Sunday, 8 April 2018

PSOD

You need to highlight important step to capture log file information after the PSOD occurred.
To resolve this issue, extract the log file from a vmkernel-zdump file using a command line utility on the ESX or ESXi host. This utility differs for different versions of ESX or ESXi.
  • For ESXi 3.5, ESXi/ESX 4.x and ESXi 5.x, use the esxcfg-dumppart utility:# esxcfg-dumppart -L vmkernel-zdump-filename
To extract the log file from a vmkernel-zdump file:
  1. Find thevmkernel-zdump file in the /root/ or /var/core/ directory:# ls /root/vmkernel* /var/core/vmkernel*
    /var/core/vmkernel-zdump-073108.09.16.1
  2. Use thevmkdump or esxcfg-dumppart utility to extract the log. For example:# vmkdump -l /var/core/vmkernel-zdump-073108.09.16.1
    created file vmkernel-log.1# esxcfg-dumppart -L /var/core/vmkernel-zdump-073108.09.16.1
    created file vmkernel-log.1
  3. Thevmkernel-log.1 file is plain text, though may start with null characters. Focus on the end of the log, which is similar to:
VMware ESX Server [Releasebuild-98103]
PCPU 1 locked up. Failed to ack TLB invalidate.
frame=0x3a37d98 ip=0x625e94 cr2=0x0 cr3=0x40c66000 cr4=0x16c
es=0xffffffff ds=0xffffffff fs=0xffffffff gs=0xffffffff
eax=0xffffffff ebx=0xffffffff ecx=0xffffffff edx=0xffffffff
Note: The file name created for the log in this example is vmkernel-log.1. If another file with the same name already exists, the new file is created with the number suffix incremented.

Most of the times it will be hardware issue and you need to open a case with Hardware vendors, in this case it is HP. Based on findings you need to replace the Hardware devices or upgrade the firmware as suggested by Hardware vendors via ITIL Change Management process.
In some cases it may be problem with software installed on ESXi server like additional agents for monitoring both software & hardware, additional VIBs added for Storage … etc

Finally if you want to be expert to analyze the logs on your own, then here is the good KB Article from VMware. It’s rare that Interviewer asking about debugging this issue but he wants to check your understanding about procedure followed in case of PSOD.

No comments:

Post a Comment

devops interview questions

Terraform* 1. Terraform workspace 2. ⁠what are Mera arguments 3. ⁠what’s difference b/w for each and dynamic block 4. ⁠provisioners in t...