Sunday, 9 March 2014

Troubleshooting GPFS Issues

In this article ,we are going to discuss about most general methods of  GPFS issues troubleshooting.
 TROUBLESHOOTING GPFS ISSUES

When you got GPFS issue?

Got a problem? Don’t panic! 
Check for possible basic problems: 
  • Is Network OK? 
  • Check status of the cluster: “mmgetstate–a” 
  • Check status of NSDs: “mmlsdisk fsname” 
Take a 5 min break  
  • In major cases GPFS will recover by it self without need of any intervention from the administrator 
If not recovered 
  • Ensure that you are the only person who is doing the work! 
  • check gpfslogs (first on cluster manager, then on FS manager, then on NSD servers) 
  • check syslog(/var/log/messages) for eventual errors 
  • Check disks availability (mmlsdisk fsname
  • Consult “Problem determination guide” 

Some usefull commands:

  • “mmfsadm dump waiters” will help to find long lasting processes 
  • “mmdiag --network|grep pending” helps to individuate non-responsive node 
  • “mmdiag --iohist” lists last 512 I/O operations performed by GPFS on current node (helps to find malfunctioning disk) 
  • “gpfs.snap” will garter all logs and configurations from all nodes in the cluster 
  • the first thing to send to IBM support when opening service reques

GPFS V3.4 Problem Determination Guide:

NFS stale file handle:

When a GPFS mount point is in the "NFS stale file handle" status, example 
[root@um-gpfs1 root]# df 
Filesystem 1K-blocks Used Available Use% Mounted on !
/dev/gpfs_um1 8125032448 8023801088 101231360 99% /storage/gpfs_um
df: `/storage/gpfs_um': Stale NFS file handle 
Then check if there is any NSD with status "down" 
[root@um-gpfs1 root]# mmlsdisk gpfs_um 
disk driver sector failure holds holds 
name type size group metadata data status availability 
------------ -------- ------ ------- -------- ----- ------------- ------------ 
disk21 nsd 512 4015 yes yes ready up !
disk22 nsd 512 4015 yes yes ready down !
disk23 nsd 512 4015 yes yes ready down !
disk24 nsd 512 4013 yes yes ready up !
restart the NSDs (important: do it for all NSD with status "down" in one command): 
[root@um-gpfs1 root]# mmchdisk gpfs_um start -d "disk21;disk24”
re-mount filesystems

Recovery of GPFS configuration:

If a node of the cluster lost its configuration (has been re-installed) but still present as member of this cluster
(“mmgetstate” lists it in “unknown” state) use this command to recover the node: 
/usr/lpp/mmfs/bin/mmsdrrestore -p diskserv-san-5 -R /usr/bin/scp

Checking existing NSD:

  • If get this warning while creating new nsd Disk descriptor xxx system refers to an existing NSD 
Use this command to verify if this device is actually used in one of the file systems 
mmfsadm test readdescraw /dev/emcpowerax

0 blogger-disqus:

Post a Comment