In this article ,we are going to discuss about most general methods of GPFS
issues troubleshooting.
Then check if there is any NSD with status "down"
When you got GPFS issue?
Got a problem? Don’t panic!
Check for possible basic problems:
- Is Network OK?
- Check status of the cluster:
“mmgetstate–a” - Check status of NSDs:
“mmlsdisk fsname”
Take a 5 min break
- In major cases GPFS will recover by it self without need of any intervention from the administrator
If not recovered
- Ensure that you are the only person who is doing the work!
- check
gpfslogs (first on cluster manager, then on FS manager, then on NSD servers) - check
syslog(/var/log/messages) for eventual errors - Check disks availability (
mmlsdisk fsname ) - Consult “Problem determination guide”
Some usefull commands:
“mmfsadm dump waiters” will help to find long lasting processes“mmdiag --network|grep pending” helps to individuate non-responsive node“mmdiag --iohist” lists last 512 I/O operations performed by GPFS on current node (helps to find malfunctioning disk)“gpfs.snap” will garter all logs and configurations from all nodes in the cluster- the first thing to send to IBM support when opening service reques
GPFS V3.4 Problem Determination Guide:
NFS stale file handle:
When a GPFS mount point is in the "NFS stale file handle" status, example
[root@um-gpfs1 root]# df
Filesystem 1K-blocks Used Available Use% Mounted on !
/dev/gpfs_um1 8125032448 8023801088 101231360 99% /storage/gpfs_um
df: `/storage/gpfs_um': Stale NFS file handle
[root@um-gpfs1 root]# mmlsdisk gpfs_um
disk driver sector failure holds holds
name type size group metadata data status availability
------------ -------- ------ ------- -------- ----- ------------- ------------
disk21 nsd 512 4015 yes yes ready up !
disk22 nsd 512 4015 yes yes ready down !
disk23 nsd 512 4015 yes yes ready down !
disk24 nsd 512 4013 yes yes ready up !
restart the NSDs (important: do it for all NSD with status "down" in one command):
[root@um-gpfs1 root]# mmchdisk gpfs_um start -d "disk21;disk24”
re-mount filesystems
Recovery of GPFS configuration:
If a node of the cluster lost its configuration (has been re-installed) but still present as member of this cluster
(“mmgetstate” lists it in “unknown” state) use this command to recover the node:
/usr/lpp/mmfs/bin/mmsdrrestore -p diskserv-san-5 -R /usr/bin/scp
Checking existing NSD:
- If get this warning while creating new nsd Disk descriptor xxx system refers to an existing NSD
Use this command to verify if this device is actually used in one of the file systems
mmfsadm test readdescraw /dev/emcpowerax
0 blogger-disqus:
Post a Comment