Thursday, 4 April 2013

HOWTO: Cleanup a MISSING PV disk

Cleanup a MISSING PV disk

While wondering around the web I found a blog with comparisons between Solaris 10 and AIX 6. One of them is this blog with several AIX articles. One (scroll down a bit if you follow the link) was on how to trick the ODM into letting you remove a MISSING disk. Anyone who has followed an AIX administration course (well the advanced one) knows that there is a command to do all this for you! Even if editing ODM is fun for some of us.

Below, my extended guide for removing a MISSINGPV from the other disks VGDA and AIX ODM.

Introduction

How a disk becomes PVMISSING is irrelevant. These things happen. Getting the system repaired is relevant! So, the simpler way! to correct volume group VGDA and AIX ODM.

The single command we will be using to remove the disk is:
ldeletepv -v VGID -p PVID

But, before we do, there are a number of steps we should follow as a matter of "best practice".

CASE: While the volume group is offline, maintenance is performed on the disks. One disk is/was damaged beyond repair, or replaced during the process. Now back at AIX the volumes are to be reactivated.
root@umaix:[/]lsvg -p vgExport
0516-010 : Volume group must be varied on; use varyonvg command.
root@umaix:[/]varyonvg vgExport
PV Status:      hdisk1  00c39b8d69c45344        PVACTIVE
               hdisk2  00c39b8d043427b6        PVMISSING

The disk hdisk2 is PVMISSING. We assume hdisk2 with PVID 00c39b8d043427b6 is physically destroyed. All the data is lost; however, the AIX ODM and the VGDA on all the other disks in the volume group do not know this yet.

First document what is lost. We need to know which logical volumes are (were) on the missing disk. Normally we could use lspv -l hdiskX; (new: undocumented variation: lspv -l PVID) however, with the disk missing, this version of the command will not work. Instead, we use the VGID (volume group identifer).

1. Query the VGDA of the working disk to get the VGID and PVID of all disks in the volume group

root@umaix:[/]lqueryvg -p hdisk1 -vPt
Physical:       00c39b8d69c45344                2   0
                00c39b8d043427b6                1   0
VGid:           00c39b8d00004c000000011169c45a4b

2. Get a list of all the logical volumes on the missing disk

root@umaix:[/]lspv -l -v 00c39b8d00004c000000011169c45a4b hdisk2
hdisk2:
LV NAME               LPs   PPs   DISTRIBUTION          MOUNT POINT
lvTest                512   512   109..108..108..108..79 /scratch
loglv00               1     1     00..00..00..00..01    N/A
(Note: lspv -l  00c39b8d043427b6 should give us the same output!)

3. Verify all filesystems are unmounted.

root@umaix:[/]lsvg -l vgExport
vgExport:
LV NAME             TYPE       LPs   PPs   PVs  LV STATE      MOUNT POINT
lvExport            jfs2       416   416   1    closed/syncd  /export
lvTest              jfs        512   512   1    closed/syncd  /scratch
loglv00             jfslog     1     1     1    closed/syncd  N/A

With this info I know that any data in /scratch is suspect, and should be restored from a backup.

4. Remove the logical volumes from the volume group before deleting the VGDA from the other disks.

root@umaix:[/]rmfs /scratch
rmfs:  0506-936  Cannot read superblock on /dev/lvTest.
rmfs:  0506-936  Cannot read superblock on /scratch.
rmfs: Unable to clear superblock on /scratchrmlv: Logical volume lvTest is removed.
root@umaix:[/]rmlv loglv00
Warning, all data contained on logical volume loglv00 will be destroyed.
rmlv: Do you wish to continue? y(es) n(o)? y
rmlv: Logical volume loglv00 is removed.
root@umaix:[/]lsvg -p vgExport
vgExport:
PV_NAME           PV STATE          TOTAL PPs   FREE PPs    FREE DISTRIBUTION
hdisk1            active            511         95          00..00..00..00..95
hdisk2            missing           542         29          51..18..51..51..51
root@umaix:[/]lsvg -l vgExport
vgExport:
LV NAME             TYPE       LPs   PPs   PVs  LV STATE      MOUNT POINT
lvExport            jfs2       416   416   1    closed/syncd  /export

5. Remove the definition of the damaged disk from the VGDA of the remaining disk(s)

The volume group has been prepared - all damaged logical volume definitions have been removed. All that is remaining for cleanup is to remove the definition of the damaged disk from the VGDA of the remaining disk(s).
root@umaix:[/]ldeletepv -g 00c39b8d00004c000000011169c45a4b -p 00c39b8d043427b6
Note: there is no output for the above command when all proceeds accordingly.
Now the regular AIX commands to verify VGDA and ODM are in order.
root@umaix:[/]lsvg -p vgExport                                              
vgExport:
PV_NAME           PV STATE          TOTAL PPs   FREE PPs    FREE DISTRIBUTION
hdisk1            active            511         95          00..00..00..00..95
root@umaix:[/]mount /export
root@umaix:[/]lsvg -l vgExport
vgExport:
LV NAME             TYPE       LPs   PPs   PVs  LV STATE      MOUNT POINT
lvExport            jfs2       416   416   1    open/syncd    /export

6. Various steps that I will only list here:

a. add a new disk to the volume group (extendvg)
b. remake the deleted logical partitions (mklv)
c. format, as needed, the log logical volumes (logform)
d. create the filesystems (crfs, or use smit)
e. restore the data from a backup (restore, tar, cpio, etc.)

Summary

This procedure is much less error prone than using ODM commands. All the commands demonstrated here have been available in AIX for disk management since at least 1995 (when AIX 4 first came out). They may have been in AIX 3 as well, taking it back to 1991 or earlier.

Important commands to review

lspv -l -v VGID hdiskX
lqueryvg
ldeletepv

0 blogger-disqus:

Post a Comment