Troubleshooting process hangs caused by JFS2 inode with an empty extended attribute entry
Summary: In this article, we reveal a special kind of inode that cannot be detected by the fsck command, and explain why it can cause process hangs and drop system performance. We also introduce some approaches to tackle such an issue.Introduction
It is not surprising that file system corruption often causes process (or system) hangs.But sometimes, a file system that looks clean might still cause process hangs.
In this article, we introduce methods to create an access control list (ACL)-enabled enhanced journaled file system (JFS2) inode that has an empty extended attribute (EA) entry, and investigate how it cause process hangs. This article also outlines approaches to tackle process hangs caused by such JFS2 inodes.
Setting extended permissions for AIXC ACL
The AIXC ACL includes base permissions and extended permissions. Base permissions are the traditional file-access modes assigned to the file owner, file group, and other users. Extended permissions modify the base file permissions (of the owner, group, or others) by permitting, denying, or specifying access modes for specific individuals, groups, or user and group combinations.
To enable ACL for the /testfs/foo file and add specific permissions to a user, we can use the acledit command:
# export EDITOR=/usr/bin/vi; acledit /testfs/foo
Then, we can use the vi commands to change the extended permissions to enabled and add the specific permissions:
* * ACL_type AIXC * attributes: base permissions owner(root): rw- group(system): r-- others: r-- extended permissions enabled permit rwx u:bin Should the modified ACL be applied? (yes) or (no) yes
We can run aclget to check the ACL of a file.
Extended attribute descriptor in inode
Many applications require the ability to associate variable length control information with a file system object. These objects are called extended attributes or EA. The EA descriptor in the inode, di_ea, describes if the inode has any EAs stored for it. If the inode does have EAs associated with it, the EA descriptor will also describe the inode extension associated with the inode. The EA descriptor is defined in j2_types.h, ead_t:
typedef struct { uint8 flag; /* 1: flags */ uint8 nEntry; /* 1: */ uint8 len; /* 1: length in unit of fsblksize */ uint8 addr1; /* 1: address in unit of fsblksize */ uint32 addr2; /* 4: address in unit of fsblksize */ uint16 type; /* 2: ea type */ int16 nblocks; /* 2: nBlocks for outline pages */ int32 rsrvd; /* 4: */ } ead_t; /* 16 */
Create an ACL-enabled JFS2 inode with an empty EA entry
When ACL is enabled, the di_ea.nEntry of inode normally will not be zero. For test purpose we need to intentionally set di_ea.nEntry to 0 for the ACL enabled inode. We have several methods to accomplish it.
Using fsdb to modify inode
First, we create a test file:
# echo "a test file" > /testfs/foo # sync
Then, we get the inode number:
# ls -i /testfs/foo 3 /testfs/foo
The fsdb can be used to modify the on-disk inode after umount file system:
# umount /testfs # fsdb /testfs File System: /testfs File System Size: 2620952 (512 byte blocks) Aggregate Block Size: 4096 Allocation Group Size: 8192 (aggregate blocks) > i 3 Inode 3 at block 33, offset 0x600: …… [5] di_nlink: 1 [22] di_ixpxd.addr2: 0x00000021 [6] di_mode: 0x000081a4 di_ixpxd.address: 33 0100644 -rw-r--r-- [24] di_uid: 0 …… [13] di_ea.flag: 0x00 [30] di_ea.len: 0 EAv1 [31] di_ea.addr1: 0x00 [15] di_ea.nEntry: 0x00 [32] di_ea.addr2: 0x00000000 [16] di_ea.type: 0x0000 di_ea.address: 0 [34] di_ea.nblocks: 0 change_inode: [m]odify, [e]a, [t]ree, or e[x]it > m 6 0x020081a4 change_inode: [m]odify, [e]a, [t]ree, or e[x]it > m 16 2 Inode 3 at block 33, offset 0x600: …… Inode 3 at block 33, offset 0x600: …… [5] di_nlink: 1 [22] di_ixpxd.addr2: 0x00000021 [6] di_mode: 0x020081a4 di_ixpxd.address: 33 0100644 -rw-r--r-- [24] di_uid: 0 …… [13] di_ea.flag: 0x00 [30] di_ea.len: 0 EAv1 [31] di_ea.addr1: 0x00 [15] di_ea.nEntry: 0x00 [32] di_ea.addr2: 0x00000000 [16] di_ea.type: 0x0002 di_ea.address: 0 [34] di_ea.nblocks: 0 change_inode: [m]odify, [e]a, [t]ree, or e[x]it > x > q
Now, we run the fscck command to check /testfs:
# fsck -yvv /testfs The current volume is: /dev/fslv00 Primary superblock is valid. Superblock s_state = 0x0 mode = 0x3 *** Phase 1 - Initial inode scan *** Phase 2 - Process remaining directories *** Phase 3 - Process remaining files *** Phase 4 - Check and repair inode allocation map *** Phase 5 - Check and repair block allocation map File system is clean.
The output shows that the fsck command cannot detect the ACL-enabled JFS2 inode that has an empty EA entry.
Using kdb to modify inode
In IBM® AIX® 6.1 and AIX 7.1, you can use the kdb command to modify the inode in memory while the file system is mounted.
(0)> i2 -i 3 ADDRESS DEVICE I_NUM/FS COUNT TYPE FLAG F1000A00588F6880 8000000A0000000C 3/16 00000 VREG …… On-disk Persistent Inode @ 0xF1000A00588F6A20: uid..........0x00000000 gid..........0x00000000 mode......0x000081A4 mode.........-rw-r--r--- mode.........0100644 …… ea.flag......0x00000000 ea.address...0x0000000000000000 ea.addr1.....0x00000000 ea.addr2.....0x00000000 ea.nEntry....0x00000000 ea.len.........0x00000000 ea.type.......0x00000000 (0)> mw 0xF1000A00588F6A20+0x3C F1000A00588F6A5C: 000081A4 = 020081a4 F1000A00588F6A60: 00000000 = . (0)> mw 0xF1000A00588F6A20+0x88 F1000A00588F6AA8: 00000000 = 00020000 F1000A00588F6AAC: 00000000 = .
Check inode again to verify the changes:
(0)> i2 F1000A00588F6880 …… On-disk Persistent Inode @ 0xF1000A00588F6A20: uid..........0x00000000 gid..........0x00000000 mode......0x020081A4 mode.........-rw-r--r--- mode.........0100644 …… ea.flag......0x00000000 ea.address...0x0000000000000000 ea.addr1.....0x00000000 ea.addr2.....0x00000000 ea.nEntry....0x00000000 ea.len.........0x00000000 ea.type.......0x00000002 typeNames.... AIXACL
Troubleshooting process hang
The process might hang when it opens the inode created using the above methods:
# more foo # ps ax|grep more 25100524 pts/3 A 0:25 more foo
The processor system time will increase drastically when more processes try to open that file, and the hang process could not be killed:
# kill -9 25100524 # ps ax|grep more 25100524 pts/3 A 0:57 more foo
Signals are delivered only when a thread returns from the kernel mode. So, we need to examine what function the thread is currently running in the kernel mode and why it cannot return to user mode. Trace and kdb are the commands that are very useful commands in such situation.
Using kdb to examine the hang process
# kdb (0)> th * |grep more pvthread+006200 98*more RUN 062027 06D 0 0 pvthread+00EB00 235!more RUN 0EB039 078 2 0 pvthread+015800 344!more RUN 158047 06C 7 0 (0)> f 344 pvthread+015800 STACK: Use current context [F00000002FF47600] of cpu 7 [00009518].simple_lock+000018 () [00274808]iPut+000054 (??, ??) [20646678]20646678 () [DEADBEEB]DEADBEEB () [0044FE64]openpnp+000788 (??, ??, ??, ??, ??) [0044FF9C]openpath+0000BC (??, ??, ??, ??, ??, ??, ??) [0045027C]copen+000218 (??, ??, ??, ??, ??) [0044F678]kopen+00001C (??, ??, ??) (0)> f 344 pvthread+015800 STACK: Use current context [F00000002FF47600] of cpu 7 [0000B598].fetch_and_addlp+000018 () [003DAEEC]lookuppn+0000D8 (??, ??, ??, ??, ??, ??, ??, ??) [0044FE64]openpnp+000788 (??, ??, ??, ??, ??) [0044FF9C]openpath+0000BC (??, ??, ??, ??, ??, ??, ??) [0045027C]copen+000218 (??, ??, ??, ??, ??) [0044F678]kopen+00001C (??, ??, ??)
The kdb command shows that the hang thread's kernel stack is different in different time, but keeps calling the openpath() andopenpnp() functions.
It means the hang thread is not sleeping for some resource, but is getting in to an infinite looping between the openpath() andopenpnp() kernel functions. This also explains why the hang process is consuming so much processor system time.
Trace log and report
Trace can be used to get more details about the hang process.
# trace -anl -C all -T20M -L40M -o trace.raw; sleep 5; trcstop # ps ax|grep more 286908 pts/5 A 13:07 more foo 323740 pts/8 A 12:26 more foo # trcrpt -C all -O 'exec=on,pid=on,tid=on,cpu=on' -o trace.log -p 323740 trace.raw
The trace.log repeated with following block:
107 more 3 323740 1401077 4.450242983 0 lookuppn: foo 4DF more 3 323740 1401077 4.450244140 1 JFS2 iget: vp = F100010038159FE8, count = 0017, ino = 0002, dev = 000000A0000000B 4DB more 3 323740 1401077 4.450244521 0 vnop_hold(vp = F100010038159FE8, getcaller = 3DB1B8) = 0000 4DF more 3 323740 1401077 4.450246527 2 JFS2 iget: vp = F100010038169FE8, count = 0006, ino = 0003, dev = 000000A0000000B 107 more 3 323740 1401077 4.450247078 0 vnop_lookup(dvp = F100010038159FE8, flag = 000A) = 0000, *vpp = 100010038169FE8 107 more 3 323740 1401077 4.450247558 0 lookuppn exit: 'foo' = vnode F100010038169FE8 15B more 3 323740 1401077 4.450248828 1 vnop_open(vp = F100010038169FE8, flags = 4000001, ext = 0000) = 0002
It is obvious that the thread first called the lookuppn() function and found the foo file exit, but vnop_open() returned the 02 (ENOENT) error b when it tried to open foo. Then, it repeated calling thelookuppn()andvnop_open() functions constantly, thus resulting in process hang.
How to tackle the problem
We usually have to restart the system when hang process cannot be killed. But reboot is not preferred for important production systems. When we know the root cause of the hang, sometimes, we can fix it without a restart. In this specific example, we already know that the hang is caused by empty EA entry. So, we can exit the infinite loop by disabling ACL.
Disable ACL using kdb
In AIX 6.1 and 7.1, the inode in memory can be modified using kdb to disable ACL.
(0)> i2 F1000A0034496880 On-disk Persistent Inode @ 0xF1000A0034496A20: uid..........0x00000000 gid..........0x00000000 mode...... 0x020081A4 mode.........-rw-r--r--- mode.........0100644 …… ea.flag....... 0x00000000 ea.address...0x0000000000000000 ea.addr1.....0x00000000 ea.addr2.....0x00000000 ea.nEntry....0x00000000 ea.len.........0x00000000 ea.type........0x00000002 typeNames.... AIXACL (0)> mw 0xF1000A0034496A20+0x3c F1000A0034496A5C: 020081A4 = 000081A4 F1000A0034496A60: 00000000 = . (0)> mw 0xF1000A0034496A20+0x88 F1000A0034496AA8: 00020000 = 0 F1000A0034496AAC: 00000000 = . (0)> q
Now all hang processes will continue to run or would be killed if kill signal was pending before.
Writing a program to disable ACL
In AIX 5.3, we write a fix program because kdb cannot modify the memory. (See the Download section.)
The fix program also works for AIX 6.1 and AIX 7.1.
Firstly, we get the memory address of on-disk persistent inode using kdb. Then we read, modify and write back ea.typeand di_mode through the /dev/kmem interface:
/* on-disk inode */ dinode_t dip; /* on-disk inode address got from kdb */ unsigned long long ip=0xF1000A005D84F620; open("/dev/kmem", O_RDWR, 0); kread(ip, (char *)&dip, sizeof(dinode_t)); /* clear S_IXACL */ if(dip.di_mode == 0x20081a4) dip.di_mode = 0x81a4; /* clear ea.type */ if(dip.di_ea.type == 0x02) dip.di_ea.type = 0; kwrite(ip, (char *)&dip, sizeof(dinode_t));
After running the fix program, all hang process will continue to run or would be killed if the kill signal was pending before.
Conclusion
Processes will hang when they open an ACL-enabled file with an empty EA entry. IBM recently has provided an authorized program analyst report (APAR) IV34969 that can avoid this problem. If your AIX system unfortunately encounters this problem before applying for IV34969 and you do not want to restart your system, you can apply the approaches introduced in this article as a workaround.
Download
Description | Name | Size | Download method |
---|---|---|---|
Pogram to disable ACL | fix.c | 3 KB | HTTP |
0 blogger-disqus:
Post a Comment