Where is my device?
When you have a faulty device, you need to know where the device is located physically on your system, so it can be replaced. The errpt or lscfg comand provides a location code specifying where the faulty device is located. Armed with the location code and a server manual or an IBM® Redbooks® title covering your model, or even better with access to IBM web information center, you should be able to identify exactly where the device is located.
Introduction
Getting a device failure is definitely an inconvenience. The type of device that is failing might be hardware swappable such as a fan cooling unit or a hot swap Peripheral Component Interconnect (PCI) card. In either case, you need to know the physical location of the device for it to be replaced. So, you need to know the location code of the device. A failing device will be shown up in the error report (using the errpt command), where the physical location code will be posted as well. Alternatively, using the lscfg command also tells you the physical locations of devices. After getting the location, how do you go about locating the device?
AIX internal codes and physical codes
AIX provides two different codes, and they are:
- IBM AIX® (internal) location system codes
- Physical location codes
AIX internal location codes can be used in conjunction with physical codes to identify devices, as we will see later in this article. The ones generated using AIX , reference certain devices, for example:
0-80-00-3,0 SCSI CD Drive 10-80-00-2,0 SCSI disk 02-08-00 SAS disk
The above codes are the internal paths to the actual device, which can be viewed with the lsdev command.
The other location code which is the physical type and the one we are particularly interested in is generated by the firmware. For example:
U789C.001.DQD3F62-P2-D3 SAS Disk Drive
Since the release of IBM POWER5 processors a few years back, the physical location code is the preferred method for locating devices. As a rule, the physical code is generally all that you need. That is what I focus on in this article. The commands provided in Table 1 enables you to get various information about your devices.
Table 1. Commands to get information about your devices
Command | Description |
---|---|
lsdev -C -H -F "name status physloc location description" | Get the AIX ( if present) and physical location codes. |
lsdev -Cc disk -F 'name location physloc' | Get the AIX and physical location codes of all disks. |
lsdev -Cl hdisk0 -F physloc | Get the location code of hdisk0. |
lscfg -vpl hdisk0 | Get extended information o fhdisk0. |
lsdev -C| grep hdisk0 | Get the AIX location code of hdisk0. |
lsparent -Cl hdisk0 | Get the parent devices for hdisk0. |
lscfg -l fcs0 | Get information about the fsc0 device. |
Use the IBM information center or Redbooks
How can you locate a device using the physical location code? It depends on what type of system you have, as these might slightly different across the system ranges. Always make sure that you have your server system manual or refer to the online information at IBM information center. These references provide the schematics of your model, including the location codes of your system for easy identification. However, having stated that all is not lost, there are ways to physically identify a device.
What's in a code?
The location code of a physical device comes from the firmware side. If you follow the location code correctly, it eventually points to the device you are looking for.
The actual format of a location code is the same format no matter what server you have, it is just the codes (in numbers/letters) that can point to a different physical location on your system. The first character of a locatable device is always "U", so far so good. Next, it gets interesting. Here is the general format of a location code, with an example, taken from an IBM Power Systems™ 520 model (floor standing), which is what is used for examples unless otherwise stated. All location code examples are physical locations unless otherwise stated.
Unit enclosure type | Enclosure model | Serial number | Location |
---|---|---|---|
U789C | 001 | DQD3F62 | P2-D3 |
The fields location are the unit/model serial number of the unit/drawer. Your system might contain different unit enclose types. Do not expect to have the same enclosure on all your location codes. This is especially true if you have expansion slots, such as additional disk drawers.
For this article, the location field is the interesting bit.
If the physical location cannot be resolved, AIX assumes it is a logical device that at some point is linked to a physical device. Typically, these can be logical devices connected to say, external storage such as Redundant Array of Independent Disks (RAID) SCSI devices or tape units. The codes can have different meanings depending on the type of hardware, for instance SCSI, serial, ttys and adapters.
The location code can be made up of several prefix letters and numbers. Common prefixes are shown in Table 2.
Table 2. Common prefixes
Code prefix | Description |
---|---|
A | Air moving device, for example,fan |
C | Card, for example, PCI slots, memory slots |
D | Devices, for example, disk slot, disk drawer |
E | Electrical, for example, power supply |
L | Logical path, for example, Fibre Channel |
P | Planar, for example, a system or I/O back-plane, system board |
T | Interface connector /Port, for example, serial port, usually followed by a number to denote which port |
U | Unit |
V | Virtual planar |
That's a code
Let's now look at an example, say hdisk0. Below is a partial output from lscfg for hdisk0:
lscfg -vpl hdisk0 hdisk0 U789C.001.DQD3F62-P2-D3 SAS Disk Drive (146800 MB) Manufacturer................IBM Machine Type and Model......ST3146356SS FRU Number..................10N7204 ROS Level and ID............45363045 Serial Number...............3QN2JFEP Hardware Location Code......U789C.001.DQD3F62-P2-D3 PLATFORM SPECIFIC Name:disk Node: disk Device Type:block
Looking at the above lscfg output, it first tells me this is a SAS disk , but let's look more closely at the location code, and break that down:
789C.001.DQD3F62-P2-D3
Code | Description |
---|---|
U789C | Unit type |
001 | Enclosure model |
DQDS3F62 | Serial number |
P2 | Planar 2 (this is accessible from the front of the unit) |
D3 | Device slot 3 (disk drive number1, which is the second bay down first disk on left.) |
How did I know all the above? By using my Redbooks and referencing the location with the schematic representation of the model, I know exactly where it is located!
Typically, you will find location codes referenced purely by the location code only and not by the system details. This is certainly true if the devices are all coming off the same planar, and in such cases, the following format would be used: Un<location>
When dealing with card slots, these could contain dual ports, for example Ethernet or fiber cards. If this is the case, then the location will have an alphabet "T" associated with it, and the number following the T is the port number. If a location has "T" but is not a card slot (and no C in the location code), then you can be pretty much assured that this is an integrated (on-board) interface. Here, I am thinking of serial ports or Ethernet ports.
Let's now turn our attention to a fiber card slot. Looking at the fiber card (fcs0) location:
lscfg -vl fcs0
Physical Location: U789C.001.DQD3F62-P1-C1-T1
Looking at the location code:
P1-C1-T1
We know that:
Code | Description |
---|---|
P1 | Planar 1. This is accessible from the backside of the unit. |
C1 | Card (PCI) slot 1. This is the first PCI slot looking down from the top of the unit. |
T1 | This is the first port (upper port). |
Looking at an integral (on-board) port, say a Hardware Management Console (HMC) Ethernet port:
P1-T5
We know that this is not a card as there is no "C' in the location.
Code | Description |
---|---|
P1 | Planar 1. This is accessible from the backside of the unit. |
T5 | Slot 5. Located at the left side of the machine. Left port. |
Another way to tell if a device could be on-board is when the device returns an AIX internal location code as well. For example, taken from a Power Systems 570 model, here are a couple of Ethernet devices:
ent0 Available 02-08 2-Port 10/100/1000 Base-TX PCI-X Adapter (14108902) ent1 Available 02-09 2-Port 10/100/1000 Base-TX PCI-X Adapter (14108902) ent0 U7879.001.DQD1AE7-P1-T6 ent1 U7879.001.DQD1AE7-P1-T7
The above AIX internal location code (02-08 and 02-09) informs us that the Ethernet devices are both using the same address location, 02. As there is no "C" in the physical location, that is no card slot, we can assume that this is a on-board dual port. As a rule of thumb, if you have two on-board devices that are T1 and T2, if one pair is horizontal, T1 will be on the right and T2 on the left. If the devices are vertical then T1 will be at the top and T2 beneath it.
Many systems have storage area network (SAN) storage. It is good to know how to locate the logical unit number (LUN), looking at hdisk0 which is an IBM System Storage® DS3400 disk system, on a 570 model:
lscfg -vl hdisk20 hdisk2 0U7879.001.DQD1AE7-P1-C2-T1-W202500A0B85B6194-LE000000000000 MPIO Other DS3K Array Disk
The above output is pretty long. So, let's see what is going with that location code:
Code | Description |
---|---|
P1 | Planar 1 (this is accessible from the backside of the unit) |
C2 | Card (PCI) slot 2 fiber card |
T1 | First fiber port (top) |
W202500A0B85B6194 | Worldwide port identifier on the remote SAN switch |
LE000000000000 | (Logical) LUN ID (in hexadecimal value) of the remote disk |
Some codes will have an alphabet "L" followed by a number, and these are logical paths. Typical users of logical paths are SCSI disks including RAID disks. For example, a SCSI disk array location is shown below:
P1-C8-T1-L0-L0 SCSI RAID 5 Disk Array
Hot plugs to go
Knowing your PCI hot-plug cards is always good, because if a hot-plug device fails, it takes a few minutes to replace them. To view you PCI hots, use the following command:
lsslot -c pci # Slot Description Device(s) U789C.001.DQD3F62-P1-C1 PCI-E capable, Rev 1 slot with 8x lanes fcs0 U789C.001.DQD3F62-P1-C2 PCI-E capable, Rev 1 slot with 8x lanes fcs1 U789C.001.DQD3F62-P1-C3 PCI-E capable, Rev 1 slot with 16x lanes Empty U789C.001.DQD3F62-P1-C4 PCI-X capable, 64 bit, 266MHz slot Empty U789C.001.DQD3F62-P1-C5 PCI-X capable, 64 bit, 266MHz slot Empty
We have aleady covered the location of the fcs0 card. However, we can see that both active cards (fcs0, fcs1) are next to each other. we already know that fcs0 is the first card and is at the top of the slots. So, the second one down is the fscs1 card. The other three slots P1-C3, C4, and C5 are unallocated.
To view all your logical swaps use the following command:
lsslot -c slot
Get that location code quickly
If you are still confused about the locations, and you have a failed unit and the IBM engineer is knocking at the door, expecting you to know where the failing device is, you can always go into 'SMIT diag' and identify that failing device. This tells you the location code as well. Be sure to review your errpt before you do so to confirm that you are identifying the correct device.
Fix that attention light flashing
After you have a failure device, you will be able to identify it through an indicator light flashing in amber, or through a symbol when you are logged on to the HMC. After the device is replaced or fixed, turn the status back to normal using with:
/usr/lpp/diagnostics/bin/usysfault -s normal
Conclusion
So, now you know how the location codes can help you. As mentioned earlier, you need to know this if you are replacing devices. However, as described, it is very much dependent on you having access to your system hardware documentation, as these are dependent of your particular system.
0 blogger-disqus:
Post a Comment