Saturday, 18 May 2013

Using AIX Tools to Debug Network Problems

Question

Using AIX Tools to Debug Network Problems

Answer

This document discusses some standard AIX commands that can check for network connectivity or performance problems.

From time to time users may be unable to access servers via their client applications or they may experience performance problems. When application and system checks do not indicate the problem, the system administrator may need to check the network or the system's network settings to find the problem. Using standard AIX tools, you can quickly determine if a server is experiencing a network problem due to configuration or network issues. These tools include thenetstat and tcpdump commands, which can help you isolate problems, from loss of connectivity to more complex network performance problems.

  • Basic tools and the OSI-RM
  • Using the netstat command
  • Using the tcpdump command

Basic tools and the OSI-RM

The AIX commands you can use for a quick checkup include the lsdeverrptnetstat and tcpdump commands. With these tools, you can assess the lower layers of your system's network configuration within the model known as the Open Systems Interconnection (OSI) Reference Model (RM) (see Table 1). Using the OSI-RM allows you to check common points of failure, without spending too much time looking at elusive errors that might be caused by loss of network access within an application.

Open Systems Interconnection Reference Model

 Model Layer           Function                         Assessment Tools
             
7. Application Layer  Consists of application          . 
                      programs that use the network.
6. Presentation Layer Standardizes data presentation 
                      to the applications.
5. Session Layer      Manages sessions between 
                      applications.
4. Transport Layer    Organizes data grams into        netstat -s 
                      segments and reliably delivers   iptrace 
                      them to upper layers.            tcpdump
3. Network Layer      Manages connections across the   netstat -in, -rn, -s, -D
                      network for the upper layers.    topas
                                                       iptrace
                                                       tcpdump
2. Data Link Layer    Provides reliable data delivery  netstat -v, -D
                      across the physical link.        iptrace
                                                       tcpdump
1.  Physical Layer    Defines the physical             netstat -v, -D 
                      characteristics of the           lsdev -C
                      network media.                   errpt
                                                       iptrace
                                                       tcpdump

Using the netstat command

One of the netstat tools, the netstat -v command, can help you decide if corrective action needs to be taken on the server or elsewhere in the network. Output from this command is the same as the entstattokstatfddistat, and atmstat commands combined. The netstat -v command assesses the physical and data link layers of the OSI-RM. Thus, it is one of the first commands you should use, after determining that there is no hardware availability problem. (The errpt andlsdev -C commands can help determine availability.) The netstat -v output can indicate whether you need to adjust configuration of a network adapter (to reestablish or improve communications) or tune an adapter for better data throughput.

Sample scenario

A simple scenario illustrates how the netstat -v command helps determine why a system is not communicating on its network.

The scenario assumes a system with the following characteristics:
  • An IBM 4-Port 10/100 Mbps Ethernet PCI Adapter (ent0 - ent3)
  • An onboard IBM 10/100 Mbps Ethernet PCI Adapter (ent4)
  • A single cable connected to one of the ports on the four-port adapters
  • A single IP address configured, on en0, which also maps to one of the logical devices (ent0) on the 4-Port card
The problem: Since TCP/IP was configured on en0, the system has been unable to ping any system on the network.
Example 1
  1. The lsdev -C and errpt commands were used to verify the availability of the adapter and interface.'

  2. The netstat -in command (interface configuration) and the netstat -rn (route configuration) command were used to check the IP configuration.

  3. After the first two preliminary steps, the next step is to use the netstat -v command to review specific statistics for adapter operations. Without a filter, thenetstat -v command produces at least 10 screens of data, so this examples uses the netstat -v ent0 command to limit the output as follows:

    netstat -v ent0 | grep -p "Specific Statistics"

    The RJ45 Port Link Status line in the sample output indicates whether or not the adapter has a link to the network. In this example, the RJ45 Port Link Status is down
    IBM 4-Port 10/100 Base-TX Ethernet PCI Adapter Specific Statistics:
    ------------------------------------------------
    Chip Version: 26
    RJ45 Port Link Status : down
    Media Speed Selected: Auto negotiation
    Media Speed Running: 100 Mbps Full Duplex
    Receive Pool Buffer Size: 384
    Free Receive Pool Buffers: 128
    No Receive Pool Buffer Errors: 0
    Inter Packet Gap: 96
    Adapter Restarts due to IOCTL commands: 1
  4. Running netstat -v a second time without a filter allows you to check the port link status for every adapter. For example, enter:

    netstat -v | more

    and then use /Specific as the search string for the more command. In this example, such a search shows that ent3, not ent0, shows a port link status ofup. This information indicates that the cable is in the wrong port on the 4-Port Adapter, and that moving the cable to the correct (that is, configured) port fixes the problem.
Example 2
Interpreting the portion of the netstat -v output that indicates adapter resource configuration can help isolate a system configuration problem. When setting up servers that provide for network backup (such as, TSM or SysBack), administrators commonly do some preliminary testing and achieve good results. Then, as more remote servers are added to the backup schedule, performance can decrease. Where network throughput was once good, but then has decreased, netstat -v can uncover potential problems with adapter resources.

Many modern adapters have tunable buffers that allow you to adjust the resources a device can obtain. When a backup server requires extensive resources to handle data reception, looking at the output of netstat -v for Receive Statistics and for Adapter Specific Statistics can help isolate potential network performance bottlenecks. It is not uncommon to see errors in the Adapter Specific section of the 10/100 Mbps adapter that indicate "No Receive Pool Buffer Errors". In Example 2 the netstat -v command is run twice, 30 seconds apart, while the server is handling several backup jobs. The output shows the default setting of 384 on the receive pool buffer needs to be adjusted higher. As long as no other errors suggesting additional problems show up in the output, you can safely assume that performance will improve when the receive pool buffer on ent4 is adjusted.
  1. Run the following command to see specific statistics for en4:

    netstat -v ent4 | grep -p "Specific Statistics"

    Command output is similar to the following:
    IBM 4-Port 10/100 Base-TX Ethernet PCI Adapter Specific Statistics:
    ------------------------------------------------
    Chip Version: 26
    RJ45 Port Link Status : up
    Media Speed Selected: Auto negotiation
    Media Speed Running: 100 Mbps Full Duplex
    Receive Pool Buffer Size: 384
    Free Receive Pool Buffers: 128
    No Receive Pool Buffer Errors: 999875
    Inter Packet Gap: 96
    Adapter Restarts due to IOCTL commands: 1
    
  2. Run the following commands to check the No Receive Pool Buffer Errors after 30 seconds:

    sleep 30 ; netstat -v ent4 | grep "Receive Pool Buffer Errors"

    Output is similar to the following:
    No Receive Pool Buffer Errors: 1005761

Using the tcpdump command

The netstat tools (netstat -innetstat -rn and netstat -v) cannot always determine the nature of a connection problem.
Example 3
Suppose your server has four separate network adapters configured and attached to separate network segments. Two are working fine (VLAN A and B) while no connections can be established to your server on the other two segments (VLAN C and D). The output of netstat -v shows that data is coming in on all four adapters and no errors are being logged, indicating that the configuration at the physical and data link layers is working. In such a case, you need to examine the inbound data itself. You can use the tcpdump tool to examine the data online to help you determine the connection problem.

The tcpdump command provides much data, but for quick analysis only some basics pieces of its output (IP addresses) are needed:
You also want to consider the logical configuration you have set up for your interfaces (netstat -in). In this example, en2 was configured with address 9.3.6.225 and is in VLAN C (IP network 9.3.6.224, netmask 255.255.255.240); en3 was configured with address 9.3.6.243 and is in VLAN D (IP network 9.3.6.240, netmask 255.255.255.240).

  1. Run the following command to check traffic on en2:

    tcpdump -i en2 -I -n

    Output similar to the following is displayed:
    -TIME STAMP-    -SOURCE IP-    -DESTINATION IP-   -FLAG   -ADDITION INFO- 
    09:04:27.313527323 9.3.6.244.23 > 9.3.6.241.38160: P 7:9(2) ack 8 win 
    65535
    09:04:27.402377282 9.3.6.245.45017 > 9.53.168.52.23: . ack 24 win 
    17520 (DF) [tos 0x10]
    09:04:27.418818536 9.3.6.241.38160 > 9.3.6.244.23: . ack 9 win 65535 
    [tos 0x10
    09:04:27.419054751 9.3.6.244.23 > 9.3.6.241.38160: P 9:49(40) ack 8 
    win 65535
    09:04:27.524512144 9.3.6.245.45017 > 9.53.168.52.23: P 4:5(1) ack 24 
    win 17520 (DF) [tos 0x10]
    09:04:27.526159054 9.53.168.52.23 > 9.3.6.245.45017: P 24:25(1) ack 5 
    win 2482 (DF)
    09:04:27.602600775 9.3.6.245.45017 > 9.53.168.52.23: . ack 25 win 
    17520 (DF) [tos 0x10]
    09:04:27.628488745 9.3.6.241.38160 > 9.3.6.244.23: . ack 49 win 65535 
    [tos 0x1
  2. Press Ctrl-C to stop the output display:

    ^C
    38 packets received by filter
    0 packets dropped by kernel
Useful data can be gained from the tcpdump output simply by recognizing the source IP addresses in the traffice (shown in bold type in the sample output). Thus, the sample output shows that ent2 is physically attached to the wrong network segment. The source IP addressses should be in the 9.2.6.22x range, not the 9.3.6.24x range. It is possible that swapping the cables for ent2 and ent3 may solve the problem. If not, you may need to ask your network administrator to reconfigure switch ports to pass the correct traffic. With the information you gain from using the netstat -v and tcpdump tools, you can better decide which action is most appropriate.

AIX provides many tools for querying TCP/IP status on AIX servers. However, the netstat and tcpdump commands do provide some methods for quick problem determination. For example, these tools can help determine if you own the problem or if it needs to be addressed by a network administrator.

For additional information, please refer to AIX Online Documents at the following URL: Link

1 comment: