Thursday 4 July 2013

Debug and Fix RMC Connection Errors

Question

How does one debug and correct an issue when they get a "No RMC Connection" error message when using the HMC?

Cause

The no RMC connection error message can occur on the HMC when attempting to dynamically configure AIX or VIOS , when attempting LPAR mobility, or when configuring virtual resources? RMC is an encrypted communication channel between the HMC and a LPAR that uses port 657 and both TCP and UDP protocols . Changing IP addresses, cloning AIX LPARs, or a host of other administrative tasks can cause RMC to breakdown.

Answer

There are some basic commands that can be run to check status of RMC configurations and there are some dependancies on RSCT versions as to which commands you use. RSCT 3.1.x.x levels are the newest and included in AIX 6.1 TL6 or higher and RSCT 2.x.x.x are included in AIX 6.1 TL5 or lower. Following queries provide a quick method to assess RMC health.

- As root on AIX or VIOS LPAR
-- IF AIX 6.1 TL5 or lower
lslpp -l csm.client ---> This fileset needs to be installed
-- IF AIX 6.1 TL6 or higher
lslpp -l rsct.core.rmc ---> This fileset needs to be 3.1.0.x level or higher
-- For all AIX versions
/usr/sbin/rsct/bin/ctsvhbac ---> Are all IP and host IDs trusted?
-- For AIX 6.1 TL5 or lower
lsrsrc IBM.ManagementServer ---> Is HMC listed as a resource?
-- For AIX 6.1 TL6 or higher
lsrsrc IBM.MCP ---> Is the HMC listed as a resource?
- On HMC (as hscroot)
lspartition -dlpar ---> Is LPAR's DCaps value non-zero ?

If you answer no to any of the above then corrective action is required.

- Fix It Commands (run as root on LPAR, HMC, or both)

Caution: Running the commands listed below on AIX LPARs is only safe if the node is only a member of the HMC's RMC domain. These commands should not be used in an active CAA clustered environment. If you need to determine if your system is a member of a CAA cluster then please refer to the Reliable Scalable Cluster Technology document titled, "Diagnosing problems with the Resource Monitoring and Control (RMC) subsystem."

http://pic.dhe.ibm.com/infocenter/aix/v7r1/index.jsp?topic=%2Fcom.ibm.aix.rsct312.trouble%2Fbl507_diagrmc.htm

Pay particular attempt to the section titled Diagnostic procedures to help learn if you node is a member of any domain other than the HMC management domain.

odmdelete -o CuAt -q "name='cluster0'" (Only run this on AIX or VIOS)
/usr/sbin/rsct/install/bin/recfgct
/usr/sbin/rsct/bin/rmcctrl -p

You would need a pesh password for your HMC if you need to run the above fix commands on the HMC.
You can try the following command first as hscroot:

lspartition -dlparreset

If that does not help you will need to request pesh passwords from IBM Support for your HMC so you can run the recfgct and rmcctrl commands listed above.

After running the above commands it will take several minutes before RMC connection is restored. The best way to monitor is by running the lspartition -dlpar command on the HMC every few minutes and watch for the target LPAR to show up with a non-zero DCaps value.

- Things to consider before using the above fix commands or if the reconfigure commands don't help.

If you are still confused about whether or not your LPAR is a member of a CAA cluster then some application names might help (PowerHA 7, HPC applications such as GPFS, ViSDs, CSM, etc). Most administrations should have a good idea how their server is configured and what is running on them so the decision to proceed can be easy. The diagnostic checks covered in the RSCT document should help with the decision if you are unsure.

Network issue are often overlooked or disregarded. There are some network configuration issues and perhaps even some APAR issues that might need to be addressed if the commands that reconfigure RSCT don't restore DLPAR functions and those issues will require additional debug steps not covered in this tech note. However, there are some common network issues that can prevent RMC communications from passing between the HMC and the LPARs and they include the following.

- Firewalls blocking bidirectional RMC related traffic for UDP and TCP on port 657.
- Mix of jumbo frames and standard Ethernet frames between the HMC and LPARs.
- Multiple interfaces with IP addresses on the LPARs that can route traffic to the HMC. 

0 blogger-disqus:

Post a Comment