Question
How does one debug and correct an issue when they get a "No RMC Connection" error message when using the HMC?
Cause
The no RMC connection error message can occur on
the HMC when attempting to dynamically configure AIX or VIOS , when
attempting LPAR mobility, or when configuring virtual resources? RMC is
an encrypted communication channel between the HMC and a LPAR that uses
port 657 and both TCP and UDP protocols . Changing IP addresses, cloning
AIX LPARs, or a host of other administrative tasks can cause RMC to
breakdown.
Answer
There are some basic commands that can be run to
check status of RMC configurations and there are some dependancies on
RSCT versions as to which commands you use. RSCT 3.1.x.x levels are the
newest and included in AIX 6.1 TL6 or higher and RSCT 2.x.x.x are
included in AIX 6.1 TL5 or lower. Following queries provide a quick
method to assess RMC health.
- As root on AIX or VIOS LPAR
-- IF AIX 6.1 TL5 or lower
lslpp -l csm.client ---> This fileset needs to be installed
-- IF AIX 6.1 TL6 or higher
lslpp -l rsct.core.rmc ---> This fileset needs to be 3.1.0.x level or higher
-- For all AIX versions
/usr/sbin/rsct/bin/ctsvhbac ---> Are all IP and host IDs trusted?
-- For AIX 6.1 TL5 or lower
lsrsrc IBM.ManagementServer ---> Is HMC listed as a resource?
-- For AIX 6.1 TL6 or higher
lsrsrc IBM.MCP ---> Is the HMC listed as a resource?
- On HMC (as hscroot)
lspartition -dlpar ---> Is LPAR's DCaps value non-zero ?
If you answer no to any of the above then corrective action is required.
- Fix It Commands (run as root on LPAR, HMC, or both)
Caution: Running the commands listed below on AIX LPARs is only safe if the node is only a member of the HMC's RMC domain. These commands should not be used in an active CAA clustered environment. If you need to determine if your system is a member of a CAA cluster then please refer to the Reliable Scalable Cluster Technology document titled, "Diagnosing problems with the Resource Monitoring and Control (RMC) subsystem."
http://pic.dhe.ibm.com/infocenter/aix/v7r1/index.jsp?topic=%2Fcom.ibm.aix.rsct312.trouble%2Fbl507_diagrmc.htm
Pay particular attempt to the section titled Diagnostic procedures to help learn if you node is a member of any domain other than the HMC management domain.
odmdelete -o CuAt -q "name='cluster0'" (Only run this on AIX or VIOS)
/usr/sbin/rsct/install/bin/recfgct
/usr/sbin/rsct/bin/rmcctrl -p
You would need a pesh password for your HMC if you need to run the above fix commands on the HMC.
You can try the following command first as hscroot:
lspartition -dlparreset
If that does not help you will need to request pesh passwords from IBM Support for your HMC so you can run the recfgct and rmcctrl commands listed above.
- Things to consider before using the above fix commands or if the reconfigure commands don't help.
- Firewalls blocking bidirectional RMC related traffic for UDP and TCP on port 657.
- Mix of jumbo frames and standard Ethernet frames between the HMC and LPARs.
- Multiple interfaces with IP addresses on the LPARs that can route traffic to the HMC.
-- IF AIX 6.1 TL5 or lower
lslpp -l csm.client ---> This fileset needs to be installed
-- IF AIX 6.1 TL6 or higher
lslpp -l rsct.core.rmc ---> This fileset needs to be 3.1.0.x level or higher
-- For all AIX versions
/usr/sbin/rsct/bin/ctsvhbac ---> Are all IP and host IDs trusted?
-- For AIX 6.1 TL5 or lower
lsrsrc IBM.ManagementServer ---> Is HMC listed as a resource?
-- For AIX 6.1 TL6 or higher
lsrsrc IBM.MCP ---> Is the HMC listed as a resource?
- On HMC (as hscroot)
lspartition -dlpar ---> Is LPAR's DCaps value non-zero ?
If you answer no to any of the above then corrective action is required.
- Fix It Commands (run as root on LPAR, HMC, or both)
Caution: Running the commands listed below on AIX LPARs is only safe if the node is only a member of the HMC's RMC domain. These commands should not be used in an active CAA clustered environment. If you need to determine if your system is a member of a CAA cluster then please refer to the Reliable Scalable Cluster Technology document titled, "Diagnosing problems with the Resource Monitoring and Control (RMC) subsystem."
http://pic.dhe.ibm.com/infocenter/aix/v7r1/index.jsp?topic=%2Fcom.ibm.aix.rsct312.trouble%2Fbl507_diagrmc.htm
Pay particular attempt to the section titled Diagnostic procedures to help learn if you node is a member of any domain other than the HMC management domain.
odmdelete -o CuAt -q "name='cluster0'" (Only run this on AIX or VIOS)
/usr/sbin/rsct/install/bin/recfgct
/usr/sbin/rsct/bin/rmcctrl -p
You would need a pesh password for your HMC if you need to run the above fix commands on the HMC.
You can try the following command first as hscroot:
lspartition -dlparreset
If that does not help you will need to request pesh passwords from IBM Support for your HMC so you can run the recfgct and rmcctrl commands listed above.
After running the above commands it will take
several minutes before RMC connection is restored. The best way to
monitor is by running the lspartition -dlpar command on the HMC every
few minutes and watch for the target LPAR to show up with a non-zero
DCaps value.
- Things to consider before using the above fix commands or if the reconfigure commands don't help.
If you are still confused about whether or not your LPAR is a member of
a CAA cluster then some application names might help (PowerHA 7, HPC
applications such as GPFS, ViSDs, CSM, etc). Most administrations should
have a good idea how their server is configured and what is running on
them so the decision to proceed can be easy. The diagnostic checks
covered in the RSCT document should help with the decision if you are
unsure.
Network issue are often overlooked or disregarded.
There are some network configuration issues and perhaps even some APAR
issues that might need to be addressed if the commands that reconfigure
RSCT don't restore DLPAR functions and those issues will require
additional debug steps not covered in this tech note. However, there are
some common network issues that can prevent RMC communications from
passing between the HMC and the LPARs and they include the following.
- Firewalls blocking bidirectional RMC related traffic for UDP and TCP on port 657.
- Mix of jumbo frames and standard Ethernet frames between the HMC and LPARs.
- Multiple interfaces with IP addresses on the LPARs that can route traffic to the HMC.
0 blogger-disqus:
Post a Comment