Wednesday 2 October 2013

Hardware Management Console (HMC ) Explained


HMC (Hardware Management Console) is a technology created by IBM  Vendor to provide a standard utility (interface) for configuring and operating logical partitions (also known as an LPAR or virtualized systems) and managing the  SMP (Symmetric multiprocessing)  systems such as IBM System i/z/p and IBM Power Systems.

Basically HMC is customized Linux blended with Java and many other graphical components. As per wiki  "The HMC is a Linux kernel using Busybox to provide the base utilities and X Window using the Fluxbox window manager to provide graphical logins. The HMC also utilizes Java applications to provide additional functionality."

As AIX admins like me very much  fond of  HMC uses in day-today operations. HMC supports the system with features that enable a system administrator to manage configuration and operation of partitions in a system, as well as to monitor the system for hardware problems. It consists of a 32-bit Intel-based desktop PC with a DVD-RAM drive.

Connection of HMC with different managed systems is shown in below diagram.

What does the HMC do?

  •     Creates and maintains a multiple-partitioned environment
  •     Displays a virtual operating system session terminal for each partition
  •     Displays virtual operator panel values for each partition
  •     Detects, reports, and stores changes in hardware conditions
  •     Powers managed systems on and off
  •     Powers Logical partitions on and off 
  •     Booting systems in Maintenance mode and  doing dump reboots
  •     Acts as a service focal point for service representatives to determine an appropriate service                                                   strategy  and enable the Service Agent Call-Home capability
  •     Activates additional resources on demand ( we call it as CoD, capacity on demand)
  •     Perform DLAR Operations. 
  •     Perform  Firmware up-gradations on managed systems
  •     Remote  management of managed systems

HMC  Facts:

  • Single HMC can manage multiple physical frames frames ( managed systems)
  • You can't open more than one virtual console for a given lpar at a given  time.
  • If your HMC is down , nothing will happen to your managed systems and their lpars they will operate as usual but only thing we can't manage them if something happens
  • There wont be direct root login . By default we get hscroot. ( need to engage  IBM support to get the root password)

HMC Operating Modes:

You can operate  HMC in two modes.

  1. Command Line Interface ( CLI )
  2. Graphical Interface
Each methods has its own merits ,  graphical interface can you clear view , how you can operate the managed systems even with  minimal knowledge on HMC

Where as using CLI you can run the information very fastly using commands  and scripts.

Below figure show how graphical interface.






















 

HMC Version Evolution:

  • HMC V7, for POWER5, POWER6 and POWER7 models
    • HMC V7R7.2.0 (Initial support for Power 710, Power 720, Power 730, Power 740 and Power 795 models)
    • HMC V7R7.1.0 (Initial support for POWER7)
    • HMC V7R3.5.0 (released Oct. 30, 2009)
    • HMC V7R3.4.0
    • HMC V7R3.3.0
    • HMC V7R3.2.0
    • HMC V7R3.1.0 (Initial support for POWER6 models)
  • HMC V6
    • HMC V6R1.3
    • HMC V6R1.2
  • 5.2.1
  • 5.1.0
  • 4.5.0
  • 4.4.0
  • 4.3.1
  • 4.2.1
  • 4.2.0, for POWER5 models
  • 4.1.x
  • 3.x, for POWER4 models

 RMC (Resource Monitoring and Control) & Association with HMC:

RMC is a distributed framework and architecture that allows the HMC to communicate with a managed logical partition. for example "IBM.DMSRM" is deamon which needs to run on the lapr inorder do DLAPR operation on through HMC on the lpar.

Both daemons in LPARs and HMCs use external network  to communicate among themselves but not through server processor  means both have access to same external  network in order to work with RMC related commands.

In order for RMC to work, port 657 upd/tcp must be open in both directions between the HMC public interface and the lpar.

The RMC daemons are part of the Reliable, Scalable Cluster Technology (RSCT) and are controlled by the System Resource Controller (SRC). These daemons run in all LPARs and communicate with equivalent RMC daemons running on the HMC. The daemons start automatically when the operating system starts and synchronize with the HMC RMC daemons.

Note: Apart from rebooting, there is no way to stop and start the RMC daemons on the HMC!

Things to check at the HMC:

- checking the status of the managed nodes: /usr/sbin/rsct/bin/rmcdomainstatus -s ctrmc  (you must be root on the HMC)

- checking connection between HMC and LPAR:
hscroot@umhmc1:~> lspartition -dlpar
<#0> Partition:<2 data-blogger-escaped-10.10.50.18="" data-blogger-escaped-aix10.domain.com="">
       Active:<1>, OS:, DCaps:<0x4f9f>, CmdCaps:<0x1b data-blogger-escaped-0x1b="">, PinnedMem:<1452>
<#1> Partition:<4 data-blogger-escaped-10.10.50.71="" data-blogger-escaped-aix20.domain.com="">
       Active:<0>, OS:, DCaps:<0x0>, CmdCaps:<0x1b data-blogger-escaped-0x1b="">, PinnedMem:<656>

For correct DLPAR function:
- the partition must return with the correct IP of the lpar.
- the active value (Active:...) must be higher than zero,
- the decaps value (DCaps:...) must be higher 0x0

(The first line shows a DLPAR capable LPAR, the second line is anon-working LPAR)

----------------------------------------

Things to check at the LPAR:

- checking the status of the managed nodes: /usr/sbin/rsct/bin/rmcdomainstatus -s ctrmc

- Checking RMC status:

# lssrc -a | grep rsct
 ctrmc            rsct             8847376      active   <== it is a RMC subsystem
 IBM.DRM          rsct_rm          6684802      active   <== it is for executing the DLPAR command on the partition
 IBM.DMSRM        rsct_rm          7929940      active    <== it is for tracking statuses of partitions
 IBM.ServiceRM    rsct_rm          10223780     active
 IBM.CSMAgentRM   rsct_rm          4915254      active   <== it is for  handshaking between the partition and HMC
 ctcas            rsct                          inoperative    <== it is for security verification
 IBM.ERRM         rsct_rm                       inoperative
 IBM.AuditRM      rsct_rm                       inoperative
 IBM.LPRM         rsct_rm                       inoperative
 IBM.HostRM       rsct_rm                       inoperative    <==it is for obtaining OS information

You will see some active and some missing (The key for DLPAR is the IBM.DRM)
- Stopping and starting RMC without erasing configuration:

# /usr/sbin/rsct/bin/rmcctrl -z   <== it stops the daemons
# /usr/sbin/rsct/bin/rmcctrl -A   <== adds entry to /etc/inittab and it starts the daemons
# /usr/sbin/rsct/bin/rmcctrl -p   <== enables the daemons for remote client connections

(This is the correct method to stop and start RMC without erasing the configuration.)
Do not use stopsrc and startsrc for these daemons; use the rmcctrl commands instead!

- recfgct: deletes the RMC database, does a discovery, and recreates the RMC configuration

# /usr/sbin/rsct/install/bin/recfgct (Wait several minutes)
# lssrc -a | grep rsct

(If you see IBM.DRM active, then you have probably resolved the issue)

Getting  Information of LPARS & HMC either way:

Make a Note: In-order to work with these commands you should have rsct daemons running on the servers means make sure RMC communication between the HMC and LPAR is happening.

1) Getting HMC IP  information from LPAR:

If you get the information of which HMC/HMCs your lpar associate  managed system ( frame) connected by using "lsrsrc" command .

Command: finding the HMC IP address
 (lsrsrc IBM.ManagementServer (or lsrsrc IBM.MCP on AIX 7))
$ lsrsrc IBM.ManagementServer

Resource Persistent Attributes for IBM.ManagementServer
resource 1:
Name             = "192.168.1.2″
Hostname         = "192.168.1.2″
ManagerType      = "HMC"
LocalHostname    = "ldap1-en1″
ClusterTM        = "9078-160″
ClusterSNum      = ""
ActivePeerDomain = ""
NodeNameList     = {"lpar1"}
So in this case, the HMC IP address is 192.168.1.2.

2) Get Managed System & LPAR information :

Below script will give us full details about Frame & LPAR information and their allocated CPU & Memory.

Script to get Frame & LPAR information
for system in `lssyscfg -r sys -F "name,state" | sort | grep ",Operating" | sed 's/,Operating//'`; do 
  echo $system
  echo "    LPAR            CPU    VCPU   MEM    OS"
  for lpar in `lssyscfg -m $system -r lpar -F "name" | sort`; do
     default_prof=`lssyscfg -r lpar -m $system --filter "lpar_names=$lpar" -F default_profile`
     procs=`lssyscfg -r prof -m $system --filter "profile_names=$default_prof,lpar_names=$lpar" -F desired_proc_units`
     vcpu=`lssyscfg -r prof -m $system --filter "profile_names=$default_prof,lpar_names=$lpar" -F desired_procs`
     mem=`lssyscfg -r prof -m $system --filter "profile_names=$default_prof,lpar_names=$lpar" -F desired_mem`
     os=`lssyscfg -r lpar -m $system --filter "lpar_names=$lpar" -F os_version`
     printf "    %-15s %-6s %-6s %-6s %-30s\n" $lpar $procs $vcpu $mem "$os"
  done
done


Generally  people will think there is no way to run scripts in HMC,but we have a possibility for this  use "rnvi" command to make scrippt file i.e  "rnvi -f  hmcscriptfile".

To run the script, use the "source" command.   For example "source hmcscriptfile".   This will run the script in your current shell. 


Here  "hmcscriptfile" is the script name and run the script like below you will see the o/p as below.

How to run script & o/p
hscroot@umhmc1:~> source hmcscriptfile
p570_frame5_ms
    LPAR            CPU    VCPU   MEM    OS
    umlpar1         0.1    3      512    AIX 6.1 6100-07-05-1228       
    umlpar2         0.1    3      512    AIX 6.1 6100-07-05-1228       
    umlpar3         0.1    3      512    Unknown                       
    linux1          0.1    3      512    Unknown                       
    vio1            0.2    2      512    VIOS 2.2.1.4                  
    vio2            0.1    2      352    Unknown       

5 comments:

  1. I think it is very effective post for us. Thanks for this nice and Helpful post. There is exciting moderately nice about the "Hardware Management Console (HMC ) Explained". I like the suggestion.
    I think, Online computer troubleshooting is rapidly becoming the most demanded online service on the World Wide Web.
    I want to share some Information about computer troubleshooters.

    ReplyDelete
  2. I like to add a small tip. Wrap your block of script snippet into a shell function, source the script file, and invoke the block of shell commands by calling the shell function.

    ReplyDelete
  3. Good Script., Very helpful. Is it possible to add the LPARs uptime in the same script.
    I am trying to get this Info. But not able to find. Please let me know if its possible to do so..

    Thanks,
    Srinivasan R K

    ReplyDelete
  4. Thank you for an informative post.

    ReplyDelete
  5. good article i certified AIX 4.3 but I know I way out of date. :)

    ReplyDelete