Friday 17 May 2013

How-to replace a failing HBA using SDD storage

This is a procedure how to replace a failing HBA or fibre channel adapter, when used in combination with SDD storage:
  1. Determine which adapter is failing (0, 1, 2, etcetera):
    # datapath query adapter
  2. Check if there are dead paths for any vpaths:
    # datapath query device
  3. Try to set a "degraded" adapter back to online using:
    # datapath set adapter 1 offline
    # datapath set adapter 1 online
    (that is, if adapter "1" is failing, replace it with the correct adapter number).
  4. If the adapter is still in a "degraded" status, open a call with IBM. They most likely require you to take a snap from the system, and send the snap file to IBM for them to analyze and they will conclude if the adapter needs to be replaced or not.
  5. Involve the SAN storage team if the adapter needs to be replaced. They will have to update the WWN of the failing adapter when the adapter is replaced for a new one with a new WWN.
  6. If the adapter needs to be replaced, wait for the IBM CE to be onsite with the new HBA adapter. Note the new WWN and supply that to the SAN storage team.
  7. Remove the adapter:
    # datapath remove adapter 1
    (replace the "1" with the correct adapter that is failing).
  8. Check if the vpaths now all have one less path:
    # datapath query device | more
  9. De-configure the adapter (this will also de-configure all the child devices, so you won't have to do this manually), by running: diag, choose Task Selection, Hot Plug Task, PCI Hot Plug manager, Unconfigure a Device. Select the correct adapter, e.g. fcs1, set "Unconfigure any Child Devices" to "yes", and "KEEP definition in database" to "no". Hit ENTER.
  10. Replace the adapter: Run diag and choose Task Selection, Hot Plug Task, PCI Hot Plug manager, Replace/Remove a PCI Hot Plug Adapter. Choose the correct device (be careful, you won't see the adapter name here, but only "Unknown", because the device was unconfigured).
  11. Have the IBM CE replace the adapter.
  12. Close any events on the failing adapter on the HMC.
  13. Validate that the notification LED is now off on the system, if not, go back into diag, choose Task Selection, Hot Plug Task, PCI Hot Plug Manager, and Disable the attention LED.
  14. Check the adapter firmware level using:
    # lscfg -vl fcs1
    (replace this with the actual adapter name).
    And if required, update the adapter firmware microcode. Validate if the adapter is still functioning correctly by running:
    # errpt
    # lsdev -Cc adapter
  15. Have the SAN admin update the WWN.
  16. Run:
    # cfgmgr -S
  17. Check the adapter and the child devices:
    # lsdev -Cc adapter
    # lsdev -p fcs1
    # lsdev -p fscsi1
    (replace this with the correct adapter name).
  18. Add the paths to the device:
    # addpaths
  19. Check if the vpaths have all paths again:
    # datapath query device | more

0 blogger-disqus:

Post a Comment