ONTAP Hardware

HA pair fas2220 troubleshooting issue

Abouk
7,470 Views

I have an issue with my HA pair configuration with FAS2220.

So After shelf fault, one node 1 tookover node2 ,  I have theses issues :

s0u1sanb(takeover)> environment shelf
Environment for channel 0a
Number of shelves monitored: 1enabled: yes
Environmental failure on shelves on this channel? yes

s0u1sanb(takeover)> sysconfig -a
*** This system has taken over s0u1sana
System Storage Configuration: Single-Path HA
System ACP Connectivity: Partial Connectivity

 

slot 0: SAS Host Adapter 0a (PMC-Sierra PM8001 rev. C, SAS, <UP>)

slot 0: SAS Host Adapter 0b (PMC-Sierra PM8001 rev. C, SAS, <OFFLINE (hard)>) and PCM LED on, and

 

Is there any solution for that , or I should to replace tne failed node ?

 

Thanks alot for your advices !

 

8 REPLIES 8

GidonMarcus
7,399 Views

Hi. What was the shelf fault, and are we sure it has been resolved?

Currently, the system seems to only see the following disks in the built-in shelf:

Shelf mapping (shelf-assigned addresses) for channel 0a:

  Shelf   0: XXX XXX XXX XXX XXX XXX XXX XXX XXX XXX XXX XXX  11  10   9   8   7   6   5   4   3   2   1   0

It's aware of a cable connected to the 0b port of controller B - but no link at the end (hence offline-hard)

SAS Host Adapter 0b (PMC-Sierra PM8001 rev. C, SAS, <OFFLINE (hard)>)
Firmware rev:       01.11.07.00
Base WWN:           5:00a098:001c4aa:74
Phy State:          [4] Enabled, Rate unknown
                    [5] Enabled, Rate unknown
                    [6] Enabled, Rate unknown
                    [7] Enabled, Rate unknown
QSFP Vendor:        Molex Inc.      
QSFP Part Number:   112-00176+A0    
QSFP Type:          Passive Copper 0.5m ID:00

and It's aware of a cable connected to the 2nd node SAS Port (0a I believe), but I can't tell if it's up or not:

  [4] Vendor: Molex Inc.
      Type: QSFP passive copper 0.5-1.0m  ID: 00  Swaps: 1

 

So again - unclear what happened to the shelf, and to the other node (maybe the other node panic as it lost it's root AGGR disks? need to login via SP/console and see).

Gidi Marcus (Linkedin) - Storage and Microsoft technologies consultant - Hydro IT LTD - UK

Abouk
7,390 Views

Hi Marcus,

 

the problem still not resolved.

I cannot console the sana controller.

I can only console the sanb controller wich it's working.

As you can see in the attached file :

s0u1sanb(takeover)> environment shelf

...
Environment for channel 0a
Number of shelves monitored: 1enabled: yes
Environmental failure on shelves on this channel? yes

s0u1sanb(takeover)> sysconfig -a

This system has taken over s0u1sana

...

System Storage Configuration: Single-Path HA
System ACP Connectivity: Partial Connectivity

 

slot 0: SAS Host Adapter 0a (PMC-Sierra PM8001 rev. C, SAS, <UP>)

slot 0: SAS Host Adapter 0b (PMC-Sierra PM8001 rev. C, SAS, <OFFLINE (hard)>)

 

So, should I restard the sana controller in order to be able to console it and then to boot it ?

The PCN LED is On , may be it's a boot problem ?

 

I'm really a beginner  on SAN and I need your help

 

Thanks

 

GidonMarcus
7,385 Views

You can run the command "cf status" to see if the other node maybe up and ready for giveback.

 

If not - the system has a physical COM console port which you can connect to with a standard RJ-45-Console cable.

It also has a service processor (SP) IP based remote-control port (a bit like ILO/iDRAC/BMC on servers). You can find the IP of the working controller SP with the command "SP status" and perhaps guess the non-working controller IP and try to connect to via telnet/ssh (user is "naroot" with the same password as the "root" user on the controller itself).

Once you connected - you can run the command "system console" to jump into the "physical" console port of the controller - and troubleshoot any boot/disks issues.

 

It still unclear what the status of the external shelf from you reply., if the SAS HBA on the second controller will still not recognize it some pictures of the cables/LEDs and physical troubleshooting will be needed.  (can my maybe do some rests/power-cycle via ACP if cabled - but it seems that it's not or the shelf down in the output which says "System ACP Connectivity: Partial Connectivity")

Gidi Marcus (Linkedin) - Storage and Microsoft technologies consultant - Hydro IT LTD - UK

Abouk
7,379 Views

Ho Marcus,

you can read my attaches file for mire information.

the console connection dosen’t work Edith the failed controller.

how to connect with the SP?

E800966F-B3E3-4059-92B1-962D461CF871.jpeg

paul_stejskal
7,322 Views

It may be bad. Your SP should be pulling a DHCP IP address, so you can check your DHCP server to see if you have a DHCP address for it, then SSH in, if it wasn't previously configured.

 

If you aren't getting a console response using the standard console properties (just like Cisco), and CTRL+G or CTRL+D don't do anything, it's likely dead.

 

Have you tried a reseat?

Abouk
7,307 Views


Hi,

I’m  getting a console response using  CTRL+G or CTRL+D , I can have the SP prompt 

So what should I do to enable  my 0a adapter which  is down? 
and to resolve my shelffault and to reboot my controller on SP mode

 

thanks 

Abouk
7,300 Views

So to resum :

I have an HA fas2220  with 2 controllers  s0u1sana and s0u1sanb

sanb is taking over sana  

s0u1sanb(takeover)>
s0u1sanb(takeover)> environment shelf
Environment for channel 0a
Number of shelves monitored: 1enabled: yes
Environmental failure on shelves on this channel? yes

no power fault

s0u1sanb(takeover)> sysconfig  -a
*** This system has taken over s0u1sana
NetApp Release 8.2.1 7-Mode: Fri 
System ID:      xxxxxxxx (s0u1sanb); partner ID: yyyyyyy(s0u1sana)

System Storage Configuration: Single-Path HA
System ACP Connectivity: Partial Connectivity


Interconnect Port: port not active
                        memory mapped I/O base 0xdf400000, size 0x100000
                        prefetchable memory base 0xde800000, size 0x800000
slot 0: SAS Host Adapter 0a (PMC-Sierra PM8001 rev. C, SAS, <UP>)
Firmware rev:       01.11.07.00
Base WWN:           5:00a098:001c4aa:70
Phy State:          [0] Enabled, 6.0 Gb/s
                    [1] Enabled, 6.0 Gb/s
                    [2] Enabled, 6.0 Gb/s
                    [3] Enabled, 6.0 Gb/s
                00.0 : NETAPP   X487_HCOBE600A10 NA00 560.0GB 520B/sect (KSG6V0MJ)
                00.1 : NETAPP   X487_HCOBE600A10 NA00 560.0GB 520B/sect (KSG6U2RJ)
                00.2 : NETAPP   X487_HCOBE600A10 NA00 560.0GB 520B/sect (KSG6V3HJ)
                00.3 : NETAPP   X487_HCOBE600A10 NA00 560.0GB 520B/sect (KSG6V82J)
                00.4 : NETAPP   X487_HCOBE600A10 NA00 560.0GB 520B/sect (KSG7TSLJ)
                00.5 : NETAPP   X487_HCOBE600A10 NA00 560.0GB 520B/sect (KSG6VKVJ)
                00.6 : NETAPP   X487_HCOBE600A10 NA00 560.0GB 520B/sect (KSG6A8MJ)
                00.7 : NETAPP   X487_HCOBE600A10 NA00 560.0GB 520B/sect (KSG7TZ0J)
                00.8 : NETAPP   X487_HCOBE600A10 NA00 560.0GB 520B/sect (KSG6V1DJ)
                00.9 : NETAPP   X487_HCOBE600A10 NA00 560.0GB 520B/sect (KSG6U8EJ)
                00.10: NETAPP   X487_HCOBE600A10 NA00 560.0GB 520B/sect (KSG6V0HJ)
                00.11: NETAPP   X487_HCOBE600A10 NA00 560.0GB 520B/sect (KSG6U9HJ)
Shelf 0: DS2126E  Firmware rev. IOM6E A: ----  IOM6E B: 0142
slot 0: SAS Host Adapter 0b (PMC-Sierra PM8001 rev. C, SAS, <OFFLINE (hard)>)
Firmware rev:       01.11.07.00
Base WWN:           5:00a098:001c4aa:74
Phy State:          [4] Enabled, Rate unknown
                    [5] Enabled, Rate unknown
                    [6] Enabled, Rate unknown
                    [7] Enabled, Rate unknown
QSFP Vendor:        Molex Inc.      
QSFP Part Number:   112-00176+A0    
QSFP Type:          Passive Copper 0.5m ID:00
QSFP Serial Number: 213820027


s0u1sanb(takeover)> cf status
s0u1sanb has taken over s0u1sana.

So, I can connect only on SP prompt :

Which SP commands should I use to correct Theses issues :
- Environmental failure on shelves on this channel? yes
- Interconnect Port: port not active
- System Storage Configuration: Single-Path HA - System ACP Connectivity: Partial Connectivity
- slot 0: SAS Host Adapter 0b (PMC-Sierra PM8001 rev. C, SAS, <OFFLINE (hard)>)

Thanks a lot

Abouk
7,278 Views

Hi,

I use  system power on and system power cycle but the system power still off.

Public