I have a FAS 2554 with 2-nodes running CDoT 8.3.1. The HA status shows me everything is good. Interface groups, failover targets, etc all check out. I have a root aggregate on each node (partitioned as 8.3.1 is wont to do) and an aggregate for a SVM owned by node1. When I initiate a takeover of node1 by node2, everything works fine, node1 reboots normally, giveback works as expected.
However when I do a takeover of node2 by node1, node2 hangs on reboot. Even the SP goes offline. The only thing I can do is a power cable yank at the datacenter. Then, it finally boots into waiting for giveback mode and accepts a giveback.
Anyone ever seen this behavior? We even wiped the config and started over from scratch, but get the same issue when I do failover testing from node2 to node1. I don't see any config or cabling issues, so I'm wondering if there's a problem w/ the node or the interconnect hardware.
Here's some info, but I'm happy to provide anything else that might help someone troubleshoot.
Thanks,
Steve
cluster1::> storage failover show-takeover
Node Node Status Aggregate Takeover Status
---------- --------------------- -------------- -------------------------------
node1
In takeover.
- -
Warning: Unable to list entries on node node2. RPC: Port mapper
failure - RPC: Timed out
cluster1::> system node run -node node1 -command storage show fault
Enclosure Status: critical
Channel: 0a
Shelf: 0
Shelf Type: DS4246
Product Serial Number: SHJSG1504000090
Module Type: IOM6E
Disk Elements:
Element Status Status Bytes Status Descriptions
0 [Bay 0]: OK 01,00,00,00
1 [Bay 1]: OK 01,01,00,00
2 [Bay 2]: OK 01,02,00,00
3 [Bay 3]: OK 01,03,00,00
4 [Bay 4]: OK 01,04,00,00
5 [Bay 5]: OK 01,05,00,00
6 [Bay 6]: OK 01,06,00,00
7 [Bay 7]: OK 01,07,00,00
8 [Bay 8]: OK 01,08,00,00
9 [Bay 9]: OK 01,09,00,00
10 [Bay 10]: OK 01,0A,00,00
11 [Bay 11]: OK 01,0B,00,00
12 [Bay 12]: OK 01,0C,00,00
13 [Bay 13]: OK 01,0D,00,00
14 [Bay 14]: OK 01,0E,00,00
15 [Bay 15]: OK 01,0F,00,00
16 [Bay 16]: OK 01,10,00,00
17 [Bay 17]: OK 01,11,00,00
18 [Bay 18]: OK 01,12,00,00
19 [Bay 19]: OK 01,13,00,00
20 [Bay 20]: OK 01,14,00,00
21 [Bay 21]: OK 01,15,00,00
22 [Bay 22]: OK 01,16,00,00
23 [Bay 23]: OK 01,17,00,00
Power Supplies:
Element Status Status Bytes Status Descriptions
1: OK 01,00,00,A0 RQSTED ON
2: OK 01,00,00,A0 RQSTED ON
3: OK 01,00,00,A0 RQSTED ON
4: OK 01,00,00,A0 RQSTED ON
Fans:
Element Status Status Bytes Status Descriptions
1: OK 01,03,AC,A7
2: OK 01,03,66,A6
3: OK 01,03,AC,A7
4: OK 01,03,66,A6
5: OK 01,03,AC,A7
6: OK 01,03,66,A6
7: OK 01,03,AC,A7
8: OK 01,03,66,A6
Temperature Sensors:
Element Status Status Bytes Status Descriptions
1: OK 01,00,21,00
2: OK 01,00,2B,00
3: OK 01,00,2A,00
4: OK 01,00,38,00
5: OK 01,00,2A,00
6: OK 01,00,36,00
7: OK 01,00,2A,00
8: OK 01,00,36,00
9: OK 01,00,2B,00
10: OK 01,00,39,00
11: OK 01,00,30,00
12: OK 01,00,30,00
Enclosure Electronics:
Element Status Status Bytes Status Descriptions
1 [IOM6E A] : OK 01,00,01,80 REPORT
2 [IOM6E B] : OK 01,00,00,80
OPS Panel:
Element Status Status Bytes Status Descriptions
1: OK 01,00,00,00
Enclosure:
Element Status Status Bytes Status Descriptions
1: OK 01,00,02,00 FAIL
Voltage Sensors:
Element Status Status Bytes Status Descriptions
1: OK 01,00,FF,01
2: OK 01,00,BA,04
3: OK 01,00,FF,01
4: OK 01,00,BE,04
5: OK 01,00,FF,01
6: OK 01,00,BE,04
7: OK 01,00,FF,01
8: OK 01,00,BE,04
Current Sensors:
Element Status Status Bytes Status Descriptions
1: OK 01,00,82,01
2: OK 01,00,B7,02
3: OK 01,00,B7,00
4: OK 01,00,03,02
5: OK 01,00,11,01
6: OK 01,00,E8,01
7: OK 01,00,21,01
8: OK 01,00,3E,02
SAS Connectors:
Element Status Status Bytes Status Descriptions
1: OK 01,3F,FF,00
2: OK 01,03,FF,00
3: OK 01,3F,FF,00
4: OK 01,03,FF,00
Vendor Unique Element 83-IOM6E: (SAS)
Element Status Status Bytes Status Descriptions
1 [IOM6E A] : OK 01,08,00,00 MASTER
2 [IOM6E B] : OK 01,00,00,00
Vendor Unique Element 85-IOM6E: (ACP)
Element Status Status Bytes Status Descriptions
1 [IOM6E A] : OK 01,00,00,00
2 [IOM6E B] : CRITICAL 02,00,00,40 FAIL
Vendor Unique Element 88-IOM6E: (PCM)
Element Status Status Bytes Status Descriptions
1: OK 01,01,00,00
2: OK 01,01,07,80 PC SHELF FAULT RQSTD
Vendor Unique Element 8B-IOM6E: (ETHERNET)
Element Status Status Bytes Status Descriptions
1 [IOM6E A] : OK 01,01,00,00
2 [IOM6E B] : OK 01,01,00,00