i tried to update the AFF220 Cluster from 9.4 P3 to 9.5P1
1st node went fine, 2nd node then stuck. Got an Email which said Automatic NDU paused.
Now i see both links down:
e0a Cluster Cluster down 9000 1000/- - false e0b Cluster Cluster down 9000 1000/- - false
they dont see each other anymore. Cables are fine, they worked before and noone touched them. i can access both BMC and both nodes but it says :
3/27/2019 14:13:02 aff220-01 ALERT callhome.andu.pausederr: subject="AUTOMATED NDU PAUSED", epoch="9fb37de9-7eae-497e-8a65-e2a1132d88b0" 3/27/2019 14:12:02 aff220-01 ALERT callhome.andu.pausederr: subject="AUTOMATED NDU PAUSED", epoch="60d38721-a585-42c5-83a5-bba67f05ddb9" 3/27/2019 14:11:46 aff220-01 ERROR net.ifgrp.lacp.link.inactive: ifgrp a0a, port e0d has transitioned to an inactive state. The interface group is in a degraded state. 3/27/2019 14:11:43 aff220-01 ERROR net.ifgrp.lacp.link.inactive: ifgrp a0a, port e0c has transitioned to an inactive state. The interface group is in a degraded state.
Mar 27 17:30:31 [aff220-02:netif.init.failed:ALERT]: Initialization of network interface e0a failed due to unexpected software error ix:9.
Mar 27 17:30:33 [aff220-02:netif.init.failed:ALERT]: Initialization of network interface e0b failed due to unexpected software error ix:9.
I moved all VMs away from it before updating 😉 So no production.
Its just cluster ports e0a and e0b are dead, aprox. since during the update of 1 node from 9.4 to 9.5.
Node 02 went fine, even saw the check mark in the cluster update overview, the 2nd node then began to be stuck. and i recently saw a line which said firmware update of the nic e0a failed ? Could that be the reason why the cluster interconnect
was down ?
i can boot both, yesterday i could even backup_boot 9.4 or choose 9.5 normal boot.
Since Node 01 never went completely through with the upgrade i have a mixed version state yes.
Actually we just wiped it again and started all over. The good thing is, its all working now though. Didnt have time to play around longer since i need it. Links are up, 9.5 P1 (its a Lenovo AFF220 Think System) cause i cant get P2 yet. But its all back to normal, im gonna try to Cluster Update it again when i get P2 and will report if i see something special or if the failover again fails.
Just had this issue after upgrading to 9.5P3 from 9.4P3. After 2 days of trying to figure it out with Netapp Engineers. The solution was simply to issue the power cycle command (not a node reboot) in the BMC... I like simple solutions but COME ON two days and it was that simple, I feel stupid... Glad it was that easy though... 🙂