NetApp FAS 2220 has detected a disk failure. After that, I received an email with the following subject.
HA Group Notification from mydevice02 (SHUTDOWN PENDING (degraded mode)) CRITICAL
After logging in for a while, I also received the following message:.
Mon Mar 23 10:00:00 JST [mydevice01:kern.uptime.filer:info]: 10:00am up 2692 days, 23:11 238839860 NFS ops, 0 CIFS ops, 0 HTTP ops, 0 FCP ops, 0 iSCSI ops Mon Mar 23 10:00:05 JST [mydevice01:cf.takeover.disabled:warning]: Controller Failover is licensed but takeover of partner is disabled due to reason : partner halted in notakeover mode. Mon Mar 23 10:00:05 JST [mydevice01:cf.ic.hourlyNicDownTime:info]: Interconnect adapter link #0 has been down for 178 minutes Mon Mar 23 10:01:14 JST [mydevice01:wafl.vol.full:notice]: Insufficient space on volume vol3 to perform operation. 8.00KB was requested but only 1.00KB was available.
"cf status" and "vol status" returned the following results.
mydevice01> cf status mydevice02 may be down, takeover disabled because of reason (partner halted in notakeover mode) mydevice01 has disabled takeover by mydevice02 (interconnect error) VIA Interconnect is down (link down). mydevice01> vol status Volume State Status Options vol0 online raid_dp, flex root 64-bit vol02 online raid_dp, flex 64-bit vol03 online raid_dp, flex 64-bit
Only one disk is damaged, but it seems that the storage is no longer accessible. We have arranged for a new disk, but I think we need to do more than just replace the disk.I would appreciate it if you could give me some advice.
By default, when a disk is broken, the system shuts down automatically every 24 hours to encourage you to replace the disk. If you reboot the system it will run for another 24 hours before shutting down. (The 24 hour timeout may be increased by altering the raid.timeout value using the options command.)
I see there are few issues going on with the filer:
1) Disk is broken - But, no spare avaialble, so naturally system will proceed to shutdown after 24 Hours, when this happen aggregates will be taken-over by Partner and system will shutdown, bue due to Insufficient space on volume vol3, it is likely that it cannot save the 'work' before it could be completely safe to shutdown. Hence, it is kind of halted.
2) I also see - Interconnect adapter link down, verify the status: filer>priv set diag filer>ic status