ONTAP Hardware
ONTAP Hardware
Hi Team
we have an issue on one of our netapp appliances.
One of the controllers keeps faulting every 2 days. Only a hard reset seems to solve the issue temporarily.
Some of the resources do stay online but the rest do not failover to the other controller.
kindly Advise
Oupsy !
Carefull ! you have minimum 2 hard drive out of order ..
You are near to lose your data ...
Call NetApp to change these disk out of order
After you have many other issue, due to mistake during install, we can see after you disk have been changed
client confirmed that disks was replaced 2 weeks ogo however the log shows that disks still out of order
attached the result of
storage disk show -broken
Here's what Cedric was referring to:
4/30/2024 09:00:00 MOCO-STR-BKP2 EMERGENCY monitor.shutdown.brokenDisk: two data disks in RAID group "/Aggr01_FSAS/plex0/rg0" are broken. Halting system now.
client confirmed that disks was replaced 2 weeks ago however the log shows that disks still out of order
attached the result of
storage disk show -broken
I had something similar with a costumer.
I have some questions: Do you have spare disks? and, what firmware version have your SP?
In my incident with the FAS2552, one of my aggregates have two disk failures and one controller gones down. We have a 3 spare disk, but CDOT (Ontap in 9s versions) doesn't take any spare disk, this due a bug in the Service Processor firmware version. While, we waiting of arrival of disks for replacement we need to change raid time out, from 24 to 72. Check this commands:
storage raid-options show
storage raid-options modify -node node1 -name raid.timeout 48
Then, if your controller keeps turning off, is possible that you have more failed disks.
Agreed, this sounds like the issue to me too.
If you review disk show.txt ouput I can see a one spare disk
1.1.17 3.63TB 1 17 FSAS spare Pool0 MOCO-STR-BKP2
I think that Failed agregate has not yet rebuilding
Need more data.
full sysconfig -r from both nodeshells
full disk show -n from either nodeshell