Options
- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Mute
- Printer Friendly Page
NetApp FAS2552 One of the controllers keeps faulting every 2 days
2024-04-30
03:39 AM
4,000 Views
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi Team
we have an issue on one of our netapp appliances.
One of the controllers keeps faulting every 2 days. Only a hard reset seems to solve the issue temporarily.
Some of the resources do stay online but the rest do not failover to the other controller.
kindly Advise
9 REPLIES 9
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Oupsy !
Carefull ! you have minimum 2 hard drive out of order ..
You are near to lose your data ...
Call NetApp to change these disk out of order
After you have many other issue, due to mistake during install, we can see after you disk have been changed
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
client confirmed that disks was replaced 2 weeks ogo however the log shows that disks still out of order
attached the result of
storage disk show -broken
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Here's what Cedric was referring to:
4/30/2024 09:00:00 MOCO-STR-BKP2 EMERGENCY monitor.shutdown.brokenDisk: two data disks in RAID group "/Aggr01_FSAS/plex0/rg0" are broken. Halting system now.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
client confirmed that disks was replaced 2 weeks ago however the log shows that disks still out of order
attached the result of
storage disk show -broken
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I had something similar with a costumer.
I have some questions: Do you have spare disks? and, what firmware version have your SP?
In my incident with the FAS2552, one of my aggregates have two disk failures and one controller gones down. We have a 3 spare disk, but CDOT (Ontap in 9s versions) doesn't take any spare disk, this due a bug in the Service Processor firmware version. While, we waiting of arrival of disks for replacement we need to change raid time out, from 24 to 72. Check this commands:
storage raid-options show
storage raid-options modify -node node1 -name raid.timeout 48
Then, if your controller keeps turning off, is possible that you have more failed disks.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
The Service Processor has no control over RAID or spare disks.
If you do not have any spare disks, ONTAP will not be able to start the reconstruct and will shut down until spare disks are added.
If you have unassigned disks, you need to assign them to the node.
To check for unassigned disks run "run -node * disk show -n"
If you have ADP (disk partitioning), and disk autoassign isn't working, you'll also need to assign the partitions created by ONTAP, as they will also be unassigned.
Unassigned partitions will also show up in "disk show -n"
If you do not have any spare disks, ONTAP will not be able to start the reconstruct and will shut down until spare disks are added.
If you have unassigned disks, you need to assign them to the node.
To check for unassigned disks run "run -node * disk show -n"
If you have ADP (disk partitioning), and disk autoassign isn't working, you'll also need to assign the partitions created by ONTAP, as they will also be unassigned.
Unassigned partitions will also show up in "disk show -n"
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Agreed, this sounds like the issue to me too.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
If you review disk show.txt ouput I can see a one spare disk
1.1.17 3.63TB 1 17 FSAS spare Pool0 MOCO-STR-BKP2
I think that Failed agregate has not yet rebuilding
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Need more data.
full sysconfig -r from both nodeshells
full disk show -n from either nodeshell
