Let's say there's already a disk with double parity failed, and theretically, it allows another disk to fail, but if another disk fails, will the system shut itself down automatically after 24 hours for its own data-protection?
thank you for this one sir! Another is what if 3 failed disk is down with out applying the raid.timeout option what could happend to the system? i know there will be a data loss or 24 hours shut down but is there a TR or white paper that can support this.
A few things to consider. Remember that there can be multiple aggregates on the system, each of which can consist of multiple raid groups - aggr status -r will show the raid groups.
You can have two failed drives in every aggregate and still not lose data, because each aggregate is a seperate entity. In addition, you can lose two drives in each raid group of a _single_ aggregate without losing data, because each raid group is in its own raid-dp setup. So, if you have an aggregate with 4 raid-dp raid groups, you could lose 8 drives, as long as two come from each raid group, without losing data. For the record, I've seen this - an entire shelf powered off, but only two drives from that shelf were in any single raid group, so no data loss.
If you lost a third drive in a raid-dp raid group, that raid group would fail and the aggregate would go offline, and you'd lose data. Not sure if a failover would happen - assuming dual path HA, if the drive is down on one controller it'll be down on the other also. Also not sure if raid.timeout would shut the system down - I don't know that an offline aggregate constitutes a degraded state. Degraded implies that it's still running, which it isn't, technically.
I did a quick look, but couldn't find anything to describe what happens when you lose that third drive. This link (https://kb.netapp.com/support/index?page=content&id=3013638) hints that the controller will panic, but it is a different situation (media error on rebuild). One of the links is to article 2014172, which says to call support in the rare event that there was actually another disk failure - so maybe they don't publish what happens in that case.