Last week, I have tested all resiliency scenarios on a new FAS8040-FMC 7-mode.
I have encountered the following issue.
Here is the summary of the configuration:
FAS8040 controller A, 1 stack with 2 x 48 Disks SATA DS4246, Pool0 of controller A, Pool1 of controller B.
FAS8040 controller B, 1 stack with 2 x 48 Disks SATA DS4246, Pool0 on Controller B, Pool1 of controller A.
When we did power off one SATA shelf in Site A, Aggr mirror of controller B went offline on second plex (pool 1). Normal behaviour.
When we did power on the SATA shelf back, mirror plex of controller B aggregate started a double reconstruction using spares of pool1 and pool 0 (Pool 0 !)
Why: When we power cycle a SATA shelf, disks are not initialized at the same time. You can see that disks start one column after the other.
So Data OnTAP started to see disks, but not all at the same time. When a minimum amount of disks had been detected, the system started the double reconstruction with the spare disks.
As a workaround, we did the same test but before powering on the shelf, we removed the SAS cables. When the shelf was fully initialized, we put the SAS cables back. It worked.
So, here is my question: Do you know if there is a specific setting in Data OnTAP to prevent such behaviour?
And is it normal to reconstruct an aggregate with a spare disk from the remote pool? Is this done by design to protect the Raid groups?
Thank you for reading
Thank you, this is really useful information. Which version of DOT are you running (to exclude possibility that this was fixed in teh latest version)?
Regarding spare disk selection - yes, this behavior is documented, see How does Data ONTAP select spares for aggregate creation, aggregate addition and failed disk replacement?
We are running version 8.2.3P2 (almost the latest one).
I've been provided the following information by Netapp.
Disk drive spin-up time differs for disk shelves with four power supplies and two power supplies. For a disk shelf with four power supplies, all disk drives spin-up at the same time. For a disk shelf with two power supplies, each of the four columns of disk drives spin-up at 12 second intervals.
It seems that adding 2 power supplies should solve the issue.
Otherwise, I've been told that removing the SAS cables before powering on the shelf is recommended.
Well, NetApp ships SATA shelves with 2 PSUs and official statement is, 2 PSUs are enough for SATA. It is true in the sense, shelf with SATA drives will function with single PSU. But as we have seen it could cause issues.
It is really independent of Metro Cluster (although having two pools makes it worse). Consider short power outage. FAS and shelves will power on when it is over and this may trigger excessive reconstructions that could be avoided. And even with 4 PSUs there is no guarantee that all 4 of them will be powered if outage happens.
But returning to the question of spare selection - it looks like a bug. According to KB I mentioned
Data ONTAP will search for suitable spares in the opposite pool only if the aggregate is mirror-degraded or is resyncing, with
the plex containing the failed disk serving as the source of the resync.
according to your description that was not the case. You have "failed" disks in target part of resync in mirrored aggregate. It should not use spares from different pool then. On your place I would open separate case pointing to this KB article and asking to clarify.
The issue regarding this is that Data OnTAP could see some disks of the failed aggregates, but not all of them (because of the delay in restarting all disks).
So instead of waiting for an additional 24 seconds maybe, there is a mechanism to restart a double reconstruction with the spare disks which are available.
When you are connected to the console, you really see disks coming online by batches, and when enough were detected, it started this reconstruction.
I've talked with someone from Netapp yesterday who told me that the SAS cables disconnection is the "rule". I've asked him to point me to the official documentation about this, I'm still waiting.
There are two independent problems here - staggering disk startup and spare selection. Whatever the reason for degraded plex is, spare selection does not look right.