Hello,
Last week, I have tested all resiliency scenarios on a new FAS8040-FMC 7-mode.
I have encountered the following issue.
Here is the summary of the configuration:
Site A:
FAS8040 controller A, 1 stack with 2 x 48 Disks SATA DS4246, Pool0 of controller A, Pool1 of controller B.
Site B:
FAS8040 controller B, 1 stack with 2 x 48 Disks SATA DS4246, Pool0 on Controller B, Pool1 of controller A.
When we did power off one SATA shelf in Site A, Aggr mirror of controller B went offline on second plex (pool 1). Normal behaviour.
When we did power on the SATA shelf back, mirror plex of controller B aggregate started a double reconstruction using spares of pool1 and pool 0 (Pool 0 !)
Why: When we power cycle a SATA shelf, disks are not initialized at the same time. You can see that disks start one column after the other.
So Data OnTAP started to see disks, but not all at the same time. When a minimum amount of disks had been detected, the system started the double reconstruction with the spare disks.
As a workaround, we did the same test but before powering on the shelf, we removed the SAS cables. When the shelf was fully initialized, we put the SAS cables back. It worked.
So, here is my question: Do you know if there is a specific setting in Data OnTAP to prevent such behaviour?
And is it normal to reconstruct an aggregate with a spare disk from the remote pool? Is this done by design to protect the Raid groups?
Thank you for reading 