ONTAP Discussions

resyncing aggr plex

gdefevere
6,435 Views

What happens when you loose the loop to the good plex (connection between controller and shelf that has the latest up-to-date WAFL file system) during a aggr resync ?

1 ACCEPTED SOLUTION

aborzenkov
6,435 Views

I would expect Data ONTAP to panic due to multiple disk failures in aggregate. The second plex cannot be used because it is stale. So you suddenly lost your aggregate.

P.S. just tested in simulator and it panics indeed. I would be greatly surprised if anything else happened

View solution in original post

6 REPLIES 6

pascalduk
6,435 Views

The same as if you loose the access to disks in a configuration without syncmirror -> downtime.

What exactly the damage will be depends on the scenario.

gdefevere
6,435 Views

I'm not convinced about the downtime. But maybe I didn't explain the circumstances very well:

1. Both plexes are fine, syncmirror between both shelf loops works fine (we have only one loop to each plex).

2. One loop interrupted, no syncmirror between both plexes. But no downtime, since other plex is fine

3. Loop recoverd, syncmirror resync's latest data to the not up-to-data plex.

4. Other loop interrupted, no syncmirror anymore (process interrupted) between both plexes.

=> WE STILL HAVE ONE PLEX HOWEVER, SO YOU SHOULD EXPECT NO DOWNTIME, BUT THIS PLEX DOESN'T HAVE AN UP-TO-DATE WAFL (since resync wasn't finished yet).

QUESTION: what would happen here !?

aborzenkov
6,436 Views

I would expect Data ONTAP to panic due to multiple disk failures in aggregate. The second plex cannot be used because it is stale. So you suddenly lost your aggregate.

P.S. just tested in simulator and it panics indeed. I would be greatly surprised if anything else happened

jonathon_lanzon
6,435 Views

aborzenkov wrote:

I would expect Data ONTAP to panic due to multiple disk failures in aggregate. The second plex cannot be used because it is stale. So you suddenly lost your aggregate.


I can confirm this behaviour. The filer won't use a plex that is out of sync.

gdefevere
6,435 Views

Thanks for the feedback so far !

Taking it a little further: so in this scenario if you have a HA config or Metrocluster, it would failover to the ohter node and then go further with the resync operation from there ?

aborzenkov
6,435 Views

Yes, as long as partner has access to both plexes.

Public