snapmirror failback problem

JACKIEXIE19 · ‎2011-08-28

During DR testing via snapmirror on fas2040 pairs, we broke snapmirror relationship between primary and dr filers, data became available on DR filer snapmirrored volume, this failover works fine.

We made some new updates on DR volume. But when trying to failback from DR to Primary (primaryfiler*> snapmirror resync -S drfiler:dr_cifs1 primaryfiler:pri_cifs1), it failed to failback.

primaryfiler*> snapmirror resync -S drfiler:dr_cifs1 primaryfiler:pri_cifs1

primaryfiler: snapmirror.dst.resync.info:notice]: SnapMirror resync of pri_cifs1 to drfiler:dr_cifs1 is using drfiler(0142238385)_dr_cifs01.37 as the base snapshot.

Volume pri_cifs1 will be briefly unavailable before coming back online.

cifs open files prevent operation.

Snapmirror resynchronization of pri_cifs1 to dr_cifs1 : revert to resync base snapshot failed

Aborting resync.

primaryfiler*> [primaryfiler: replication.dst.resync.failed:error]: SnapMirror resync of pri_cifs1 to drifiler:dr_cifs1 : revert to resync base snapshot failed.

We discovered that during failover time, because we just run snapmirror break to stop the snapmirror without really shutdown the entire Prod site, therefore Primary site volume was still continuely receiving I/O or updates from applications and users, means primary site also has new updates while snapmirror was broken.

We could resync the original snapmirror (from Primary to DR) without any issue, updates on DR are deleted as expected.

My question is, for the DR testing like this (both DR and Primary sites have new updates during broken snapmirror), how can I failback from DR to Primary with

1. just keep DR updates and delete Primary updates?

2. keep both DR and Primary updates that was got written to their own vol during snapmirror broken?

thomas_glodde · ‎2011-08-29

1. just keep DR updates and delete Primary updates?

SnapMirror is destination driven, so you need to stop work on DR, resync from Primary, then break, active Primary and then resync from DR ( as a quick direction)

2. keep both DR and Primary updates that was got written to their own vol during snapmirror broken?

You CANNOT do so, when using NetApp methods, you will lose on of the sides.

When using SnapMirror for DR, you are NOT supposed to have both sides up, running and receiving updates. If you are testing, then test in the scope of being able to kill the DR side afterwards.

JACKIEXIE19 · ‎2011-08-29

Thanks for the answer. answer for the #2 was clear, but not fully understand #1.

say if in the real disaster situation, primary site is complete down, then I will need to re-initialize snapmirror volume run from primary site when it comes back up, direction will be from DR to Primary, then we can break snapmirror, and on DR site to run resync for snapmirror direction from Primary to DR, right?

But what happen if primary site is not complete down, and we decide to break the mirror and make DR as active site, during snapmirror broken, assume primary site still has some sort of updates got written to pri_vol, which we don't want to keep, how do I go with resync in the direction of DR to Primary, and force to override primary updates with DR updates on pri_vol? do I need to run re-initialize at the direction of DR to Primary, or id there any way I can force to resync without rerun initialize? I think it will get errors of what I post before, which will cause the resync to fail, right?

aborzenkov · ‎2011-08-29

how do I go with resync in the direction of DR to Primary, and force to override primary updates with DR updates on pri_vol?

You just do snapmirror resync on primary specifying DR as source.

JACKIEXIE19 · ‎2011-08-29

That's what I did, I ran 'primaryfiler*> snapmirror resync -S drfiler:dr_cifs1 primaryfiler:pri_cifs1' but got error (also see my initial post):

cifs open files prevent operation.

Snapmirror resynchronization of pri_cifs1 to dr_cifs1 : revert to resync base snapshot failed

Aborting resync.

It seems because there were open cifs sessions on primary cifs vol, that stopping resync from DR to Primary, I have tried restrict pri_cifs1, but snapmirror resync requires vol to be online. there seem be no way I can do resync from DR to Primary while primary cifs vol has open cifs sessions?

aborzenkov · ‎2011-08-29

Well, you could briefly stop CIFS, start resync and then restart CIFS. This is probably the simplest way.

JACKIEXIE19 · ‎2011-08-29

that makes sense, I will test it in the next maint window. Thanks

SBLANKENSHIP · ‎2014-12-13

I had a problem trying resync after a DR test (overwrite changes on DR volume, NOT a reverse resync). This is the first time I had encountered a resync problem when I had a common snapshot and was not performing a volume migration.

I received the following generic errors in the console and in the syslog file; the "snapmirror" log was not providing further diagnostic information.

Sat Dec 13 15:11:54 EST [DR-NETAPP2:snapmirror.dst.resync.info:notice]: SnapMirror resync of vol_cdata_v01 to PR-NETAPP2:vol_cdata_v01 is using DR-NETAPP2(1786420718)_vol_cdata_v01.2327 as the base snapshot.
Sat Dec 13 15:11:54 EST [DR-NETAPP2:replication.dst.resync.failed:error]: SnapMirror resync of vol_cdata_v01 to PR-NETAPP2:vol_cdata_v01 : revert to resync base snapshot failed.

I found this thread and terminated/restarted CIFS on the destination controller (DR) and was able to resync immediately. This thread was helpful and saved me time so I figured I would confirm this worked in my case.

Sean