SVM DR and the physical location of CIFS Share Information in Cdot

TMADOCTHOMAS · ‎2017-11-01

First, some background: I have just upgraded to OnTAP 9.1P8 and want to take advantage of the SVM DR feature on our CIFS SVMs.

The Data Protection guide indicates I should ensure that all volumes, including the root volumes, have the same names in the source and destination SVMs. Currently I have root volumes on the source and destination SVMs, and on each SVM I use root volume protection as recommended in NetApp documentation.

My question: how can I incorporate the source SVM's root volume into this without disrupting the SVM root volume protection on the destination? Or, should I replicate the source root volume and keep a separate volume for the destination's root volume?

Also: is it accurate to assume that the CIFS Share Information is stored in the root volume, and that's the reason for replicating it offsite?

Finally: during DR tests, we currently flexclone volumes for CIFS Servers and then use the results of a PowerShell script to create commands in an Excel spreadsheet to rebuild the shares. Very convoluted. With SVM DR I'm hoping there's an easier way to conduct the test. Any suggestions?

Thanks in advance for any thoughts / suggestions on these questions!

JGPSHNTAP · ‎2017-11-01

SVM-DR is super simple. We use it with identity preserve discard network. It's a derivivative on 7-mode's vfiler dr.

If you are talking about LS mirrors on the root, don't worry about it.

Just follow the guide for svm-dr and you will be set

Forget that flexclone workflow as well. We do full failovers and failbacks. That's the only way to truly test DR

TMADOCTHOMAS · ‎2017-11-01

Thanks jgpshntap! Unfortunately we can't do failover/failback because our DR network is only available during DR tests and/or actual DR situations (we rent the space in a datacenter from IBM). Because of that we do flexclones, although with SVM DR I would have to modify the plan to flexclone to a different SVM (as I understand it).

When you say "dont worry about it" re: LS mirrors on the root, what do you mean? Should I not have LS mirrors on the root of the destination (DR) SVM?

JGPSHNTAP · ‎2017-11-01

If we were in a true DR we would establish LR mirrors on the root.

So you are still stuck with flexclones, you absolutely would have to present to a new svm.

Again, not a true DR. Tell MGMT DR test would be if you actually failed over full workload with full DR network. How else do you know if stuff really works...

TMADOCTHOMAS · ‎2017-11-01

Thanis JGPSHNTAP. During our tests we have full DR network, however it's isolated from our main network so there's no network conflict. We actually use the same IP's as production so we have to isolate the network.

I don't think I'm expressing my question very clearly. Let me try again: on both source and destination SVM's, I currently have root volumes on both with root volume protection in place for both. Documentation indicates I should replicate the source root volume to a matching volume on the destination with the same name. My question is, should I:

(a) remove the existing destination root volume and root volume protection, and replace it with a single root volume destination with the same name as the source root volume?

Or -

(b) leave the existing destination root volume and root volume protection in place, and create a new volume with the same name as the source root volume for SVM DR?

TMADOCTHOMAS · ‎2017-11-01

Never mind, I think i found the answer. The Express Guide says the following after creating the destination SVM:

------------------------------------------------

The destination SVM is created without a root volume and is in the stopped state.

------------------------------------------------

So apparently I would delete the root on the destination.

I'm starting to think SVM DR isn't going to work for us. Looking further in the guide, it looks like you have to set up a CIFS Server on the destination which I won't be able to do since there's no live data network unless we're in the middle of a DR test.

JGPSHNTAP · ‎2017-11-01

If you have relationships in play and you want to convert them to an SVM-DR relationship, that's pretty easy

TMADOCTHOMAS · ‎2017-11-01

Thanks, yeah I see that in the guide.

To clarify in case someone searches this topic - I was mistaken about having to create a CIFS Server on the destination. That's only if I set identify-preserve to false which I'm not planning to do.

TMADOCTHOMAS · ‎2017-11-01

Testing not going well. Anyone have insights on the following?

I have a test source SVM and test destination SVM. Each side has a root volume and a single test data volume. I verified the names of both volumes are the same on each side. I have a standard snapmirror job set up for the test data volume, to simulate conditions in my existing CIFS SVMs.

I successfully create the SVM DR relationship, but when I try to resync I get the following:

-----------------------------------------------------------------------------

Error: command failed: There are one or more volumes in this Vserver which do not have a volume-level SnapMirror relationship with volumes in Vserver <source_vserver>.

-----------------------------------------------------------------------------

I thought the issue might be the load sharing volumes on the source. So I ran the volume modify command with the -vserver-dr-protection unprotected command, but got the following:

-----------------------------------------------------------------------------

Error: command failed: Modification of the following fields: vserver-dr-protection not allowed for volumes of the type "Flexible Volume - LS read-only volume"

-----------------------------------------------------------------------------

I then thought, maybe i need to identify the root volume as unprotected, and I got the following:

-----------------------------------------------------------------------------

Error: command failed: Cannot change the protection type of volume <volume> as it is the root volume.

-----------------------------------------------------------------------------

Finally, I changed the name of the destination root volume to something different, and set up a standard replication job for the source root volume. When I try to resync I get:

-----------------------------------------------------------------------------

Error: command failed: The source Vserver root volume name <volume> is not the same as the destination Vserver's root volume name <volume>. Rename the destination volume and then try again.

-----------------------------------------------------------------------------

I am at a loss. Any ideas?

axsys · ‎2017-11-02

Hey man,

What I don't get is why do you care so much about the root volume? SVMDR is just the exact same thing you have as on the source system. If you change/modify/manipulate your volumes in the SVMDR you're bound to fail. So if you'd be in a DR you'd break the vserverdr / svmdr snapmirror on the destination and then fire up the vserver/svm in the destination et voilà you have everything running active in your destination datacenter.

So first error you get is an indicator that the volume information in your SVMDR is not identical anymore with the source SVM - there must have been a modification.
vol show -vserver svmy
vol show -vserver svmy_dr

should give you an idea of what is different. So this is my go to document regarding SVMDR:

create

DEST> vserver create -vserver svmy_dr -subtype dp-destination
SRC> vol show -vserver svmy -volume *
DEST> vserver add-aggregates -vserver svmy_dr -aggregates // this is for disk type for example whether you want to have the data on SATA or SSD
DEST> vserver peer create -vserver svmy_dr -peer-vserver svmy -applications snapmirror -peer-cluster v
SRC> vserver peer accept -vserver svmy -peer-vserver svmy_dr
DEST> snapmirror create -source-vserver svmy -destination-vserver svmy_dr -type DP -throttle unlimited -policy DPDefault -schedule hourly -identity-preserve true
DEST> snapmirror initialize -destination-vserver svmy_dr

failover

DEST> snapmirror quiesce -destination-vserver svmy_dr
DEST> snapmirror break -destination-path svmy_dr
SRC> vserver stop -vserver svmy
DEST> vserver start -vserver svmy_dr

and you are running in destination with the full configuration of all your shares, exports, interfaces etc.
now you have to decide whether you want to resync back to the state you had on the source or you'll have to create a new svmdr to mirror everything back to your previous source. So:

SRC> snapmirror resync -destination-vserver svmy

OR
OLD_DEST> snapmirror create -source-vserver svmy_dr: -destination-vserver svmy -type DP -throttle unlimited -policy DPDefault -schedule hourly -identity-preserve true
// Day X when mainsite will be switched back
OLD_DEST> snapmirror quiesce -destination-vserver svmy_dr
OLD_DEST> snapmirror break -destination-path svmy_dr
OLD_DEST> vserver stop -vserver svmy_dr
OLD_SRC> vserver start -vserver svmy

When you delete volumes on the source SVM it's abit more tricky to get the SVMDR mirror running again:

delete svmdr volume

DEST> snapmirror break -vserver svmy_dr
SRC> snapshot delete -vserver svmy -volume volx -snapshot * -ignore-owners true
SRC> snapmirror list-destinations -source-vserver svmy -source-volume volx
SRC> set diag
SRC> snapmirror release -destination-path svmy_dr:volx -relationship-id z -force
SRC> vol offline volx -vserver svmy
SRC> vol delete volx -vserver svmy
DEST> snapmirror resync -vserver svmy_dr

if volume still exist at destination

DEST> snapmirror break -vserver svmy_dr
SRC> vol offline volx -vserver svmy_dr
SRC> vol delete volx -vserver svmy_dr
DEST> snapmirror resync -vserver svmy_dr

hope this clears some of the situation or if not please give us a bit more background.

Cheers,

axsys

TMADOCTHOMAS · ‎2017-11-02

axsys,

Thank you very much for the detailed response.

This morning I spent two hours with a very helpful NetApp Support representative. We got the issues cleared up as well as some confusion on my part as to how SVM DR worked. It turns out that my deleting/recreating of some SVM root volume replicated copies was the culprit - there were "shadow" deleted copies that caused a problem. I actually deleted them but there must have been a reference to them somewhere that was cleaned up in an overnight process, because today the resync worked. Well - at least it got me to the next step.

This time we encountered a new issue. Turns out the source volume must be thin provisioned, OR you have to turn off data compaction, one of the two. Since all my prod volumes are thin provisioned anyway, I checked the thin provision box and the resync worked without issue.

I then worked with the NetApp rep to test my plan during DR tests to clone the volume(s) in the DR SVM to a different SVM for the duration of the test. At first it didn't work, but then we realized the documentation was unclear and we had some options wrong. Once we got the options straighted out it worked like a charm. I am going to run through the whole process again to be sure, but we are now ready to implement SVM DR.

As for the root volume, my concern had been (a) that copies of the root volume on the source would be an issue (since I couldn't exclude them) and (b) that copies of the root volume on the destination would be an issue. I decided to remove the destination copies and just leave the one root volume, renamed to match the source, and it worked.

TMADOCTHOMAS · ‎2017-11-02

One additional question:

I noticed the following on this page: https://library.netapp.com/ecmdocs/ECMLP2426782/html/GUID-0AE20AEC-8330-4864-8CE6-A64DCDA7E2F1.html

------------------------------------------------

If the source cluster reboots, then the source SVM is operationally stopped and is locked for any management operations to avoid data corruption in case data is accessed inadvertently from both the source and destination SVMs.

------------------------------------------------

Is this accurate? it appears to say the source (production) SVM goes offline if there's a failover/giveback or an unplanned failover. If so, I can't imagine implementing this feature if it will cause the production SVM to go offline when we do planned failovers (or if we have unplanned ones).

TMADOCTHOMAS · ‎2017-11-09

For anyone who searches this topic, I ended up opting not to set up SVM DR at this time after testing and starting an implementation. There were too many buggy issues that came up, and I had somehow missed the fact that you have to duplicate every custom schedule on the source cluster at the destination. We have over 80 custom schedules and reguarly add / change them - I don't want a simple change in schedule to potentially break replication. Although I got clarification on the issue raised in the post right above this one with a very helpful NetApp rep, my uneasiness only grew the further I got into it. I will keep watching Release Notes and may revisit this in the future.