Seeking DR Procedures on CDOT

[ Edited ]

We have two clusters, one is in Primary site, and the other in DR site.


NFS is the primary protocol, and then CIFS, as well as some ISCSI. Those DR required have been replicated onto DR cluster.


Off top of my head, the following is what I can think of:
1. Should we have identical names for vservers and/or volumes etc during normal time? If not, then we would have to manually create vservers and/or volmes during DR, correct?

2. in 7-MODE, in CIFS situation, I can use the file /etc/cifsconfig_share.cfg, and edit the file to add cifs shares on DR filer. Now in CDOT, this file is no longer exisit, then I would have to manually set shares up, correct?



Can somebody please provide procedures on what needs to be done for failing over to DR site and then DR site will be becoming Primary site?


If somebody have such experience, appreciate sharing.






Re: Seeking DR Procedures on CDOT

You have to create SVMs(vservers) and volumes for SnapMirror manually. Name of them does not need to be identical on both side.
CIFS configuration an making share is also needed on DR side manually. It's boreing work!!

Re: Seeking DR Procedures on CDOT

[ Edited ]

vserver, volumes,  CIFS shares and SM replication need to be created/set up in DR site during normal time. I am asking what we need to do during DR / DR Exercise.


volume names could be different, but mount points should be the same. Correct?

vserver names could be different as well, since we just need to change DNS names. Correct?

volume mount points should be all the same. Correct?


 What about AD Domain, I think, for CIFS, AD in DR site is MUST, but if there will be no CIFS, is AD required?




Re: Seeking DR Procedures on CDOT

There are a couple of options you can choose for DR - this description is just what I like in my configurations.  I use this from small DR setups all the way to my current gig which flips over about 1400 volumes with associated shares/LUNs for both testing and (hopefully never) real DR.


General setup:  On DR side I match SVMs one to one from production, but with different names.  Our style has a location identifier in the SVM, but a functional identifier works as well.  So, for example SVM prod-svm01 would have a DR equivalent SVM of dr-svm01.  Although you could I don't like to use the same SVM name on both ends because then you depend on DNS or other lookups at both ends to be sure you seeing the correct one.  I prefer access control to be a bit more specific when not in a DR test or actual situation.  Especially for CIFS I create a different CIFS server on the DR side up front (traditionally I match SVM and CIFS server name for convenience).  I prefer to have the CIFS server up and running in DR all the time rather than jump at DR time.


For SnapMirrors I use the same volume name on both ends - just convenient - unless I can't for some reason.  Example - SRM on ESX hasn't always played nice with the same volume name at both ends.  Where I don't use the same volume, I use a simple prefix like "DR_" in front of the DR side versions.  


A tricky part of cDOT and DR is namespaces.  You can't share a volume until it is mounted, and you can't mount a SnapMirror destination until after it is initialized the first time.  This is similar to behavior in 7-mode but with the extra mount step.  So you have two basic options - mount and share the replicas ahead of time through the DR SVMs or mount/share at actual DR time.  Either one is workable.  The advantage of pre-mounting is that you can then just alias the DR SVM as the Production SVM when the time is right.  A disadvantage of pre-mounting is that it is somewhat less flexible.  Let me explain.


For DR testing, my configuration leaves all the destination SnapMirrors running in place, since we have to maintain short RPO/RTO's and we might be running a multi-day test.  For testing, I FlexClone all the target DR volumes (pre-scripted), mount the clones, and then define shares/LUNs as needed.  If the originals were pre-mounted, I couldn't mount the clones at the same location.  If I didn't conduct my tests this way, I'd probably pre-mount the destination volumes.


So for pre-mounting and sharing, as I create volumes that need DR, I immediately initialize the SnapMirror copy so I could then define mounts and shares on both Prod and DR at the same time.  If you follow this idea, after each volume/mirror creation you will have two shares, either NFS or CIFS, just from different servers, and of course the DR copies are read-only.  


When it comes time to DR a solution like this, NFS is easy - update the DNS to point the Prod server address to the DR side by CNAME, A, whatever you prefer, or add the IP address as a alias on a LIF in DR.  Lot's of ways to do that.  Break the mirrors to bring the volumes into read-write state, bring up your DR servers, instant data.


CIFS is a little trickier to do 100% right.  Remember that CIFS clients have multiple ways of identifying who is who - DNS lookups, Active Directory, etc.  And it isn't easy to just change the CIFS name on the DR server - if you delete and re-add as the original production name you wipe out the shares so any pre-mounting advantage is lost and you are back to a script to recreate them all.  Here is one way you can alias CIFS servers in both DNS and AD.  It is a little drastic, so be careful with it.


CIFS clients typically just need the DNS update thing to work.  I prefer to actually change the original A/PTR records in DNS to set the DR version IP address.  When I get to DoT 8.3, I will also add a NetBIOS alias at the DR side just for completeness of name response.  This covers just about everyone's needs in practice, and it's easy to reverse.


However, some AD users that need to authenticate a machine need to do a SPN lookup in AD.  I've run into this most often with SQLServer setups.  SPNs don't get corrected by a simple DNS change - they want to find a particular hosting server and they get really picky about it.  So for my DR, in AD I also delete the server objects that represent the production side CIFS servers, then use the "SETSPN" utility (details from Microsoft here) to add the SPN's of the production side CIFS vServers to the DR side equivalents.  This covers the odd cases.


Note that this also destroys the original production side AD server object.  In a real DR might not be so bad - but it is a little harder in test situations.  I have a completely isolated test AD environment to use when we run our DR tests, so this is safe.  If you don't have the isolation required, the SPN thing might not be available to you, but then hopefully you also don't need it.  I haven't yet setup cDOT 8.3 in DR, so I don't know if the NetBIOS aliasing that is now back in cDOT would also mitigate the SPN issue.


Lastly - LUNs.  Even if pre-mounting/sharing CIFS/NFS shares I don't do the same with LUNs.  Depends on the OS of course, but most servers I use don't really like read-only LUNs.  So those I map and mount on DR servers only at time of need, and that is just scripts - one to break/clone all the appropriate volumes, one to do the mapping/mounting (I am fortunate to use SnapDrive so the scripts are trivial but they do take a while to mount), and then a reverse pair to undo all that when my DR test is done.  Igroups are preset for the target servers of course so that doesn't have to be done during a DR.


In general through the mechanisms are slightly different, I think the general approach remains the same between 7-mode and cDOT.  Hope this helps you.





Lead Storage Engineer

Huron Consulting Group

NCDA, NCIE-SAN (Clustered)




Re: Seeking DR Procedures on CDOT

Thanks for all the great info, Bob!


I'm trying a CIFS DR test now and have an issue with access at the receiving site, presumably because of NetBIOS names. I've pre-created a DR svm, SnapMirror of the CIFS volume, a CIFS server (with a different name from the primary), mounted the mirrored volume and applied all the same permissions at the DR site.


During a failover test, I changed the DNS A record for the CIFS server (we're using all external DNS, no Windows DNS) to point to a lif in the DR svm, broke the mirror, and the volume at the DR site becomes read/write.


Accessing the share by IP (\\ip-address\sharename) works fine, but \\original-cifs-server-name cannot be found {"Windows cannot access ..."}. From the Windows client nslookup on the name returns the DR IP.


Any ideas on what to try? We're at 8.2.2 now and am not sure if the 8.3 NetBIOS aliases will help or not.


Kind regards





Re: Seeking DR Procedures on CDOT

Accessing the share by IP (\\ip-address\sharename) works fine, but \\original-cifs-server-name cannot be found {"Windows cannot access ..."}. From the Windows client nslookup on the name returns the DR IP.

This usually means that Kerberos authentication fails. Using IP will fall back to NTLM, bypassing Kerberos. I think that resetting machine password (vserver cifs password-reset) should fix it, but this will in turn block your original SVM.

Re: Seeking DR Procedures on CDOT

>> I think that resetting machine password (vserver cifs password-reset) should fix it, but this will in turn block your original SVM.

Blocking the original svm would be fine as this is a DR failover. I'd just need to do the same on the original CIFS server when failing back. I'll try this...

I noticed that Kerberos is not enabled for the CIFS lif at either site.

Could it be an issue with the SPN? Possibly Scenario 3 in this article?

Re: Seeking DR Procedures on CDOT

Thanks to Bob for your ideas.


I have some more question, it may sound basic, but please help me to clear these up:


1.    What network infrastructure should look like on DR site before we start to work on the storage?

       For instance, DR network has to be cut off from Primary site, during DR test. That means everything we said here would have to be based on this assummption. Correct?

       Only then, the CNAME created for production could then reference vservers on DR site.


2.    To ensure the clients can mount the volumes on DR, For NFS vserver, the volulme mount points (junction paths) would have to be the same as production volumes. Correct?



Please continue on sharing your idea.

Re: Seeking DR Procedures on CDOT

For the suggestions I made with CIFS and so forth, yes in a test scenario you would need to isolate networks.  In the test DR side, you'd want to be sure that the Active Directory infrastructure does not get updates from the primary nor make updates to the primary.  In my network infrastructure, we do it with VLANs.  The DR infrastructure has a VLANs that are replicas of the VLANs in production.  When we test, we purposely move all the connected ports at the DR side into a different set of VLANs that don't have routing back to production.


For instance - if we have VLANs 300,301,302 - the first step in the test is to update all the ports to VLANs 1300,1301,1302.  For our few physical servers we bring the VLANs to the servers, so there is a manual step.  The virtual environment (ESX) is done in conjunction with SRM to make key changes to the network setup.  On my Filers, I have both the "normal" and the "test" VLANs defined and I move the LIFs for the affected vServers to the test VLAN ports for the duration.  Since my "test" AD server is a VM, once that is up everything stays in sync.  The primary cluster management interface stays in the normal VLAN so that I have easy remote access to my DR storage during the test.


If you can't do that level of isolation, it gets harder.  Obviously in a real DR situation that isn't as big a deal.


For NFS on Clustered Data OnTAP - yes, the junction-path structure should be planned to be the same on both sides.  In cDoT there isn't an "actual-path" parameter on an export, so aliasing an alternate path gets a little harder to do.  Could probably cobble together something with symbolic links but it's simpler to just use the same junction-path structure.


Highlights on of the two things I do miss the most in cDoT - actual path on NFS export.  Despite having a junction path to build a logical structure out of a virtual one, sometimes you want to map a different logical structure on the same data and actual path is needed to do that easily.




Re: Seeking DR Procedures on CDOT

[ Edited ]

Hi Bob,


Thanks again for such practicle advise.


Just wanted to make sure. When I said "isolate" DR and Primary site, I really meant to cut off any connection between them which is the foundation before we could start the DR exercise, this is how I understood, and it could be done as I know of.


So, what you are saying, for instance, to update all ports to different VLAN would be the method on the storage level to isolate the network. However, if we could complete seperating Primary and DR network, we will then no need to change ports to differnet VLAN, and no need to worry about routing back to Production.


Also, in this case, we will have a own AD server located in DR site.


Am I correct?