ONTAP Discussions

Seeking DR Procedures on CDOT

dragontiger
11,150 Views

We have two clusters, one is in Primary site, and the other in DR site.

 

NFS is the primary protocol, and then CIFS, as well as some ISCSI. Those DR required have been replicated onto DR cluster.

 

Off top of my head, the following is what I can think of:
1. Should we have identical names for vservers and/or volumes etc during normal time? If not, then we would have to manually create vservers and/or volmes during DR, correct?

2. in 7-MODE, in CIFS situation, I can use the file /etc/cifsconfig_share.cfg, and edit the file to add cifs shares on DR filer. Now in CDOT, this file is no longer exisit, then I would have to manually set shares up, correct?

 

 

Can somebody please provide procedures on what needs to be done for failing over to DR site and then DR site will be becoming Primary site?

 

If somebody have such experience, appreciate sharing.

 

Thanks!

 

 

 

10 REPLIES 10

YIshikawa
11,065 Views
You have to create SVMs(vservers) and volumes for SnapMirror manually. Name of them does not need to be identical on both side.
CIFS configuration an making share is also needed on DR side manually. It's boreing work!!

netappmagic
11,057 Views

vserver, volumes,  CIFS shares and SM replication need to be created/set up in DR site during normal time. I am asking what we need to do during DR / DR Exercise.

 

volume names could be different, but mount points should be the same. Correct?

vserver names could be different as well, since we just need to change DNS names. Correct?

volume mount points should be all the same. Correct?

 

 What about AD Domain, I think, for CIFS, AD in DR site is MUST, but if there will be no CIFS, is AD required?

 

 

 

bobshouseofcards
11,057 Views

There are a couple of options you can choose for DR - this description is just what I like in my configurations.  I use this from small DR setups all the way to my current gig which flips over about 1400 volumes with associated shares/LUNs for both testing and (hopefully never) real DR.

 

General setup:  On DR side I match SVMs one to one from production, but with different names.  Our style has a location identifier in the SVM, but a functional identifier works as well.  So, for example SVM prod-svm01 would have a DR equivalent SVM of dr-svm01.  Although you could I don't like to use the same SVM name on both ends because then you depend on DNS or other lookups at both ends to be sure you seeing the correct one.  I prefer access control to be a bit more specific when not in a DR test or actual situation.  Especially for CIFS I create a different CIFS server on the DR side up front (traditionally I match SVM and CIFS server name for convenience).  I prefer to have the CIFS server up and running in DR all the time rather than jump at DR time.

 

For SnapMirrors I use the same volume name on both ends - just convenient - unless I can't for some reason.  Example - SRM on ESX hasn't always played nice with the same volume name at both ends.  Where I don't use the same volume, I use a simple prefix like "DR_" in front of the DR side versions.  

 

A tricky part of cDOT and DR is namespaces.  You can't share a volume until it is mounted, and you can't mount a SnapMirror destination until after it is initialized the first time.  This is similar to behavior in 7-mode but with the extra mount step.  So you have two basic options - mount and share the replicas ahead of time through the DR SVMs or mount/share at actual DR time.  Either one is workable.  The advantage of pre-mounting is that you can then just alias the DR SVM as the Production SVM when the time is right.  A disadvantage of pre-mounting is that it is somewhat less flexible.  Let me explain.

 

For DR testing, my configuration leaves all the destination SnapMirrors running in place, since we have to maintain short RPO/RTO's and we might be running a multi-day test.  For testing, I FlexClone all the target DR volumes (pre-scripted), mount the clones, and then define shares/LUNs as needed.  If the originals were pre-mounted, I couldn't mount the clones at the same location.  If I didn't conduct my tests this way, I'd probably pre-mount the destination volumes.

 

So for pre-mounting and sharing, as I create volumes that need DR, I immediately initialize the SnapMirror copy so I could then define mounts and shares on both Prod and DR at the same time.  If you follow this idea, after each volume/mirror creation you will have two shares, either NFS or CIFS, just from different servers, and of course the DR copies are read-only.  

 

When it comes time to DR a solution like this, NFS is easy - update the DNS to point the Prod server address to the DR side by CNAME, A, whatever you prefer, or add the IP address as a alias on a LIF in DR.  Lot's of ways to do that.  Break the mirrors to bring the volumes into read-write state, bring up your DR servers, instant data.

 

CIFS is a little trickier to do 100% right.  Remember that CIFS clients have multiple ways of identifying who is who - DNS lookups, Active Directory, etc.  And it isn't easy to just change the CIFS name on the DR server - if you delete and re-add as the original production name you wipe out the shares so any pre-mounting advantage is lost and you are back to a script to recreate them all.  Here is one way you can alias CIFS servers in both DNS and AD.  It is a little drastic, so be careful with it.

 

CIFS clients typically just need the DNS update thing to work.  I prefer to actually change the original A/PTR records in DNS to set the DR version IP address.  When I get to DoT 8.3, I will also add a NetBIOS alias at the DR side just for completeness of name response.  This covers just about everyone's needs in practice, and it's easy to reverse.

 

However, some AD users that need to authenticate a machine need to do a SPN lookup in AD.  I've run into this most often with SQLServer setups.  SPNs don't get corrected by a simple DNS change - they want to find a particular hosting server and they get really picky about it.  So for my DR, in AD I also delete the server objects that represent the production side CIFS servers, then use the "SETSPN" utility (details from Microsoft here) to add the SPN's of the production side CIFS vServers to the DR side equivalents.  This covers the odd cases.

 

Note that this also destroys the original production side AD server object.  In a real DR might not be so bad - but it is a little harder in test situations.  I have a completely isolated test AD environment to use when we run our DR tests, so this is safe.  If you don't have the isolation required, the SPN thing might not be available to you, but then hopefully you also don't need it.  I haven't yet setup cDOT 8.3 in DR, so I don't know if the NetBIOS aliasing that is now back in cDOT would also mitigate the SPN issue.

 

Lastly - LUNs.  Even if pre-mounting/sharing CIFS/NFS shares I don't do the same with LUNs.  Depends on the OS of course, but most servers I use don't really like read-only LUNs.  So those I map and mount on DR servers only at time of need, and that is just scripts - one to break/clone all the appropriate volumes, one to do the mapping/mounting (I am fortunate to use SnapDrive so the scripts are trivial but they do take a while to mount), and then a reverse pair to undo all that when my DR test is done.  Igroups are preset for the target servers of course so that doesn't have to be done during a DR.

 

In general through the mechanisms are slightly different, I think the general approach remains the same between 7-mode and cDOT.  Hope this helps you.

 

 

 

Bob

Lead Storage Engineer

Huron Consulting Group

NCDA, NCIE-SAN (Clustered)

 

 

 

gofilesgo
11,034 Views

Thanks for all the great info, Bob!

 

I'm trying a CIFS DR test now and have an issue with access at the receiving site, presumably because of NetBIOS names. I've pre-created a DR svm, SnapMirror of the CIFS volume, a CIFS server (with a different name from the primary), mounted the mirrored volume and applied all the same permissions at the DR site.

 

During a failover test, I changed the DNS A record for the CIFS server (we're using all external DNS, no Windows DNS) to point to a lif in the DR svm, broke the mirror, and the volume at the DR site becomes read/write.

 

Accessing the share by IP (\\ip-address\sharename) works fine, but \\original-cifs-server-name cannot be found {"Windows cannot access ..."}. From the Windows client nslookup on the name returns the DR IP.

 

Any ideas on what to try? We're at 8.2.2 now and am not sure if the 8.3 NetBIOS aliases will help or not.

 

Kind regards

 

 

 

 

aborzenkov
11,017 Views

Accessing the share by IP (\\ip-address\sharename) works fine, but \\original-cifs-server-name cannot be found {"Windows cannot access ..."}. From the Windows client nslookup on the name returns the DR IP.


This usually means that Kerberos authentication fails. Using IP will fall back to NTLM, bypassing Kerberos. I think that resetting machine password (vserver cifs password-reset) should fix it, but this will in turn block your original SVM.

gofilesgo
11,004 Views
>> I think that resetting machine password (vserver cifs password-reset) should fix it, but this will in turn block your original SVM.

Blocking the original svm would be fine as this is a DR failover. I'd just need to do the same on the original CIFS server when failing back. I'll try this...

I noticed that Kerberos is not enabled for the CIFS lif at either site.

Could it be an issue with the SPN? Possibly Scenario 3 in this article?
https://kb.netapp.com/support/index?page=content&id=1013601&pmv=print&impressions=false

netappmagic
10,968 Views

Thanks to Bob for your ideas.

 

I have some more question, it may sound basic, but please help me to clear these up:

 

1.    What network infrastructure should look like on DR site before we start to work on the storage?

       For instance, DR network has to be cut off from Primary site, during DR test. That means everything we said here would have to be based on this assummption. Correct?

       Only then, the CNAME created for production could then reference vservers on DR site.

 

2.    To ensure the clients can mount the volumes on DR, For NFS vserver, the volulme mount points (junction paths) would have to be the same as production volumes. Correct?

 

 

Please continue on sharing your idea.

bobshouseofcards
10,944 Views

For the suggestions I made with CIFS and so forth, yes in a test scenario you would need to isolate networks.  In the test DR side, you'd want to be sure that the Active Directory infrastructure does not get updates from the primary nor make updates to the primary.  In my network infrastructure, we do it with VLANs.  The DR infrastructure has a VLANs that are replicas of the VLANs in production.  When we test, we purposely move all the connected ports at the DR side into a different set of VLANs that don't have routing back to production.

 

For instance - if we have VLANs 300,301,302 - the first step in the test is to update all the ports to VLANs 1300,1301,1302.  For our few physical servers we bring the VLANs to the servers, so there is a manual step.  The virtual environment (ESX) is done in conjunction with SRM to make key changes to the network setup.  On my Filers, I have both the "normal" and the "test" VLANs defined and I move the LIFs for the affected vServers to the test VLAN ports for the duration.  Since my "test" AD server is a VM, once that is up everything stays in sync.  The primary cluster management interface stays in the normal VLAN so that I have easy remote access to my DR storage during the test.

 

If you can't do that level of isolation, it gets harder.  Obviously in a real DR situation that isn't as big a deal.

 

For NFS on Clustered Data OnTAP - yes, the junction-path structure should be planned to be the same on both sides.  In cDoT there isn't an "actual-path" parameter on an export, so aliasing an alternate path gets a little harder to do.  Could probably cobble together something with symbolic links but it's simpler to just use the same junction-path structure.

 

Highlights on of the two things I do miss the most in cDoT - actual path on NFS export.  Despite having a junction path to build a logical structure out of a virtual one, sometimes you want to map a different logical structure on the same data and actual path is needed to do that easily.

 

 

 

netappmagic
10,924 Views

Hi Bob,

 

Thanks again for such practicle advise.

 

Just wanted to make sure. When I said "isolate" DR and Primary site, I really meant to cut off any connection between them which is the foundation before we could start the DR exercise, this is how I understood, and it could be done as I know of.

 

So, what you are saying, for instance, to update all ports to different VLAN would be the method on the storage level to isolate the network. However, if we could complete seperating Primary and DR network, we will then no need to change ports to differnet VLAN, and no need to worry about routing back to Production.

 

Also, in this case, we will have a own AD server located in DR site.

 

Am I correct?

 

 

 

Sanjeev_Pabbi
6,662 Views

Guys, i spent some time and got it working the CIFS DR failover testing .......... hopefully it can help others who are still testing

 

Different scenarios were tested using Netapp simulators (Prod Cluster=Q9T, DR Cluster=Q9B)

 

a) Traditional standalone SVM at Prod site and another Standalone SVM at DR site (No SVM DR)

b) SVM DR (Preserve Identity=False)

c) SVM DR (Preserve Identity=True)

 

in scenario c)

Since DR site doesn't hold any different CIFS server, it is using the exact same configuration (and even CIFS server name) from Prod SVM ...... RW access works normal in DR mode by using original Prod UNC path.

 

In scenario a) and b)

 

Normal mode: Prod site provides RW access to CIFS shares + DR site provides RD access to CIFS shares using their respective CIFS servers

DR mode: After SM break and keeping Prod SVM down - RW Cifs access worked normal when pointing to DR CIFS server name or DR CIFS Server IP

However client pointing to their shares using original UNC Path (Pointing to Prod CIFS server name) failed even after Updating DNS A record for Prod server and pointing it to DR IP

 

Challenge: In Prod environments, end users are not expected to change their UNC paths even in DR mode, so RW access from DR site is expected even by using the Prod CIFS server name - Updating DNS A record must work.

 

Note: DFS worked just fine in above a) and b) scenario and doesn't need to update DNS record, because when it detect high priority link pointing to Prod server down, it direct tarffic to DR CIFS server, which is already functional (But user keep on pointing to virtual name space through DFS, so no change from their perspective)

 

Resolution: To get it working through DNS A record flip:

 

Step-1: After breaking SM, add a Netbios Alias for prod CIFS server on DR CIFS server (using cifs add -netbios-aliases command)

Step-2: Delete Prod CIFS server computer object from AD

 

Note:Pl. check first you would be able to restore the deleted computer object, from recycle bin in Windows 2012 R2 through - Tools - AD administartive Ceter, it is required when failng back to Prod site.

 

Example Setup: (Pl. validate first in test environments only)

 

Prod Cluster: Q9T
DR Cluster:   Q9B

#### ---- Source provide RW access and DR provide RD access: ---- ####

C:\>dir \\Q9B_C_SVM_01\Q9T_C_SVM_01_CIFS_01
 Volume in drive \\Q9B_C_SVM_01\Q9T_C_SVM_01_CIFS_01 is Q9T_C_SVM_01_CIFS_01
 Volume Serial Number is 8000-0402

 Directory of \\Q9B_C_SVM_01\Q9T_C_SVM_01_CIFS_01

12/03/2015  10:59 PM    <DIR>          .
12/04/2015  09:40 AM    <DIR>          ..
11/30/2015  10:14 PM                20 File1.txt
11/30/2015  11:44 PM    <DIR>          dir1
12/02/2015  09:44 PM                26 File2.txt
12/03/2015  02:43 PM    <DIR>          dir2
12/03/2015  02:23 PM                58 File3.txt
12/03/2015  04:28 PM                58 File4.txt
12/03/2015  09:58 PM                61 File5.txt
12/03/2015  10:37 PM                22 File6.txt
12/03/2015  10:59 PM                45 File7.txt
               7 File(s)            290 bytes
               4 Dir(s)     372,334,592 bytes free

C:\>
C:\>echo Hello ------------------ > \\Q9T_C_SVM_01\Q9T_C_SVM_01_CIFS_01\File08.txt

C:\>echo Hello ------------------ > \\Q9B_C_SVM_01\Q9T_C_SVM_01_CIFS_01\File08.txt
Access is denied.

C:\>

#### ---- SM Update: ---- ####

Q9B::> snapmirror update Q9B_C_SVM_01:Q9T_C_SVM_01_VOL_01
Operation is queued: snapmirror update of destination "Q9B_C_SVM_01:Q9T_C_SVM_01_VOL_01".
Q9B::>


#### ---- Data replciated to Target: ---- ####

C:\>dir \\Q9B_C_SVM_01\Q9T_C_SVM_01_CIFS_01\File08.txt
 Volume in drive \\Q9B_C_SVM_01\Q9T_C_SVM_01_CIFS_01 is Q9T_C_SVM_01_CIFS_01
 Volume Serial Number is 8000-0402

 Directory of \\Q9B_C_SVM_01\Q9T_C_SVM_01_CIFS_01

12/16/2015  01:53 PM                27 File08.txt
               1 File(s)             27 bytes
               0 Dir(s)     372,260,864 bytes free

C:\>


#### ---- SM Break: ---- ####

Q9B::> snapmirror break  Q9B_C_SVM_01:Q9T_C_SVM_01_VOL_01
Operation succeeded: snapmirror break for destination "Q9B_C_SVM_01:Q9T_C_SVM_01_VOL_01".
Q9B::>

#### ---- DNS A Record Update: ---- ####

Q9B::> traceroute -vserver Q9B_C_SVM_01 Q9T_C_SVM_01
traceroute to Q9T_C_SVM_01.xyz.com (192.168.1.123) from 192.168.1.133, 64 hops max, 44 byte packets
 1  q9t_c_svm_01.xyz.com (192.168.1.123)  0.342 ms  0.224 ms  0.204 ms
Q9B::>

Change DNS A record and point to DR IP 192.168.1.133:

Q9B::> traceroute -vserver Q9B_C_SVM_01 Q9T_C_SVM_01
traceroute to Q9T_C_SVM_01.xyz.com (192.168.1.133) from 192.168.1.133, 64 hops max, 44 byte packets
 1  q9b_c_svm_01.xyz.com (192.168.1.133)  0.155 ms  0.125 ms  0.127 ms
Q9B::>


#### ---- Stop Soure Vserver: ---- ####

Q9T::> vserver stop Q9T_C_SVM_01
[Job 333] Job succeeded: DONE
Q9T::>


#### ---- Check share access using Prod name : ---- ####

C:\>dir \\Q9T_C_SVM_01\Q9T_C_SVM_01_CIFS_01\File08.txt
A device attached to the system is not functioning.
C:\>

#### ---- Add CIFS Netbios alias : ---- ####

Q9B::> cifs show -vserver Q9B_C_SVM_01 -display-netbios-aliases

Vserver: Q9B_C_SVM_01

      Server Name: Q9B_C_SVM_01
  NetBIOS Aliases: -
Q9B::>

Q9B::> cifs add -netbios-aliases Q9T_C_SVM_01 -vserver Q9B_C_SVM_01

Q9B::> cifs show -vserver Q9B_C_SVM_01 -display-netbios-aliases

Vserver: Q9B_C_SVM_01

      Server Name: Q9B_C_SVM_01
  NetBIOS Aliases: Q9T_C_SVM_01

Q9B::>


#### ---- Restart CIFS server: ---- ####

Q9B::> cifs server stop Q9B_C_SVM_01
Q9B::>
Q9B::> cifs server start Q9B_C_SVM_01
Q9B::>


#### ---- Delete the AD amchine account for Prod site: : ---- ####

Action: AD Users and computers - Delete computer obhjct for Q9T_C_SVM_01


C:\Users\sanjeev.pabbi>dir \\Q9T_C_SVM_01\Q9T_C_SVM_01_CIFS_01\File08.txt
The security database on the server does not have a computer account for this workstation trust relationship.
C:\Users\sanjeev.pabbi>

#### ---- Restart CIFS DR Server  : ---- ####

Q9B::> cifs server stop Q9B_C_SVM_01
Q9B::> cifs server start Q9B_C_SVM_01
Q9B::>

Q9B::> cifs server show -display-netbios-aliases

Vserver: Q9B_C_N_SVM_03

      Server Name: Q9T_C_N_SVM_03
  NetBIOS Aliases: -

Vserver: Q9B_C_SVM_01

      Server Name: Q9B_C_SVM_01
  NetBIOS Aliases: Q9T_C_SVM_01

Vserver: Q9B_C_SVM_02

      Server Name: Q9B_C_SVM_02
  NetBIOS Aliases: Q9T_C_SVM_02
3 entries were displayed.

Q9B::>

#### ---- VALIDATION: Check access to shares using Prod name - OK: ---- ####

C:\Users\sanjeev.pabbi>dir \\Q9T_C_SVM_01\Q9T_C_SVM_01_CIFS_01
 Volume in drive \\Q9T_C_SVM_01\Q9T_C_SVM_01_CIFS_01 is Q9T_C_SVM_01_CIFS_01
 Volume Serial Number is 8000-0402

 Directory of \\Q9T_C_SVM_01\Q9T_C_SVM_01_CIFS_01

12/16/2015  01:53 PM    <DIR>          .
12/04/2015  09:40 AM    <DIR>          ..
11/30/2015  10:14 PM                20 File1.txt
11/30/2015  11:44 PM    <DIR>          dir1
12/02/2015  09:44 PM                26 File2.txt
12/03/2015  02:43 PM    <DIR>          dir2
12/03/2015  02:23 PM                58 File3.txt
12/03/2015  04:28 PM                58 File4.txt
12/03/2015  09:58 PM                61 File5.txt
12/03/2015  10:37 PM                22 File6.txt
12/03/2015  10:59 PM                45 File7.txt
12/16/2015  01:53 PM                27 File08.txt
               8 File(s)            317 bytes
               4 Dir(s)     372,170,752 bytes free

C:\Users\sanjeev.pabbi>


#### ---- VALIDATION: Check again RW access on DR site using Prod name: - OK: ---- ####

C:\Users\sanjeev.pabbi>echo Hello Dear --------------- > \\Q9T_C_SVM_01\Q9T_C_SVM_01_CIFS_01\File09.txt
C:\Users\sanjeev.pabbi>

C:\Users\sanjeev.pabbi>dir \\Q9T_C_SVM_01\Q9T_C_SVM_01_CIFS_01
 Volume in drive \\Q9T_C_SVM_01\Q9T_C_SVM_01_CIFS_01 is Q9T_C_SVM_01_CIFS_01
 Volume Serial Number is 8000-0402

 Directory of \\Q9T_C_SVM_01\Q9T_C_SVM_01_CIFS_01

12/16/2015  02:32 PM    <DIR>          .
12/04/2015  09:40 AM    <DIR>          ..
11/30/2015  10:14 PM                20 File1.txt
11/30/2015  11:44 PM    <DIR>          dir1
12/02/2015  09:44 PM                26 File2.txt
12/03/2015  02:43 PM    <DIR>          dir2
12/03/2015  02:23 PM                58 File3.txt
12/03/2015  04:28 PM                58 File4.txt
12/03/2015  09:58 PM                61 File5.txt
12/03/2015  10:37 PM                22 File6.txt
12/03/2015  10:59 PM                45 File7.txt
12/16/2015  01:53 PM                27 File08.txt
12/16/2015  02:32 PM                29 File09.txt
               9 File(s)            346 bytes
               4 Dir(s)     372,154,368 bytes free

C:\Users\sanjeev.pabbi>

 

Hope it helps,

 

-- Sanjeev Pabbi

 

 

 

 

 

 

Public