Disaster Recovery Options

PENGUINONTAP9 · ‎2018-04-10

We are looking to perform a DR failover test of our cifs\luns from live to our DR site across our FAS2554 8.2 7-mode storage, the site will run live for a number of weeks before we fail back to live.. All data is currently snapped from live to DR. High level steps to be performed, break snapmirror relationship, create shares on DR filer, change DNS record to DR filer, attatch the luns. resync snapmirror from DR to live.

Is there anyting we can do in advance to minimize the risk \ time to perform this DR test? Currently the shares only exist on the live site, could we configure these to be hidden until required.

GidonMarcus · ‎2018-04-10

Hi

You can't really change a share name from CLI (just re-create it). but - i don't see harm in creating the in advance. as long as the volumes / qtrees are in mirrored state they are write protected. so you shouldn't worry about having users saving data on the DR.. ..

as in DR scenario you anyway not likely to have access to the old shares. i would always leave them configured. or at lease a cheatsheet of commands to create them.

As for DNS move. if you taking the filer name as-is or CNAME as have kerberos delegation. make sure you taking the kerberos delegation as well. there were a few patches in recent years that preventing an already delegated name to be used against another name.

as for the cut over.. i tend to be very careful with it. i know some people (especially MGMT) want to see it as realistic as possible and to just simulate a big bang. but they tend to miss all the possibilities of data loss during the process. so i'm listing here the things i would add:

1. backup. don't relay on the backup from the night before. make a dedicated on as close as you can before the failover.

2. shut down properly the VMs / SAN client beofre the last mirror. otherwise the file system will try to recover .

3. shut the access to the LUNs and shares beofre the last mirror (by removing the DNS record and kill sessions if needed) verify there's no active sessions (cifs sessions;iscsi session show;fcp show initiator;vfiler run * nfsstat -l) . you don't want someone writings something in before the last mirror and loss this data.

4. do two updates for the snapmirror before releasing. there's some scenarios where snamomirror will try to use a previous snapshot when doing a re-sync,

5. make sure you reverse the mirror as soon as possible (even before you repoint the DNS). otherwise you don't have real DR again if the DR site going bad. and the volumes on the other site a writable and you might end up with someone still mange to right there (with cached DNS/netbios names, direct use of IP, ETC)

6. make sure you run backups while you in DR. you might have funny cascade created before that would not be relevant for the DR scenario

G

Gidi Marcus (Linkedin) - Storage and Microsoft technologies consultant - Hydro IT LTD - UK

PENGUINONTAP9 · ‎2018-04-12

Thanks, DNS is a static A record so no issues on the redirect\keberos. Can we CLI the snapmirror update \ break or is this best performed via System Manger?

GidonMarcus · ‎2018-04-12

Hi

my internal procedure utilize SystemManager only. not CLI,,,, and it worked successfully around 15 times since i joined the company,

the delegation is not required only for CNAMEs. it require for any DNS name used for access CIFS. it also created by default for the filer Hostname and FQDN when joining a filer to the domain.

so if you re-pointing the actual filer name to another filer (another computer object) it will not be able to kerborise... and will not fallback to NTLM (see https://blogs.technet.microsoft.com/askds/2008/06/11/kerberos-authentication-problems-service-principal-name-spn-issues-part-3/ for exactly what happens when you have a delegation pointing to the wrong computer object)

if you don't have delegation against the A record (can confirm with setspn.exe -q */dnsname) then you probably ok for the DR and it will manage to fallback to NTLM (not recommended in today's world. but.....)

Gidi

Gidi Marcus (Linkedin) - Storage and Microsoft technologies consultant - Hydro IT LTD - UK

PENGUINONTAP9 · ‎2018-04-12

Hi,

There is no delegation against the A record or filer hostname, this is used for drive mappings ie "NETAPP\sharename". Thanks for the heads up on the NTLM.

What are your thoughts on sitting a WFS 2016 in front of the storage at live and DR sites for client access shares using DFS Namespace, which we can leverage in a DR without a DNS change.

We are also thinking about using DFS-Replication and removing the repliaction from the stoarage stack to windows, i have my concerns around placing our data replication in the hands of the windows.

GidonMarcus · ‎2018-04-12

Hi

in short - i don't like neither DFSr and DFSn.

DFSr - is not really scalable. And cannot handle lots of files and frequent updates. Large file quantity can take days to replicate.

DFSn- i see as another point of failure added: as instead of being depended on DNS + single AD authentication request, you now depended on DNS, DFS server, and two-authentication requests/ If you have multiple sites you will likely want local DFS server next to each filer, which is also not very scalable.

Also, DFSn make it very unclear for help-desk/local support to handle (it’s not just a ping and check a clock skew anymore. They have one more layer that they have no idea how it work).

I don’t see the big advantage DFSn adds. My boss is very in favour of it (saying it’s more enterprise way. And centralize all the dozens CNAMEs we have).

But for me it much easier to script around DNS only (as I can’t get rid of existing CNAMEs as they are hardcoded in million places)

and I also don’t want to bother changing the DFSn structure every time the organization structure changes (twice a year there is a ORG change somewhere in my organization).

Sorry for the steaming :-D. this a subject keeping me busy for almost a year now in my org..

G

Gidi Marcus (Linkedin) - Storage and Microsoft technologies consultant - Hydro IT LTD - UK