Subscribe

Failover of CIFS - Issues with DFS

[ Edited ]

Perhaps those of you who use DFS can help.

 

I have a pretty typical setup with a cluster at my prod site and a cluster at my DR site, with snapmirror protecting my CIFS. My hardware is AFF8080s running CDOT 9.

 

I have a DNS A record for my production CIFS SVM: Let's call it

 

uscifsProd1.companyname.net

 

and a DNS A record for my DR SVM. Let's call it

 

uscifsDR1.companyname.net

 

I also have a CNAME record pointing to the production CIFS SVM:

 

usfs1.companyname.net >uscifsProd1.mathematica.net

 

I use this name in my DFS namespace, so a typical target looks like:

 

\\usfs1.companyname.net\Project1\NYC

 

During a failover, I will "float" the CNAME record over to my DR CIFS SVM:

 

usfs1.companyname.net >uscifsDR1.mathematica.net

 

 

 

I also change the Service Principal Names for the CIFS service:

 

Old:

 

setspn.exe -D HOST/usfs1.companyname.net USCIFSProd1

 

setspn.exe -D HOST/usfs1 USCIFSProd1

 

New:

 

setspn.exe -A HOST/usfs1.companyname.net USCIFSDR1

 

setspn.exe -A HOST/usfs1 USCIFSDR1

 

 

I then force a replication in AD. Once the DNS change propagates, clients should be able to access the CIFS shares at the DR site.

 

Key facts:

 

The CNAME record is updated and responds correctly to pings.

 

The workstation can access the share if I browse directly via

 

\\usfs1.companyname.net\Project1\NYC

or

 

\\uscifsDR1.companyname.net\Project1\NYC

However, when I browse to the network locations via the drive letter assigned to the namespace, i.e.

 

N:\Project1\NYC

 

or the UNC which uses the namespace, i.e.

 

\\companyname.net\NDrive\ProjectVol\Project1\NYC

 

I receive an error: The network path cannot be found.

 

A wireshark trace reveals a Kerberos mismatch. So it's not a network issue; it's that Kerberos is failing.

 

I have tried using KLIST to purge every ticket I can think of, including those of the network service, and the local system account. I have also purged the DFS caches using DfsUtil. All to no avail.

 

I do know that the client gets its DFS info through the Workstation Service. And, restarting the Workstation service (or rebooting the client) clears the issue.

 

So, question one:

 

1) Is there a way to remedy the issue without rebooting the clients (~1500) or restarting the Workstation service?

 

2) If not, is there another/better way to engineer the failover? I am NOT willing to move my CIFS service to a Windows environment as many have suggested, for many reasons.

 

I've considered instead modifying the links in the namespace directly via a script but this would obviously not be preferred as I'd much rather change one CNAME record than 2000 DFS target links.

 

Thanks.