Data Backup and Recovery

SnapProtect, cDot, and NDMP Backups

MURRAYCH1
6,896 Views

Hey all,

 

Wondering if anyone else is doing NDMP backups to tape from cDot via Snapprotect?  I have it working but the speed is atrocious (~3GB/hr).  This is something I'm seeing that feels specific to cDot NAS volumes...I can back up my 7-mode systems and SAN volumes from my cDot systems at normal speed.

 

Running against a FAS8040 running Ontap 8.2.2P1.  Snapprotect 10 sp10.  Backing up NAS Data (Cifs volumes).

 

The test backups I'm running are on a couple volumes of maybe 3TB total.  While yes, there are "lots of files" it's not an extreme situation (home drives for users, mostly).  

 

Not really sure where to go from here to troubleshoot.

7 REPLIES 7

dankirkwood
6,857 Views

Hey Murray

 

Are you running your backups via a MediaAgent server or are your tapes direct-connected (or SAN-connected) to the cDOT nodes (aka local, LAN-free or 3-way NDMP backup)?

 

I'm doing the former (running NDMP backups via a MediaAgent) environments I'm working on, and getting in excess of 400GB per hour to tape off SATA disks on a FAS8040 pair running cDOT 8.2.2. The tape drives are LTO6, and are FC-connected to a physical MediaAgent server. I've got 10Gb ethernet between the MediaAgent and the cDOT cluster.

 

If you're not doing LAN-free NDMP  backups, you might find that the NDMP data connection is not going over the interface that you expect it to. Try doing a "netstat" on your MediaAgent when the backups are running to see if the data is coming from the IP address you're expecting. If not, you can try editing the "Host name" of the client in SnapProtect: open the client properties, then click Edit next to hostname, and set it to a DNS name or IP address of a LIF (it can be a data LIF or an intercluster LIF) that has good connectivity to the MediaAgent (i.e. not something hosted on e0M :))

 

MURRAYCH1
6,856 Views

It is going through a media agent.  This is a Cisco UCS environment, so the Media Agent is a blade in there, the library is a Quantum i80 (FC) connected to the upstream FC Switches (same ones the UCS FI's are uplinked to), and the same FC switches are where the NetApp systems uplink to (Nexus switches, so there is a combination of FC and 10GB Ethernet from NetApp to the Nexus's).

 

So yes, it is "three way".  The weird part is that the three-way thing seems to work fine for a 7-mode filer, and for SAN volumes off this particular cDot 8040.

 

The tape drives here are "just" LTO-5, but as other backups still end up giving me over 400GB/Hr it's not that.  Nor is it the disks being busy as I have now watched this particular backup over the last (literally) 141 hours to see if it was some NDMP thing.  Looking in the logs the Pass1/2/3 stuff does take a while but it's not the case where "once it hits Pass IV it speeds up to normal".  It's weird.

 

I will check the netstat thing though.  

MURRAYCH1
6,846 Views

Well, **bleep**.  The media agent is talking to the 8040 over the cluster management port, which is in fact coming off an e0M port.  

 

I have this in vServer scope mode (i.e NOT node-scope-mode) which I don't know if that's actually a good thing.

 

So I guess I will try to figure out how to get this to do NDMP over one of the data ports 🙂  I'm not 100% convinced this is the total solution but it's the best thing I have currently.

MURRAYCH1
6,837 Views

Ok, thanks for this.  Definitely the issue.  This is a 4-node cluster with two 8040 and two 3220 nodes; the data in question lives on one of the 3220's and it was pulling off it's e0M port (which on a 3220 is a 100mbit connection).  

 

I went into the cluster and set the -preferred-interface-role for the admin vserver to just "intercluster" and assumed that would fix it (since I have intercluster LIF's on each node) but now it Snapprotect complains that the client (in this case my admin vserver) cannot connect to the media server to start a backup.  Only worked when I re-allowed "node-mgmt" in the preferred interface type.

 

Now I'm not entirely sure how to proceed.  Is it preferred to have this in node-scoped mode?

MURRAYCH1
6,830 Views

A HA.  And the problem is now solved.

 

Turns out I had broken routing groups for my intercluster LIF's (Which are still too **bleep** complicated to set up).  I had everything set to prefer those LIF's but they couldn't actually route.  Added route entries so I could ping the LIF's (which were in place but unused) and voila, now my NDMP is going over the correct interface abd a few mins into my backup it's about to break 400GB/hr.

dankirkwood
6,808 Views

Excellent! I might have led you astray with that "change the hostname" thing in SnapProtect - I think that is for setting up the control connection to the filer, and then the filer initiates a connection back to the DMA (SnapProtect) according to that "-preferred-interface-role" setting you mentioned.

 

Here's the relevant part of the cdot commands doco (for the benefit of anyone else who comes here with the same issue) : 

 

Command: vserver servervices ndmp modify [-preferred-interface-role {cluster|data|node-mgmt|intercluster|cluster-mgmt}, ...] -

 

Preferred Interface Role

 

This option allows the user to specify the preferred Logical Interface (LIF) role while establishing an NDMP data connection channel. The NDMP data server or the NDMP mover establishes a data channel from the node that owns the volume or the tape device respectively. This option is used on the node that owns the volume or the tape device. The order of IP addresses that are used to establish the data connection depends on the order of LIF roles specified in this option.

 

The default value for this option for the admin Vserver is intercluster, cluster-mgmt, node-mgmt

The default value for this option for a data Vserver is intercluster, data.

 

^^ reading this above (my bold) I think SnapProtect must have been connecting to the cluster management LIF, otherwise by default, ONTAP wouldn't have tried to use the node management LIF. If you change that hostname setting to be a management LIF on your data vserver, it would probably only have been able to use the intercluster or data port.

 

Anyway I'm glad you fixed it. I'm still getting my head around the CAB extensions myself, and SnapProtect is never quite as straightforward as I'd like 🙂 10GbE is way faster than 100Mb, right? 😉

 

MURRAYCH1
6,794 Views

Exactly, I was connecting to the Admin vServer as my client (because at the time that seemed to make sense) and as well I couldn't get it to work using any of the vServers directly (kept complaining they couldn't connect to the Media Server).  Eventually it became clear WHY...I did have the preferred interface role for the vservers as intercluster,data; there were intercluster LIFs (with broken routing) and no data lif's at all.

 

Add routing and voila, now connecting to the vServers for backups works as expected.

Public