ONTAP Discussions

Filebased backup from CIFS Vault shares cause high latency and utilization

thomasb82
4,678 Views

Hi guys,

 

we use SnapVault to replicate various volumes (NFS and CIFS) from a remote 8.3 Cluster to our internal 8.3 Cluster.

So far so good.

 

Now we would like to Backup the replicated CIFS shares with TSM to tape.

Everytime a backup starts on a read-only Vault share we get warning and ciritcal alerts regarding CIFS latency.

 

For comparison:

~ 140us on "normal" read-write CIFS shares

~ 502992us on the replicated VAULT CIFS shares

 

This also happens if tools like xcopy, robocopy etc. are used to copy the files.

 

 

statistics (just 5 seconds) while doing backups

cluster.cluster: 7/1/2015 21:44:00
cpu cpu total fcache total total data data data cluster cluster cluster disk disk pkts pkts
avg busy ops nfs-ops cifs-ops ops spin-ops recv sent busy recv sent busy recv sent read write recv sent
---- ---- -------- -------- -------- -------- -------- -------- -------- ---- -------- -------- ------- -------- -------- -------- -------- -------- --------
Minimums:
23% 41% 5239 611 4273 0 4241 5.56MB 4.86MB 0% 5.14MB 4.69MB 0% 142KB 120KB 528KB 0B 6302 8296
Averages for 15 samples:
28% 67% 6120 1488 4631 0 5111 7.26MB 49.2MB 3% 6.99MB 49.0MB 0% 282KB 280KB 13.1MB 6.94MB 9862 41362
Maximums:
40% 83% 7554 2927 4774 0 6514 12.2MB 137MB 11% 11.9MB 136MB 0% 664KB 659KB 38.7MB 26.3MB 15809 106834

 

 

Did anyone experience similar issues?

Anything I could try to resolve this? (exchanging the backup software is not an option currently)

 

Many thanks!

6 REPLIES 6

JGPSHNTAP
4,675 Views

So i'm on the ame page as you, are you using ndmp?

 

And what is your disk setup on the backup side?

thomasb82
4,665 Views

Unfortunatly it`s just basic file-copying. No NDMP for now.

It does not matter if we use TSM, robocopy, xcopy etc. also it does not matter if we copy it to a VM on the same cluster, a QNAP NAS, or to a physical windows server.

 

NetApp support told us "low-end-systems" like ours (2552) can be affected of the bug  BURT "880471" - and our system is affected.

They said it was caused by a lif that was not on his home port. But we have this issue even if all lifs are on their home ports.

So far we did not get a proper solution.

 

JGPSHNTAP
4,657 Views

Well, that's an ugly way to backup..

 

Ok, what's your disk setup..  How many disks in the aggregate...

 

 

thomasb82
4,513 Views

I know it`s not ideal and it`s going to be changed but not at this time.

 

We have 2 Shelfs, each have 20x900GB SAS and 4x200GB SSD.

Out of those we have built 2 aggregates with the same assignment.

 

I hope 8.3.1 or a future release will fix it, if this is an hardware issue I hope we get replacements.

 

 

JGPSHNTAP
4,155 Views
Ok i apologize for keep asking but i want to make sure i follow.

Your vaulted shares seem to be going to an aggregate with enough iops based on your above disk layout. I assume you have hybrid aggrs with sas and ssd.
Gimme exact raid group layout of the destination shares underlying aggr. I just want to triple check

Also did someone accidently put qos on?

thomasb82
4,151 Views

raid type = dp for SAS, 4 for SSD

raid gr. size = 19

raid alloc = 18

1 flashpool with 6 disks

 

the latency during file copys/backups on the vault cifs shares are 300-400x higher than on the normal cifs shares.

And the normal shares are about 3TB (no backups / latency issues) compared to the vault shares with only 50GB.

 

Both reside on the same aggr. So I think this is a software related issue.

QoS is active for all CIFS shares (100MB/s).  Nothing improves when QoS is disabled.

 

Public