This past weekend was spent examining why the throughput of NDMP/SMTAPE operations varied so much.
I do not have an answer as yet just more mystery.
The configuration is three LT05 tape drives connected over 4GB SAN fabric to a FAS3240 and Dell R815 (quad processor, 12 core, 256GB memory) with the NetApp being able to perform NDMP/SMTAPE operations direct to tape.
What has been found is:
- Using NetBackup V7 with NDMP/SMTAPE operation, the backup operation can go directly from the filer to tape without talking with the R815 server
- If the source volume and dozens of snapshots within it is a CIFS share with over 8 million files can be put to a LT05 tape drive at upwards of 113MB/sec.
- If the source volume and dozens of snapshots within it contains LUNs which are used by our Exchange 2010 servers the writing to a LT05 tape drive does not get above 10MB/sec. I have found it to be as low as 3MB/sec.
- Both volumes can be on the same or different controllers. Does not make a difference.
- Both volumes can be in the same aggregate. Does not make a difference.
- The volumes can reside on 7.2K SATA or 15K SAS. Does not make a difference.
- The volume sizes have ranged from hundreds of GB through 7TB. Does not make a difference.
- This same ratio happens with multiple volumes of CIFS and Exchange data.
- The NDMP/SMTAPE commanding can originate from NetBackup or from the filer command line using 'smtape backup ...'. Does not make a difference.
- Under the covers it looks like NDMP is using snapmirror functionality to perform the data transport to tape.
- There is no 'throttle' option for NDMP/SMTAPE operations. An error message is displayed stating such. I was thinking the limit was due to the 'options replication.*' values I had set.
From what I can tell the NDMP/SMTAPE operation causes a new snapshot to be taken for a static version of the data to then be analyzed and sent to tape.
The controllers are not 'beat', nothing glaring (that I could think to examine) on the filer.
I have opened a support case and sent them this information and perfstats output to try and solve this puzzle.
The question is why would there be such a drastic difference in the throughput due to having CIFS versus LUNs within the volume?
I have no current answer and am wondering if others have found the same thing and maybe the answer/solution to getting great throughput all the time?!
Thanks.
pdc