We have a 2040 (OnTapp 8.0.2P2) on a DR site to which we snapmirror from our live site.
We have a volume for archiving general files from live and another vol for archiving old Virtual machines...
The DR filer is fibre attached to a Quantum Scalar i80 with LTO5 drives
Also fibre attached is a Netbackup 7.5 server
We're seeing a consistant ceiling of approx 65MB/s using direct NDMP. I see people on here bemoaning 120MB/s limitations, if we could get anywhere near that I would be very happy.
So today I am doing an ndmp backup of old virtual machines - it's a very low file count, approx 900GB of data and it's still only getting 65MB/s
The underlying aggregate is only 13 x 7200K Sata disks. This is a capacity system, not performance. However there's not a lot going on with this filer apart from receiving snapmirrors from live so I'm surprised if the disks can't push harder than that.
Are there any methods to diagnose where the bottleneck is?
Am I expecting too much from these disks?
Are there any tuning parameters I can change?
Lastly, in case I have a fit of bookishness, is there a good tech doc which covers NDMP performance that I might reasonably understand? (I have a lot of other systems to look after too and am not a netapp expert)!
Can you run sysstat -x 1 on the filer for a few minutes while a backup is in progress and post the output here. We can then see if you are hitting any bottlenecks.
There are a couple of different ways that NDMP can be configured; the filer can stream the backup data to the backup server which then forwards it to tape; or the backup server can direct the filer to send the backup data directly to the tape. Can you confirm which of the two modes you have set it up in ? This should be a configuration option in NetBackup somewhere.
Another experiment that might be worth trying would be to use the dump command (see the command reference manual for usage and examples). The backup speed you get with dump might isolate the problem to the filer itself or to the network.
Your two readouts seem to be very different, as if the filer was doing different things.
The first paste of the text shows lots of data coming in over the network and being written to disk, and high CPU utilization, which as abzorkenov says looks like an incoming snapmirror update. What are your snapmirror schedules ? You'd normally want to be very sure that the snapmirror updates are complete before you start your tape dump.
The second drop, the screenshot, shows minimal data coming in over the network and being written to disk, but as you have said correlated disk read and tape write numbers.
It's notable in both cases that the disk utilization is typically in the 90s, often 100%. Note that this is the utilization of the busiest disk (not all the disks). This may be a hint that the disks are being maxed out. Which in turn may be a hint that your data is heavily fragmented and there's a lot of seeking going on.
A couple of further questions .. what's the output of sysconfig -r ? (shows your raid config). Did you gradually add disks to the aggregate to grow it, or did you add them all at once ? How full is the aggregate and the volumes within it ? (df -A and df -r will show this). Are you using volume or qtree snapmirror ? If you are using volume snapmirror then how many snapshots are being retained on the source volume ? Are you doing full NDMP backups or are you doing differential backups ?
if you're doing very regular snapmirror updates to an aggregate/volume which is nearly full, there might be a lot of fragmentation going on, and if the aggregate was grown gradually over time it would exacerbate the problem.
dump to null is a good test to see max throughput it can dump bypassing tape (although like mentioned tape isn't the bottleneck) with "dump 0f null /vol/volname".
With a snapmirror target, there is wafl deswizzling...that can really take an effect of backups with contention for I/O. If you run "priv set advanced ; wafl scan status" do you see deswizzling operations? It would be good to schedule backups offset from snapmirror updates to work around deswizzling. A perfstat during backup and a performance case will help identify this too and troubleshoot any other bottlenecks.
For a FAS2040 with SATA and a snapmirror target this doesn't sound like unexpected performance, but maybe some tuning and workarounds like listed can help speed it up.
Sorry to hear that Andy, best of luck sorting that out.
(Ever thought about getting rid of the tape side of things altogether ? You could hook up a rake of cheap SATA drives to your 2040 and use SnapVault to retain the archives. Deduplication is the big thing that makes this possible.)