Hi All, We currently have a V3140 (7.3) backed off onto an XP12000. We are getting amazing results from the backup operation which averages around 150mbs/s per controller (We have a cluster). The problem is we experience very high latencies between 50-250 which causes the system to be very unresponsive and in more recent cases has started impacting services on machines failing and on linux boxes puts the read only mode. Has anyone experience this and is there a way of setting priority on NDMP dups to be lower than normal operations?
You might want to take a look at "FlexShare" (if I didn't get the marketing gobeltigook wrong), i.e. the "priority" command as a good stop gap measure (the manpage is an ok place to start). It might even just solve the problem for you. There are a couple of TR's (3459 is one) on this as well. Basically, this will let you set I/O priorities per volume on a global scale. You don't need any extra licenses or anything,
A quick run through for you, depending on your setup, would be:
1. Get an overview of expected I/O priorities for all of your volumes on both cluster members. I used a quick Excel table with volume, description, level, system, cache columns.
2. Fill in the Excel table. You can then parse that for some simple command lists/scripts to implement this.
3. Enable priority globally with "priority on" on both filer heads (controllers).
4. Enable priority for all volumes simply with 'priority set volume <volname> service=on'. At this point you haven't really changed a lot, but the I/O will be a bit more "even". You might want to try some quieter point in the day to do this. All of the volume level and system priorities are set to "Medium" at this point (the default).
5. If you really just want to affect the NDMP backup problem, go through and set your "system" priority per volume to "Low" on the volumes that use NDMP for backup. Then NDMP backup should get its I/O prioritized lower than normal user/volume access.
The rest is just a matter of knowing which volumes need higher/lower access. Remember, the system priority is still relative to the volume priority, so a volume with volume=High and system=Low will still get its I/O prioritized above a volume with volume=Medium and system=Low. Priority is still smart enough to avoid I/O starvation of volumes (as far as I've seen) with lower priorities.
You should be able to plan and implement this in a few hours. The support case will definitely take infinitely longer, unfortunately.
Your backups may well take a lot longer now. You will probably need to track how things are going and look for errors. I've implemented this on some pretty heavily loaded filers in production without noticeable problems. Read the TR a few times until you get the hang of it. The FlashCache TR (3832) will also tell you how to get better performance out of PAM cards (if you have any) by toggling the "cache" setting.
Have you opened a support case with NetApp Global Services? We have exerts that help you collect some data to pinpoint exactly what the issue is, and suggest corrective actions. As performance issues are difficult to accurately assess without detailed information, I think that a support case is the most efficient means to get your problem solved.
Daniel Isaacs Technical Marketing Engineer - VBU V-Series