Solved: Slow NDMP Backup to Tape After Upgrading to FAS8040 Cluster and 8.3

JRGLENNIE · ‎2016-10-20

Hello All-

We have been using Symantec NetBackup with our NetApp for doing NDMP backups to tape for quite some time. Recently, we migrated off of our old 7-mode filer to a new 8040 cluster running 8.3. I have been trying to get our tape backups running again and was able to set things up in NetBackup, but now whenever I try and run a full backup on any of the volumes, the backups only write at 150kb-200kb a second and the jobs eventually fail. I am using the same FC switch as before and updated the configuration on the switch and library to allow communication to the new FC adapters on the NetApp. I have 2 (although I recently disabled one of the adapters to see if that was causing the problem) FC adapters connected to node 1 on the NetApp to the FC switch, and I am using the Cluster management LIF as my NDMP host in NetBackup according to the following guide:

https://www.veritas.com/support/en_US/article.000025335.

Does anyone have any ideas? I've found some documentation from NetApp discussing how to troubleshoot poor backup performance, but I can't seem to set up a working "dump to null" command, which seems pretty integral to most of their troubleshooting steps. Still, I don't think the performance issue is caused by an overloaded controller.

Here's some of the output I am seeing from a given backup job in the NetBackup admin console:

10/19/2016 10:14:55 - Info nbjm (pid=6560) starting backup job (jobid=53570) for client CLUSTERMGMT, policy POLICY1, schedule SCHEDULE1
10/19/2016 10:14:55 - Info nbjm (pid=6560) requesting STANDARD_RESOURCE resources from RB for backup job (jobid=53570, request id:{3A3732BC-A6DD-4FB4-8C35-2C57512B093A})
10/19/2016 10:14:55 - requesting resource backup_svm-SCHEDULE1
10/19/2016 10:14:55 - requesting resource NB-HOST.NBU_CLIENT.MAXJOBS.CLUSTERMGMT
10/19/2016 10:14:55 - requesting resource NB-HOST.NBU_POLICY.MAXJOBS.POLICY1
10/19/2016 10:14:55 - granted resource NB-HOST.NBU_CLIENT.MAXJOBS.CLUSTERMGMT
10/19/2016 10:14:55 - granted resource NB-HOST.NBU_POLICY.MAXJOBS.POLICY1
10/19/2016 10:14:55 - granted resource 101323
10/19/2016 10:14:55 - granted resource IBM.ULTRIUM-TD3.004
10/19/2016 10:14:55 - granted resource NB-HOST-hcart3-robot-tld-3-CLUSTERMGMT
10/19/2016 10:14:56 - estimated 0 kbytes needed
10/19/2016 10:14:56 - Info nbjm (pid=6560) started backup (backupid=CLUSTERMGMT_1476886495) job for client CLUSTERMGMT, policy POLICY1, schedule SCHEDULE1 on storage unit NB-HOST-hcart3-robot-tld-3-CLUSTERMGMT
10/19/2016 10:14:56 - started process bpbrm (pid=11208)
10/19/2016 10:14:57 - Info bpbrm (pid=11208) CLUSTERMGMT is the host to backup data from
10/19/2016 10:14:57 - Info bpbrm (pid=11208) reading file list for client
10/19/2016 10:14:57 - connecting
10/19/2016 10:14:57 - Info bpbrm (pid=11208) starting ndmpagent on client
10/19/2016 10:14:57 - Info ndmpagent (pid=12152) Backup started
10/19/2016 10:14:57 - Info ndmpagent (pid=12152) PATH(s) found in file list = 1
10/19/2016 10:14:57 - Info ndmpagent (pid=12152) PATH[1 of 1]: /backup_svm/volume_to_backup
10/19/2016 10:14:57 - Info bptm (pid=10756) start
10/19/2016 10:14:57 - Info bptm (pid=10756) using 30 data buffers
10/19/2016 10:14:57 - Info bptm (pid=10756) using 65536 data buffer size
10/19/2016 10:14:57 - connected; connect time: 0:00:00
10/19/2016 10:14:58 - Info bptm (pid=10756) start backup
10/19/2016 10:14:58 - Info bptm (pid=10756) Waiting for mount of media id 101323 (copy 1) on server NB-HOST.
10/19/2016 10:14:58 - mounting 101323
10/19/2016 10:14:59 - Info ndmpagent (pid=12152) CLUSTERMGMT: Session identifier: 28042
10/19/2016 10:15:44 - Info bptm (pid=10756) media id 101323 mounted on drive index 8, drivepath /NODE1/nrst3a, drivename IBM.ULTRIUM-TD3.004, copy 1
10/19/2016 10:15:44 - Info ndmpagent (pid=12152) CLUSTERMGMT: SCSI: TAPE READ: short read for nrst3a
10/19/2016 10:15:44 - mounted 101323; mount time: 0:00:46
10/19/2016 10:15:44 - positioning 101323 to file 2
10/19/2016 10:15:47 - Info ndmpagent (pid=12152) NDMP 3Way - Data Affinity 13102a5c-7740-11e5-8b3a-f34bfadd9084 is not equal to Tape Affinity d1c11ad3-7740-11e5-b678-5fd0506b00a8
10/19/2016 10:15:47 - positioned 101323; position time: 0:00:03
10/19/2016 10:15:47 - begin writing
10/19/2016 10:15:49 - Info ndmpagent (pid=12152) CLUSTERMGMT: Session identifier for Mover : 28042
10/19/2016 10:15:49 - Info ndmpagent (pid=12152) CLUSTERMGMT: Session identifier for Backup : 30232
10/19/2016 10:15:49 - Info ndmpagent (pid=12152) CLUSTERMGMT: DUMP: Using "/backup_svm/volume_to_backup/../4hours.2016-10-19_0800" snapshot.
10/19/2016 10:15:49 - Info ndmpagent (pid=12152) CLUSTERMGMT: DUMP: Using Full Volume Dump
10/19/2016 10:15:51 - Info ndmpagent (pid=12152) CLUSTERMGMT: DUMP: Using 4hours.2016-10-19_0800 snapshot
10/19/2016 10:15:51 - Info ndmpagent (pid=12152) CLUSTERMGMT: DUMP: Date of this level 0 dump snapshot: Wed Oct 19 08:00:00 2016.
10/19/2016 10:15:51 - Info ndmpagent (pid=12152) CLUSTERMGMT: DUMP: Date of last level 0 dump: the epoch.
10/19/2016 10:15:51 - Info ndmpagent (pid=12152) CLUSTERMGMT: DUMP: Dumping /backup_svm/volume_to_backup to NDMP connection
10/19/2016 10:15:51 - Info ndmpagent (pid=12152) CLUSTERMGMT: DUMP: mapping (Pass I)[regular files]
10/19/2016 10:15:51 - Info ndmpagent (pid=12152) CLUSTERMGMT: DUMP: Reference time for next incremental dump is : Wed Feb 3 09:15:02 2016.
10/19/2016 10:15:51 - Info ndmpagent (pid=12152) CLUSTERMGMT: DUMP: mapping (Pass II)[directories]
10/19/2016 10:15:51 - Info ndmpagent (pid=12152) CLUSTERMGMT: DUMP: estimated 84603127 KB.
10/19/2016 10:15:51 - Info ndmpagent (pid=12152) CLUSTERMGMT: DUMP: dumping (Pass III) [directories]
10/19/2016 10:19:18 - Info ndmpagent (pid=12152) CLUSTERMGMT: DUMP: dumping (Pass IV) [regular files]
10/19/2016 10:20:52 - Info ndmpagent (pid=12152) CLUSTERMGMT: DUMP: Wed Oct 19 10:20:52 2016 : We have written 173211 KB.
...lines repeating as dump progresses
10/19/2016 16:33:13 - Info ndmpagent (pid=12152) CLUSTERMGMT: DUMP: Wed Oct 19 16:33:13 2016 : We have written 11166693 KB.
10/19/2016 16:36:04 - Error nbjm (pid=6560) nbrb status: LTID reset media server resources
10/19/2016 16:36:14 - Error ndmpagent (pid=12152) terminated by parent process
10/19/2016 16:36:14 - Info ndmpagent (pid=0) done
10/19/2016 16:36:14 - Info ndmpagent (pid=12152) Received ABORT request from bptm
10/19/2016 16:36:14 - Error ndmpagent (pid=12152) NDMP backup failed, path = /backup_svm/volume_to_backup
10/19/2016 16:36:14 - Error ndmpagent (pid=12152) CLUSTERMGMT: DUMP: Write to socket failed
10/19/2016 16:36:14 - Error ndmpagent (pid=12152) CLUSTERMGMT: DUMP: DUMP IS ABORTED
10/19/2016 16:36:14 - Warning ndmpagent (pid=12152) CLUSTERMGMT: DUMP: Total Dir to FH time spent is greater than 15 percent of phase 3 total time. Please verify the settings of backup application and the network connectivity.
10/19/2016 16:36:14 - Error ndmpagent (pid=12152) CLUSTERMGMT: DATA: Operation terminated (for /backup_svm/volume_to_backup).
10/19/2016 16:36:15 - Error ndmpagent (pid=12152) CLUSTERMGMT: BACKUP: job aborted
10/19/2016 16:36:15 - Error ndmpagent (pid=12152) CLUSTERMGMT: BACKUP: BACKUP_NET IS ABORTED
10/19/2016 16:36:15 - Info ndmpagent (pid=12152) CLUSTERMGMT: MOVER: Tape writing operation terminated
10/19/2016 16:37:45 - Info ndmpagent (pid=0) done. status: 150: termination requested by administrator
10/19/2016 16:37:45 - end writing; write time: 6:21:58
client process aborted (50)

JRGLENNIE · ‎2016-10-20

Nevermind, looks like I figured it out. The new cluster has 16/8/4 capable FC adapters and the library was using 4/2/1. The tape library was connected at 4g and the NetApp FC adapters were connecting at 8. I couldn't find a way to force the speed to 4g on the NetApp side (only had the options for FC adapters in target mode, these were in initiator so I could connect to library) but I was able to set the port speed on the switch to 4g. After doing that, NDMP jobs now write at ~78,119 kbps (~268mb per hour). The jobis able to run to completion now and I no longer receive any of the NDMP error messages.

View solution in original post

JRGLENNIE · ‎2016-10-20

Nevermind, looks like I figured it out. The new cluster has 16/8/4 capable FC adapters and the library was using 4/2/1. The tape library was connected at 4g and the NetApp FC adapters were connecting at 8. I couldn't find a way to force the speed to 4g on the NetApp side (only had the options for FC adapters in target mode, these were in initiator so I could connect to library) but I was able to set the port speed on the switch to 4g. After doing that, NDMP jobs now write at ~78,119 kbps (~268mb per hour). The jobis able to run to completion now and I no longer receive any of the NDMP error messages.