Subscribe

NDMP backup failure for large file system after 16hrs

Hello Friends,

 

NDMP Backup is failing with netbackup status code 99 for a File system of size 17 TB after 16 hrs from the start of the backup. No Progress in any of the NDMP phases.

 

 

Filer Backup logs:

 

dmp Sun Jan 22 02:00:43 CST 2017 /vol/pool2/(7) Error (job aborted)
dmp Sun Jan 22 02:00:43 CST 2017 /vol/pool2_1/(7) Tape_close (ndmp)
dmp Sun Jan 22 02:00:43 CST 2017 /vol/pool2_1/(7) Abort (0 MB)
dmp Sun Jan 22 02:00:43 CST 2017 /vol/pool2_1/(7) Log_msg (reg inodes: 3232764 other inodes: 22265 dirs: 165388 nt dirs: 0 nt inodes: 0 acls: 0)
dmp Sun Jan 22 02:00:43 CST 2017 /vol/pool2_1/(7) Log_msg (Phase 1 time: 0)
dmp Sun Jan 22 02:00:43 CST 2017 /vol/pool2_1/(7) Log_msg (Phase 3: directories dumped: 0)
dmp Sun Jan 22 02:00:43 CST 2017 /vol/pool2_1/(7) Log_msg (Phase 3: wafl directory blocks read: 0)
dmp Sun Jan 22 02:00:43 CST 2017 /vol/pool2_1/(7) Log_msg (Phase 3 throughput (MB sec): read 0 write 0)
dmp Sun Jan 22 02:00:43 CST 2017 /vol/pool2_1/(7) Log_msg (Percent of phase3 time spent for: reading inos 0% dumping ino 0%)

 

Netbackup logs:

 

01/08/2017 10:45:41 - Error ndmpagent (pid=51001) aborting operation - no mover progress

01/08/2017 10:45:43 - Error ndmpagent (pid=51001) NDMP backup failed, path = /vol/pool2_1/

01/08/2017 10:45:43 - Info ndmpagent (pid=51001)  tapprd01-b: DUMP: job aborted

01/08/2017 10:45:43 - Info ndmpagent (pid=51001)  tapprd01-b: DUMP: DUMP IS ABORTED

 

I have tried a dump with null to this volume , it works fine (cmd used ::"dump -0f null /vol/pool2_1")

 

Not sure on the trace of this issue.

 

Thanks
Saran

 

Re: NDMP backup failure for large file system after 16hrs

Unfortunately this doesn't tell us anything except the dump was aborted.

I would recommend you to enable ndmp debug logs prior to the backup start

following this kb

https://kb.netapp.com/support/s/article/how-to-enable-ndmp-debugging-in-data-ontap

then as soon as backup failed collect ndmp and backup logs with mentioning what volume was backed up time of start\fail

then open a case to Netapp with provided collected data

Re: NDMP backup failure for large file system after 16hrs

  1. Error 99 is a generic error, used for NDMP backup to disk or tape.

    You may be out of resources because your filer reached the maximum number of threads.
    Because you're backing up to tape, you might want to try the steps in this article:

    https://kb.netapp.com/support/s/article/ka11A00000015mnQAA/troubleshooting-workflow-ndmp-backup-fails-to-start?language=en_US

  2. There are NDMP session limititations for either disk or tape backups.  Take a look at this article:

    https://kb.netapp.com/support/s/article/ka21A0000000j8DQAQ/how-do-ndmp-session-limitations-work-with-data-ontap-8-2-and-later?language=en_US

    Dump and Restore sessions: These are NDMP sessions directly responsible for backing up data to and from disk or tape. They interface with the NDMP data or tape service.

    Control or 'common' sessions: These are bi-directional NDMP control connections responsible for NDMP control commands or messages between the NDMP client (typically a backup application) and the NMDP server (typically a NetApp Storage Controller)

            NDMPsessions.PNG

 

Re: NDMP backup failure for large file system after 16hrs

[ Edited ]

We have decided to split the backup with sub folders of multiple policy. Hope it might work well, however still doesn't understand cause of this issue.