Data Backup and Recovery

NDMP Backup hanging in phase 1 (calculating inodes) [aborting operation - no mover progress]

axsys
8,146 Views

Hello Everyone,


We've already had this issue with a FAS3020 in the past. As I understood, the Filer just couldn't count the files/inodes fast enough to create the inodemap before the NetBackup job run into a "timer expired" issue or better said: "aborting operation - no mover progress". With the new system bought in 2014 a FAS3250 we haven't had any issues until late january 2015. We've upgrade from 8.1.2P3 to 8.2.3 on this particular single head system. Now some volumes or better said subdirectories can't be backed up anymore.

This particular volume has over 200 million files (inodes) about 13TB of space used. We've already split it up so we can actually back it up. Now a subdirectory with 33 million files is failing and we are running into the same problem mentioned above again. Means after 8 hours Symantec NetBackup has no progress and is still in phase 1 and 0 files backed up and cancels the job.

Previous attempts with getting this problem solved via NetApp case resulted in upgrading head but the volume hasn't really grown on files since than. Has anyone else had issues after upgrading to 8.2.3 with their backup?

 

Thanks!

 

4 REPLIES 4

axsys
8,144 Views

from the ndmp backup log on the filer this looks like this:

 

dmp Wed Feb 11 09:16:06 CET 2015 /vol/volumename/nfs/u01/axn/staticfileservice/files/PL(1) Start (Level 0, NDMP)
dmp Wed Feb 11 09:16:06 CET 2015 /vol/volumename/nfs/u01/axn/staticfileservice/files/PL(1) Options (b=0, u)
dmp Wed Feb 11 09:16:06 CET 2015 /vol/volumename/nfs/u01/axn/staticfileservice/files/PL(1) Filesystem (/vol/volumename/nfs/u01/axn/staticfileservice/files/PL)
dmp Wed Feb 11 09:16:06 CET 2015 /vol/volumename/nfs/u01/axn/staticfileservice/files/PL(1) Granularity (subtree)
dmp Wed Feb 11 09:16:06 CET 2015 /vol/volumename/nfs/u01/axn/staticfileservice/files/PL(1) Snapshot (fil-01-dr(2014870175)_volumename.57966, Wed Feb 11 09:00:04 CET 2015)
dmp Wed Feb 11 09:16:06 CET 2015 /vol/volumename/nfs/u01/axn/staticfileservice/files/PL(1) Tape_open (ndmp)
dmp Wed Feb 11 09:16:09 CET 2015 /vol/volumename/nfs/u01/axn/staticfileservice/files/PL(1) Phase_change (I)
dmp Wed Feb 11 17:15:58 CET 2015 /vol/volumename/nfs/u01/axn/staticfileservice/files/PL(1) Error (job aborted)
dmp Wed Feb 11 17:15:58 CET 2015 /vol/volumename/nfs/u01/axn/staticfileservice/files/PL(1) Tape_close (ndmp)
dmp Wed Feb 11 17:15:58 CET 2015 /vol/volumename/nfs/u01/axn/staticfileservice/files/PL(1) Abort (0 MB)
dmp Wed Feb 11 17:15:58 CET 2015 /vol/volumename/nfs/u01/axn/staticfileservice/files/PL(1) Error (DUMP IS ABORTED)

 

on the netbackup job status:

02/10/2015 18:43:30 - Info bpbrm (pid=24745) start bpbkar on client
02/10/2015 18:43:30 - Info bpbkar (pid=25005) Backup started
02/10/2015 18:43:30 - Info bpbrm (pid=24745) Sending the file list to the client
02/10/2015 18:43:48 - Info ndmpagent (pid=25005) fil-01-dr: DUMP: using "/vol/volumename/../fil-01-dr(2014870175)_volumename.57905" snapshot.
02/10/2015 18:43:55 - Info ndmpagent (pid=25005) fil-01-dr: DUMP: Using subtree dump
02/10/2015 18:44:04 - Info ndmpagent (pid=25005) fil-01-dr: DUMP: Date of this level 0 dump: Tue Feb 10 17:45:04 2015.
02/10/2015 18:44:04 - Info ndmpagent (pid=25005) fil-01-dr: DUMP: Date of last level 0 dump: the epoch.
02/10/2015 18:44:04 - Info ndmpagent (pid=25005) fil-01-dr: DUMP: Dumping /vol/volumename/nfs/u01/axn/staticfileservice/files/PL to NDMP connection
02/10/2015 18:44:04 - Info ndmpagent (pid=25005) fil-01-dr: DUMP: mapping (Pass I)[regular files]
02/11/2015 02:03:43 - end writing; write time: 8:15:20
02/11/2015 02:43:47 - Error ndmpagent (pid=25005) aborting operation - no mover progress
02/11/2015 02:43:47 - Error ndmpagent (pid=25005) NDMP backup failed, path = /vol/volumename/nfs/u01/axn/staticfileservice/files/PL
02/11/2015 02:43:47 - Info ndmpagent (pid=25005) fil-01-dr: DUMP: job aborted
02/11/2015 02:43:47 - Error ndmpagent (pid=25005) fil-01-dr: DUMP: DUMP IS ABORTED
02/11/2015 02:43:54 - Error ndmpagent (pid=25005) fil-01-dr: DATA: Operation terminated (for /vol/volumename/nfs/u01/axn/staticfileservice/files/PL).
02/11/2015 02:58:33 - Error bpbrm (pid=25003) socket read failed: errno = 62 - Timer expired
02/11/2015 02:58:33 - Info bpbrm (pid=24745) sending message to media manager: STOP BACKUP fil-01-dr_1423586903

 

Don't get confused by the time job time, I restarted the job several times and have always failed with the same issue.

saranraj456
7,882 Views

did you got an solution to this ?

 

we are also having a similar issue , but we have multiple dump snapshots which is not cleared .

 

we are trying to clear the snapshots and rerun the backup, i will keep you posted on that.

 

Thanks

Saran

axsys
7,867 Views

@saranraj456 wrote:

did you got an solution to this ?

 

we are also having a similar issue , but we have multiple dump snapshots which is not cleared .

 

we are trying to clear the snapshots and rerun the backup, i will keep you posted on that.

 

Thanks

Saran


Hi Saran,

 

Sorry for not stating the fix here previously. The problem was kinda solved when we started to backup the top level of the volumes. The subdirectory walk just takes too long. I was advised to take top level means: "/vol/volumename" then full backups and incremental did work again. Generally if you have volumes +100 million inodes bigger than 10TB try to split them up (FAS32xx here could imagine it be higher with a FAS8xxx). I think netapp is quite at a limit and has no "real" solution for the often asked media disrupt for example on to tape drives.
My advise if you run into this problem, get in touch with netapp (open a case) and they will do perfstat and other checks to analyse the backup.


I set my hopes onto snapmirror to tape as they will (or have already) introduced single file restore.

saranraj456
7,785 Views

we got a fix once we restart the NDMP services and cleared the orphan snapshots .

 

thanks

Saran

Public