Data Backup and Recovery

SMO hangs while restoring Backup

wintermute
6,309 Views

Hello,

I'm trying to restore a Backup. After the unmout of  the  NFS mount u01 on volume v20_s041_003 (there is a empty u02 on the same vol., see below) the restore of the Snapshot is hanging. Has anybody a Idea what the Problem could be or where I could get future information on the system about the cause of the hang ?

Here is the log from /var/log/sd-recover.log :

16319: BEGIN 15-08-2012 09:54:25 snapdrive snap restore -fs /AUMT/u01 -snapname smo_aumt_aumt_f_h_1_8a9a9aa03924e1b1013924e1b7280001_0 -force -noprompt

16319: BEGIN 15-08-2012 09:54:29 create snapshot:  smo_aumt_aumt_f_h_1_8a9a9aa03924e1b1013924e1b7280001_0.RESTORE_ROLLBACK_08152012_095426_16318 on n20

16319: END   15-08-2012 09:54:29 create snapshot:  smo_aumt_aumt_f_h_1_8a9a9aa03924e1b1013924e1b7280001_0.RESTORE_ROLLBACK_08152012_095426_16318 on n20,n20 successful

16319: BEGIN 15-08-2012 09:54:29 unmount file system:  /AUMT/u01

16319: END   15-08-2012 09:54:29 unmount file system:  /AUMT/u01 successful

16319: BEGIN 15-08-2012 09:54:29 restore from snapshot:  n20:/vol/v20_s041_003:smo_aumt_aumt_f_h_1_8a9a9aa03924e1b1013924e1b7280001_0

Thats output of df -k on the Server, AUMT/u01 is unmounted :

Filesystem                       size   used  avail   used%  Mounted
/vol/v20_s041_003/AUMT/u02    20G   4.5G    16G   23%   /AUMT/u02
/vol/v20_s041_004/AUMT/u00    10G   1.2G   8.8G   13%   /AUMT/u00
/vol/v20_s041_004/AUMT/u50    10G   1.2G   8.8G   13%   /AUMT/u50
/vol/v20_s041_004/AUMT/u51    10G   1.2G   8.8G   13%   /AUMT/u51
/vol/v20_s041_004/AUMT/u60    10G   1.2G   8.8G   13%   /AUMT/u60
/vol/v20_s041_005/AUMT/u99    20G   5.0G    15G   25%   /AUMT/u99

Here output fom a ps command :

root 16313 16312   0 09:54:25 ?       0:05 /usr/sbin/snapdrive snap restore -fs /AUMT/u01 -snapname smo_aumt_aumt_f_h_1_8a
root 16318 11188   0 09:54:25 ?       0:12 snapdrived start
root 26934  6996   0   Jul 24 pts/5  40:34 /usr/jre1.6.0_25/bin/java -jar snapcreator.jar
root 11188 1   0   May 08 ?      36:24 snapdrived start
root  5103 1   0   Jul 24 pts/5   0:05 ./snapcreator --start-agent 9399 -debug -verbose
root 16312 15498   0 09:54:25 ?       0:00 /bin/sh -c "/usr/sbin/snapdrive" "snap" "restore" "-fs" "/AUMT/u01" "-snapname"

And here the log from SMO :

--[ INFO] SMO-13036: Starting operation Backup Restore on host abc

--[ INFO] SMO-13046: Operation GUID 8a9a9aa03929460f0139294616320001 starting on Profile AUMT

--[ INFO] SMO-07431: Saving starting state of the database: AUMT(OPEN).

--[ INFO] SMO-07431: Saving starting state of the database: AUMT(OPEN).

--[ INFO] SMO-07127: Locked database for SnapManager operations - created lock file "/u00/app/oracle/product/11.2.0.1.2/dbs/.sm_lock_AUMT" on host krebs.

--[ INFO] ORACLE-20000: Changing state for database instance AUMT from OPEN to MOUNTED.

--[ INFO] ORACLE-20009: Attempting to reconnect to instance AUMT after shutdown/startup.

--[ INFO] ORACLE-20009: Attempting to reconnect to instance AUMT after shutdown/startup.

--[ INFO] ORACLE-20009: Attempting to reconnect to instance AUMT after shutdown/startup.

--[ INFO] ORACLE-20011: Reconnect to instance AUMT successful.

--[ INFO] SMO-07200: Beginning restore of database "AUMT".

--[ INFO] SD-00022: Querying for snapshot n20:/vol/v20_s041_003:smo_aumt_aumt_f_h_1_8a9a9aa03924e1b1013924e1b7280001_0.

--[ INFO] SD-00023: Finished querying for snapshot n20:/vol/v20_s041_003:smo_aumt_aumt_f_h_1_8a9a9aa03924e1b1013924e1b7280001_0.

--[ INFO] SD-00016: Discovering storage resources for /AUMT/u60.

--[ INFO] SD-00017: Finished storage discovery for /AUMT/u60.

--[ INFO] SD-00016: Discovering storage resources for /AUMT/u99.

--[ INFO] SD-00017: Finished storage discovery for /AUMT/u99.

--[ INFO] SD-00016: Discovering storage resources for /AUMT/u50.

--[ INFO] SD-00017: Finished storage discovery for /AUMT/u50.

--[ INFO] SD-00016: Discovering storage resources for /AUMT/u51.

--[ INFO] SD-00017: Finished storage discovery for /AUMT/u51.

--[ INFO] SD-00004: Beginning restore of filesystem(s) [/AUMT/u01] from snapshot smo_aumt_aumt_f_h_1_8a9a9aa03924e1b1013924e1b7280001_0.

1 ACCEPTED SOLUTION

himanshp
6,309 Views

Hello Eric,

I assume that you are currently using n20a in /etc/fstab for mounting the nfs volumes.

Use following snapdrive commands to configure the datapath (n20a here):

# snapdrive config set root n20

# snapdrive config set -mgmtpath n20 n20a  ( n20 will be used for management communication using NetApp APIs and n20a will be used for data exchange)

Hope this helps

Thanks

Himanshu

View solution in original post

8 REPLIES 8

aborzenkov
6,309 Views

Do you use volume restore or file restore?

wintermute
6,309 Views

That's what the SMO Preview Function shows :

Restore Plan:

  Preview:

     The following components will be restored completely via: storage side file system restore

      /AUMT/u01

When I force a File based restore it is working fine.

himanshp
6,309 Views

Can you please send the complete smo and sd-trace.log for the restore operation tried above ( with SFSR)

Also just curious to know, shouldn't matter though, are the mounted paths above recursive directories created in a Vol or a directories created inside a qtree ?

Thanks,

Himanshu

wintermute
6,309 Views

Hello Himanshu

I have uploaded a zip file with all the log's of a failed storage side file system restore. I had to run the job again to enshure to have the right logfiles.

For reference in the logs :

Restore was started at 09:50 am

My sysadm killed the snapdrive processes at 10:05 am

The smo gui, df and ps output are the same as in my first post so I refrain from posting them again.

Thanks for your help

Eric

himanshp
6,309 Views

Hello Eric,

From the logs sent it seems mount command highlighted below is taking long time and that's when operation appears to be hung in SMO.

17-08-2012 09:51:50 [312d]v,9,0,ASSISTANT EXECUTION (at 8708147.968388): umount /AUMT/u01 2>&1

17-08-2012 09:51:51 [312d]v,9,0,ASSISTANT EXECUTION (at 8708148.148446): Output:

17-08-2012 09:51:51 [312d]v,9,0,ASSISTANT EXECUTION (at 8708148.484608): mkdir -p /tmp/SDU1_08172012_095151_2374

17-08-2012 09:51:51 [312d]v,9,0,ASSISTANT EXECUTION (at 8708148.496457): Output:

17-08-2012 09:51:51 [312d]v,9,0,ASSISTANT EXECUTION (at 8708148.518998): mount -F nfs n20:/vol/v20_s041_003 /tmp/SDU1_08172012_095151_2374

Can you please try these operations manually outside the SDU/SMO and see how long does it take to mount the volume on temporary path:

1. umount /AUMT/u01 2>&1

2. mkdir -p /tmp/SDU1_08172012_095151_2374

3. mount -F nfs n20:/vol/v20_s041_003 /tmp/SDU1_08172012_095151_2374

FYI, SDU mounts the volume for the cleanup purpose before doing the restore.

Thanks,

Himanshu

wintermute
6,309 Views

Hello Himanshu,

Thank you for pointing us in the right direction !! We have tried the three commands and found out that n20:/vol/v20_s041_003 /tmp/SDU1_08172012_095151_2374 is wrong. The n20 part is the 1GB address of our netapp server and this address is not known on the DB Server. The mount should be made on the 10GB address which is n20a.

My Sysadm has added a row in /etc/hosts with n20 pointing to the IP address from n20a. With this workaround the mount works fine. It is, however, just a temporary fix as we would prefer to have SMO/SDU nowing that n20a is the Address to do the mount.

Do you have any idea where in SMO/SDU we could specify something like that ? I have browsed throug snapdrive.config but couldn't tell which keyword it could be.

Thanks eric

himanshp
6,310 Views

Hello Eric,

I assume that you are currently using n20a in /etc/fstab for mounting the nfs volumes.

Use following snapdrive commands to configure the datapath (n20a here):

# snapdrive config set root n20

# snapdrive config set -mgmtpath n20 n20a  ( n20 will be used for management communication using NetApp APIs and n20a will be used for data exchange)

Hope this helps

Thanks

Himanshu

wintermute
6,309 Views

Hello Himanshu,

The commands did solve our problem ! Thank you very much for your help !

Regards Eric

Public