Solved: Snap Creator fails to delete SnapShots because of SnapMirror

danmalcor · ‎2011-11-08

We have not discovered a pattern here. There are 12 different DBs being backed up, and every other night or so one or two of them will faile (with a "Critical" error) on failure to delete a snapshot with the reason "snapmirror":

# cat -n PDB1d.out.20111102190002

...

1698 ########## Running NetApp Snapshot Delete on Primary anpdfil2 ##########

1699 [Wed Nov 2 19:00:42 2011] WARN: More than 14 NetApp snapshots exist, older snapshots of anpdfil2:OraPdb1Data will be automatically deleted!

1700 [Wed Nov 2 19:00:42 2011] WARN: Deleting NetApp Snapshot PDB1-daily_20111015190000 on anpdfil2:OraPdb1Data

1701 [Wed Nov 2 19:00:42 2011] DEBUG: ZAPI REQUEST

1702 <snapshot-delete>

1703 <volume>OraPdb1Data</volume>

1704 <snapshot>PDB1-daily_20111015190000</snapshot>

1705 </snapshot-delete>

1706

1707 [Wed Nov 2 19:00:42 2011] DEBUG: ZAPI RESULT

1708 <results status="failed" errno="16" reason="snapmirror"></results>

1709

1710 [Wed Nov 2 19:00:42 2011] ZAPI: snapmirror

1711 [Wed Nov 2 19:00:42 2011] ERROR: [scf-00013] NetApp Snapshot Delete of PDB1-daily_20111015190000 on anpdfil2:OraPdb1Data failed! Exiting

We are using SC to kick off the SnapMirror. This cleanup is the last thing SC is doing, and it does seem to eventually clean them up the next run (if I don't clean it up manually).

Any ideas where to look next?

:-Dan

ktenzer · ‎2011-11-08

Hi Dan,

This can happen if SC is too fast and ontap has volume locked due to snapmirror, volume is locked for a second or so when update is started.

Easiest thing to do is add a "sleep 30" as a POST_NTAP_DATA_TRANSFER_CMD

POST_NTAP_DATA_TRANSFER_CMD=sleep 30

If your using agent command gets sent there so make sure you add sleep to agent.conf if it is SC 3.4 then something like tgis in agent.conf:

command: sleep

In SC 3.5 this issue should be fixed but it hasnt been released yet on NOW site, Jan 12 2012

Regards,

Keith

View solution in original post

ktenzer · ‎2011-11-08

Hi Dan,

This can happen if SC is too fast and ontap has volume locked due to snapmirror, volume is locked for a second or so when update is started.

Easiest thing to do is add a "sleep 30" as a POST_NTAP_DATA_TRANSFER_CMD

POST_NTAP_DATA_TRANSFER_CMD=sleep 30

If your using agent command gets sent there so make sure you add sleep to agent.conf if it is SC 3.4 then something like tgis in agent.conf:

command: sleep

In SC 3.5 this issue should be fixed but it hasnt been released yet on NOW site, Jan 12 2012

Regards,

Keith

danmalcor · ‎2011-11-08

That looks like it works, only time will tell for sure, but it sure looks like this will make a difference. Thanks!

[Tue Nov 8 10:39:56 2011] INFO: Running NetApp Snapmirror Update on destination aobdnas2:OraTnetData source anpdnfg2-e0b:OraTnetData

[Tue Nov 8 10:40:17 2011] INFO: NetApp Snapmirror Update on destination aobdnas2:OraTnetData Started Successfully

[Tue Nov 8 10:40:17 2011] INFO: Running post netapp data transfer command POST_NTAP_DATA_TRANSFER_CMD01 [sleep 30]

[Tue Nov 8 10:40:47 2011] INFO: Running post netapp data transfer command [sleep 30] finished successfully

[Tue Nov 8 10:40:47 2011] WARN: More than 14 NetApp snapshots exist, older snapshots of anpdfil2:OraTnetData will be automatically deleted!

[Tue Nov 8 10:40:47 2011] WARN: Deleting NetApp Snapshot TNET-daily_20111023200000 on anpdfil2:OraTnetData

[Tue Nov 8 10:40:49 2011] INFO: NetApp Snapshot Delete of TNET-daily_20111023200000 on anpdfil2:OraTnetData completed Successfully

[Tue Nov 8 10:40:49 2011] WARN: More than 14 NetApp snapshots exist, older snapshots of anpdfil2:OraTnetData will be automatically deleted!

[Tue Nov 8 10:40:49 2011] WARN: Deleting NetApp Snapshot TNET-daily_20111022200000 on anpdfil2:OraTnetData

[Tue Nov 8 10:40:52 2011] INFO: NetApp Snapshot Delete of TNET-daily_20111022200000 on anpdfil2:OraTnetData completed Successfully