Snapshot delete failing - why?

ntitlow01 · ‎2010-11-29

Hello,

Occaisionally, snapcreator jobs will fail trying to delete snapshots:

Here is a snippet from the .out log:

[Thu Nov 25 09:11:44 2010] WARN: More than 14 NetApp snapshots exist, older snapshots of myfiler-b:myvol will be automatically deleted!
[Thu Nov 25 09:11:44 2010] WARN: Deleting NetApp Snapshot qb_dr-daily_20101125080102 on myfiler-b:myvol
[Thu Nov 25 09:11:45 2010] ZAPI: snapmirror
[Thu Nov 25 09:11:45 2010] ERROR: NetApp Snapshot Delete of qb_dr-daily_20101125080102 on myfiler-b:myvol failed! Exiting

From the .error file:

[Thu Nov 25 09:11:45 2010] NetApp Snapshot Delete of qb_dr-daily_20101125080102 on myfiler-b:myvol failed! Exiting at /</usr/local/scServer_v3.2/snapcreator>SnapCreator/Snap.pm line 319, <S> line 90.

No errors on the filer.

Any thoughts on what might be happening? I would also like to add an enchancement request that snapcreator pass specific error codes to the SENDTRAP comma/script. That would allow me to recover automatically in a script called via the SENDTRAP variable.

-nathan

ntitlow01 · ‎2010-11-29

My apologies - I included nothing about my configuration:

Replication is via SnapMirror between three source 3170s running 7.3.2P3 and one target running 7.3.3P5. Using SnapCreator version 3.2. Snapcreator jobs are kicked off every 5 minutes. There are 5 configurations in a single profile - I replicate each of 5 volumes in a dataset separately. This is done to parallelize the process (we don't have enough time to replicate the volumes serially and still maintain our desired RPO).

The problem occurs multiple times a day. As a test I inserted a 30-second sleep as POST_NTAP_DATA_TRANSFER_CMD01 and that seemed to mask the problem until I ran a long (> 24 hours) test. It does not seem to be load or change rate related. There also doesn't seem to be a trend for a time of day for these errors to occur.

-nathan

ktenzer · ‎2010-11-29

Hi Nathan, The issue here is that after a snapmirror update is issued there is a brief quiesce that occurs on primary volume. During this time snapshots cannot be deleted, although this quiesce happens very fast sometime snapcreator is just to quick. There are two work-around, the first you already discovered which is adding small sleep to post data transfer cmds. The other is to have snapcreator wait for snapmirror transfer. To enable that you set the clone secondary option to "Y". I think though in this case it is better to do sleep since update us running every 5 minutes. We plan on fixing this issue so sleep wont be required it was just low prio since we have a few workarounds but since customers are hitting this issue it needs to be fixed asap. I like your idea about sendtrap imorovements, adding it as feature request. Actually we may need some more info on that to capture requirement so I will ping you. Regards, Keith