Subscribe

Not deleted snapshot in SnapManager for Exchange causes snapmirror to fail

[ Edited ]

Hello,

I have a weird issue using SnapManager for Exchange and SnapMirror.

Environnement:

- DATA ONTAP 7.3.3

- SnapManager for Exchange 5.0

On the source filer:

- /vol/mail01_data contains 3 Exchange LUNs

On the destination filer:

- /vol/mail01_data_sm is the SnapMirror destination volume

SnapMirror is configured in volume mode.
SnapMirror scheduling is disabled in the destination filer (- - - -).

A SnapManager job is running every 2 hours, with the –updatemirror option set and no verification.

Every schedule after 22:00 is failing with error:

On the source filer:

Wed May 12 10:01:36 EDT replication.src.err: SnapMirror: source transfer from mail01_data to dest_filer:mail01_data_sm : transfer failed.

On the destination filer:

Wed May 12 10:01:36 EDT snapmirror.dst.snapDelErr: Snapshot {7cddaa60-6211-4577-9bda-20001dae5171} in destination volume mail01_data_sm is in use, cannot delete.
Wed May 12 10:01:37 EDT replication.dst.err: SnapMirror: destination transfer from source_filer:mail01_data to mail01_data_sm : replication transfer failed to complete.

And obviously, on the destination filer :

- the snap list command shows the {7cddaa60-6211-4577-9bda-20001dae5171} snapshot with (busy, LUN)

- There are 3 online LUNS :

/vol/mail01_data_sm/{16d381d8-4a5e-4b0d-ac12-7df039a7ec7e}.rws 
/vol/mail01_data_sm/{3dd2624e-56f9-49ce-b461-b3f9093883fd}.rws 
/vol/mail01_data_sm/{e7387ee3-d09d-453a-8840-69d44decbe4c}.rws 

Actually, I can clean this up by quiescing / breaking the snapmirror relationship, putting thoses LUNs offline, deleting them, deleting the snapshot and resyncing the snapmirror.

I first though that someone have mounted a snapshot at the destination filer to perform a restore with SMBR or something, but the day after the issue appeared again, in the same condition, after the 22:00 SME snapshot.

Here is the log on the source filer :

Tue May 11 22:00:00 EDT kern.uptime.filer:  10:00pm up 11 days,  3:59 0 NFS ops, 6393604 CIFS ops, 154 HTTP ops, 268591012 FCP ops, 0 iSCSI ops
Tue May 11 22:00:32 EDT [source_filer: SMBRPCWorker05:notice]: Multicreation of snapshot named {426ece1b-912d-48c1-bbcb-dd6d3e4804ea} successful. It took 162 milli seconds from start to finish in ZAPI.
Tue May 11 22:00:33 EDT lun.offline: LUN /vol/mail01_data/{4126d53b-0fb8-4af6-96d8-a745883aea3d}.aux has been taken offline
Tue May 11 22:00:33 EDT lun.destroy: LUN /vol/mail01_data/{4126d53b-0fb8-4af6-96d8-a745883aea3d}.aux destroyed
Tue May 11 22:00:33 EDT lun.offline: LUN /vol/mail01_data/{dc7a2c40-3a0d-4e0d-86cd-fcaf65e9eeed}.aux has been taken offline
Tue May 11 22:00:33 EDT lun.destroy: LUN /vol/mail01_data/{dc7a2c40-3a0d-4e0d-86cd-fcaf65e9eeed}.aux destroyed
Tue May 11 22:00:33 EDT lun.offline: LUN /vol/mail01_data/{6fc85d54-5730-4345-adc9-9cfe113e8991}.aux has been taken offline
Tue May 11 22:00:34 EDT lun.destroy: LUN /vol/mail01_data/{6fc85d54-5730-4345-adc9-9cfe113e8991}.aux destroyed
Tue May 11 22:00:51 EDT [source_filer: SMBRPCWorker09:notice]: Multicreation of snapshot named {7cddaa60-6211-4577-9bda-20001dae5171} successful. It took 222 milli seconds from start to finish in ZAPI.
Tue May 11 22:00:55 EDT lun.offline: LUN /vol/mail01_data/{3dd2624e-56f9-49ce-b461-b3f9093883fd}.aux has been taken offline
Tue May 11 22:00:55 EDT lun.destroy: LUN /vol/mail01_data/{3dd2624e-56f9-49ce-b461-b3f9093883fd}.aux destroyed
Tue May 11 22:00:55 EDT lun.offline: LUN /vol/mail01_data/{16d381d8-4a5e-4b0d-ac12-7df039a7ec7e}.aux has been taken offline
Tue May 11 22:00:55 EDT lun.destroy: LUN /vol/mail01_data/{16d381d8-4a5e-4b0d-ac12-7df039a7ec7e}.aux destroyed
Tue May 11 22:00:55 EDT lun.offline: LUN /vol/mail01_data/{e7387ee3-d09d-453a-8840-69d44decbe4c}.aux has been taken offline
Tue May 11 22:00:55 EDT lun.destroy: LUN /vol/mail01_data/{e7387ee3-d09d-453a-8840-69d44decbe4c}.aux destroyed
Tue May 11 22:00:56 EDT lun.offline: LUN /vol/mail01_data/{3dd2624e-56f9-49ce-b461-b3f9093883fd}.rws has been taken offline
Tue May 11 22:00:57 EDT lun.map: LUN /vol/mail01_data/{3dd2624e-56f9-49ce-b461-b3f9093883fd}.rws was mapped to initiator group viaRPC.21:00:00:xx:xx:xx:xx:xx.mail01=14
Tue May 11 22:01:01 EDT lun.offline: LUN /vol/mail01_data/{16d381d8-4a5e-4b0d-ac12-7df039a7ec7e}.rws has been taken offline
Tue May 11 22:01:01 EDT lun.map: LUN /vol/mail01_data/{16d381d8-4a5e-4b0d-ac12-7df039a7ec7e}.rws was mapped to initiator group viaRPC.21:00:00:xx:xx:xx:xx:xx.mail01=15
Tue May 11 22:01:03 EDT lun.offline: LUN /vol/mail01_data/{e7387ee3-d09d-453a-8840-69d44decbe4c}.rws has been taken offline
Tue May 11 22:01:03 EDT lun.map: LUN /vol/mail01_data/{e7387ee3-d09d-453a-8840-69d44decbe4c}.rws was mapped to initiator group viaRPC.21:00:00:xx:xx:xx:xx:xx.mail01=16
Tue May 11 22:01:17 EDT wafl.snap.delete: Snapshot copy exchsnap__mail01_05-11-2010_08.00.12 on volume mail01_data IBM was deleted by the Data ONTAP function zapi_snapshot_delete. The unique ID for this Snapshot copy is (218, 2975623).
Tue May 11 22:01:30 EDT wafl.snap.delete: Snapshot copy @snapmir@{02F0A978-3657-4048-B4CE-F1C535FBCB1B} on volume mail01_data IBM was deleted by the Data ONTAP function zapi_snapshot_delete. The unique ID for this Snapshot copy is (240, 2979422).
Tue May 11 22:01:48 EDT app.log.info: mail01: SME Version 5.0: (144) Backup: SnapManager for Exchange online backup successfully completed.  (Exchange Version 8.1 (Build 240.6))  Number of storage groups backed up successfully: 1.
Tue May 11 22:17:05 EDT
lun.map.unmap: LUN /vol/mail01_data/{3dd2624e-56f9-49ce-b461-b3f9093883fd}.rws unmapped from initiator group viaRPC.21:00:00:xx:xx:xx:xx:xx.mail01
Tue May 11 22:17:05 EDT lun.destroy: LUN /vol/mail01_data/{3dd2624e-56f9-49ce-b461-b3f9093883fd}.rws destroyed
Tue May 11 22:17:05 EDT lun.map.unmap: LUN /vol/mail01_data/{16d381d8-4a5e-4b0d-ac12-7df039a7ec7e}.rws unmapped from initiator group viaRPC.21:00:00:xx:xx:xx:xx:xx.mail01
Tue May 11 22:17:05 EDT lun.destroy: LUN /vol/mail01_data/{16d381d8-4a5e-4b0d-ac12-7df039a7ec7e}.rws destroyed
Tue May 11 22:17:05 EDT lun.map.unmap: LUN /vol/mail01_data/{e7387ee3-d09d-453a-8840-69d44decbe4c}.rws unmapped from initiator group viaRPC.21:00:00:xx:xx:xx:xx:xx.mail01
Tue May 11 22:17:05 EDT lun.destroy: LUN /vol/mail01_data/{e7387ee3-d09d-453a-8840-69d44decbe4c}.rws destroyed
Tue May 11 22:17:06 EDT wafl.snap.delete: Snapshot copy
{7cddaa60-6211-4577-9bda-20001dae5171} on volume mail01_data IBM was deleted by the Data ONTAP function zapi_snapshot_delete. The unique ID for this Snapshot copy is (249, 2981390).
Tue May 11 23:00:00 EDT kern.uptime.filer:  11:00pm up 11 days,  4:59 0 NFS ops, 6413653 CIFS ops, 154 HTTP ops, 268976664 FCP ops, 0 iSCSI ops

We can see that everything has happened normally on the source : our LUNs have been unmapped and destroyed successfully and the snapshot has been deleted (in blue).

But why those LUNs and the containing snapshot are still presents at the destination ?. .. with the consequence to prevent the next snapmirror schedules to work properly.

In a volume snapmirror relationship, all the volume state is mirrored. So I suppose  it’s normal for the LUNs in a destination snapshot to be online if it has been online in the source at the moment of the snapmirror.

Hum... actually I have a clue (in red); here is the SME log:

[22:01:22.689]  *** REQUEST SNAPMIRROR UPDATE AFTER BACKUP ***

[22:01:22.689]  Starting SnapMirror update...

[22:01:22.689]  Querying LUN list for SnapMirror update...

[22:01:22.690]  Requesting SnapMirror update for LUN [F] on volume [mail01_data] of filer [source_filer]...

[22:01:36.884]  Request for SnapMirror update for LUN [F] was completed successfully.

[22:01:36.884]  Requesting SnapMirror update for LUN [S] on volume [mail01_silogs] of filer [other_source_filer]...

[22:01:44.527]  Request for SnapMirror update for LUN [S] was completed successfully.

[22:01:48.785]  Number of storage groups backed up successfully: 1.

[22:01:48.786]  Storage system AutoSupport is not sent because all operations completed successfully.

[22:01:48.786]  Backup of storage groups successfully completed

[22:01:48.786]  ***SNAPMANAGER BACKUP SUBJOB ENDED AT: [05-11-2010 22.01.48]

[22:01:48.786]  ***SUBJOB #1: END OF BACKUP

It seems that SME requests for a snapmirror update at 22:01 but our LUNs have been unmapped and destroyed from the source at 22:17 !


Has anyone already seen this issue or did I miss something ?

Thank you.

Olivier

Re: Not deleted snapshot in SnapManager for Exchange causes snapmirror to fail

Actually, I've found that the "ghost" snapshot and mapping is performed by SMVI, which was originaly sheduled at the same time (22:00).

Setting it at 22:30 solved my issue.

I forgot to mention that the server is a VM and the LUNs are RDM.

But I'm still wondering why the SMVI scheduled snapshot is mapping the LUNs...