2010-09-19 04:59 PM
During a recent DR test I failed over 20 RDM Physical Mode Virtual Machines which are all using SnapDrive 6.2 and hosted on a FAS3160. I replicate at an aggressive 5 minute schedule but have had no problems pulling the data to the destination. This was a controlled fail over so I power off all my VMs issued a "snapmirror update" then after the data transfer completed issued a "snapmirror break" and brought up my VMs in my DR site. I have identically configured and provisioned VMs managed by a 2nd Virtual Center in the DR site which reside on an identically configured Snapmirror destination FAS3160. 19 of the 20 VMs booted. One VM experienced a BSOD. After unsuccessfully trying to boot the VM I decided to power the Server back up at my production source site then power it down again. I waited a few minutes and issued the snapmirror commands again after which then the VM booted without issue in the DR site on the destination Array.
The only explanation for this that I can think of is that there were WAFL / FLEXVOL related writes still in Cache on the Source FAS3160 (even thought the VM was powered OFF) resulting in an Inconsistent "Snapmirror snap" in respect to the FC luns inside the FlexVol, one of which was my Windows OS lun for the VM in Question.
I would like to know if there is a command to force de-staging of any writes pertaining to a flexvolume to disk or maybe someone can give me possible cause of the corruption and a way to prevent this moving forward.
A Snapdrive snap is no help because the VM was powered off, hence no way to issue a "SnapDrive Consistent" snap from the Guest.
2010-09-20 06:30 AM
Hi Greg, I'm not sure what caused you VM to BSOD, but it was not because there was writes still in the cache at the source that had not been written to disk yet. When a snapmirror update happens a snapshot is created on the source volume. This snapshot causes all data in cache to be commited to disk. Snapmirror then transfers that snapshot to the destination.