Data Backup and Recovery
Data Backup and Recovery
2-node Exchange DAG (built on vSphere):
SME - 6.0.4.1008
SDW - 7.0.1
Windows OS - 2008 R2 EE SP1
ONTAP - 8.1.2P4
FCP
Stale LUN left attached to DAG node (verification node) after failed verification...
Storage System Name: filer01
LUN Path: /vol/sdw_cl_vol_maildata_DAGNODE1_System_0/lun_maildata_DAGNODE1_System
LUN Size: 60 GB
Volume Name: \\?\Volume{22b06c02-dc83-11e4-ba5c-0050569c4f7e}\
Backed by Snapshot path: /vol/vol_maildata_DAGNODE1_System/.snapshot/exchsnap_maildata_DAGNODE1_04-09-2015_07.22.09/lun_maildata_DAGNODE1_System
SnapMirror: No
Disk Type: Disk in Snapshot Read Write
Disk Serial Number: B7sUt+C0gEVP
PartitionType: GPT
Restore Status: Normal
RDM File Name : [NetApp SATA - vmware Servers Config Files] DAGNODE1 (Dag Server)/DAGNODE1 (Dag Server)_SD_filer01_B7sUt+C0gEVP_0.vmdk
Data Store Name : NetApp SATA - vmware Servers Config Files
I know how to clean this up but am looking for an RCA, as customer has seen this occur a number of times now and would like to get to the bottom of why it is happening.
Looking in the SME log for the failed verification job we see the failure occurs sometime between 10:35-10:37:
SME Log...
[10:25:56.250] [DAGNODE1] Running Integrity Verification using CheckSgFiles API
[10:25:56.609] [DAGNODE1] File: C:\Program Files\NetApp\SnapManager for Exchange\SnapMgrMountPoint\MPDisk114\Exchange Mailstores\System Management\System Management.edb
...
[10:35:22.388] [DAGNODE1] Operation completed successfully in 566.174 seconds.
[10:35:22.388] [DAGNODE1] Dismounting LUN [C:\Program Files\NetApp\SnapManager for Exchange\SnapMgrMountPoint\MPDisk114] of Snapshot [exchsnap__mail-DAGNODE1_04-09-2015_07.22.09]...
[10:37:01.317] [DAGNODE1] [SnapDrive Error]:Failed to delete disk in virtual machine, Failed to delete virtual disk: Unable to connect to the remote server.
(SnapDrive Error Code: 0xc0040414)
[10:37:14.530] [DAGNODE1] [SnapDrive Error]:The LUN may not be connected, because its mount point cannot be found.
(SnapDrive Error Code: 0xc0041085)
[10:37:14.530] [DAGNODE1] SnapDrive failed to dismount the snapshot.
[10:37:15.013] [DAGNODE1] Unknown Error, Error Code: 0xc0041085
[10:37:15.013] [DAGNODE1] Re-trying to force dismounting LUN...
[10:37:15.294] [DAGNODE1] [SnapDrive Error]:The LUN with serial number 'B7sUt+C0gEVP' is not found on the system
(SnapDrive Error Code: 0xc0040374)
[10:37:15.294] [DAGNODE1] SnapManager will pause 70 seconds after force dismount, please wait...
[10:38:25.303] [DAGNODE1] SnapDrive failed to dismount the snapshot.
[10:38:25.303] [DAGNODE1] [SnapDrive Error]:The LUN with serial number 'B7sUt+C0gEVP' is not found on the system
Going to SDW debugs to try and track for more detail and it looks like the mount point removal occurs (although there is still an empty mount point dir left in that location?) but RDM removal fails:
SDW Debug log...
04/09-10:22:46.643 | PID:4744 | TID:2344 | SnapShot.cpp@4151 | MountSnapShotInternal: mch='MAIL-DAGNODE1', livedsk='G', snap='exchsnap__mail-DAGNODE1_04-09-2015_07.22.09', bwritble=1, bAuto=0, mntPtPath='C:\Program Files\NetApp\SnapManager for Exchange\SnapMgrMountPoint\MPDisk114', smdest=0, dfile='none', dvol='none', mcalback=-1, shared='0', clustergroup='none', SnapVaultSecondaryConnect='0', IgnoreCloneLicense='0' |
..
04/09-10:25:55.844 | PID:4744 | TID:2344 | DrvLetEdit.cpp@708 | Start addVolumeMountPoint(), volumeName:'\\?\Volume{22b06c02-dc83-11e4-ba5c-0050569c4f7e}\', mount point:'C:\Program Files\NetApp\SnapManager for Exchange\SnapMgrMountPoint\MPDisk114\' |
..
04/09-10:35:29.251 | PID:4744 | TID:9132 | DrvLetEdit.cpp@673 | Start removeVolumeMountPoint(), volumeName:'\\?\Volume{22b06c02-dc83-11e4-ba5c-0050569c4f7e}\', mount point:'C:\Program Files\NetApp\SnapManager for Exchange\SnapMgrMountPoint\MPDisk114\' |
..
04/09-10:35:29.251 | PID:4744 | TID:9132 | DrvLetEdit.cpp@677 | Finish removeVolumeMountPoint() |
..
04/09-10:35:30.297 | PID:4744 | TID:9132 | FCPVdisk.cpp@4325 | CFCPVdisk::DeleteRawDeviceMapping() Lun Serial No=B7sUt+C0gEVP | |
04/09-10:35:58.063 | PID:4744 | TID:5156 | System.cpp@4738 | MonitorCacheThrdProc(): remaining time=9 minute(s) | |
04/09-10:37:01.302 | PID:4744 | TID:9132 | FCPVdisk.cpp@4345 | Failed to delete the virtual Disk through raw device mapping, , error code: '0x80004005', error description: 'Failed to delete virtual disk: Unable to connect to the remote server' |
Same error again, no more detail.
Nothing further in the Windows Event logs surround this event to indicate connectivity issues, or other. Same with Filer logs.
So the questions I'm currently looking for answers on are:
1. "Unable to connect to remote server" - is the remote server vCenter?
2. Any bugs / known issues that could relate to this - I've trawled support resources and can't find anything to pin this on
Solved! See The Solution
I have decripted the file and here is what I could see:
here is when we successfully connected the lun during verification:
VirtualizationAPILog Critical: 1 : 04/09-10:23:49.772 PID:4744 TID:2344 @-1 CEsxServer::MapVirtualDisk() - Successfullly mapped LUN for VM
then we tried to delete the RDM lun and I think we partially managed that:
VirtualizationAPILog Warning: 3 : 04/09-10:35:30.297 PID:4744 TID:9132 @-1 Start CEsxServer::GetDeleteLunInformation()
VirtualizationAPILog Warning: 3 : 04/09-10:35:30.297 PID:4744 TID:9132 @-1 The incoming lun Serial No is B7sUt+C0gEVP
VirtualizationAPILog Critical: 1 : 04/09-10:36:09.060 PID:4744 TID:9132 @-1 CEsxServer::ExecuteVIMAPI() - Name: WebException
]tMessage: The underlying connection was closed: A connection that was expected to be kept alive was closed by the server.
]tTrace: at System.Web.Services.Protocols.WebClientProtocol.GetWebResponse(WebRequest request)
at System.Web.Services.Protocols.HttpWebClientProtocol.GetWebResponse(WebRequest request)
at System.Web.Services.Protocols.SoapHttpClientProtocol.Invoke(String methodName, Object[] parameters)
at VimApi.VimService.RetrieveProperties(ManagedObjectReference _this, PropertyFilterSpec[] specSet)
at VirtualizationAPI.CEsxServer.ExecuteVIMAPI(VIMSDKAPI vimAPI, Object[] Parameters)
VirtualizationAPILog Critical: 1 : 04/09-10:36:35.158 PID:4744 TID:9132 @-1 CEsxServer::ConnectToEsxServer() - Failed to connect to ESX server the exception is - Unable to connect to the remote server
VirtualizationAPILog Critical: 1 : 04/09-10:37:01.224 PID:4744 TID:9132 @-1 CEsxServer::ConnectToEsxServer() - Failed to connect to ESX server the exception is - Unable to connect to the remote server
VirtualizationAPILog Critical: 1 : 04/09-10:37:01.239 PID:4744 TID:9132 @-1 CEsxServer::ConnectToEsxServer() - Failed to connect to ESX server the exception is - Unable to connect to the remote server
VirtualizationAPILog Critical: 1 : 04/09-10:37:01.255 PID:4744 TID:9132 @-1 CEsxServer::ExecuteVIMAPI() - Caught Exception:
Name: WebException
]tMessage: Unable to connect to the remote server
]tTrace: at VirtualizationAPI.CEsxServer.ConnectToEsxServer()
at VirtualizationAPI.CEsxServer.ExecuteVIMAPI(VIMSDKAPI vimAPI, Object[] Parameters)
VirtualizationAPILog Critical: 1 : 04/09-10:37:01.255 PID:4744 TID:9132 @-1 CEsxServer::GetObjectProperties():
VirtualizationAPILog Critical: 1 : 04/09-10:37:01.255 PID:4744 TID:9132 @-1 Caught Exception :in GetDeleteLunInformation() Name : WebException Message : Unable to connect to the remote server Trace : at VirtualizationAPI.CEsxServer.GetObjectProperties(ManagedObjectReference mobj, String[] properties)
at VirtualizationAPI.CEsxServer.GetHostStorageDeviceInfo()
at VirtualizationAPI.CEsxServer.GetDeleteLunInformation(String strLunSerialNo, String& strDeviceName, String& strLUNuuid)
VirtualizationAPILog Critical: 1 : 04/09-10:37:01.255 PID:4744 TID:9132 @-1 CEsxServer::UnmapVirtualDisk() - Failed to delete the virtual disk
VirtualizationAPILog Critical: 1 : 04/09-10:37:01.271 PID:4744 TID:9132 @-1 CVirtualServer::UnmapVirtualDisk() - Failed to delete virtual disk: Exception message: Unable to connect to the remote server
VirtualizationAPILog Critical: 1 : 04/09-10:38:41.354 PID:4744 TID:9132 @-1 CEsxServer::ExecuteVIMAPI() - Name: WebException
]tMessage: Unable to connect to the remote server
however, because of the connection drop in the middle of it, we errored out.
Please check if you still download the tool from here:
http://mysupport.netapp.com/NOW/download/tools/internal/snapdrive_win_debug_decoder/
Meanwhile,
I have submited a request to check why this is not showing on http://mysupport.netapp.com/download/tools/toolchest_legacy.shtml nor http://mysupport.netapp.com/tools/index.html?TaskID=70201
I hope that clarifies it.
Domenico Di Mauro.
Hi,
the remote server can only be indeed the Vcenter server, as snapDrive needs to talk to Vcenter in order to enumerate and eliminate RDM lun's.
SnapDrive also needs to send a XZAPI call to the Storage System ultimately, but we are not this far, at the point where we throw that error.
Please check the VirtualizationAPI.log in order to see whether there is something similar to this, aroundd the failure time:
CEsxServer::ConnectToEsxServer() - Failed to connect to ESX server the exception is - Unable to connect to the remote server
If so, it may be that when the Vcenter is very busy, the connection is dropped and it needs to be re-established, if Vcenter then accepts it.
hope that helps,
Domenico Di Mauro
Thanks for the response, Domenico, much appreciated.
I suspected as much but don't have a way of decoding the VirtualizationAPI.dat from nSanity (I miss SANview!) so was unable to confirm. Unless there is tool I can utilize now (as an ASP/SSP Partner) but not aware of one? Would be great if so. If not, I've attached the log here.
Customer was carrying out ESX upgrades during this time, so would have had a lot of vMotion traffic and host reboots, etc going on, so busy vSphere is a good fit for this.
I have decripted the file and here is what I could see:
here is when we successfully connected the lun during verification:
VirtualizationAPILog Critical: 1 : 04/09-10:23:49.772 PID:4744 TID:2344 @-1 CEsxServer::MapVirtualDisk() - Successfullly mapped LUN for VM
then we tried to delete the RDM lun and I think we partially managed that:
VirtualizationAPILog Warning: 3 : 04/09-10:35:30.297 PID:4744 TID:9132 @-1 Start CEsxServer::GetDeleteLunInformation()
VirtualizationAPILog Warning: 3 : 04/09-10:35:30.297 PID:4744 TID:9132 @-1 The incoming lun Serial No is B7sUt+C0gEVP
VirtualizationAPILog Critical: 1 : 04/09-10:36:09.060 PID:4744 TID:9132 @-1 CEsxServer::ExecuteVIMAPI() - Name: WebException
]tMessage: The underlying connection was closed: A connection that was expected to be kept alive was closed by the server.
]tTrace: at System.Web.Services.Protocols.WebClientProtocol.GetWebResponse(WebRequest request)
at System.Web.Services.Protocols.HttpWebClientProtocol.GetWebResponse(WebRequest request)
at System.Web.Services.Protocols.SoapHttpClientProtocol.Invoke(String methodName, Object[] parameters)
at VimApi.VimService.RetrieveProperties(ManagedObjectReference _this, PropertyFilterSpec[] specSet)
at VirtualizationAPI.CEsxServer.ExecuteVIMAPI(VIMSDKAPI vimAPI, Object[] Parameters)
VirtualizationAPILog Critical: 1 : 04/09-10:36:35.158 PID:4744 TID:9132 @-1 CEsxServer::ConnectToEsxServer() - Failed to connect to ESX server the exception is - Unable to connect to the remote server
VirtualizationAPILog Critical: 1 : 04/09-10:37:01.224 PID:4744 TID:9132 @-1 CEsxServer::ConnectToEsxServer() - Failed to connect to ESX server the exception is - Unable to connect to the remote server
VirtualizationAPILog Critical: 1 : 04/09-10:37:01.239 PID:4744 TID:9132 @-1 CEsxServer::ConnectToEsxServer() - Failed to connect to ESX server the exception is - Unable to connect to the remote server
VirtualizationAPILog Critical: 1 : 04/09-10:37:01.255 PID:4744 TID:9132 @-1 CEsxServer::ExecuteVIMAPI() - Caught Exception:
Name: WebException
]tMessage: Unable to connect to the remote server
]tTrace: at VirtualizationAPI.CEsxServer.ConnectToEsxServer()
at VirtualizationAPI.CEsxServer.ExecuteVIMAPI(VIMSDKAPI vimAPI, Object[] Parameters)
VirtualizationAPILog Critical: 1 : 04/09-10:37:01.255 PID:4744 TID:9132 @-1 CEsxServer::GetObjectProperties():
VirtualizationAPILog Critical: 1 : 04/09-10:37:01.255 PID:4744 TID:9132 @-1 Caught Exception :in GetDeleteLunInformation() Name : WebException Message : Unable to connect to the remote server Trace : at VirtualizationAPI.CEsxServer.GetObjectProperties(ManagedObjectReference mobj, String[] properties)
at VirtualizationAPI.CEsxServer.GetHostStorageDeviceInfo()
at VirtualizationAPI.CEsxServer.GetDeleteLunInformation(String strLunSerialNo, String& strDeviceName, String& strLUNuuid)
VirtualizationAPILog Critical: 1 : 04/09-10:37:01.255 PID:4744 TID:9132 @-1 CEsxServer::UnmapVirtualDisk() - Failed to delete the virtual disk
VirtualizationAPILog Critical: 1 : 04/09-10:37:01.271 PID:4744 TID:9132 @-1 CVirtualServer::UnmapVirtualDisk() - Failed to delete virtual disk: Exception message: Unable to connect to the remote server
VirtualizationAPILog Critical: 1 : 04/09-10:38:41.354 PID:4744 TID:9132 @-1 CEsxServer::ExecuteVIMAPI() - Name: WebException
]tMessage: Unable to connect to the remote server
however, because of the connection drop in the middle of it, we errored out.
Please check if you still download the tool from here:
http://mysupport.netapp.com/NOW/download/tools/internal/snapdrive_win_debug_decoder/
Meanwhile,
I have submited a request to check why this is not showing on http://mysupport.netapp.com/download/tools/toolchest_legacy.shtml nor http://mysupport.netapp.com/tools/index.html?TaskID=70201
I hope that clarifies it.
Domenico Di Mauro.
Looks like that's been made internal only now:
That is a useful tool, so would be good to get partner access to it, if possible, so thanks for flagging it up.
Again, you help is much appreciated, the above should be plenty to feed back to customer with.
Looks like this tool is still not up. Will we something like this in the near future? Agreed that it would be helpful for partners.