Snapdrive cannot connect one specific LUN

Dry_Den · ‎2015-10-13

Hi all.

Just updated to Snapdrive 7.0.3 from 5.2 (it's a long story...)

We have some scripted overnight jobs which created cloned volumes based on SMSQL snapshots and attach these to various SQL servers. Multiple servers run essentially the same script. Since upgrading Snapdrive one job on one server has started failing.

The job disconnects the destination volumes. Removes the LUNS that were disconnected. Then creates new cloned LUNS and connects them. One LUN is the SQL data, the other is the SQL logs, each is in a volume of it's own. The part that fails is the connecting of the log LUN, bearing in mind that the data LUN has connected successfully in the previous step.

The error on the first run of the job is:

Failed to connect to LUN. Failure in connecting to the LUN.

LUN Name = servername_lun_LiveLogs
Storage Path = /vol/servername_clone_servername2_vol_Logs_Snapshot/
Protocol Type = HTTPS
Storage System Name = Filer2
Requested Mount Point = C:\VOLUMEMOUNTPOINTS\SNAPSHOT\LOGS\
Assigned Mount Point = C:\VOLUMEMOUNTPOINTS\SNAPSHOT\LOGS\

Error code : Timeout has occurred while waiting for disk arrival notification from the operating system.

The log volume looks as if it's been created properly if you check System Manager.

If I rerun the job immediately it fails, I get a bunch of errors trying to connect the same lun:

Error description: LUN is not recognized by Windows. Possible reasons: this disk is not formatted, or this disk is under delete/disconnect/restore operation.

If I bounce the SnapDrive service and try again, it runs fine.

So it looks like something in the process of either cloning the volume or attaching the volume stalls but I'm not sure what. There's another job which runs right before this one which clones a different database in the same way, and that runs fine.

Any thoughts?

dmauro · ‎2015-10-14

Hi,

this is a common issue in misconfigured environments.

here is a KB which may help you:

https://kb.netapp.com/support/index?page=content&id=3014069&locale=en_US&access=s

in short, disk arrival timeout means the blocks of the SAN did not arrive to the upper layer of the OS where Volume manager sits. So before Volume Manager even tries to find a volume within it, because the notification that the disk is arrived did not get that far yet.

SnapDrives won't wait for ever, so we have a tunable timeout, however it is advicable that you fix the issue rather than work it around by increasing the timeout in the registry.

Typically, this can be a issue with misconfigured Fabric, multihomed filers sending zapi responses to SnapDrive using different interfaces, unsupported MPIO components like DSM version, HBA firmware and drivers.

you need to check also the IMT in order to find out if you have a supported SAN configuration.

hope that helps,

Domenico Di Mauro