We have a daily process that takes a snapshot of multiple databases on server A and then mounts those flexclones on Server B. The snapshot process on server A kicks off at a designated time and once it completes it starts the process on server B. The process on server B detaches the current databases, disconnects those drives, deletes the old flexclones, grabs the new snapshot, creates flexclones, mounts the new flexclones & drives up to the server and then re-attaches the databases. The process on server B is handled by a perl script that utilizes the SnapDrive/NetApp API. This all should work great but we end up having issues with server B almost weekly and to the point where we have to reboot it. The perl script and API calls become unresponsive and can't mount up the flexclones. We end up with errors like this....
ServerB: Checking input parameters
ServerB : Checking access control
ServerB : Checking policies
ServerB : Turning on space reservation
ServerB : Connecting to the LUN
Unable to connect to the LUN
Error: A timeout of 120 secs elapsed while waiting for volume arrival notification from the operating system.
Can't spawn "sdcli disk connect -m ServerB -p 172.xx.xx.xx:/vol/nb1serverbdb/lun01 -d W:\ -IG ServerB ServerB -dtype dedicated"
Server B seems to become more and more unresponsive as we try the process/script to the point where it hangs on even trying to execute the script. At that point we reboot the server and everything runs perfectly. I am confused why Server B is becoming unresponsive and what could be causing it
Let me give some details about the OS and software we are running. Both server A and B are running same software.
- Windows Server 2003 R2 64-Bit SP2
- SQL Server 2005 SP2
- SnapDrive 6.2.0.4519
- NetApp Windows Host Utilities 5.2.3297.2229
- Snap Manager for SQL Server 5.0
- Data ONTAP DSM for Windows MPIO 3.3.3298.1305
- NetApp 7.3.3P2
Is anyone else experiencing these same issues and/or have any ides what could be causing our problems?