2014-05-18 02:31 PM
We've been working on this issue for a month now and have tickets open with Netapp and Microsoft but we have not been able to resolve the issue yet. We have issues in WIN cluster after data migration from EMC storage to NetApp storage. We have migrated data with FalconStor NSS device. We have problems in 2 Windows clusters (2008 and 2003), single hosts are working OK.
After mapping NetApp LUN's to hosts they do not restart properly and hang on black screen. In the case when windows not start and it goes to "black screen" the server is still pingable. If we unplug the FC cables from HBA, they restart OK and then we can plug them into cluster. Other workaround is unmapping LUNs on netapp, boot the host and mapping LUNs back to server. After that hosts see all LUNs from NetApp normally. Sometimes also happens that hosts boot normally with more than half of the LUNs (we have unmapped some LUNs). NetApp software (DSM, HUK) and HBA drivers information are a supported configuration and we have removed all EMC software from this servers.
Does anybody has any idea or hint why this is happening? We want to find the reason for such strange behaviour.
Could the issue be somehow related with EMC software that was installed on this hosts before?
2014-05-19 05:06 AM
Please provide whate type of controllers and version of ontap.
Also, are you running netapp MPIO software on your clusters?
I'm not familiar with falconstor, but perhaps someone else can assist
2014-05-19 05:48 AM
The system is:
We have installed:
Data ONTAP DSM
Previosuly this EMC software was installed:
EMC PowerPath LAM for Invista
EMC PowerPath LAM for Generic
EMC PowerPath LAM for Symmetrix
EMC PowerPath LAM for CLARiiON
Microsoft Multi-Path Device Specific Module
Microsoft Multi-Path Bus Driver
2014-05-20 01:32 AM
Given this is Metrocluster this opens up all sorts of possible scenario's.
Are the LUNs being mounted on the Local or Remote Controller?
Is the connectivity between Controllers and Shelves working correctly?
Are you presenting via SnapDrive? (Highly recommended)
2014-05-20 01:40 AM
LUNs are being mounted from both locations. The reason here is that one metrocluster site has bigger SATA disks and other site faster SAS disks. They put LUNs from WIN cluster which need better performance on faster SAS disks and other on SATA disks. Also they have a lot more space on location with SATA disks. It's a matter of design – customer insisted on this configuration.
We do not use snapdrive, connectivity is OK - it was confirmed from netapp support.
Now it is also happens that server will boot few times normally and then several times will not boot and so on.
2014-05-20 01:48 AM
Lots of dependencies on the inter-site communication for normal production then.
My best best would be that latency is causing the problem. I'd be looking at the HBA logs and windows logs to see what's being posted.
Might be worth ripping off Microsoft and NetApp MPIO, rebooting and reinstalling
Also unless you are using the likes of CommVault/SnapProtect I'd present the LUNs via SnapDrive