2012-02-26 07:27 PM
When you say “cluster failover” do you mean NetApp takeover/giveback or host clustering?
For NetApp the usual reason is improper configuration of timeouts in host’s drivers. Another possibility is incorrect connection (e.g. no path via partner).
2012-02-27 06:11 AM
We are running 3160s for our ESX guest images, RDMs, and NAS storage. We had a bad CNA card that had to be replaced, the failover did not impact any of the services, including ESX guests, with RDMs and NAS storage. The NAS side had a couple of timeouts, but none of the LUNs reported any issues.
Drivers and settings are always important, but the HBA's drivers and setting are extremely important. We had issues with the VMs at one point when their default settings were too low, they would get timeout errors pretty regularly accessing their RDMs. So our usual steps:
1.) what is different between the working and affected hosts?
2.) drivers/settings/config checks
3.) best practices for the equipment involved checks
4.) multipathing software checks
5.) and my favorite...log scrubbing on everything to see what each piece reported(can be tedious and time consuming)
2012-02-29 06:45 AM
There really isn't anything to configure on the filer end in a cluster, so your hosts all should be running the same version of the HBA driver, have the same MPIO settings, the same firmware settings, etc.