Cluster Failover - VMware NFS/iSCSI

jasonnash · ‎2011-08-18

Hi All,

A have a few queries, we have a vSphere 4 setup with a Netapp Fas2040 dual controller serving NFS datastores and some iSCSI RDMs. I need to ensure the correct failover operation in the event of controller failover or problem with a network switch. I have the following queries.

1. Controller failover is configured and I know from initial testing that we can correctly between controllers. I have not attempted this with live virtual machines running. What do I need to do to ensure that VMs running from NFS datastores (some with iSCSI RDMs) continue to operate in the event of a controller failover. I set all the recommended values within VMware using the Netapp VSC and on some VMs I have ran the guest OS tools/script from Netapp to set the SCSI I/O timeouts. Is this all that needs to be done to ensure VMs continue to run during a controller failover?

2. Currently both controllers have trunked interfaces for the storage network, both interfaces of controller A go into switch A and both interfaces of controller B go into switch B. These switches than connect to our vSphere hosts, connectiviy from both switches is redundant on the VMware side. I would like to configure Netapp cluster failover to protect against network switch failure (i.e both interfaces in the storage vif go down). I have seen some details around the commands to achieve this but I am not 100% sure on the required commands. I should also mention the other two interfaces on the controllers are trunked and go to a different switch and are used for CIFS traffic. I only want cluster failover to occur if the two interfaces that make up the storage vif fail, I don't mind about the CIFS interfaces as they connect to a redundant switch anyway. Is this possible to only failover in the event of specific interface failure?

Hope this makes sense, can anyone assist?

Thanks

Jason

rwelshman · ‎2011-08-19

1) Yes, as long as the settings within Vmware for the datastores are set as recommended by NetApp and the VMs are runnnig tools which has set the SCSI timeout to at least 60 seconds, they should survive. (Just did this last night on a 6070 filer with over 370 VMs running).

2) If you want to split the connections between Switch A and Switch B (instead of just going to switch A or switch B), then you'll have to change the VIF from a multimode (i'm assuming it is multimode) to a single mode (unless the switches you are using can port channel across multiple switches?) If you do have to change the VIF type, it will require removing the old VIF and re-creating a new one - which will be service interruption, although you can cut and paste the commands in and have it complete fairly quickly if all goes well. What sort of constraints are you under in terms of outages?

jasonnash · ‎2011-08-19

Thanks Riley exactly what I was after.

1) All timeouts look good so as you said this should work. Think I will test will fre test VMs. What is the worst case for incorrect failover, do the VMs crash and require restart or would you potentially see corrupt VMs?

2) Pretty much as I thought. Ideally want to keep multimode for performance. I was looking at putting in some stackable switches that offer cross switch port channel. Altough when looking into this I came across some details around a cluster failover on network failure option, the option cf.takeover.on_network_interface_failure. Could this achieve what we are looking to do by failing over the cluster if both interfaces in the vif go down? If so any pointers or recommendations on setting this up?

Thanks

Jason