Netapp + VMware ESX 4 multipath problem

dorian_attard · ‎2010-03-23

Hi,

I have the following setup:

An FSA2020A (12 disks) with 2 nodes (Node A & Node B) connected with a DS14mk4 (14 disks) using FC runnig Data OnTap 7.3.2 and configured as active/active cluster.

Node A using FSA2020A disks
Node B using DS14mk4 disks
Node A FC port 0a connected with blade center brocade switch 3
Node B FC port 0a connected with blade center brocade switch 4
Node A FC port 0b connected with DS14mk4
Node B FC port 0b connected with DS14mk4

Blade center with 5 blades installed with VMware ESX 4 installed locally.

Configured LUNs on both nodes. Configured zones and igroups. Attached the configured LUNs to each respective blade. Installed virtual machines successfully.

From the above configuration and steps didn't encounter any problems but the big problem I'm facing is when i try to takeover one of the nodes to test redundancy. Basically what is happening is that when 1 of the nodes is down i cannot access the LUN that is configured on the 'failed' node from the blades that are attached to it. I can still access the Volumes and LUNs from the filer view or from the system manager but not from the ESX server.

What i'm noticing when all node are up is that from the vsphere client under storage tab there is olny 1 active connection to the LUN. I think that i should see a second passive connection to the LUN from the same HBA for multipathing.

Do you have any idea what can be wrong?

Any help will be really appreciated.

Thanks,

Dorian

evilensky · ‎2010-03-25

How are zones and igroups configured? Are you using ALUA?

ogra · ‎2010-03-25

Have you installed NetApp Host Attached Kits and FC Utilities ??

You can now get all of these using a new software called 'NetApp Virtual Storage Console'. This is highlight is there are any MPIO issues for that ESX server or not.

All the best !!!

radek_kubka · ‎2010-03-26

As per post from evilensky - what type of zoning are you using?

One hypothetical explanation is that you have zoning based on port numbers. If that's the case, when you do a controller failover, its virtual instance is running on partner controller & different physical ports on the FC switch are used, which do not belong to the ESX host zone.

Regards,
Radek