Solved: LUNs not visible after active/active takeover

matt_stevens · ‎2011-11-29

We have 2 heads setup in an active/active cluster, version Ontap 7.3.6. We can do takeover and giveback without issue, but when we do the takeover, the LUNs are no longer visible to the ESX hosts that use them. Each head is a separate netapp head unit, and we have them connected with interconnect cables. Each head has a fiber connection to a different fiber switch. (head A connected to switch A, and head B connected to switch B) I'm looking at ideas on what might be wrong/incorrect with our setup.

paleon · ‎2011-11-29

That sounds about right. We use Cisco switches, but I expect the concetps are similar.
1. We create zones for each pairing of server HBA and NetApp target interface.
1a. You can put more than two elements in a zone, but this creates unnecessary connections within the zone fabric. If hba1, hba2, and target1 are in the same zone, the SAN switch creates connections between hba1 & target1, hba2 & target1, and hba1 & hba2. The connection from hba1 & hba2 is unused. In large SAN environments, these unnecessary connections can be an issue. In small SAN environments, the impact should be minimal.
2. We add the zones to the zoneset
3. We activate the zoneset. This applies the updates to the active zoneset.

If the zone changes require any additional igroups or LUN mappings, you will need to make those changes as well.

In the attached diagram, I used numbers 1-6 for the WWPN of the devices and included a conceptual list of commands (they are not the correct Cisco commands) for the SAN switch config.

Please let me know if you have any other questions.

View solution in original post

paleon · ‎2011-11-29

Are you using FC, FCoE, or iSCSI LUNs?

If you are using iSCSI LUNs:
* The ESX hosts will need to be configured to handle up to 120 seconds of storage connectivity loss. Please refer to NetApps's best practics TR for VMware environments (http://media.netapp.com/documents/tr-3428.pdf) for more details on configuring guest OSes and ESX hosts for loss of iSCSI or NFS connectivity. There are other documents for later versions of VMware, but from what I recall the TR I referenced is very thorough.

If you are using FC,
* I would suggest confirming that the netapp controllers are configured to use the same world-wide node name (WWNN). This can be configured through software if there is a mismatch.
* The FC ports will have different world wide port names (WWPN). In our environment, our zones are setup using WWPN, not WWNN. I'm not sure if this is consistent between SAN manufacturers or not. If the zones connecting the ESX hosts and the NetApps are based on WWPNs (which I suspect they will be), the WWPN for the LUNs will change when the LUN moves between controllers. You may need to configure additional zones to enable connectivity.

If you are using FCoE,
* I'm sorry, but we have yet to deploy FCoE, so I have limited knowledge on this point. I would recommend checking the latest NetApp TRs for SAN best practices.

Please let me know if any of these suggestions help identify the problem or if I can offer any further suggestions.

Sincerely,

Bill

matt_stevens · ‎2011-11-29

Aha! Ok I think this might be getting somewhere. We are using FC by the way. The switches are using WWN which I believe is actually WWPN in the case of our FC switches (Brocade). I'm thinking what I probably have to do is connect BOTH heads to each FC switch, and then setup a zone that has both WWPN's, or at least create an alias that has both WWPN's for the same alias... I assume this is what you did with your config with the different WWPN's?

paleon · ‎2011-11-29

That sounds about right. We use Cisco switches, but I expect the concetps are similar.
1. We create zones for each pairing of server HBA and NetApp target interface.
1a. You can put more than two elements in a zone, but this creates unnecessary connections within the zone fabric. If hba1, hba2, and target1 are in the same zone, the SAN switch creates connections between hba1 & target1, hba2 & target1, and hba1 & hba2. The connection from hba1 & hba2 is unused. In large SAN environments, these unnecessary connections can be an issue. In small SAN environments, the impact should be minimal.
2. We add the zones to the zoneset
3. We activate the zoneset. This applies the updates to the active zoneset.

If the zone changes require any additional igroups or LUN mappings, you will need to make those changes as well.

In the attached diagram, I used numbers 1-6 for the WWPN of the devices and included a conceptual list of commands (they are not the correct Cisco commands) for the SAN switch config.

Please let me know if you have any other questions.

matt_stevens · ‎2011-11-30

This was the fix to my issue!! Thanks so much for pointing me in the right direction and providing such helpful information!

paleon · ‎2011-12-01

You're very welcome. I'm glad we were able to figure out the problem and resolve it.

scottgelb · ‎2011-12-01

Most of our customers zone this way too, but some follow a zone per initiator scheme...what are your thoughts on that instead of a zone per initiator/target pairs?

paleon · ‎2011-12-01

If I understand the "zone per initiator" model, each zone will contain:
- One and only one initiator HBA

- One or more target HBAs
Is this correct?

First of all, I would like to be clear on something. While I have worked with FC SAN for several years, I do not consider myself an expert. I based my decisions to deploy zones with one initiator and one zone on two things.

First, I read best practices documents and manuals from my SAN switch manufacturer. I do not have the document handy, so I cannot reference it directly. From what I recall, the Cisco best practices guide for NX-OS (I cannot recall whether it was 3.x or 4.x) indicated that for small SAN fabrics, placing more than two (2) HBAs in a zone would not create issues; however in large SAN environments, the wasted resources could become an issue.

Second, I worked in an environment where the SAN switches were daisy-chained together instead of kept in a strict A-B topology. This was done due to a combination of physical plant constraints, a requirement to manage all zones and zonesets from one SAN switch, and a lack of understanding of why strict A-B topologies are such a very good idea. In this environment, each zone contained all initiator HBAs on the host, all target HBAs on all required SAN controllers, and all target HBAs on all tape drives. This model made some aspects of administration easier. Unfortunately, it added a very difficult problem. Because there was more than one SAN switch but only one SAN fabric and because all initiator HBAs on a host could see all target HBAs on the necessary storage controllers, it was very easy for a host to chose an initiator-to-target path that crossed the inter-switch links. The SAN switches were capable of line speed between ports on the same switch. But the inter-switch links contained only two (2) interfaces per switch. In other words, the inter-switch links could easily become a bottleneck to SAN traffic. To make matters worse, the hosts could not easily identify which initiator-to-target connections crossed inter-switch links and which remained within a single switch. As a result, I strongly recommend using a strictly A-B SAN topology.

That being said, if my customers have maintained a strict A-B topology and the number of wasted connections is insignificant to the switch's/fabric's maximum number of connections, then I wouldn't worry about putting more than two (2) HBAs per zone.

On the bright side, since HBAs can be members of more than one zone, it may be possible to replace the more-than-two-HBA zones with several only-two-HBA zones without client disruption. However, I would confirm that with my SAN switch manufacturer (and some testing if possible) that such a migration is supported.

scottgelb · ‎2011-12-01

Correct. 1 initiator and multiple targets. But two fabrics so zoned only with targets on the same fabric. Having a zone per initiator/target works but can be more entries. Although if disk an tape are mixed I wouldn't use the same initiator for both like mentioned. Cut wondering if anyone has seen issues with single initiator zones to targets on the same fabric with disk.