Re: Filer HA Pairs and SRM 5.1 - Array Managers

VMCREATOR · ‎2013-11-06

Just a quick question. I am hoping you may help me.

If you have two NetApp NFS filers in a HA pair in protected site (each have separate IP addresses) and two NetApp NFS filers in a HA pair in the recovery site (each have separate IP addresses).

When you configure SINGLE SRM array managers for each site it asks for only 1 IP address for one of the filers in both sites, and you can pair them accordingly.

If a NetApp NFS filer HA failover occurs then the secondary filer is then the Primary and SRM cannot reference this other IP address and the array pairing will be disconnected.

The field for several NFS addresses separated by commas is still tied to the controller IP address, so this does not help.

I cannot find any information anywhere of how SRM works in a NetApp NFS based HA pair.

Other storage systems use a Virtual IP address (VIP) to do this, as it is a cluster.

Cheers

Rob

CJSCOLTRINITY · ‎2013-11-07

Bob, are all your NFS datastores served from the Primary filer with the secondary only being used if the Primary fails or are they distributed between the two filer heads at the Protected site?

If you have datastores presented from both filer heads then you need to configfure BOTH filers in SRM as seperate Array Managers.

Have you used the vif_mgt address in the array manager or the e0M address? You need to use the vif_mgt address (see /etc/hosts) in the array manager.

In this case vif_mgt is a ifgrp using e0a and e0b and configured in the /etc/rc file to be taken over by an equivalant ifgrp on the partner filer.

VMCREATOR · ‎2013-11-07

Calvin said

"are all your NFS datastores served from the Primary filer with the secondary only being used if the Primary fails or are they distributed between the two filer heads at the Protected site?"

Yes I believe they are.

Below is the rc file and other information and hosts file.

I have substituted the real controller name with acme-protected. See contents of each etc/rc file for all controllers.

The information below is from documents left for me to peruse. I am no NetApp expert so I cannot comment much deeper than what is shown below:

We have two FAS2240-2 controllers in Protected Site in a HA pair.

We have two FAS2240-2 controllers in Recovery Site in a HA pair.

OnTap version is 8.1.3 7-mode.

VMware VSC is 4.2

OnCommand System Manager is version 2.2

Snapmirror is being used to replicate NFS LUNs from Protected to Recovery site.

Both sites have a single FAS2240-2 filer, each of which will have two controller head units in a single 2u chassis. These controller heads have a total of 6 network interfaces, plus 2* management interfaces. 2 of the network interfaces are 10 Gigabit SFP (10Gbe), all others are CAT6/RJ45 gigabit (Gbe).

The vSphere hosts have access to the NFS volumes via one of two 10Gbe network interfaces, the filer head has both 10Gbe interfaces connected to a single Cisco 4900m switch using the Fibre SFP cables provided. The 2* 10Gbe Cisco 4900m ports used by the Netapp interfaces are configured with LACP for load balancing for an aggregated throughput of 20Gbps. In the event of a single cable/port failing then connections will continue but at a reduced throughput of 10Gbps.

In addition to the base FAS2240 licence kit, other licenses required include:

Snapmirror
Flexclone
Snaprestore
SMVI Backup

The design for <Datacentre 1> is to connect each of the Netapp FAS2240 filer controllers to the Cisco 4900M switches, with two 10Gbe interfaces from each controller (blue connections above) used as the primary uplink for NFS Storage traffic. The two 10Gbe connections will be connected to one physical Switch and configured as an LACP aggregated pair.

A pair of 1Gbe interfaces (green connections above) will also be configured in an LACP aggregated pair and will act as a standby connection for NFS Storage traffic. These 1Gbe interfaces will be physically connected to the second Cisco 4900 switch. The LACP pairs (blue & green) will then be paired together using Netapp Virtual interfaces to create a resilient link for VMWare NFS Storage traffic.

In addition to the Virtual Interfaces used for NFS storage traffic, connections will be required for management of the Netapp controllers. This will consist of resilient links for management of the ONTAP system using Netapp tools such as the Virtual Storage Console in the vSphere client and Netapp System manager, but there will also be a link for ‘Out of band’ management, connecting to the dedicated Cisco 3750 management/iLO switch.

The 10Gbe interfaces on each controller will be paired together into a dynamic Multimode VIF, with LACP used for link aggregation. With this configuration the Netapp controllers at the recovery site will have the same network throughput as the controllers at <DC1> but as there is only one 4900m Switch available at <DC2>, there will be no resilience against a switch failure.

Management/administration access to the ONTAP system on each Netapp Controller will be over a single 1Gbe connection to the Cisco 4900.

While there is only 1 physical switch at <DC2>, the intention is to create the Virtual interfaces as close to the configuration at <DC1> as is practical. This will make any future changes easier as the VIF configuration will already exist.

The blue connections in the diagram above show two 10Gbe SFP cables from each Netapp controller, connecting to a single Cisco 4900 Switch. Each Netapp controller will have both 10Gbe interfaces configured in a dynamic Multimode VIF with LACP used for load balancing and resilience (load balancing will be based on source/destination IP address). Controller A of the FAS2240 will have both 10Gbe SFP cables connected to the 4900_A switch. Controller B of the FAS2240 will have both 10Gbe SFP cables connected to the second 4900m switch (4900_B).

This configuration will result in a single virtual interface (VIF) on each controller. However on its own, while this VIF configuration will offer high throughput speeds and resilience against port/cable failure, it will not protect against switch failure. As the 4900m Switches do not support cross switch etherchannel/LACP, another VIF is required to act as a standby for NFS traffic in case of switch failure.

A dynamic multimode VIF using the two 10Gbe interfaces will be created, with the two 10Gbe interfaces on the controller assigned. A second Dynamic Multimode VIF will also be created, but with two of the 1Gbe interfaces from the Netapp controller (Green connections in the diagrams) added to this VIF. The two 1Gbe interfaces will then be connected to the second Cisco 4900 switch and configured with LACP, using source/destination IP Address load balancing.

Once this second (1Gbe) dynamic VIF has been configured, a static multimode VIF will be created (second level VIF) which will be used to create an Active/Standby pairing. One interface is chosen as the preferred uplink and the other as standby only. In our design, we will add both of the dynamic multimode VIF’s to this static VIF and configure the 10Gbe (blue) VIF as the primary uplink, with the 1Gbe (green) VIF as standby.

This design will provide automatic failover if one of the 4900m switches were to fail, although this will be at a reduced throughput (combined equal to 2Gbps as opposed to the 20Gbps we would expect to see in normal operations). If the switch failure is likely to be for an extended period, then a manual failover to the secondary Netapp controller can be performed using the CF-takeover command. The secondary controller would then direct traffic via its own (aggregated) 10Gbe uplinks.

The vSphere hosts will access the NFS volumes over a dedicated, layer 2 VLAN with no route to any other networks. Access to this VLAN is via direct connection on the VLAN only. Each Netapp Controller will have 2 IP addresses alias’ specified within the Ctrlr_a_vif_nfs_static VIF. This is to support load sharing across the two 10Gbe interfaces.

NFS volumes will be exported using one of these two IP addresses, with the Netapp VSC within vCenter being used to mount these NFS exports correctly on the vSphere hosts. Separating NFS exports onto two IP addresses like this means that the IP Hash calculation used by LACP will utilise both physical interfaces when a vSphere host is accessing multiple NFS Datastores.

The alias IP addresses will be assigned to the second layer VIF (Ctrlr_a_vif_nfs_static; described in the last section). This ensures that the IP addresses will remain in place in the event of a failover to the standby VIF.

dvSwitches have been replaced by standard VMware vSwitches Calvin !. They had problems when vCenter was down ?

ESXi 5.1 VLANs

DC1, host1	dvSwitch2	10Gbe – Port 1	Vmnic8	dvuplink1	Xx/yy	2,200-208,300-304
DC1, Host1	dvSwitch2	10Gbe – Port 2	Vmnic11	dvuplink2	Xx/yy	2,200-208,300-304
DC1, Host1	dvSwitch1	1Gbe – O/B Port 1	Vmnic0	dvuplink1	Xx/yy	200,204
DC1, Host1	dvSwitch1	1Gbe – PCI Port 1	Vmnic4	dvuplink2	Xx/yy	200,204
DC1, host2	dvSwitch2	10Gbe – Port 1	Vmnic8	dvuplink1	Xx/yy	2,200-208,300-304
DC1, Host2	dvSwitch2	10Gbe – Port 2	Vmnic11	dvuplink2	Xx/yy	2,200-208,300-304
DC1, Host2	dvSwitch1	1Gbe – O/B Port 1	Vmnic0	dvuplink1	Xx/yy	200,204
DC1, Host2	dvSwitch1	1Gbe – PCI Port 1	Vmnic4	dvuplink2	Xx/yy	200,204
DC1, host3	dvSwitch2	10Gbe – Port 1	Vmnic8	dvuplink1	Xx/yy	2,200-208,300-304
DC1, Host3	dvSwitch2	10Gbe – Port 2	Vmnic11	dvuplink2	Xx/yy	2,200-208,300-304
DC1, Host3	dvSwitch1	1Gbe – O/B Port 1	Vmnic0	dvuplink1	Xx/yy	200,204
DC1, Host3	dvSwitch1	1Gbe – PCI Port 1	Vmnic4	dvuplink2	Xx/yy	200,204
DC2, host1	dvSwitch2	10Gbe – Port 1	Vmnic8	dvuplink1	Xx/yy	2,200-208,300-304
DC2, Host1	dvSwitch2	10Gbe – Port 2	Vmnic11	dvuplink2	Xx/yy	2,200-208,300-304
DC2, Host1	dvSwitch1	1Gbe – O/B Port 1	Vmnic0	dvuplink1	Xx/yy	200,204
DC2, host1	dvSwitch1	1Gbe – PCI Port 1	Vmnic4	dvuplink2	Xx/yy	200,204
DC2, Host2	dvSwitch2	10Gbe – Port 1	Vmnic8	dvuplink1	Xx/yy	2,200-208,300-304
DC2, Host2	dvSwitch2	10Gbe – Port 2	Vmnic11	dvuplink2	Xx/yy	2,200-208,300-304
DC2, host2	dvSwitch1	1Gbe – O/B Port 1	Vmnic0	dvuplink1	Xx/yy	200,204
DC2, Host2	dvSwitch1	1Gbe – PCI Port 1	Vmnic4	dvuplink2	Xx/yy	200,204
DC2, Host3	dvSwitch2	10Gbe – Port 1	Vmnic8	dvuplink1	Xx/yy	2,200-208,300-304
DC2, Host3	dvSwitch2	10Gbe – Port 2	Vmnic11	dvuplink2	Xx/yy	2,200-208,300-304
DC2, Host3	dvSwitch1	1Gbe – O/B Port 1	Vmnic0	dvuplink1	Xx/yy	200,204
DC2, Host3	dvSwitch1	1Gbe – PCI Port 1	Vmnic4	dvuplink2	Xx/yy	200,204

acme protected> rdfile /etc/rc

hostname acme-protected

ifgrp create single ctlr_a_vif_mgt e0a e0b

ifgrp create lacp ctlr_a_nfs_1 -b ip e0c e0d

ifgrp create lacp ctlr_a_nfs_10 -b ip e1a e1b

ifgrp create single ctlr_a_nfs_l2 ctlr_a_nfs_10 ctlr_a_nfs_1

ifgrp favor ctlr_a_nfs_10

vlan create ctlr_a_nfs_l2 203

ifconfig ctlr_a_vif_mgt `hostname`-ctlr_a_vif_mgt netmask 255.255.255.0 partner ctlr_b_vif_mgt mtusize 1500

ifconfig e0M `hostname`-e0M netmask 255.255.254.0 mtusize 1500

ifconfig ctlr_a_nfs_-203 `hostname`-ctlr_a_nfs_l2 netmask 255.255.255.224 partner ctlr_b_nfs_-203 mtusize 1500

route add default 10.xxx.xxx.1 1

routed on

options dns.enable on

options dns.domainname acme protected

options nis.enable off

savecore

Note VLan is 203

Hosts:

127.0.0.1 localhost localhost-stack
127.0.10.1 localhost-10 localhost-bsd
127.0.20.1 localhost-20 localhost-sk
10.xxx.226.112 acme recovery acme recovery-ctlr_a_vif_mgt
10.xxx.225.19 acme recovery-e0M
10.xxx.226.12 acme protected
10.xxx.230.137 acme recovery-ctlr_a_nfs_l2

Regards

Bob

JIM_SURLOW · ‎2013-11-07

VMware SRM is going to login to the filer just fine. When the HA pair fails over to the partner, the IP & MAC will move over in your 7-mode configuration. At this point, you haven't mentioned using vFilers (Multi-store), so this will simplify the explanation a little bit (I'll still have to explain in a multi-store config).

So, each filer, has an entity known as vFiler0. This is everything that you are configuring right now. When the HA pair fails over, the surviving filer will be running both its vFiler0 and the failed partner's vFiler0. Imagine 2 VMs running on the same ESX server, for a poor analogy.

So, VMware SRM may be configured to talk to IP1 and IP2 of two separate filers via the API. Upon failover, it still talks to IP1&2. But, now, it is running on the same head.

The only thing that I can't recall right now is that if during failover, if the API interface is working (http / https). I believe that there can be issues with the CLI, depending on the OnTap vintage (need to connect to the partner and then issue the partner command).

The important thing is that since the vFiler context moves over, your user perms, snapmirrors, IPs, are still okay.

VMCREATOR · ‎2013-11-08

Thank for your help on this Jim.

I understand the vFiler / Multistore configuration now and things are becoming more clear.

1. We have two FAS 2240-2 units, one for protected site and one for recovery site with each having a management IP address lets say A & B.

2. Each unit has two heads with NFS IP adddresse's, lets say C&D and E&F.

When I configure the Array Manager firstly in the Protected Site vCenter I enter the Protected Site FAS IP address (A) and its corresponding NFS IP addresses separated with commas C&D. This fails, and only works with just NFS address C entered?.

Is it a case of requiring two array managers at each site?

I have configured Lefthand, Dell Equalogic and HP EVA, of course not NFS, and we provide a VIP and it all works accordingly.

Best Regards

Robert

JIM_SURLOW · ‎2013-11-08

So, if you have filer clusters A&B which have controllers, C&D & E&F, respectively....(it is going to be much easier if you think of this in terms of the individual controllers ... and I'm going to guess that the fact that you have the nice compact 2240-2s that it seems like it is one unit ... but think of them as 2 separate ones in the same box ... it'll be much easier).

Snapmirror relationships would be setup between C&E and D&F. As a result, this will need to show up as 2 separate array pairings in SRM as well. If you are setting up D & F as just failover units (I think I saw that in your longer thread), then there would not be an array pairing configuration. You only need array pairings if you have snapmirrors between the controllers.

I should also point out that I had a great deal of trouble initially setting this up in a multi-store environment because our VMware PS consultants gave us some incorrect (i.e. dated) information. I later discovered that the NetApp TR (http://www.netapp.com/us/system/pdf-reader.aspx?m=tr-4064.pdf&cc=us) for SRM is much more beneficial than the SRA install guide that NetApp has or some of the other docs that VMware publishes.

VMCREATOR · ‎2013-11-09

Hi Jim,

I can qualify your thoughts here, and the SRM stuff looks good to go for the primary (active) heads.

I also have other problems.

Both Protected and Recovery Site controller heads have authenticated over SSL to ONCommand ( 4 heads total )
Both Protected and Recovery Site controller heads are visible in the vCenter – Home Tab – NetApp
The active controller heads for each site ( 2 heads ) are visible in the vCenter – Hosts and Clusters – VSC NetApp Tabs
However the Snapmirror and Backup jobs are not visible in the vCenter – Hosts and Clusters – VSC NetApp Tabs.
The secondary controller heads for each site will not authenticate through the vCenter – Hosts and Clusters – VSC NetApp Tabs, and are showing status = Unknown, then after 10 seconds they drop off out of sight.
I have reinstalled VSC on both sites, but still no joy.
SRM configures okay for the two visible heads , cannot add the other heads in SRM arrays, even though are visible in OnCommand and vCenter Home tabs ???
We have Linked Mode and Multi-Configuration Single Sign On ( vSphere 5.1 Update 1 ). Hmmmm

JIM_SURLOW · ‎2013-11-12

4-6 sound like VSC specific. The plugin doesn't like NATted IPs. So, if your client is using a NAT, that could be an issue. Found that I have RDP to the vCenter box with that connected without NATs to the filers. But, it seems that you've made progress on SRM, so that plugin is functional. So, maybe NATs aren't your issue.

#7 - Do you have user creds installed at the remote sites? Those creds won't move over automatically. Also, DNS needs to be set correctly for this to work - vCenter needs to map the IPnames listed in the snapmirror relationship (see snapmirror status from the CLI) in the same way that the filers do.

#8 - unsure on the impact here.

VMCREATOR · ‎2013-11-15

Hi Jim,

Sorry for the belated reply.

NATs are not our issue, but you gave me clues to check ports are listening as they should. We have dropped vCenter 5.1 and have gone to vCenter 5.5a for improved SSO but keeping our ESXi 5.1 hosts (this could bring more bugs until 1st update).

Things have improved, and all plugins are working. There was some high number ports being blocked and telnet etc showed that flaw up.

Moving on to the reinstall of SRM now with one array manager for each head as suggested.

Thanks for your help on all this.

Regards

VMCreator