Solved: SRM5 Stretched VLAN Question

esoriano1 · ‎2012-02-06

We are in the process of deciding how to best configure our network between our primary site and our recovery site for SRM 5. We have a 100mb 13ms MPLS open pipe between our primary and disaster recovery site. It is an ethernet handoff and we can configure each end as we see fit. We will have snap mirror in place for datastore replication between our primary FAS3240 and recovery FAS3210.

My question surrounds Stretched VLANs. We would rather not re-ip VMs in our protection groups and have them keep the same IP at the DR site when failed over. How is the network set up? The default gateways for our VLANs are located at the primary site, if we stretch the VLAN to our DR site, if the production site went out how would data at our DR site know how to route traffic without the VLANS respective default gateways?

Can someone enlighten me? Do I need to set alternate gateways at the DR Site on the Juniper L3 switches on each end? Create static routes for traffic that needs to traverse the WAN? Input default and alternate gateways on the VMs OS for networking? Your insights will be appreciated.

Thanks,

Ed

esoriano1 · ‎2012-02-07

Here's what we found by talking to VMWare SRM experts as well as Juniper. Setting up a SRM network using a stretched network requires a gateway at the primary site and at the disaster site that are both on the same subnet. We happen to have three VLANs that will be protected which means we have three gateways at the primary site and hence three alternate gateways at the DR site. Our 100mb pipe will be trunked for three VLANs. By doing this we are able to maintain the local IP addresses on the protected VMs when failed over on the DR site. Only the VM's gateway will need to be re-ip to the DR gateways. This creates the most simple scenario without having to re-ip any of the local IPs of the VMs.

We will have to throttle snap mirror to not take up full bandwidth and leave some bandwidth for other daily operational traffic.

Apparently this is only one of the ways and we wont know if it works until we make the changes. But on paper it looks good.

In Summary,

SRM Stretched VLAN (requirement to have local IP on protected VMs the same when failed over)

1) Primary Gateway(Protected) 10.1.1.1 and Alternate Gateway (DR) on same subnet 10.1.1.2

2) Only re-ip protected VMs gateway from 10.1.1.1 to 10.1.1.2 when failed over

View solution in original post

f_duranti · ‎2012-02-06

We have a similar setup but a bit more complex.

Our infrastructure is in 2 datacenters and we have Firewalls in those datacenters to our neighborhood sites. Each datacenters is connected to peripheral sites with MPLS lines going through our firewalls so in case one firewall/router don't work we want the connection to pass through the other datacenters routers/firewalls.

Our 2 datacenters are configured with Nexus 7K in each and we decided to use BGP to pass routing information between sites and to peripheral sites.

Our L3 public VLAN are configured in both sites, with the VLAN at the site hosting active servers configured in an UP state and the same VLAN (with the same ip on the interface) configured DOWN at the DR sites (we have many VLAN active on one site or the other).

Nexus 7k have a bgp session between them and each of them with the firewall at the local site. The firewalls have bgp session with MPLS routers. Servers have a default route to Nexus Switches. The "preferred" route from external site will be the one with the smallest path (a server in site1 will be reachable with 2 hop using the site1 routers and 3 hop using the site2 router). In case of a fault of MPLS routers or firewalls at site1 the connection to site1 will pass through site2 and viceversa. In case of failure of the connection between sites the connection from site1 to site2 and viceversa will go through MPLS network and servers in the 2 sites will be able to communicate.

In each site we also have a separate L2 network "not routed" used for VMWare and servers storage NFS access and our storage on that network are configured with exactly the same IP address in this way when we failover a VM with NFS mount (like a Oracle database) it will find the mountpoint and it will be able to use the volumes without any need to change configurations.

We are also configuring, when possible, a single VLAN/Datastore per application so that we will be able to move a single application with just some simple operation (stop vm, snapmirror, disable VLAN at site1 enable VLAN at site2, SRM migrate).

This is probably more then you need but probably leaving out all the routing needs you can do something similar. You can create a different VLAN interconnecting the 2 sites used as a routing VLAN and configure the same servers VLAN in one of the 2 sites as active and the same as inactive on the other site. When you SRM to secondary site you should only activate the VLAN on it to have the same default routers for servers. The only prerequisite is that you should have servers on a specific VLAN only in one of the 2 sites.

I'm not sure (I'm not a network expert) but you can probably also configure something in this way (don't know about Juniper but I think it has those functionality): each sites has it's own ip on the VLAN (like .2 and .3) then you can configure HSRP (HA between switches) between the 2 switch with .1 and set it up so that it normally stay at your primary site. In case of a disaster, switch at site1 will not be available and switch at site2 will takeover .1 so your VM will not need to be reconfigured. I don't know if HSRP will work on geographic 13 ms links or if Juniper support it but it should work. The only problem will be if the networking between sites go down. You will probably end with .1 in each sites but this could not be a problem (VM in each site will find their default router) if you have another kind of backup connection between them.

esoriano1 · ‎2012-02-07

Here's what we found by talking to VMWare SRM experts as well as Juniper. Setting up a SRM network using a stretched network requires a gateway at the primary site and at the disaster site that are both on the same subnet. We happen to have three VLANs that will be protected which means we have three gateways at the primary site and hence three alternate gateways at the DR site. Our 100mb pipe will be trunked for three VLANs. By doing this we are able to maintain the local IP addresses on the protected VMs when failed over on the DR site. Only the VM's gateway will need to be re-ip to the DR gateways. This creates the most simple scenario without having to re-ip any of the local IPs of the VMs.

We will have to throttle snap mirror to not take up full bandwidth and leave some bandwidth for other daily operational traffic.

Apparently this is only one of the ways and we wont know if it works until we make the changes. But on paper it looks good.

In Summary,

SRM Stretched VLAN (requirement to have local IP on protected VMs the same when failed over)

1) Primary Gateway(Protected) 10.1.1.1 and Alternate Gateway (DR) on same subnet 10.1.1.2

2) Only re-ip protected VMs gateway from 10.1.1.1 to 10.1.1.2 when failed over

f_duranti · ‎2012-02-08

Can't you do L3 routing at Juniper switches instead of having VM that work as gateway?

It will be probably simple and only need an switch command to put the VLAN at site2 in an active state (you should only put up the interface vlans).

In your case if you have siteA with GW 10.1.1.1 and SiteB with GW 10.1.1.2 when you failover all your vm from SiteA to SiteB the VM native on site B will continue to use 10.1.1.2 but the VMs you'll migrate will use their original 10.1.1.1 GW, why you need to re-ip the Protected GW?

esoriano1 · ‎2012-02-08

Francesco Duranti wrote:

Can't you do L3 routing at Juniper switches instead of having VM that work as gateway?

It will be probably simple and only need an switch command to put the VLAN at site2 in an active state (you should only put up the interface vlans).

We were thinking about having an inactive DR switch. When failed over, activate the DR Switch with a script to use the primary gateway so that no-re ip of the VMs gateway (Windows OS, LAN IP) has to happen. However it would introduce more traffic passing through the WAN for any DR devices.

In your case if you have siteA with GW 10.1.1.1 and SiteB with GW 10.1.1.2 when you failover all your vm from SiteA to SiteB the VM native on site B will continue to use 10.1.1.2 but the VMs you'll migrate will use their original 10.1.1.1 GW, why you need to re-ip the Protected GW?

We would need to re-ip all the protected VMs gateways (Windows OS, LAN IP) to 10.1.1.2 when failed over because it will need an active gateway local to the DR site.