Re: Understanding mirrored HA pairs

GVM666GVM · ‎2012-12-14

Hi all,

I am reading in the High Availability Configuration Guide about mirrored HA pairs.

This is literally in the guide:

"Mirrored HA pairs do not provide the capability to fail over to the partner node if one node

is completely lost. For example, if power is lost to one entire node, including its storage, you

cannot fail over to the partner node. For this capability, use a MetroCluster."

If you have a full copy of the data, why wouldn't it be possible to failover to the copy in case one node goes down?

What is actually the use of having mirrored HA if it doesn't give you any more availability? If a node goes down you will still have an outage in your environment.

Could someone please explain?

aborzenkov · ‎2012-12-14

You missed words “completely” and “including storage”. It is of course possible to failover if node goes down. It is not possible to automatically failover if entire site (node + storage) goes down.

Please ask Google about “split brain”. You need third instance to resolve it, it can be implemented as part of MSCS or as separate tie breaker. Or manually, when administrator accepts responsibility for declaring one node dead.

paulstringfellow · ‎2012-12-14

So this is a terminology thing I’d suggest.

Mirrored HA – is probably talking about SnapMirror between two mirrored pairs, using SnapMirror is not an automatic failover process in the way that two HA controllers are.

If you have two controllers in a HA pair and controller fails, then this automatically presents storage from both controllers to maintain availability.

However if you lose the pair and this mirrors to another site for DR purposes, this is not an automated process (although it can be automated with NetApp tools such as OnCommand).

Metrocluster that is mentioned however does do this…which allows you to “stretch” a pair of HA controllers across a distance (KM’s if need be) for this to work means replicating disk trays exactly etc at both ends and having the relevant bandwidth to write data to geographically spread disks in real time, so metrocluster has some important pre reqs.

So I suppose the question is, what are you looking to achieve?

GVM666GVM · ‎2012-12-14

"For example, if power is lost to one entire node, including its storage, you cannot fail over to the partner node. For this capability, use a MetroCluster."

It doesn't say that failover is not automatically. It says it you can't failover.

If you have all data duplicated, including the root volume, failover should be possible as you have a snapmirror sync between 2 controllers (called nodes in the documentation).

@aborzenkov

The last sentence says that for failover you need MetroCluster. And MetroCluster doesn't do automatic failover. So it would surprise me they are actually talking abouth automatic failover here.

Maybe it is just bad wording in the documentation. But it is really confusing.

> So I suppose the question is, what are you looking to achieve?

I am looking into doing the ns0-155 exam

paulstringfellow · ‎2012-12-14

Metrocluster does indeed automatically failover, that’s the point of it.

Metrocluster is a HA pair that is stretched over distance, with full disk shelf duplication, with synchronous writes to the production tray and its mirror.

The point of metrocluster is if you lose an entire computer room, so a controller and all of its shelves, it will automatically failover and present all storage from the remaining node.

However the quote you use talks about mirrored HA pairs…so this is two HA controller builds mirrored to each other using SnapMirror. SnapMirror does not present the option to automatically “failover” to the SnapMirror target, this is a manual process, although it can be automated using the OnCommand and other third party tools.

What does automatically failover is the nodes within each HA pair – in the event of the failure of one controller in a HA pair, the other controller will present the all of the data to the environment.

However the document is talking of two of these HA pairs, Mirroring data via SnapMirror, which indeed does not do a automatically failover, however it is correct that metrocluster does allow this kind of geographically spread HA automated failover.

Hope that helps.

aborzenkov · ‎2012-12-14

The point of metrocluster is if you lose an entire computer room, so a controller and all of its shelves, it will automatically failover and present all storage from the remaining node.

No, it won't. Please do not spread totally false and misleading information.

paulstringfellow · ‎2012-12-17

Sorry you feel that way, however it is neither false nor misleading.

From page 4 of the metrocluster design document is the following paragraph

“A MetroCluster (either Stretch or Fabric) behaves in most ways just like an active-active configuration. All of the protection provided by core NetApp technology (RAID-DP®, Snapshot™ copies, automatic controller failover) also exists in a MetroCluster configuration (Figure 1). However, MetroCluster adds complete synchronous mirroring along with the ability to perform a complete site failover from a storage perspective with a single command.”

As this states, basically a metrocluster is a “stretched” HA pair, which operates in almost the same way, which includes the ability to failover between controllers and present all storage from a single controller in the event of a controller outage (planned or otherwise).

The single trigger command can be automated by a failover witness management box, which can trigger the failover if necessary, however many people don’t wish to automate failover across sites and want manual control of that. However the automated option does exist.

TR-3548 is the document of reference for metrocluster design.

N_ANDREEV · ‎2013-08-07

Guys, what you were missing here is the fact that Mirrored HA Pair is NOT a SnapMirror. Both Mirrored HA Pair and MetroCluster use SyncMirror to synchronously replicate data.

alaa_samarji · ‎2012-12-15

guys i think if i got the question right that this is getting out of proportion.

in a normal HA pair, you would have two controllers running each a separate NVRAM capacity and controlling a certain set of disks. in this normal day to day configuration in case of NODE failure (only and not power shortage of disks), MPIO will kick in by redirecting (automatically-Preferred if load is less than 50%), Node 2 will take control of all the disk set that was originally belonging to node 1 and it would be a normal office day.

when we have HA mirrored Pair, both Nodes are accessing the same Data pool with replicated NVRAMS, its like what NodeA do, Node B mimics if Node A go down, their would be no outage or disturbance vs in first scenario where their would be a shortage and all NVRAM data would be lost and not committed to disk.

this is the exact same concept as Cluster Ontap when running HA mirrored nodes and have all that Immortal Data and stuff. to ensure full Infra failover (nodes and disks) you have two ways of doing either with a Metro Cluster with mirrored Shelves or Snap mirror Sync if within the same Infra .