Subscribe

Active/Passive Fabric Metrocluster

Hi

IHAC who wants to use metrocluster as an active/passive solution. i.e at one given time they will only serve data from one site.

Let me explain:

Primary side controller hostname:       XXXpri

Secondary side controller hostname:   XXXsec

Both sides are fully loaded i.e 9 loops, 18 shelves = 252 disks on each side. What we are planning to do is

Primary side node:

252 disks all assigned to XXXpri pool0

Sec side node:

3 (root aggr) + 1 spare to XXXSec pool0

248 disks assigned to XXXpri pool1

In this case root of Secondary side is not mirrored to the primary,is that a problem? Does this solutioin look feasible. Am I missing some thing?

Last but not least we have a strange disaster scenario in our heads. The customer has the habit of serving data from either pri or sec. If he does a manual takeover and is serving data from sec side (pri is waiting for giveback) and for some reason sec site goes down (let us say MB blows) how can we giveback control to pri, if we can at all?

Thanks in advance.

Regards,

Babar

Re: Active/Passive Fabric Metrocluster

Hi Babar,

It is okay to have an active/passive MetroCluster where the most of the storage is on one site and the there is only minimal non-mirrored storage at the other (i.e. root). As long at the storage at the production site is mirrored and the mirror plexes are physically located at the standby site you are okay.

In the case you mentioned when you have failed over to the standby, the standby goes down you will have the following situation:

There is now data on what were the mirrored plexes that would be lost if you were to try to bring up the original system on the original plexes. Not sure you would want to do it.

Re: Active/Passive Fabric Metrocluster

Hi Lanson,

First of all your help is highly appreciated.

There is now data on what were the mirrored plexes that would be lost if you were to try to bring up the original system on the original plexes. Not sure you would want to do it.

But if you can be sure that the mirrrored will stay down (powered off or something) how can you bring the original site up? cause the original side system would be in "waiting for give back" state?

Regards,

Babar

Re: Active/Passive Fabric Metrocluster

It depends on the scenario. If the primary site failed and you were never able to execute the forcetakeover command, the you could bring up the primary again without data loss. It would not come up as "waiting for giveback" since the standby site never took over.

Re: Active/Passive Fabric Metrocluster

Nah the way I am looking at is: Our customer has the habit of having these periodic drills where by they will do a takeover and start serving data from secondary site. In this case the pri node would be in "waiting for giveback" mode. Now if the secondary node goes does can we start serving data from pri site?

Thanks in advance.

Re: Active/Passive Fabric Metrocluster

To enable failover from primary site to secondary in case of primary disaster you must have primary root volume and all production (whatever it means) data mirrored. As you have ony 248 disks on secondary, you can mirror only 248 disks on primary which leaves you with 4 disks on primary that serve no useful purpose. Unless your customer has some use for lone 4 disk aggregate you can just as well simply mirror secondary root onto these disks. It will make life much easier.

Re: Active/Passive Fabric Metrocluster

As Jim pointed out, the exact steps depend on the scenario...

If prior to the MB failure on "sec" site the plexes are in mirrored state - and you might loose whatever is in NVRAM (and not committed to disk - depending on the state of the "pri" filer at the point of the "sec" filer failure),  you can start procedures to break the SCSI reservations and mailbox contents, in order to allow the Node "pri" to start up and bring the aggrs online...

However, as mentioned, content which was not committed to disk at the point in time of the MB failure (panic, watchdog reset, fire...) might be lost - which could be as much as 0,5 GB. You also might need to run a DB replay - but again, a few transactions might be lost...

Therefore, such a scenario is nothing to be tested lightly - and if a desaster strikes, you should engage the Support Center and CIO of the customer, so that the due diligence about impact/risk/benefit can be evaluated fully...

Regards