In our company environment we have a 6 nodes cdot cluster (9.1) located in a given server room:
2 x FAS8200
2 x AFF300
2 x FAS8020
For redundancy purposes we're reshaping the server room splitting the nodes between 2 separate and independent rooms with connectivity for the cdot cluster. The cluster switches we be in room 2.
2 x FAS8200 (room 1)
2 x AFF300 (room 2)
2 x FAS8020 (room 2)
If connectivity issues arise between rooms will the 2 x FAS8200 nodes continue to serve data locally (fibre channel LUNs only)?
7 REPLIES 7
You could have an issue with the 2 nodes that are separate due to quorum and epsilon.
In a 6 node cluster, if 2 fail there is still enough nodes left to elect a node to be epsilon and keep the remianing 4 nodes in quorum. The two that were left out in lala land would go "I don't have enough nodes to for a quorum" and they would shut off data services.
For the "seperation" to a new room part, are you just planning to run like a direct cable from the back of the cluster switch to the back of the 2 nodes you move?
Our view on this when it has come up in the past is that it is not supported due to risks of cluster partition. The diversity of failure modes makes it an option we would not encourage - as @SpindleNinja points out, the cluster quorum marker may end up in a different room to the surviving nodes, which would cause them to also shut down. So instead of reducing risk, you're actually increasing it.
If you are interested in geographic diversity, even within the same building, we have an option called Metrocluster, which deploys identically configured systems in two locations (with options of a tiebreaker in a third), using IP connectivity, and as of 9.6 it can run on our most cost effective systems, the FAS2750/AFFA220 nodes.
It doesn't work with 6 node environments, and in almost all cases should be set up at install time, so the best deployment options for it would be a new environment to install and migrate highly critical workloads to. We have some more information at (business) https://www.netapp.com/us/products/backup-recovery/metrocluster-bcdr.aspx and (technical) https://www.netapp.com/us/media/tr-4705.pdf Please send me a message either here or via my first.last @ netapp. com email with your corporate contact details if you'd like your NetApp account manager to reach out with more details.
Hope this helps!
Exactly what Alex said.
I had ONE customer that had split the cluster between floors. One switch on the first floor and the other on the second floor. One HA pair on the first floor and 3 ha-pairs on the second. As indicated above, could have been dangerous with certain failure scenarios. Late last year, the customer was conviced to relocate everything to one floor. We added a few new ha-pairs, migrated and eventually removed the nodes from the first floor.
Unless MetroCluster is used, it is not even close to a good idea to split a single cluster.
Many thanks for the reply.
Well our goal is to increase local redundancy by splitting fully redundant VMs between diferent storage nodes and also independent rooms. If one room fails for whatever reason (power, fire, flood....) the other continues to give service.
If in a cluster environment an inititally independent HA pair now depends totally on the connectivity to the other nodes then indeed it doesn't solve our needs.
I think the solution is to remove that single HA pair from the cluster but I guess it's probably not that simple?
you can certainly remove an HA Pair and make it a independent switchless cluster- however the nodes have to get reinitialized after removal from the cluster. so you have to store the data somewhere else and move it later on to the new cluster (via snapmirror for example)
For what you're describing, I 3rd the metrocluster recomendation.
What's the distance between "rooms"
And as Gidon says, to break apart the cluster you would need to remove all data from the nodes. remove the pair and then move the data back on.
The goal is not to geographically split the storage neither replicating data between rooms.
The rooms are together in the same Datacentre but with independent power capacity plus fire and flooding protection.
Our goal is to split redundant servers between 2 different rooms so having a room totally dependent from the other due to the storage cluster design is indeed a problem.
We'll have to figure how to migrate data off one of the HA pairs and remove that one from the cluster. I don't see any other solution.
Many thanks for the reply.