Solved: newbie data ontap cluster 8.2

wolfontap99 · ‎2015-12-05

Hello Everyone,

I am new to Netapp CDOT so excuse me for stupidity. I was go through the elearning for Ontap cluster fundamental ..this morning.
I have few questions that i tried researching but couldnt find an answer ..I am trying my luck here .

1. HA pair still exists in Data ontap cluster meaning .. Storage failover must be configured with partner . Help me understand this ..

Can i have cluster with odd number of nodes i.e cluster of 3 nodes or 5 nodes ?
Does it have to be in even number so that I can have a partner for each node ?
If 3 nodes cluster exist , how is HA paired among them ?

2. Can I use ontap 8.3 to simulate failover ?? I read its not possible in ontap 8.2 or below?? How can i test or configure failove in simulator ?
If you managed to do it .. can you please share procedure ?

3. One last question Smiley Happy

I just need a high level overview of step by step . ( not in details just a task by task ) I see there are many helpful blogs on step by step setting cluster ontap.

1 . Create 2 node cluster
2. Create aggr
3. Create Vserver/SVM
4. Create file or block protocol LIF for connectivity and volumes

Since failover isnt possible on simulator .. I want to know opinion from others who are using simulator to practice ? Please give me list of task that i can try on simulator
I went to try failover .it took me lot of effort to understand its not possible . I am tryin to seek help from pro's who have already used simulator and tested functinalites of cluster ontap

bobshouseofcards · ‎2015-12-06

Cluster vs. HA can sometimes cause confusion, especially if you have any experience or knowledge of a cluster concept in other domains. There are a lot of common terms thrown out - redundancy, resiliency, availability, etc. Some of these tend to overlap while others have very specific meanings in context. For this post I'm just going to talk about keeping things running in one cluster with specific redundancy resources. Redundancy will mean keeping things running when something physical breaks, in a generally automatic way. I'm not extending redundancy to failover sites, etc. Where applicable I will describe the availability factor of the redundancy. Also some of the mechanisms described will be conceptual rather than really low level technical.

To start, I find it helps to break it down first physically, then logically.

"Redundancy" is fundamentally a physical concept. Doesn't matter what services the "cluster" provides - if there isn't some form of physical redundancy something will be lost when physical equipment breaks. Also, without basic redundancy there is no availability, "high" or otherwise. So we start with physical redundancy.

Data OnTap redundancy starts with the RAID layout on disk which stores extra data so the loss of disks doesn't cause data loss. Hot spares are swapped in to replace failed disks to restore the basic level of disk redundancy. Of course this disk concept is pretty basic to storage arrays.

The disk controller system also requires redundancy such that any interconnection or processing point can break. The disk shelf controllers, the cabling, all the way to the NetApp nodes must be at least paired. That is where a High Availability (HA) node pair comes into the mix. Of course you can run storage off a single node, but if the node goes down, you've lost redundancy in the chain. NetApp, in both 7-mode and cDot combines two nodes together into an HA pair. Both nodes are wired to a set of disks. The aggregates defined on those disks can be owned by either node. The owning node accesses the disk for reads and writes. The two nodes are also "wired" to each other so that writes can be stored on the NVRAM of both nodes. This ensures that the redundancy mechanism also covers acknowledged writes.

The two nodes in an HA pair only cover the disk to which they are physically attached. All disk is not wired to all nodes in a cDot cluster. Any given set of disk is wired to a single HA pair. You also cannot create "virtual" HA pairs - for instance have 3 nodes and split disk shelves between nodes 1&2, nodes 2&3, and nodes 1&3 (would be interesting). The HA pair and attached disk is a single physical unit where redundancy is created for data storage.

Aggregates are a physical storage container that lives within a single node. By extension an aggregate can only live on disks attached to that single node. So an aggregate, though "mobile", can only move between the two nodes in the HA pair. There are two types of aggregates, at a high level - cluster aggregates and user storage aggregates. User storage aggregates can be flipped between nodes in an HA pair using Storage Failover (SFO). This is a transparent operation that can be manually initiated or on-demand initiated such as when a node fails. Under the covers, cDot flips the ownership of the aggregate disks between the two HA nodes. Cluster services handle the redirection of user requests and such so that data access continues.

Cluster aggregates are singular per node. They contain the base data a node requires to boot up and start the underlying cluster services. For all intents, a node is a virtual machine on top of a base OS, as are all the Storage Virtual Machines that control user data. When a node fails or when a node needs to be restarted, the Cluster aggregate (also referred to as the "root" aggregate of the node) uses Cluster Failover (CFO) to jump between nodes in the HA pair. The surviving node is already running of course. But it can start another "node" virtual machine to read specific configurations from its partner's cluster aggregate. Thus it can act, loosely, as a proxy for that node. That is why you see CFO and SFO called out differently. For instance, in a standard "giveback" first the CFO aggregate is handed back to the original node. Then the node boots up and starts all the backend cluster services. After a time to settle and ensure the shared cluster databases are in sync, the SFO aggregates are handed back to the original node.

Thus far everything described is just in one HA pair. The actions above do not go beyond a single pair of nodes. At the core physical level, the HA pair is the building block for redundancy. As mentioned in a previous post, cDot doesn't really care about HA pairs. You could cluster a whole bunch of single nodes. The cluster itself does not provide any redundancy to node loss in that configuration.

Then what does the cluster provide? Using the underlying physical node building blocks, the cluster provides virtualization for logical storage allocations and access to data. The resources across the cluster that are virtualized are the data ports (network and fiber channel) and aggregates. You create a Storage Virtual Machine which will define volumes (storage), network access (LIFs), NAS access (CIFS/NFS), SAN access (FC/iSCSI), authentication (AD/LDAP/etc) among other services. The SVM defined elements can live on any of the physical resources across all the nodes in the cluster independent of the HA pairs. A volume can be on any SFO aggregate. An IP network interface can be on any defined network port in the cluster that can talk to the designated network. Almost all of these elements can be freely moved between the underlying physical resources across the nodes (SAN LIFs being the notable exception).

Thus the cluster can provide more logical redundancy for network access than an HA pair can, but for data access the HA pair remains the mechanism for redundancy in case of failure.

With that background to your specific question:

If you have a 3 node cluster, you either pair two nodes and leave one with no physical redundancy in case of hardware failure. Or you pair none of the nodes and leave them all subject to specific hardware failures. When the odd node goes offline, the data being controlled goes offline too. In practice, you either have a cluster with one node (used for small or specialized setups where on site high availability/redundancy isn't required) or your have a cluster built from HA pairs. If you want to use all the features of cDot, such as non-disruptive software upgrades, etc., you must use HA pairs.

Obviously the cluster can function with an odd number of nodes. If a node fails in a 6 node cluster (3 HA pairs) or while you are upgrading software in that same cluster, at least one node is offline for a while leaving you with a five node running cluster. But redundancy within the entire cluster has now been lost until the offline node is brought back online.

I hope this helps you.

Bob Greenwald

Huron Legal | Huron Consulting Group

NCDA, NCIE - SAN Clustered, Data Protection

Kudos and accepted solutions are always appreciated.

View solution in original post

aborzenkov · ‎2015-12-05

Only cluster built with HA pairs is supported (with sole exception of single-node clusters). Technically cluster itself does not care - you can build cluster with multiple non-HA nodes, e.g. simulators. In this case you obviously lose storage if node fails.

I have not used simulator for ages (too much resources needed) so I do not know if it supports C-Mofe HA now.

wolfontap99 · ‎2015-12-06

Thanks for the reply 🙂

My confusion is with SFO/CFO concept

Lets say if i have 3 node cluster . How do I pair them with each other ?

As per what i read , every node need a partner to perform storage failover.. I didnt read much about it online

bobshouseofcards · ‎2015-12-06