ONTAP Discussions

newbie data ontap cluster 8.2

wolfontap99
7,340 Views

Hello Everyone,

I am new to Netapp CDOT so excuse me for stupidity. I was go through the elearning for Ontap cluster fundamental ..this morning.
I have few questions that i tried researching but couldnt find an answer ..I am trying my luck here .

1. HA pair still exists in Data ontap cluster meaning .. Storage failover must be configured with partner . Help me understand this ..

Can i have cluster with odd number of nodes i.e cluster of 3 nodes or 5 nodes ?
Does it have to be in even number so that I can have a partner for each node ?
If 3 nodes cluster exist , how is HA paired among them ?

2. Can I use ontap 8.3 to simulate failover ?? I read its not possible in ontap 8.2 or below?? How can i test or configure failove in simulator ?
If you managed to do it .. can you please share procedure ?

3. One last question Smiley Happy

I just need a high level overview of step by step . ( not in details just a task by task ) I see there are many helpful blogs on step by step setting cluster ontap.

1 . Create 2 node cluster
2. Create aggr
3. Create Vserver/SVM
4. Create file or block protocol LIF for connectivity and volumes

Since failover isnt possible on simulator .. I want to know opinion from others who are using simulator to practice ? Please give me list of task that i can try on simulator
I went to try failover .it took me lot of effort to understand its not possible . I am tryin to seek help from pro's who have already used simulator and tested functinalites of cluster ontap

1 ACCEPTED SOLUTION

bobshouseofcards
7,282 Views

Cluster vs. HA can sometimes cause confusion, especially if you have any experience or knowledge of a cluster concept in other domains.  There are a lot of common terms thrown out - redundancy, resiliency, availability, etc.  Some of these tend to overlap while others have very specific meanings in context.  For this post I'm just going to talk about keeping things running in one cluster with specific redundancy resources.  Redundancy will mean keeping things running when something physical breaks, in a generally automatic way.  I'm not extending redundancy to failover sites, etc.  Where applicable I will describe the availability factor of the redundancy.  Also some of the mechanisms described will be conceptual rather than really low level technical.

 

To start, I find it helps to break it down first physically, then logically.

 

"Redundancy" is fundamentally a physical concept.  Doesn't matter what services the "cluster" provides - if there isn't some form of physical redundancy something will be lost when physical equipment breaks.  Also, without basic redundancy there is no availability, "high" or otherwise.  So we start with physical redundancy.

 

 

Data OnTap redundancy starts with the RAID layout on disk which stores extra data so the loss of disks doesn't cause data loss.  Hot spares are swapped in to replace failed disks to restore the basic level of disk redundancy.  Of course this disk concept is pretty basic to storage arrays.  

 

The disk controller system also requires redundancy such that any interconnection or processing point can break.  The disk shelf controllers, the cabling, all the way to the NetApp nodes must be at least paired.  That is where a High Availability (HA) node pair comes into the mix.  Of course you can run storage off a single node, but if the node goes down, you've lost redundancy in the chain.  NetApp, in both 7-mode and cDot combines two nodes together into an HA pair.  Both nodes are wired to a set of disks.  The aggregates defined on those disks can be owned by either node.  The owning node accesses the disk for reads and writes.  The two nodes are also "wired" to each other so that writes can be stored on the NVRAM of both nodes.  This ensures that the redundancy mechanism also covers acknowledged writes.

 

The two nodes in an HA pair only cover the disk to which they are physically attached.  All disk is not wired to all nodes in a cDot cluster.  Any given set of disk is wired to a single HA pair.  You also cannot create "virtual" HA pairs - for instance have 3 nodes and split disk shelves between nodes 1&2, nodes 2&3, and nodes 1&3 (would be interesting).  The HA pair and attached disk is a single physical unit where redundancy is created for data storage.

 

Aggregates are a physical storage container that lives within a single node.  By extension an aggregate can only live on disks attached to that single node.  So an aggregate, though "mobile", can only move between the two nodes in the HA pair.  There are two types of aggregates, at a high level - cluster aggregates and user storage aggregates.  User storage aggregates can be flipped between nodes in an HA pair using Storage Failover (SFO).  This is a transparent operation that can be manually initiated or on-demand initiated such as when a node fails.  Under the covers, cDot flips the ownership of the aggregate disks between the two HA nodes.  Cluster services handle the redirection of user requests and such so that data access continues.

 

Cluster aggregates are singular per node.  They contain the base data a node requires to boot up and start the underlying cluster services.  For all intents, a node is a virtual machine on top of a base OS, as are all the Storage Virtual Machines that control user data.  When a node fails or when a node needs to be restarted, the Cluster aggregate (also referred to as the "root" aggregate of the node) uses Cluster Failover (CFO) to jump between nodes in the HA pair.  The surviving node is already running of course.  But it can start another "node" virtual machine to read specific configurations from its partner's cluster aggregate.  Thus it can act, loosely, as a proxy for that node.  That is why you see CFO and SFO called out differently.  For instance, in a standard "giveback" first the CFO aggregate is handed back to the original node.  Then the node boots up and starts all the backend cluster services.  After a time to settle and ensure the shared cluster databases are in sync, the SFO aggregates are handed back to the original node.

 

Thus far everything described is just in one HA pair.  The actions above do not go beyond a single pair of nodes.  At the core physical level, the HA pair is the building block for redundancy.  As mentioned in a previous post, cDot doesn't really care about HA pairs.  You could cluster a whole bunch of single nodes.  The cluster itself does not provide any redundancy to node loss in that configuration.

 

Then what does the cluster provide?  Using the underlying physical node building blocks, the cluster provides virtualization for logical storage allocations and access to data.  The resources across the cluster that are virtualized are the data ports (network and fiber channel) and aggregates.  You create a Storage Virtual Machine which will define volumes (storage), network access (LIFs), NAS access (CIFS/NFS), SAN access (FC/iSCSI), authentication (AD/LDAP/etc) among other services.  The SVM defined elements can live on any of the physical resources across all the nodes in the cluster independent of the HA pairs.  A volume can be on any SFO aggregate.  An IP network interface can be on any defined network port in the cluster that can talk to the designated network.  Almost all of these elements can be freely moved between the underlying physical resources across the nodes (SAN LIFs being the notable exception).

 

Thus the cluster can provide more logical redundancy for network access than an HA pair can, but for data access the HA pair remains the mechanism for redundancy in case of failure.

 

With that background to your specific question:

 

 

If you have a 3 node cluster, you either pair two nodes and leave one with no physical redundancy in case of hardware failure.  Or you pair none of the nodes and leave them all subject to specific hardware failures.  When the odd node goes offline, the data being controlled goes offline too.  In practice, you either have a cluster with one node (used for small or specialized setups where on site high availability/redundancy isn't required) or your have a cluster built from HA pairs.  If you want to use all the features of cDot, such as non-disruptive software upgrades, etc., you must use HA pairs.

 

Obviously the cluster can function with an odd number of nodes.  If a node fails in a 6 node cluster (3 HA pairs) or while you are upgrading software in that same cluster, at least one node is offline for a while leaving you with a five node running cluster.  But redundancy within the entire cluster has now been lost until the offline node is brought back online.

 

 

 

I hope this helps you.

 

Bob Greenwald

Huron Legal | Huron Consulting Group

NCDA, NCIE - SAN Clustered, Data Protection

 

Kudos and accepted solutions are always appreciated.

 

View solution in original post

9 REPLIES 9

aborzenkov
7,307 Views
Only cluster built with HA pairs is supported (with sole exception of single-node clusters). Technically cluster itself does not care - you can build cluster with multiple non-HA nodes, e.g. simulators. In this case you obviously lose storage if node fails.

I have not used simulator for ages (too much resources needed) so I do not know if it supports C-Mofe HA now.

wolfontap99
7,294 Views

Thanks for the reply 🙂 

 

My confusion is with SFO/CFO concept

 

Lets say if i have 3 node cluster . How do I pair them with each other ?

 

As per what i read , every node need a partner to perform storage failover.. I didnt read much about it online 

bobshouseofcards
7,283 Views

Cluster vs. HA can sometimes cause confusion, especially if you have any experience or knowledge of a cluster concept in other domains.  There are a lot of common terms thrown out - redundancy, resiliency, availability, etc.  Some of these tend to overlap while others have very specific meanings in context.  For this post I'm just going to talk about keeping things running in one cluster with specific redundancy resources.  Redundancy will mean keeping things running when something physical breaks, in a generally automatic way.  I'm not extending redundancy to failover sites, etc.  Where applicable I will describe the availability factor of the redundancy.  Also some of the mechanisms described will be conceptual rather than really low level technical.

 

To start, I find it helps to break it down first physically, then logically.

 

"Redundancy" is fundamentally a physical concept.  Doesn't matter what services the "cluster" provides - if there isn't some form of physical redundancy something will be lost when physical equipment breaks.  Also, without basic redundancy there is no availability, "high" or otherwise.  So we start with physical redundancy.

 

 

Data OnTap redundancy starts with the RAID layout on disk which stores extra data so the loss of disks doesn't cause data loss.  Hot spares are swapped in to replace failed disks to restore the basic level of disk redundancy.  Of course this disk concept is pretty basic to storage arrays.  

 

The disk controller system also requires redundancy such that any interconnection or processing point can break.  The disk shelf controllers, the cabling, all the way to the NetApp nodes must be at least paired.  That is where a High Availability (HA) node pair comes into the mix.  Of course you can run storage off a single node, but if the node goes down, you've lost redundancy in the chain.  NetApp, in both 7-mode and cDot combines two nodes together into an HA pair.  Both nodes are wired to a set of disks.  The aggregates defined on those disks can be owned by either node.  The owning node accesses the disk for reads and writes.  The two nodes are also "wired" to each other so that writes can be stored on the NVRAM of both nodes.  This ensures that the redundancy mechanism also covers acknowledged writes.

 

The two nodes in an HA pair only cover the disk to which they are physically attached.  All disk is not wired to all nodes in a cDot cluster.  Any given set of disk is wired to a single HA pair.  You also cannot create "virtual" HA pairs - for instance have 3 nodes and split disk shelves between nodes 1&2, nodes 2&3, and nodes 1&3 (would be interesting).  The HA pair and attached disk is a single physical unit where redundancy is created for data storage.

 

Aggregates are a physical storage container that lives within a single node.  By extension an aggregate can only live on disks attached to that single node.  So an aggregate, though "mobile", can only move between the two nodes in the HA pair.  There are two types of aggregates, at a high level - cluster aggregates and user storage aggregates.  User storage aggregates can be flipped between nodes in an HA pair using Storage Failover (SFO).  This is a transparent operation that can be manually initiated or on-demand initiated such as when a node fails.  Under the covers, cDot flips the ownership of the aggregate disks between the two HA nodes.  Cluster services handle the redirection of user requests and such so that data access continues.

 

Cluster aggregates are singular per node.  They contain the base data a node requires to boot up and start the underlying cluster services.  For all intents, a node is a virtual machine on top of a base OS, as are all the Storage Virtual Machines that control user data.  When a node fails or when a node needs to be restarted, the Cluster aggregate (also referred to as the "root" aggregate of the node) uses Cluster Failover (CFO) to jump between nodes in the HA pair.  The surviving node is already running of course.  But it can start another "node" virtual machine to read specific configurations from its partner's cluster aggregate.  Thus it can act, loosely, as a proxy for that node.  That is why you see CFO and SFO called out differently.  For instance, in a standard "giveback" first the CFO aggregate is handed back to the original node.  Then the node boots up and starts all the backend cluster services.  After a time to settle and ensure the shared cluster databases are in sync, the SFO aggregates are handed back to the original node.

 

Thus far everything described is just in one HA pair.  The actions above do not go beyond a single pair of nodes.  At the core physical level, the HA pair is the building block for redundancy.  As mentioned in a previous post, cDot doesn't really care about HA pairs.  You could cluster a whole bunch of single nodes.  The cluster itself does not provide any redundancy to node loss in that configuration.

 

Then what does the cluster provide?  Using the underlying physical node building blocks, the cluster provides virtualization for logical storage allocations and access to data.  The resources across the cluster that are virtualized are the data ports (network and fiber channel) and aggregates.  You create a Storage Virtual Machine which will define volumes (storage), network access (LIFs), NAS access (CIFS/NFS), SAN access (FC/iSCSI), authentication (AD/LDAP/etc) among other services.  The SVM defined elements can live on any of the physical resources across all the nodes in the cluster independent of the HA pairs.  A volume can be on any SFO aggregate.  An IP network interface can be on any defined network port in the cluster that can talk to the designated network.  Almost all of these elements can be freely moved between the underlying physical resources across the nodes (SAN LIFs being the notable exception).

 

Thus the cluster can provide more logical redundancy for network access than an HA pair can, but for data access the HA pair remains the mechanism for redundancy in case of failure.

 

With that background to your specific question:

 

 

If you have a 3 node cluster, you either pair two nodes and leave one with no physical redundancy in case of hardware failure.  Or you pair none of the nodes and leave them all subject to specific hardware failures.  When the odd node goes offline, the data being controlled goes offline too.  In practice, you either have a cluster with one node (used for small or specialized setups where on site high availability/redundancy isn't required) or your have a cluster built from HA pairs.  If you want to use all the features of cDot, such as non-disruptive software upgrades, etc., you must use HA pairs.

 

Obviously the cluster can function with an odd number of nodes.  If a node fails in a 6 node cluster (3 HA pairs) or while you are upgrading software in that same cluster, at least one node is offline for a while leaving you with a five node running cluster.  But redundancy within the entire cluster has now been lost until the offline node is brought back online.

 

 

 

I hope this helps you.

 

Bob Greenwald

Huron Legal | Huron Consulting Group

NCDA, NCIE - SAN Clustered, Data Protection

 

Kudos and accepted solutions are always appreciated.

 

wolfontap99
7,272 Views

Thank you very much for the explanation. I understand the whole concept well now.  I dont want to keep this post running too long . But you have given me a direction to dig more info. I appreciate it . I am wondering now if Cluster manages logical redundacy then what could be a challenge for it  to manage block storage LIF. Why cant it move block access LIF between nodes. 

bobshouseofcards
7,235 Views

Good question.  The answer is fundamental to the protocols themselves and how "redundancy" is achieved.

 

Whether you consider the SAN protocols or the NAS protocols, the interconnects between client and storage are a network in both.  But, they are different kinds of networks.  NAS protocols today that are built on top of IP networks rely on the underlying network to handle things like routing and visibility between end points.  

 

The NAS protocols assume a single endpoint, identified by an IP address typically.  The fact that there might be multiple paths to get between two endpoints is not relevant - one is chosen by the networking layers and a connection is established.  Redundancy is introduced, in a sense, in that the multiple paths will work out how to get from point A to point B.  The network might chose an optimal path, but typically the client has minimal influence on the path.  If multiple paths are present, typically the application layer must be aware to look for multiple paths and to use them.

 

Since the path is not preset, and the network accepts that the path between two endpoints can be fluid, it is a simple matter to take an IP address that is currently presented on network port A and move it to network port B.  The underlying network layer will fix up the delivery of information transparently to endpoint systems.  You see this type of behavior in all manner of cluster-ized IP based systems.  For cDot, use of multiple NAS protocol data LIFs increase potential throughput of the storage, but are not needed to provide redundancy.  LIF failover policies (how LIFs failover between physical ports) cover redundancy.

 

SAN protocols are different.  While the network is there, all connections are essentially considered to be point to point.  A particular session between end points will be established along a fixed path that is pre-determined through switch zoning, LUN mapping, etc.  Because each path is pre-determined, general mechanisms at the presentation layer and lower are available to manage multiple paths and select optimal paths.  Consider MPIO and ALUA in the block protocols, for instance.  

 

Because of the predetermined nature of the paths, it is not a simple matter to move a WWN from port C to port D on storage.  Consider that the WWN is logged into the switch on a particular port.  The client HBA has also logged into the switch and zoning has established a traffic path between two ports.  There is a direction (initiator/target) to all data flows.  The initiator also logs into the target.  To pick up and move the connection would require all kinds of potential adjustments up the entire chain that are not built into the protocol.

 

iSCSI in the block world is a bit of a hybrid.  Because it is based on an IP network, in theory the address could move.  But since most implementations at clients follow the basics of block style storage with multi-pathing mechanisms and multi-session mechanisms, it is simpler to treat iSCSI more like FC than like a NAS protocol.  Much less to be reimplemented in the OS and storage layers to treat block protocols the same.

 

Multiple block data protocol LIFs are needed to provide full redundancy as well as potentially increased bandwidth to a single client.  It's important to have at least one block data protocol LIF on at least two separate nodes visible to every client system, otherwise access to data could be lost in a cluster.  Best practice would dictate more like one block data protocol LIF per node, although you have to watch path limits as well in the clients.  An 8 node cluster for instance can easily overwhelm some OS limits with respect to storage paths if you're not careful.

 

 

I hope this answer helps you.

 

Bob Greenwald

Lead Storage Engineer

Huron Legal | Huron Consulting Group

NCDA, NCIE - SAN Clustered, Data Protection

 

Kudos and accepted solutions are always appreciated.

 

 

 

aborzenkov
7,231 Views

Because of the predetermined nature of the paths, it is not a simple matter to move a WWN from port C to port D on storage.  Consider that the WWN is logged into the switch on a particular port.


Sorry have to chime in here. Moving WWN to different port is easy and does not cause any problem to hosts. But this does complicate implementation on storage side. Today multipath stacks are ubiquitous in operating systems, so it is easier to rely on them than implement it in storage. But there are no inherent lilmitations of FC that would make different implementation impossible. If you remember, NetApp had used WWN relocation in the past. It actually worked 🙂

bobshouseofcards
7,215 Views

Granted - it "can" work assuming you also don't have SAN switch port based security (WWN must come in on a fixed port) and that you don't have port based zoning as opposed to WWN based zoning.  I concede that both possibilities are more rare than straign WWN based zoning (either WWPN or WWNN), but NetApp has to worry about both possibilities.  It's simpler and more straight forward, since multi-pathing solutions exist and are mature at the client host, to simply chose the option that meets all conditions without a set of specialized exceptions for edge cases.

wolfontap99
7,146 Views

Thanks a lot .. Appreciate it 🙂 

nickE10mm
6,913 Views

Excellent explanation there, Bob!

Public