Subscribe
Accepted Solution

Does root volume\aggregate need to be separate from data aggregates in cDOT?

This is my first rodeo with Clustered Data ONTAP after years of 7-Mode.

 

We have a new FAS8040 2-node cluster with cDOT 8.3.

 

I seperated out the disk ownership with SATA on one node and SAS on the other node.

 

I see both nodes grabbed the 3 disks for ONTAP root volume and aggregate. I also see that System Manager says no Data Aggregates.

 

So my question is... Does cDOT force you to have a dedicated root aggregate for the root volume?

 

I know this has always been best practices and there has always been a debate on whether it should or should not be. I never configured things that way with 7-Mode and would always have everything together and never had an issue.

 

Thanks!

Re: Does root volume\aggregate need to be separate from data aggregates in cDOT?

IMHO, Yes, root gets it's own aggregates

 

And make sure you setup your load-sharing mirrors on all root volumes.

Re: Does root volume\aggregate need to be separate from data aggregates in cDOT?

Load-sharing mirrors on all root volumes?

 

Re: Does root volume\aggregate need to be separate from data aggregates in cDOT?

Since you mentioned that this is your first go round with cDot, here are some key concepts that are different from 7-mode.

 

Clustered DoT is like 7-mode at the edges - there are physical stuff like ports and disks and aggregates that are owned by a particular node (controller, head - pick your term).  There are volumes and shares and LUNs that are exposed to the outside world.  In the middle is the new creamy filling.  A virtualization layer is always present that takes all the physical stuff in a cluster and presents it as a single pool of resources that can then be divied up into the logical Storage Virtual Machines (previously - vServers).

 

SVMs for all intents are like the vFilers of 7-mode.  In cDot 8.3 you can have IPspaces, or not.  You overload virtual network interfaces and IP addresses onto the physical network ports.  New in cDot compared to 7-mode you can also overload virtual FVC WWNs onto the physical FC ports, something a vFiler couldn't do.  For all intents, think in terms of vFiler.

 

Remember in 7-mode that when you used "MultiStor" to add vFiler capability, there was always a default "vfiler0" which represented the actual node itself.  Aggregates, disks, and ports were controlled through vFiler0 as the owner of physical resources.

 

So the big switch in cDot is that you're always in "MultiStor" mode and that "vFiler0" is reserved for control use only.  You can't create user volumes and define user access to vFiler0.  Instead you have to create one or more user "vFilers" where logical stuff like volumes and LUNs and shares and all that get created.

 

More implications of this design.  Each node needs a root volume from which to start operations.  Remember in 7-mode that the root volume held OS images, log files, basic configuration information, etc.  The node root-volume in cDot is pretty much the same, except it cannot hold any user data at all.  The node root volume needs a place to live, hence the node root aggregates.  Each node neads one, just like in a 7-mode HA pair.  Yes, the only contents of the node root aggregates are the node root volumes.  And they are aggregates, so at least 3 disks.  Suggestion for a heavily used system is actually to use 5 disks to avoid certain odd IOPs dependencies on lower class disk.  The node root volume will get messages and logs and all kinds of internal operational data dumped to it.  I have experienced, especially when using high capacity slower disks, that node performance can be constrained by the single data disk performance of a 3 disk root aggregate, so I have standardized on 5 for my root aggregates.  Now, for my installation, 20 disks (4 node cluster) out of 1200 capacity disks isn't a big deal.  A smaller cluster can certainly run jsut fine with 3 disks.  Similarly, because I want all my high speed disks available for user data, I purposely but some capacity disks on all nodes, even it they just server the root aggregate needs.  Again, my installation allows for it easily, your setup may not.

 

So yes - root aggregate is one per node and you don't get to use it for anything else.  Not a best practice question - it's a design requirement for cDot.

 

About the load sharing mirrors.  Here is where we jump from physical to logical.  After you have your basic cluster functional, you need to create SVMs (again, think vFilers) as a place for user data to live.  Just like a 7-mode vFiler, an SVM has a root volume.  Now this root volume is typically small and contains only specifics to that SVM.  It is a volume, and thus needs an aggregate to live in.  So you'll create user aggregates of whatever size and capacity meets your needs, and then create a root volume as you create your SVM.  For instance, let's say your create SVM "svm01".  You might then call the root volume "svm01_root" and you specify what user aggregate will hold it.

 

For file sharing, cDot introduces the concept of a namespace.  Instead of specifying a CIFS share or an NFS export with a path like "/vol/volume-name", you instead create a logical "root" mount point and then "mount" all your data volumes into the virtual file space.  A typical setup would be to set the vserver root volume as the base "/" path.  Then, you can create junction-paths for each of the undelrying volumes, for instance create volume "svm01-data01" and mount it under "/".  You then could create a share by referencing the path as "/svm01-data01".  Unlike 7-mode, junctions points can be used to cobble together a bunch of volumes in any namespace format you desire - you could create quite the tree of mount locations.  It is meant to be like the "actual-path" option of 7-mode export shares by creating a virtual tree if you will, but it doesn't exactly line up with that funcitonality in all use cases.

 

Of course, if you are creating LUNs, throw the namespace concept out the window.  LUNs are always referenced via a path that starts with "/vol/" in the traditional format and the volumes that contain LUNs don't need a junction-path.  Unless of course if you want to also put a share on the same volume that contains a LUN...then to setup the share you need a namespace and junction-paths.  Confusing?  Yes, and it is something I wish NetApp would unify at some point, as there are at least four different ways to refer to a path based location in cDot depending on context, and they are not interchangeable.  That and a number of commands which have parameters with the same meaning but different parameter names are my two lingering issues with the general operation of cDot.  Sorry - I digress.

 

So - why the big deal on namespaces and how does that apply to load sharing mirrors?  Here's the thing.  Let's assume you have created svm01 as above.  And you give it one logical IP address on one logical network interface.  All well and good.  That logical address lives on only one physical port at a time, which could be on either node.  Obviously you want to setup a failover mechanism so that the logical network interface can failover between nodes and function if needed.  You share some data from the SVM via CIFS or NFS.  A client system will contact the IP address for the SVM and that contact will come through node 1 for instance if a port on node 1 currently holds the logical interface.  But, for a file share, all paths need to work through the root of the namespace to resolve the target, and typically the root of the name space is the SVM's root volume.  If the root volume resides on an aggregate owned by node 2, all accesses to any share in the SVM, whether residing on a volume/aggregate in node 1 or 2, must traverse the cluster backplane to access the namespace information on the SVM root on node 2 and then bounce to whatever node the target volume lives on.

 

So, let's say we add a 2nd netowrk interface for SVM01, this time by default assigned to a port that lives on node 2.  By DNS round robin we now get half the accesses going first thorugh node 1 and half through node 2.  Better, but not perfect.  And there remains the fact that the SVM's root volume living on node 2 still becomes a performance choke point if the load gets heavy enough.  What we really want is for the SVM's root volume to kinda "live" on both nodes, so at least that level of back and forth is removed.  And that is where load sharing mirrors come in.

 

A load sharing mirror is a special kind of snapmirror relationship where an SVM's root volume is mirrored to read only copies.  Because most accesses through the SVM's root volume are read only, it works.  You have the master SVM root, as above called "svm01_root".  You can create replicas, for instance "svm01_m1" and "svm01_m2", each of which exists on an aggregate typically owned by different nodes (m1 for mirror on node 1, m2 for mirror on node 2).  Once you initialize the snapmirror load sharing relationship, read level accesses are automatically redirected to the mirror on the node where the request came in.  You will need a schedule to keep the mirrors up to date, and there are some other small caveats.  Is this absolutely required?  No, it isn't.  The load/performance factor achieved through use of load-sharing mirrors is very dependent on the total load to your SVMs.  A heavily used SVM will certainly benefit.  It can sometimes be a good thing, other times it can be a pain.  The load sharing Snapmirror works just like a traditional snapmirror where you have a primary that can be read/write and a secondary shared as read only.  The extras are that no snapmirror license is needed to do load sharing, load sharing mirrors can only be created within a single cluster, and any read access to the primary is automatically directed to one of the mirrors.  Yes - you should also create a mirror on the same node where the original exists, otherwise all access will get redirected to a non-local mirror, which defeats the purpose.

 

You will also want to review the CIFS access redirection mechanisms whereby when SVMs have multiple network interfaces across multiple nodes a redirect request can be sent back to a client so that subsequent accesses to data are directed to the node that owns the volume without needing to traverse the backplane.  Definitely review that first before putting a volume/share structure in place because you can defeat that redirection if you aren't careful with your share hierarchy.

 

Hope this helps with both some general background as you get up to speed on cDot and some specifics in response to your topic points.

 

Bob Greenwald

Lead Storage Engineer | Huron Legal

Huron Consulting Group

NCDA | NCIE-SAN Clustered Data OnTap

 

Re: Does root volume\aggregate need to be separate from data aggregates in cDOT?

Thank you for the reply and great explanation.

 

I have the cluster setup and have sucked in A LOT of information over the last couple of weeks.

 

As for the load-sharing mirrors. Not sure if we really need that right now. It's a small cluster and is 95% NFS presentation for VMs with VMware and Hyper-V.

 

I'm sure this is something we could implement down the road if we begin to see a performance issue and load-sharing mirrors would help.

 

Re: Does root volume\aggregate need to be separate from data aggregates in cDOT?

[ Edited ]

I think he also meant that LS mirrors provide high availability of the NFS SVM root namespace, and not just for load distribution, but I might have misinterpreted.

 

With NFS, if I remember correctly, if the SVM root junction becomes inaccessible at any time (ie: if the SVM rootvol goes down), access to all other NFS junctions in that SVM are lost until access to the SVM rootvol is restored. DP aggregates with LS mirrors prevents this from becoming an issue.

 

Here's an example of configuration cmds you'd need to do to put this in place:

#load-sharing mirror for rootvol
vol create -vserver [svm1_nfs] -volume [svm1_rootvol_m1] -aggregate [sas_aggr1] -size 1g -type DP
vol create -vserver [svm1_nfs] -volume [svm1_rootvol_m2] -aggregate [sas_aggr2] -size 1g -type DP
snapmirror create -source-path [//svm1_nfs/svm1_rootvol] -destination-path [//svm1_nfs/svm1_rootvol_m1] -type LS -schedule 15min
snapmirror create -source-path [//svm1_nfs/svm1_rootvol] -destination-path [//svm1_nfs/svm1_rootvol_m2] -type LS -schedule 15min
snapmirror initialize-ls-set [//svm1_nfs/svm1_rootvol]

 

I apologise if my syntax is wrong.

Re: Does root volume\aggregate need to be separate from data aggregates in cDOT?

@bobshouseofcards - You state: "...read level accesses are automatically redirected to the mirror on the node where the request came in."

 

If a request for read-level access comes in on the same node, shouldn't it come directly in instead of using an LS Mirror? Please explain why LS Mirror on the same nodes does or doesn't create double-work.

 

"Yes - you should also create a mirror on the same node where the original exists, otherwise all access will get redirected to a non-local mirror, which defeats the purpose."

Re: Does root volume\aggregate need to be separate from data aggregates in cDOT?

Good question...

 

The answer is pretty basic.  If you are using Load Sharing mirrors, then when access to the root volume of an SVM is needed, the Load Sharing mirrors trump the actual root volume.  It's either all or none.  So if load sharing mirrors are in use, and an access comes in that reads from the root volume in some fashion, that access has to come from a load sharing mirror copy of the root volume.  That way all accesses are consistent from an ONTAP point of view.

 

The explanation goes to what LSMs are actually are and how they are implemented.  At the heart of it, an LSM copy is nothing more than a SnapMirror destination.  Recall that a SnapMirror destination, while the relationship is active, can be accessed in a read only fashion.  So that's what LSMs are - read only copies.

 

Now add in the concept that the LSM must be in sync with both the root volume and each other to be functional and consistent across all nodes.  Thus, if direct access were allowed to the real SVM root volume, that might now be out of sync with all the LSM copies, necessitating a SnapMirror update on the LSMs to bring them all back into sync.  That's why if LSMs are present, direct access is read only through the LSM copies to ensure there is a consistent presentation from all the copies, even on the node where the real SVM root resides.

 

You can access the real root SVM volume for write access if needed through an alternate mount point.  For NFS, the "/.admin" path is the "real" root volume.  For CIFS, you can create a separate share that references the "/.admin" path on the SVM.

 

You should not do this unless you need to make changes to the SVM root (a new folder, a permission change to the root volume, etc.), and then of course immediately update the LSM copies to be sure everyone sees the same change.  In 8.3.2P9 and later (not sure when in the ONTAP 9.x line) there is a feature change available.  When the SVM root volume has LSMs and the SVM root volume is changed, a SnapMirror update is now automatically triggered from the cluster rather than having to do it manually or by schedule.  The automatic update relieves overhead of having to regularly run updates on a schedule or manually in workflows.

 

Like every feature LSMs are a feature that should be used at the appropriate time for the right purpose, rather than "always" for every CIFS/NFS configuration.  For clusters with significant CIFS/NFS workloads and many files, they can improve performance.  LSMs serve no purpose and should not be used for SVMs that exclusively serve block level data.

 

 

Hope this helps you.

 

Bob Greenwald

Senior Systems Engineer | cStor

NCIE SAN ONTAP, Data Protection

 

 

 

Kudos and accepted solutions are always accepted.