Solved: Moving a volume to an aggregate on the second HA within a 4-node CDOT environment

mdvillanueva · ‎2015-10-14

Hi experts,

I have vol1 containing LUN1 that is connected to a Windows machine. So my SVM1 is configured with two portsets; porsetHA1 and portsetHA2. I have the igroup for the Windows machine mapped to porsetHA1. vol1 is in HA1_aggr1 in HA1. I want to move vol1 to HA2_aggr1. Since I am moving vol1 to an aggregate accross a diffrent HA, will that be creating an outage on the LUN?

What can I do to avoid a downtime?

I am new to CDOT so any help would be great.

Thanks!

bobshouseofcards · ‎2015-10-14

Hi.

You general configuration will not present a problem to accomplish the move, but it could present some potential failure scenarios leading to loss of access.

From your description I understand that you have two HA pairs, HA1 and HA2, in a single cluster. You have an aggregate within each HA pair. Each HA pair has a portset. And you're new to cDOT. All good. let's start with a few basics then.

A cDOT cluster is for all intents a single storage system image to connecting clients. Your volumes can be located anywhere within the cluster on any aggregate. A request can come in to access that data on any "available" interface on any node, and the data will be located on the storage and delivered back through the interface to the connected client. That just works, using the private cluster interconnect network on the backend. It is completely transparent to the client.

The "virtualization" of the storage is done thorugh SVMs. SVMs manage the volume/lun "space" that is presented and the interfaces (network, iSCSI, FC) thorugh which the volumes/luns are presented. An SVM can also be limted as to where its volumes can live on the backend if desired. So the magic of the physical storage is hidden through the SVM.

Now - you can tell the SVM "how" to present data. Take an IP interface for example. An SVM may have only a single logical network interface, with a single logical IP address (not best practice of course, but suffices for this discussion). That interface will be mapped on the backend to a physical network port on one physical node (no matter the size of the cluster). Any access to the data that the SVM owns must come through that single interface. The aggregate which hold the data could be controlled by any other node in the cluster or it might be on the node where the logical interface lives. Either way, the data will be located and delivered.

When a request comes into a node via any interface, it is fully processed on that node - for instance authenticaiton lookups, etc. When the node goes to disk to access specific data blocks, that's when the cluster interconnect comes into play as needed. Surprisingly there is little difference in access to local disk versus access to "remote" disk from a node perspective, although in extremely high IO environments it oculd make a small difference. cDOT is highly optimized to address accessing data through an alternate node. But, one could overwhelm the request processing capability of a node, in theory, if all the requests came into a single node and the other nodes were just serving disk traffic. Having multiple "access points" is a good thing both for both performance and failure isolation if the logical access points are mapped to physical access points in separate networks or SANs.

So how does a portset play into this. Just like in 7-mode, a portset limits the ports on which a given iGroup will see mapped LUNs. In your example, your have a portset defined per each HA pair. So for LUN1, clients are zoned to WWN's in portset1 and access the LUN through ports in the HA1 pair. It doesn't matter where the volume/lun is physically located, your clients will still map the LUN through the portset on HA1 and access the data. The volume move is transparent and non-disruptive.

The downside: you only have access to LUN1 through ports in the HA1 pair. So - rare, but what happens if the entire HA1 pair goes offline? For instance, you hit a bug or condition that panics one controller in HA1, and during the giveback you hit the same bug or condition on the other controller and it panics leading to a moent in time when both controllers are down. Granted - should be rare, but I've been there any number of times in both 7-mode and cDot. So let's assume that you hit an odd condition that HA1 is completely off. Your data is still up and good on HA2, but by limiting the access to a portset that only uses ports from HA1, you have lost access to the data, because HA1 is still the gatekeeper to that igroup and LUN.

A more robust portset deifinition might be to chose two ports from each HA pair to be in the portset (certainly doesn't need to be all of the ports) or even at least one port per controller assuming you have carefully split the ports up between SAN fabrics. Standard multi-pathing is going to locate and choose the best port to use for direct access to the LUN, no matter where you move it.

There are lots of ways to work this, and each has specific advantages and disadvantages - what works for you and your organization still counts for a lot. But rest assured, so long as you have a means to access the SVM via network interface or block protocol interface, the data's physical location will be transparent to the client.

Hope this helps you out.

Bob Greenwald

Lead Storage Engineer

Huron Legal | Huron Consulting Group

NCIE - SAN, Data Protection

View solution in original post

bobshouseofcards · ‎2015-10-14