Re: Decommission nodes/LIF's from a Cluster and remount NFS

netappmagic · ‎2020-01-07

We wanted to decommission nodes and LIF's homed onthese nodes. The problem is that some VMware NFS Datastores and also NFS file systems are using these LIF's/IP's, is there any way non-discruptively remove these LIF's/IP's?

My understanding is that we would be downtime to remount NFS datastores or file systems.

Thanks!

dbenadib · ‎2020-01-07

Hi lad,

Basically you are abble to move the lif from nodes to nodes wich is a non disruptive process. So if you found that you need to keep some lifs move them to another node.

BTW you can validate lif usage using the CLI: network connection active

Cheers !

netappmagic · ‎2020-01-07

dbenadib ,

There were concern with that approach: All network trafic will be going to these two nodes that I move these LIF's to, then it would cause unbalanced load.

So, from long run, it'd better to clean them up, and to avoid confusion. To remove them, there will be a downtime.

Make sense?

dbenadib · ‎2020-01-07

Can you clarify ?

How much node do you have in your cluster ?

Which kind of maintenance are you performing ?

@netappmagic wrote:

dbenadib ,

There were concern with that approach: All network trafic will be going to these two nodes that I move these LIF's to, then it would cause unbalanced load.

So, from long run, it'd better to clean them up, and to avoid confusion. To remove them, there will be a downtime.

Make sense?

netappmagic · ‎2020-01-07

I have total of 8 nodes in the cluster. Planning on replacing two of them by adding two new ones first, then take two out. In the end, there would be still 8 nodes.

Understood I can lif move all lif's to the other two nodes(HA) without interruption. However, as I said, move lif's will also move all connections along with NFS datastores and file systems , it will put a lot of loads to those two nodes, causing unbalanced load.

I am thinking to manuall umount those NFS's connecting to two nodes going away, and remount them to two new nodes. That will have service downtime.

Make sense?

dbenadib · ‎2020-01-07

I would do the following:

Add 2 new nodes (Cluster will have 10 Nodes)

Validate that new nodes are correctly connected to relvant network

Create intercluster lifs (if Snapmirror)

Move all volumes to the new HA Pair 1:1 (that way you will ensure the same level of performance)

Move / Rebalance all lifs to the new HA-Pair 1:1 (that way you will ensure the same level of performance)

Ensure that no volume reside in the old HA-Pair

Clean-up old nodes

remove intercluster lifs

delete aggregates

disable HA

Move Epsilon out of this HA Pair

Evict node by node

netappmagic · ‎2020-01-07

Your steps looks very well.

However, I have not sure an information with you: two new nodes have already added into the cluster, and now there are 10 nodes with new LIF's and new everything. Loads are already balanced across all 10 nodes.

Now, i just need to remove two old nodes. I can move all LIF's to the other nodes, which will cause unbalancing, and yet leave all old LIF's with old name convention (ex, nfs-lif-node1, nfs-lif-node2..) in the cluster forever, whereas node1 and node2 should be already gone. That is why I am thinking to take a downtime and remove all old lif's...

Make sense?

dbenadib · ‎2020-01-07

it makes sense.. the only issue with that is the downtime...

If you want to avoid downtime U have to migrate lifs. after lifs will be balanced across nodes (for better perf ensure that lif and volumes reside in the same node) and rename it according to your naming conv.

BR

netappmagic · ‎2020-01-07

Thank you!

aborzenkov · ‎2020-01-07

@netappmagic wrote:

Make sense?

No. You cannot remove nodes that host volumes. If volumes are already relocated to another nodes, any traffic to LIFs on these nodes will go via interconnect to another node(s). So moving LIFs to nodes that actually host volumes will actually improve situation by avoiding indirection via cluster interconnect.

netappmagic · ‎2020-01-07

@aborzenkov

You are right. Unfortunately, we didn't do what you suggested. Now throughputs to LIF's on two specific nodes are much heavier than the others.

Question: How do I know if throughtputs to these LIF's are too heavy, and causing performance issues? Or how heavy is too heavy? To me, there seems no way to tell latency on LIF's.