Changing MTU on broadcast domain

geauxgetter · ‎2019-10-08

I have a FAS8020 running DOT 9.4, I have an ifgrp using e0c and e0d on both nodes on vlan 13. e0c and e0d has the MTU set at 9000. However port a0a-13 on both nodes belong to a broadcast domain where the MTU is set at 1500.

My question is can I move the lif from netapp-01:a0a-13 > netapp02:a0a-13 then remove netapp-01:a0a-13 from the broadcast domain and add it into a new broadcast domain where the MTU is set at 9000 then move the lif back to netapp-01:a0a-13.

I'm trying to work out how to get my NFS lif's set for jumbo frames without causing any interruption by changing the MTU.

SpindleNinja · ‎2019-10-09

Couple questions:

Are these for some kind of NFS VMware Datastore?

is there anything else in that broadcast domain besides those to vlan13 ports?

And was this set up in error initially or did something change?

Ontapforrum · ‎2019-10-10

Hi,

While you come back with those questions, I thought this is a great question, and more so b'cos not often you are in this situation, hence the experience is never first hand.

I am not an expert on networking stuff, but I will try to attempt this.

My thoughts below based on the ingredients you have provided in your query.

Your case: [Please correct]

1) ifgrp = a0a
2) Vlan Port = a0a-13 <---Broadcast Domain MTU 1500
3) Physical Port = [e0c,e0d] <---MTU 9000

You want to change the MTU of:

2) Vlan Port = a0a-13 <---Broadcast Domain MTU 1500

Question:
Is the vlan port a0a-13 part of a separately defined (Dedicated) broadcast domain, if yes, then you use the following command, this will briefly disrupt the connections.

The following command changes the MTU to 9000 for all ports in the broadcast domain test:
cluster1::> network port broadcast-domain modify -broadcast-domain test -mtu 9000
Warning: Changing broadcast domain settings will cause a momentary data-serving
interruption.
Do you want to continue? {y|n}: y

If the vlan port a0a-13 is not dedicated and is part of the default broadcast domain which has a mtu value 1500, then changing the mtu will change the mtu on all the vlan-tagged and non-taggged ports which may not be what you need.

For example: Default broadcast domain [mtu-1500]

|--------->a0a-10 ---LIF
|
|--------->a0a-20 ---LIF
a0a[e0c-e0c]---|
|--------->a0a-30 ---LIF
|
|--------->a0a-40----LIF

However, if you want to change the above vlan a0a-10 mtu from 1500 to 9000 mtu, then you could split this off into new BD without disruptions.

::> broadcast-domain split -broadcast-domain Storage -new-broadcast-domain vlan10 -ports node-01:a0a-10,node-02:a0a-10

After the split command is run, a new broadcast-domain and failover-group will be created along with updating the failover-group on the appropriate LIFs. After this, you can use modify command to can change the mtu to 9000 for that vlan broadcast domain.

Experts, let us know, if this is correct.

SpindleNinja · ‎2019-10-10

changing the broadcast domain will also warn of a blip i believe.

geauxgetter · ‎2019-10-10

a0a-13 is being used for NFS datastores to VMware.

I believe when the filer was upgraded years ago and introduced broadcast domains it put all of the interfaces into one default broadcast domain and the MTU was set at 1500.

I know I can change the MTU and there will be disruption when I do this, so my question is specifically how can I change the MTU without any disruption. I'm not against just removing the ports from the broadcast domain as long as data disruption does not occur.

Ontapforrum · ‎2019-10-11

I tested in my lab (real ontap 9.1 ) for both NFS datastore and iscsi and it made no difference to the end client. There was a blip for a second, that caused couple of ping loss, but it was back. However, it is safe to take a maintenance window for half an hour just in case.

geauxgetter · ‎2019-10-11

@Ontapforrum wrote:

I tested in my lab (real ontap 9.1 ) for both NFS datastore and iscsi and it made no difference to the end client. There was a blip for a second, that caused couple of ping loss, but it was back. However, it is safe to take a maintenance window for half an hour just in case.

Thanks, this will be performed on a hospital that has their EMR system running on these links with lots of sql transactions and

I can not take a risk of anything dropping when this change it made. A maintenance window would mean shutting everything down which would mean shutting down the hospital which isn't really an option which is why I am trying to find a way to move LIF's around in order to adjust the MTU on the ports that belong to the broadcast domain.

Ontapforrum · ‎2019-10-11

Hi,

I understand your concern, toally. Hospital IT Infrastructure takes priority over everything.

Regarding LIF Migration: Even though it is NDO for NFSv3 [Due to protocol type -stateless], it is designed to recover from 2 blips or 2 loss of ping. My point here is that, even when you do LIF migration, there will be a blip, it just depends which protcol can take it and which cannot.

I have few questions:

1) What is the need for changing the mtu from 1500 to 9000 ? Was there any RCA done, which resulted in the suggestion that we mtu 1500 is the cause for the latency and performance related issues ?

# Are the Hospital Applications end users complaining for example: They are unabe to save patient's records quickly, or doctors are unable to see patient's history in good time, and it's taking too long for then open a image/x-ray ?

Based on these complaints that the RCA was done, and perhaps suggested that mtu1500 is the cause and moving to 9000 wlll massively improve this situation ?

I work in a environment, where we always have rejections for doing any NDO activity [even LIF miration], if it happnes naturally, due to physical port failure then that's a different story. Therefore, we always assess the situation about how strong is our case ? Sometimes, we have to tie up with SQL application upgrade or Oracle DB patching, and do it when they take the application down, but in our envrionment it is still possible b'cos it is not 24/7 shop.

If there is no performance or user experience issue, then I will hold off until that opportunity arrives, but if the performance related tickets are being created due to latency issues then will have to see 'what is the trend usage of the applciations users' in terms of IOPS, and probably choose a time when it is expected to be low and less impactful/disruptive.