Re: SnapMirror to DR moving from 1GbE to 10GbE

izzi · ‎2013-04-29

What in your opinion are the trade-offs with going from a 1g to 10g snapmirror network?

How much impact to you see latency and packet loss on this decision process?

Is this worth the expense of the connection?

lmunro_hug · ‎2013-04-29

Hi,

Do you know how much disk throughput you are currently doing to service actual workloads running on the SRC filer? I found even with good 1GbE layer 2 connections between SRC and DST there was a performance impact that was noticeable when snapmirrors occurred. This was on mid level 32xx systems doing snapmirror transfers totalling +-110MB/sec and user requests for data going between 300MB/sec and 400MB/sec, I always viewed snapmirrors as using disk throughput that could be servicing user requests. Even though the system tasks should have less priority over user data requests I didn't find this to be the case. You could try tweaking the priority but I found this didn't help much.

If you had to have 10GbE for snapmirror traffic it would be my opinion that you may have to limit the speed to prevent performance impacts. Also is the destination systems disk throughput able to keep up (if it is a smaller system)

This is my 2c

Luke

justin_smith · ‎2013-06-12

We noticed a performance impact going to a 10GBe pipe between DC's...

Previously everything was 10GBe internal, leaving the DC at 1GBe, back to 10GBe networks... SM worked just fine...

Once we went from 10Gbe internal, leaving the DC at 10Gbe and into our 10GBe networks at DR site, the lag times increased significantly.... not sure how/why, we're running 6240's.

radek_kubka · ‎2013-06-13

Well, you can throttle SnapMirror bandwidth, can't you?

I would say it all depends - if the filer isn't heavily loaded by users request, then upping the bandwidth for SM could make sense, *if* it is really needed for anything (more data / data changes expected?)

Regards,

Radek

justin_smith · ‎2013-06-13

Curious what would throttling the bandwidth accomplish though?

If we ran fine on a 1GB interface, why would increasing the pipe 10x between the 2 cause poor performance? And how would throttling the BW help?

bbjholcomb · ‎2013-06-13

What is the MTU size on the 10G interface on the NetApp side? Do you have a firewall between the DC and how much bandwidth can it handle? We found a problem with jumbo frames on one end, it caused 100% CPU on the switch, changing from 9000 to 1500 packets. Most people use throttling when they have limited bandwidth between the DC, it doesn't seem to apply in your case.

radek_kubka · ‎2013-06-14

If we ran fine on a 1GB interface, why would increasing the pipe 10x between the 2 cause poor performance? And how would throttling the BW help?

I was thinking that SnapMirror traffic was going over using more than 1Gbit bandwidth, hence more reads from disks, hence lower performance of the system. It is moot point, indeed, if SM throughput stays below 1Gbit in both cases.

justin_smith · ‎2013-06-13

The NetApp is configured for 1500. Ill triple check the switch, but I want to say its the same.

Its probably just a setting on one side or a switch that doesnt match up....

bbjholcomb · ‎2013-06-13

As long as the switch is set to higher than 1500 the switch shouldn't be part of the problem. I would do a traceroute from end to the other and see what there is between there, can all of connection points handle 10g, you could over loading the firewall if it has a 1G connection. I would also run a packet trace on both ends, make sure you are seeing packets of 1500. Is the connection encrypted? We found problems with the packet size of 1500 for VPN, we set it lower. We have two 10G connections from coast to cost, we are able advantage of the 10G. Just my opinion, it appears something is being overloaded when you went from 1G to 10G.

paul_wolf · ‎2013-06-14

This brings Long Fat Networks (LFNs) into play as bandwidth gets wider, the ability to effectively utilize that bandwidth becomes more of a challenge. Not sure if this is the case here but in the last when we have gone to wide low latency links between sites, we would see the same or slightly worse overall performance between hosts. This was due to the nature of TCP specifically window size, acknowledgements and other TCP overhead.

Here are some links that discuss LFNs

http://en.wikipedia.org/wiki/Bandwidth-delay_product

http://itperformancemanagement.blogspot.com/2010/04/tcp-throughput-over-long-fat-networks.html

http://technopragmatica.blogspot.com/2012/09/long-fat-networks.html