I have been pulling my hair out for the past few weeks trying to figure this exact issue out. We are replicating from a FAS8020 to a FAS3210 via 10G ports via Appliance ports ona UCS FI. Each filer is directly connected to the same FI and on a non-routed private VLAN. Same exact issue. The snapmirror would run like a dog on the 10G links, even though NFS was working without issue on the same ports. Putting in your suggested tweaks has improved the replication from 50Mb/sec to 1900Mb/sec! 🙂
Can't compare to doing XDP relationships over 10G, but I can share my experience running standard Snapmirror (DP) over long distance 10G (1000 mile replication, latency 24ms round trip).
My sources are two 4-node clusters - "small" (2x6240,2x6220) and "big" (2x6280, 2x6290) both replicating to a single 4 node (4x8060) cluster at the target. The "small" source cluster is mostly high capacity disks (3/4TB), the "big" source cluster is all performance disk (600/900GB). All replications are to capacity disk (4TB) at the destination end.
My prep test was to run volume moves internally on both source clusters. Volume move for all intents is an internal snapmirror (with obvious bonuses for keeping it live all the time). I used these tests to set expectations of how fast a snapmirror might function. At best my small cluster would go just under 1Gbps for volume moves and my big cluster would go closer to 1.5Gbps. These set the limits for a single replicaiton speed I'd expect.
For the replications, I maxed out the WAN receive buffers ahead of time (because of the low latency in the network):
network connections*> options buffer show Service Layer 4 Protocol Network Receive Buffer Size (KB) Auto-Tune? ----------- ---------------- ------- ------------------------ ---------- ctlopcp TCP WAN 7168 true ctlopcp TCP LAN 256 false
I've found that transfers from capacity disk are very much limited by disk speeds. I control all replications with scripts because the standard schedules don't work well enough to keep the pipe at maximum speed - replicating updates about 1800 volumes daily. For the small cluster, I limit current replications to 3 per aggregate, which seems to be as fast as the aggregate will go before disk limitations slow down individual transfers. The control script starts updates and monitors every 5 minutes to start up new replications in each aggregate. Even so - the best the "small" cluster has ever achieved is about 5.5Gbps over time, and individual transfers are always less than 1Gbps. Best I've ever seen on an individual transfer is about 700Mbps.
The large cluster just fires on usual schedules without regard for aggregates. Nodes don't seem to sneeze at it much - the large cluster can easily fill the 10G pipe with multiple transfers, though individual transfers max out near the 1.5Gbps mark similar to the internal move.
We are facing a similar problem, basically SnapMirror/SnapVault is slow, even on 1 Gbps links, with speeds reaching only 10-15% of the available bandwidth In our troubleshooting, we have found that the common ground for this problem to occur is that the intercluster LIF(s) are on etherchannel ports (ifgrp) that are using post-based load balancing. When using IP-based load balancing (which is the default when creating ifgrps) or even working active-passibely we are not seeing this performance degradation. Furthermore if we do use port-based load balancing on the source controller, yet disable all but one physical port, SnapMirror/SnapVault throughput is very good. Note that SnapMirror/SnapVault in cDOT tends to open a multitude of TCP sessions between source and destination system.
So I am wondring of any of you facing SnapVault performance issues on 10 Gbps network is using port-based load balancing by any chance ?