Community

Subscribe
Highlighted

Intercluster XDP (SnapVault) performance over 10G

Hi all,

I'm having performance issues with Snapmirror XDP relationships between two clusters (primary 4-node 3250, secondary 2-node 3220, all running cDOT 8.2P5).

The replication LIFs on both sides are pure 10G (private replication VLAN), flat network (no routing), using jumbo frames (however also tested without jumbo and problem persist).

The vaulting works in general, but average throughput for a single node/relationship never goes beyond 1Gbit/s - most of the time it is even much slower (300-500MBit/s or less).

I verified that neither the source node(s) nor the destination node(s) are CPU or disk bound during the transfers (at least not all of the source nodes).

I also verfied the SnapMirror traffic is definitely going through the 10g interfaces.

There is no compressed volume involved, only dedupe (on source).

The dedupe schedules are not within the same time window.

Also checked the physical network interface counters on switch and netapps, no errors / drops, clean.

However, the replication is SLOW, no matter what i try.

Customer impression is that it got even slower over time, e.g. throughput of a source node was ~1GBit/s when relationship(s) were initialized (not as high as expected), and dropped to ~ 500MBit/s after some months of operation / regular updates...

Meanwhile the daily update (of all relationships) sums up to ~ 1,4 TB per day, and it takes almost the whole night to finish :-(

So the question is: how to tune that?

Is anyone having similar issues regarding Snapmirror XDP throughput over 10Gbit Ethernet?

Are there any configurable parameters (network compression? TCP win size? TCP delayed ack's? Anything I don't think of?) on source/destination side?

Thankful for all ideas / comments,

Mark

Re: Intercluster XDP (SnapVault) performance over 10G

Noone?

Is anyone using SnapMirror XDP actually? In a 10gE environment?

What throughput do you see?

Re: Intercluster XDP (SnapVault) performance over 10G

When we had vol move issues between cluster nodes.. (via 10Gb) we hit BUG 768028 after opening a performance case... this may or may not be related to what you see... this also impacted snapmirror relationships

http://support.netapp.com/NOW/cgi-bin/bugrellist?bugno=768028

workaround was disabling redirect scanners from running OR, you can disable the free_space_realloc completely.

storage aggregate modify -aggregate <aggr_name> -free-space-realloc no_redirect



Re: Intercluster XDP (SnapVault) performance over 10G

We have been having this problem for months with plain old snapmirror.  We have a 10G connection between two different clusters, and snapmirror performance has been calculated at around 100Mb/sec - completely unacceptable.

We disabled reallocate per bug 768028, no help.  We disabled throttling, and that helped, but not enough - and as a debug flag, disabling throttling comes with a production performance cost.

We used this to disable the throttle:

    • node run local -command "priv set diag; setflag repl_throttle_enable 0;"

Re: Intercluster XDP (SnapVault) performance over 10G

Thanks for the tips.

I tried both settings (disable free-space realloc on source and destination aggrs, as well as setflag repl_throttle_enable 0 on source and destination nodes), but this did not make things better, maybe slightly but not really.

I meanwhile experimented a bit more, migrated all intercluster interfaces of my source nodes to gigabit ports (instead of vlan-tagged 10gig interfaces).

Although unexpected this helped quite a lot! Doubled my overall throughput by going to slower NICs on the source side.

Next step is finally open a performance case

Re: Intercluster XDP (SnapVault) performance over 10G

It smells like flow control on the 10Gb links to me..

ask the network guys if they see alot of RX/TX pause packets on the switch ports.. may as well check for switch port errors while you are at it.. or work with the network guys to see if there are a lot of retransmits

this is the situation where I would have hoped netapp would have network performance tools like iperf so we can validate infrastructure and throughput when the system is put in.

Re: Intercluster XDP (SnapVault) performance over 10G

Checked that already. Switch stats look clean (don't have them anymore).

Ifstat on netapp side looks good:

-- interface  e1b  (54 days, 8 hours, 12 minutes, 26 seconds) --

RECEIVE

Frames/second:    3690  | Bytes/second:    18185k | Errors/minute:       0

Discards/minute:     0  | Total frames:    30935m | Total bytes:     54604g

Total errors:        0  | Total discards:      0  | Multi/broadcast:     7

No buffers:          0  | Non-primary u/c:     0  | Tag drop:            0

Vlan tag drop:       0  | Vlan untag drop:     0  | Vlan forwards:       0

Vlan broadcasts:     0  | Vlan unicasts:       0  | CRC errors:          0

Runt frames:         0  | Fragment:            0  | Long frames:         0

Jabber:              0  | Bus overruns:        0  | Queue drop:          0

Xon:                 0  | Xoff:                0  | Jumbo:               0

TRANSMIT

Frames/second:    3200  | Bytes/second:    12580k | Errors/minute:       0

Discards/minute:     0  | Total frames:    54267m | Total bytes:       343t

Total errors:        0  | Total discards:      0  | Multi/broadcast: 78699

Queue overflows:     0  | No buffers:          0  | Xon:                 6

Xoff:              100  | Jumbo:            3285m | Pktlen:              0

Timeout:             0  | Timeout1:            0

LINK_INFO

Current state:       up | Up to downs:         2  | Speed:           10000m

Duplex:            full | Flowcontrol:       full

So there are some Xon/XOff packets, but very few.

The very same link (same node) transports NFS I/O with 500MByte/sec (via different LIF), so I don't think the interface has any problems. However, Snapmirror average throughput remains below 50 MByte/sec, no matter what I try.

Re: Intercluster XDP (SnapVault) performance over 10G

Hi Mark,

As you saw my other posts in a similar thread...

You say it's a flat network (no routing)...Ok. But with SnapMirror, the routing groups / routes themselves on the clusters can be tricky (as I have discovered in a weird issue I was previously having).

Can you output the following?

cluster:Smiley Embarassed network routing groups show

cluster:Smiley Embarassed network routing groups route show

Can you output a trace from a source IC LIF to a destination IC LIF as well, please?

Also - check this KB (if you have not already): https://library.netapp.com/ecmdocs/ECMM1278318/html/onlinebk/protecting/task/t_oc_prot_sm-adjust-tcp-window-size.html

Hope I can attempt to help out!

Thanks!
Trey

Re: Intercluster XDP (SnapVault) performance over 10G

Hi, thanks for your info.

I have no gateway (no route) defined in my intercluster routing groups - because these IC lifs are on a flat subnet (low latency) and should only communicate within this local network.

The strange thing is, if I migrate my SOURCE IC LIF to a 1gig port, the traffic flows with constant 110MByte/s.

As soon as I migrate it to a 10gig port, traffic is very peaky and never constant above 100MByte/sec. In average it is even slower... Same network, same Nexus, same transfer...

I have played a lot with window sizes already, but I think in cDOT (at least in the latest releases) this is configured within the cluster scope, not node-scope, as stated here:

https://kb.netapp.com/support/index?page=content&id=1013603

At the moment I have:

ucnlabcm04::*> net connections options buffer show (/modify)

  (network connections options buffer show)

Service      Layer 4 Protocol  Network  Receive Buffer Size (KB) Auto-Tune?

-----------  ----------------  -------  ------------------------ ----------

ctlopcp      TCP               WAN      512                      false

ctlopcp      TCP               LAN      256                      false

The default ("WAN" = intercluster) is 2048 (2MB receive window size) with auto-tuning enabled for WAN.

I've played with different values on the destination. Changing the values may require to reset all TCP connections, e.g. by rebooting or osetting the IC LIF's down for a while.
The configured window sizes are then definitely in effect according to netstat -n.

However my problem remains, throughput is flakey as soon as the source ic lifs are on 10g...

Here is info about my source cluster (at the moment it consists only of nodes 03 and 04) - I am testing with a single volume from node 03 as the source node.

ucnlabcm01:Smiley Embarassed net port show -role intercluster

  (network port show)

                                      Auto-Negot  Duplex     Speed (Mbps)

Node   Port   Role         Link   MTU Admin/Oper  Admin/Oper Admin/Oper

------ ------ ------------ ---- ----- ----------- ---------- ------------

ucnlabcm01-03

       e0b    intercluster up    1500  true/true  full/full   auto/1000

       e2b    intercluster up    1500  true/true  full/full   auto/10000

ucnlabcm01-04

       e0b    intercluster up    1500  true/true  full/full   auto/1000

ucnlabcm01:Smiley Embarassed net int show -role intercluster

  (network interface show)

            Logical    Status     Network            Current       Current Is

Vserver     Interface  Admin/Oper Address/Mask       Node          Port    Home

----------- ---------- ---------- ------------------ ------------- ------- ----

ucnlabcm01-03

            repl1        up/up    10.230.4.33/16     ucnlabcm01-03 e2b     true

ucnlabcm01-04

            repl1        up/up    10.230.4.34/16     ucnlabcm01-04 e0b     true

ucnlabcm01:Smiley Embarassed net routing-groups show

Vserver   Group     Subnet          Role         Metric

--------- --------- --------------- ------------ -------

ucnlabcm01

          c10.230.0.0/16

                    10.230.0.0/16   cluster-mgmt      20

ucnlabcm01-03

          c169.254.0.0/16

                    169.254.0.0/16  cluster           30

          i10.230.0.0/16

                    10.230.0.0/16   intercluster      40

          n10.210.0.0/16

                    10.210.0.0/16   node-mgmt         10

ucnlabcm01-04

          c169.254.0.0/16

                    169.254.0.0/16  cluster           30

          i10.230.0.0/16

                    10.230.0.0/16   intercluster      40

          n10.210.0.0/16

                    10.210.0.0/16   node-mgmt         10

...

And here is my destination cluster:

ucnlabcm04::*> net port show -role intercluster

  (network port show)

                                      Auto-Negot  Duplex     Speed (Mbps)

Node   Port   Role         Link   MTU Admin/Oper  Admin/Oper Admin/Oper

------ ------ ------------ ---- ----- ----------- ---------- ------------

ucnlabcm04-01

       e1a-230

                intercluster up    1500  true/true  full/full   auto/10000

       e2d    intercluster up    1500  true/true  full/full   auto/1000

ucnlabcm04:Smiley Embarassed net int show -role intercluster

  (network interface show)

            Logical    Status     Network            Current       Current Is

Vserver     Interface  Admin/Oper Address/Mask       Node          Port    Home

----------- ---------- ---------- ------------------ ------------- ------- ----

ucnlabcm04-01

            repl1        up/up    10.230.4.41/16     ucnlabcm04-01 e1a-230 true

ucnlabcm04:Smiley Embarassed routing-groups show -vserver ucnlabcm04-01

  (network routing-groups show)

          Routing

Vserver   Group     Subnet          Role         Metric

--------- --------- --------------- ------------ -------

ucnlabcm04-01

          i10.230.0.0/16

                    10.230.0.0/16   intercluster      40

          n10.210.0.0/16

                    10.210.0.0/16   node-mgmt         10

ucnlabcm04:Smiley Embarassed routing-groups route show -vserver ucnlabcm04-01

  (network routing-groups route show)

          Routing

Vserver   Group     Destination     Gateway         Metric

--------- --------- --------------- --------------- ------

ucnlabcm04-01

          n10.210.0.0/16

                    0.0.0.0/0       10.210.254.254  10

The destination node is a dedicated replication destination

Source LIF running on 1G

ucnlabcm04:Smiley Embarassed node run -node local sysstat -x 1

CPU    NFS   CIFS   HTTP   Total     Net   kB/s    Disk   kB/s    Tape   kB/s  Cache  Cache    CP  CP  Disk   OTHER    FCP  iSCSI     FCP   kB/s   iSCSI   kB/s

                                       in    out    read  write    read  write    age    hit  time  ty  util                            in    out      in    out

59%      0      0      0      13  117390   2903    1340  84500       0      0   >60     92%   50%  :    50%      13      0      0       0      0       0      0

64%      0      0      0       0  115171   2866    5680 161280       0      0   >60     94%   85%  H    70%       0      0      0       0      0       0      0

63%      0      0      0       0  119723   2974    5099 158230       0      0   >60     94%   78%  Hf   67%       0      0      0       0      0       0      0

66%      0      0      0       1  116859   2902    5396 160052       0      0   >60     93%   69%  Hf   70%       1      0      0       0      0       0      0

81%      0      0      0    5200  113367   2806    6492 127552       0      0   >60     93%   57%  Hf   55%    5200      0      0       0      0       0      0

86%      0      0      0    7007  115025   2847    5634 107421       0      0   >60     93%   52%  Hf   49%    7007      0      0       0      0       0      0

86%      0      0      0    6533  117410   2925    2653  86653       0      0   >60     92%   70%  :    62%    6529      4      0       0      0       0      0

91%      0      0      0    6943  117531   2920    4888 160884       0      0   >60     93%   71%  H    65%    6943      0      0       0      0       0      0

71%      0      0      0    2056  109902   2863    4931 161846       0      0   >60     94%   69%  H    64%    2056      0      0       0      0       0      0

64%      0      0      0     147  115349   2855    7048 133812       0      0   >60     92%   50%  Hf   47%     147      0      0       0      0       0      0

61%      0      0      0      14  114891   2844    6315 119106       0      0   >60     92%   68%  Hf   55%      14      0      0       0      0       0      0

63%      0      0      0       0  118481   2928    3976  80359       0      0   >60     91%   64%  Hs   51%       0      0      0       0      0       0      0

61%      0      0      0       0  117321   2923    2845 149546       0      0   >60     93%   67%  :    66%       0      0      0       0      0       0      0

62%      0      0      0       0  117185   2913    3992 160355       0      0   >60     93%   69%  H    61%       0      0      0       0      0       0      0

63%      0      0      0       0  115966   2886    6422 152112       0      0   >60     93%   66%  Hf   56%       0      0      0       0      0       0      0

64%      0      0      0      13  118045   2938    7341 130641       0      0   >60     93%   63%  Hf   55%      13      0      0       0      0       0      0

62%      0      0      0       0  117927   2933    4692 136488       0      0   >60     93%   72%  Hf   60%       0      0      0       0      0       0      0

62%      0      0      0     109  119707   2967    2830  95054       0      0   >60     92%   51%  Hs   58%     109      0      0       0      0       0      0

63%      0      0      0     305  117229   3072    2245 129655       0      0   >60     94%   61%  :    62%     305      0      0       0      0       0      0

Source LIF Running on 10G

ucnlabcm04:Smiley Embarassed node run -node local sysstat -x 1

CPU    NFS   CIFS   HTTP   Total     Net   kB/s    Disk   kB/s    Tape   kB/s  Cache  Cache    CP  CP  Disk   OTHER    FCP  iSCSI     FCP   kB/s   iSCSI   kB/s

                                       in    out    read  write    read  write    age    hit  time  ty  util                            in    out      in    out

14%      0      0      0     104   22012    588     836  39936       0      0   >60     90%  100%  :f   33%     104      0      0       0      0       0      0

16%      0      0      0     166   20780    564     995  35692       0      0   >60     91%  100%  :f   34%     166      0      0       0      0       0      0

21%      0      0      0       2   38318   1021     788  29900       0      0   >60     90%   80%  :    36%       2      0      0       0      0       0      0

27%      0      0      0       0   33097    886    1476      0       0      0   >60     87%    0%  -     6%       0      0      0       0      0       0      0

19%      0      0      0     143   22712    632   10713  52947       0      0   >60     90%   84%  Hf   43%     143      0      0       0      0       0      0

14%      0      0      0      20    2645     70     964  50608       0      0   >60     87%  100%  :f   40%      20      0      0       0      0       0      0

  8%      0      0      0       1      17      1    1176  52596       0      0   >60     90%  100%  :v   50%       1      0      0       0      0       0      0

18%      0      0      0       1   30655    795     880     12       0      0   >60     88%    5%  :    11%       1      0      0       0      0       0      0

20%      0      0      0       1   23746    663    1144      0       0      0   >60     88%    0%  -     8%       1      0      0       0      0       0      0

23%      0      0      0       3   29419    762    1340      0       0      0   >60     88%    0%  -     9%       3      0      0       0      0       0      0

30%      0      0      0      16   81171   2189    8796  41272       0      0   >60     92%   36%  Hf   31%      16      0      0       0      0       0      0

19%      0      0      0       1   23010    610    1168 114004       0      0   >60     90%   83%  :    67%       1      0      0       0      0       0      0

21%      0      0      0       1   22341    630    1412      0       0      0   >60     88%    0%  -    13%       1      0      0       0      0       0      0

17%      0      0      0       0   14863    422     932      0       0      0   >60     88%    0%  -     8%       0      0      0       0      0       0      0

27%      0      0      0       0   19099    499    1492      0       0      0   >60     87%    0%  -     7%       0      0      0       0      0       0      0

21%      0      0      0      16   29642    765    6504  94544       0      0   >60     91%   90%  Hf   55%      16      0      0       0      0       0      0

16%      0      0      0       1   19368    568    3080  61488       0      0   >60     89%   96%  :    63%       1      0      0       0      0       0      0

21%      0      0      0       0    1121     29    1363      0       0      0   >60     86%    0%  -     8%       0      0      0       0      0       0      0

26%      0      0      0       0   65267   1753    1380      0       0      0   >60     89%    0%  -    13%       0      0      0       0      0       0      0

18%      0      0      0     143   17263    490    1036      0       0      0   >60     88%    0%  -     6%     143      0      0       0      0       0      0

I just don't get it.

Re: Intercluster XDP (SnapVault) performance over 10G

Lots of great info.


So - am I reading that correctly in that your Intercluster LIFs are on the same subnet as your cluster management LIF?

What interface are you using for the cluster management LIF?


Can you output the routing groups and routes for both clusters, please?

(I saw in the previous output that you included the routing groups for source cluster, but no routes. For the destination cluster, only the routing groups and routes for that one node are shown. Just trying to get a solid comparison, that's all).

Also - can you give the output for the following command?

node run -node <node> route -gsn

cDOT routing groups handle all the traffic regardless if you're working with layer 2 or layer 3. These rules apply based on the routing group configuration within the cluster and its nodes. It's only when you're working with layer 3 that you need a specific routing group route.

So, let's say your cluster management LIF does, in fact, happen to be on a 1 Gb port. Your cluster management LIF belongs to routing group c10.230.0.0/16. Your Intercluster LIFs belong to routing group i10.230.0.0/16. Those routing groups are most likely sharing the same default gateway, and that could lead to traffic routing over the cluster management LIF.

Also - am I reading the output correctly in that you seem to be replicating over different port configuration types? Your source shows the LIF on port e2b (assuming an access port) and your destination shows e1a-230, which is a tagged VLAN over an ifgrp (assuming a trunk port). I've seen an issue before where this definitely leads to issues but haven't had a chance to dig back into it. The result was to move the Intercluster traffic back to dedicated 1 Gb ports for more consistent traffic flow. I hope to re-visit this specific issue later in the week to troubleshoot like I did with the one I posted about in the other thread.

I know that some of this might seem like grasping at straws, but can't hurt, right?