Data Backup and Recovery

Snapmirror between two C190 clusters using ONTAP 9.8P4 is very slow.

RuleS
1,836 Views

This is our primary NetApp at site CapeTown and were running a SVM-DR relation to a Netapp cluster in Centurion (1500Km). From that DR-site we run a snapvault to another destination. Around 3 weeks ago we started to get problems with update of one volume in a SVM-DR. We tried to restart snapmirror relation, do a resync and all operations we could think of except running a new baseline since the WAN link is poor so we definetly want to avoid that. Nothing helped and we decided to change backup method and run a snap vault directly between primary site and backupsite and completely skip the SVM-DR relation. The snapvault relations are rebuilt at destination site and the source path is now set to the primary system. All “new” relations are now in a resync and hung.

 

After troubleshooting DR-site, snapvault destination, cluster peer status, WAN link and found all that OK I stared to look at the source NetApp cluster. It seems the problem is here and the system does not deliver any data during a snapmirror. I’ve setup several local test snapmirror relations within the same box. It’s no problem to establish the relation but I get basically no data transferred/copied. There is enough space in the aggregates and it is very low utilization of the cluster over all.

Here a baseline been going for 3 days and moved 82GB.

I run the same test in another C190 and a baseline of 100GB took around 3min. to complete.

Next step what is going to happen is upgrading the ONTAP from 9.8P4 to 9.8P8 tomorrow night.

 

Any suggestions please?

1 ACCEPTED SOLUTION

RuleS
1,526 Views

The problem turned out to be that another administrator had changed one of the options and throttled the snapmirror by setting the

global option replication.throttle.enable       on

 

View solution in original post

4 REPLIES 4

AlexDawson
1,629 Views

Packet loss or link MTU mismatch would be my guess.

 

Use the network performance test feature between the two intercluster LIFs to check link throughput - https://docs.netapp.com/us-en/ontap/performance-admin/measure-latency-throughput-nodes-task.html 

 

Then, use ping with different packet sizes to check path maximum transmittable unit (PMTU) - https://docs.netapp.com/ontap-9/index.jsp?topic=%2Fcom.netapp.doc.dot-cm-cmpr-960%2Fnetwork__ping.html - start at 1000 with -disallow-fragmentation and increase at 100 bytes per test to see when it stops working, then make sure the LIFs on both sides are set to a smaller value than that. They may be set to 1500 and your PMTU might be 1460 or lower due to vxlan, vlan or mpls encapsulation.

 

Hope this helps!

tahmad
1,532 Views

Were you able to fix the issue by upgrading @RuleS , or did you find any network issue like @AlexDawson suggested

RuleS
1,527 Views

The problem turned out to be that another administrator had changed one of the options and throttled the snapmirror by setting the

global option replication.throttle.enable       on

 

tahmad
1,525 Views

Thank you for sharing the solution @RuleS 

Public