we are seeing an issue with our Snap Mirror jobs which run from our Production NetApp over to another NetApp at our DR site. This issue is occuring on both controllers.
Currently we have a point to point 100Mbps link between our HQ and DR site, this line has been moved to another MPLS provider. The lines details and WAN networking equipment are still the same, just the provider has changed.
What we have found is since the changeover to the new WAN ptp link, all snapmirror jobs which run to the DR site are failing with the message ‘transfer’failed’. When watching the job start at the intended time, its begin to perform a transfer, is able to transfer a certain amount of data but then eventually fails with the message transfer failed appearing on the screen. If we try to rerun the snap mirror the same thing happens again.
Fortunately for the moment we still have the old MPLS link available and have the ability to revert back to it, as soon as we do this, the snap mirrors work without issue.
We have done the following to understand and try to address the issue:
Raised with new MPLS provider to explain issue since changeover, engineers have been on site at both end points, performed additional testing and have confirmed again that all is ok with the WAN link.
Limited the bandwidth on the snapmirror jobs to see if this makes any difference.
We have transferred a large 40GB file from a CIFS share on the Production NetApp to a volume on a Virtual machine on a server in the DR environment, this would also utilise the WAN link and have found the whole file copied successful, we tested this to confirm whether it was with all transfers that were failing or just Snap Mirror transfers and as this method worked, this allowed us to more or less confirm that this is an issue on the NetApp side.
I have taken a look at the snapmirror.log file on the destination filer and since we have made the change today I can just see details that the transfer failed but no additional detail on what caused it.
Below is an exmaple:
dst Wed Jan 6 10:03:54 GMT CENSLFASPROD01:NETAPP_prf1_esx_w2k8_sas_04 CENWBFASDR01:NETAPP_prf1_esx_w2k8_sas_04 Start dst Wed Jan 6 10:10:01 GMT CENSLFASPROD01:NETAPP_prf1_esx_w2k8_sas_04 CENWBFASDR01:NETAPP_prf1_esx_w2k8_sas_04 Abort (replication transfer failed to complete)
Does anyone have any clues what might be the issue? Has anyone else suffered similar issues?