Community

Subscribe
Highlighted

Snapmirror replication suddenly stopped working

We have snapmirror replication set up on a metro-E network. We also have a point-to-point network which is being optimized by two Riverbed Steelheads. Here is what I don't get: Snapmirror replication broke the minute we enabled the Steelhead appliances. I ran packet traces and confirmed that the filers are talking to each other over the metro-E, and that no snapmirror or control traffic is passing over the point-to-point (only CIFS).

I opened a case with NetApp support. The only suggestion I got was to change my MTU on the filers. The MTU is currently set to 9000 on the iSCSI VIF and that's what we're using for snapmirror replication. Again, this worked perfectly fine until the Steelheads were enabled on a different network.

Here are the errors I'm seeing:

On my destination filer:

recover.abort.ROOLR:notice]: The abort event, snapmirror: Cannot Init

NTM, aborting , is just notified.

On my source filer:

replication.src.err:error]: SnapMirror: source transfer from

x to x : transfer

failed.

Has anyone else seen anything like this?

Re: Snapmirror replication suddenly stopped working

It is not at all netapp snapmirror issues, it is all about your steelhead appliance, you need to tweak it to work with your network. Make sure the it talks on all necessary ports of snapmirror using the telnet.

thank you,

AK G

Re: Snapmirror replication suddenly stopped working

AK G, the snapmirror traffic isn't traversing the Steelheads. It's on a separate network.

Re: Snapmirror replication suddenly stopped working

does traceroute from the target to source and vice-versa go over the expected network?  routing in 7-mode can be interesting and some unexpected routes unless we route add net for the dedicated snapmirror network.  You probably already checked, but just to make sure it is going over the expected network and if so fixed with a route add net.

Re: Snapmirror replication suddenly stopped working

Yes, the traffic is flowing just as expected.

Re: Snapmirror replication suddenly stopped working

Is options snapmirror.allow set to a hostname with a different IP address?  Does changing the setting to "*" or "all" let it work?  Just to test then you can put the IP of the source controller.

Re: Snapmirror replication suddenly stopped working

I have snapmirror.allow set to *. The filers are talking to each other on the correct network and IPs, but the initial transfer fails with a generic network error. Not surprisingly, NetApp support has been of no help. All the tech wants to do is close the ticket. That's why I'm posing the question here.

Re: Snapmirror replication suddenly stopped working

Hi Ben,

I'm sorry to hear that you are not able to resolve your issue with NetApp Support. Please send me a private message with the case number and I will look into it.

Thanks,

Christine

Re: Snapmirror replication suddenly stopped working

The NetApp tech has gone silent. He intimated that snapmirror is sensitive to any network delays or problems. I know for a fact that this is false. I have run snapmirror on saturated, high-latency networks with no problem. Either the tech I got assigned to doesn't know the product, or Snapmirror is just not ready for prime-time. At this point, I'm starting to suspect both.

Re: Snapmirror replication suddenly stopped working

Sorry to hear you didn't get a response... it may be worth a call back and ask for the Duty Manager and escalate...most cases get handled just fine, but as NetApp grew to 6 Billion there can be some growing pains.  SnapMirror has been at many of our customers for many years (about 12 years doing this) and can vouch for the production quality of SnapMirror.  Hopefully support can escalate and quickly solve this issue.  The NetApp and/or VAR team may be able to help escalate as well and start packet traces that escalations can look at to see the issue.