Re: repl.engine.error: replStatus="5", replFailureMsg="5898500"

jappereuling · ‎2016-04-14

Hi,

We have a 8020 with ontap 8.3 cluster mode and we have a seperate 2554 ontap 8.3 cluster. The 8020 cluster snapmirrors to the 2554 cluster.

All is working, all snapmirror relations are up and running, no errors or issues on the snapmirror relations. But on the snapmirror destination (the 2554) we see a DEBUG message in teh logs each 15 minutes:

DEBUG repl.engine.error: replStatus="5", replFailureMsg="5898500", replFailureMsgDetail="0", functionName="void repl_volume::Query::_queryResponse(repl_spinnp::Request&, const spinnp_repl_result_t&, repl_spinnp::Response*)", lineNumber="149"

The snapmirrors (replication) run each hour and are up to date, no errors, etc. No other log messages that give any hint.

So this debug error is bugging me, what does it mean, to what volume is it related, what does it mean?

Hope someone can help

warrenb · ‎2016-04-14

I saw an internal article that suggested that this might be due to MTU issues. Have a look at this public KB and see if it helps: https://kb.netapp.com/support/index?page=content&id=2020029&locale=en_US

Regards,

Warren

jappereuling · ‎2016-04-14

Thanks for your quick reply!

MTU is 9000 on all interfaces regarding intercluster. The nexus switches also have a high mtu:9216 and the source cluster also has mtu 9000

cluster peer ping works every time:

drs1::*> cluster peer ping -destination-cluster nac1
Node: fas13a Destination Cluster: nac1
Destination Node IP Address Count TTL RTT(ms) Status
---------------- ---------------- ----- ---- ------- -------------------------

fas14a           x.x.x.82          1 255   0.262 interface_reachable
fas14b           x.x.x.84          1 255    0.26 interface_reachable
fas15a           x.x.x.86          1 255   0.291 interface_reachable
fas15b           x.x.x.88          1 255   0.247 interface_reachable

Node: fas13b                 Destination Cluster: nac1
Destination Node IP Address       Count TTL RTT(ms) Status
---------------- ---------------- ----- ---- ------- -------------------------
fas14a           x.x.x.82          1 255   0.207 interface_reachable
fas14b           x.x.x.84          1 255   0.231 interface_reachable
fas15a           x.x.x.86          1 255    0.22 interface_reachable
fas15b           x.x.x.88          1 255   0.227 interface_reachable
8 entries were displayed.

(i obfuscated the ip's)

Lowering the MTU to 1500 seems quite a waste to me 😉

Doesn't seem to be this issue or at least i'm not seeing what the KB article is describing

Interesting fact, it is almost exact on the quarter of an hour

14:00:04
14:15:03
14:15:04
14:45:02
15:00:04
15:00:07
15:30:04
15:45:03
16:00:07
16:00:08
16:15:02
16:30:05

the 00 / 15 /30 /45 match with the snapmirror schedules

hourly0             @:00
hourly1             @:15
hourly2             @:30
hourly3             @:45

But the snapmirrors seem to be oke

drs1::snapmirror*> show -fields last-transfer-error-codes,last-transfer-error,status,last-transfer-end-timestamp ,healthy,last-transfer-size
source-path   destination-path status healthy last-transfer-error last-transfer-error-codes last-transfer-size last-transfer-end-timestamp
------------- ---------------- ------ ------- ------------------- ------------------------- ------------------ ---------------------------
svm0:vol1 dsvm0:vol1   Idle   true    -                   -                         34.87GB            04/14 16:05:41
svm1:vol1 dsvm1:vol1   Idle   true    -                   -                         26.96GB            04/14 16:19:20
svm2:vol1 dsvm2:vol1   Idle   true    -                   -                         39.88GB            04/14 16:36:18
svm3:vol1 dsvm3:vol1   Idle   true    -                   -                         29.15GB            04/14 15:49:49
svm4:vol1 dsvm4:vol1   Idle   true    -                   -                         424KB              04/14 16:00:22

warrenb · ‎2016-04-14

Hi, It was a shot in the dark really. My only other suggestion would be to open a case. Regards, Warren

Rouden · ‎2016-12-06

Hi,

Did you find out the root cause of you problem?

I Have exactly the same error message but in my case I try to initialize a svm-dr relationship.

Additionnaly the transfer didn't start and after a while the relationship stays uninitialized and unhealthy.

Thanks in advance,

Jorge DIAS

jappereuling · ‎2016-12-06

Hi,

wow blast from the past 🙂 . Nothing we did (top of mind) and its been ages that i've seen this message. I just check my mail with support and one thing i noticed:

functionName="void repl_volume"

the were hinting at the fact that it might have been because of nothing to mirror / no changes Never got that confirmed though.

Never had this with setting up the relation ship though.

Sorry that I can't help you further.