ONTAP Discussions

PM Job failing after upgrade to 4.0.1

lovik_netapp
5,244 Views

Hi all,

Last day we upgraded our DFM from 4.0D16 to 4.0.1 happily, however after few hours all PM's mirror jobs started failing. Checked all the logs and don't see any error other than 'connection entry not found' in job logs.

We have in total around ~50 relationship which are configured for snapshots and snapmirrors and all of them are affected since that upgrade. I am not sure what went wrong few hours after upgrade and caused all those SM jobs to fail and snapshot jobs to keep working but yeah now it has put our state in big trouble.

We have a case opened with support and working with them however if anyone else has an idea what else could be wrong, please suggest.

Thanks,

9 REPLIES 9

adaikkap
5,243 Views

Hi,

      Can you get the output of dfpm job detail <jobid> for couple of failed jobs ?

Also was there any specific reason for you to be on 4.0D16 ? Because IIRC some of the fixes in D releases are not rolled up in 4.0.1

So if the reason for you to be on D16 was due to any bug hit then you must make sure the same is available in 4.0.1(I can do that for you if you can give the bug id)

In that case I would recommed you to request a D patch release on 4.0.1 with the specific D release items.

Regards

adai

lovik_netapp
5,243 Views

Job Id:                    2624020

Job State:                 completed
Job Description:           Mirror data from node 'Primary data' to node 'Mirror' of dataset 'vol_11' (3283)
Job Type:                  mirror
Job Status:                failure
Bytes Transferred:         0
Dataset Name:              vol_11
Dataset Id:                3283
Object Name:               vol_11_mirror
Object Id:                 3283
Policy Name:               SNAPMIRROR_PRI_SEC
Policy Id:                 4767
Started Timestamp:         09 Dec 2010 08:25:04
Abort Requested Timestamp:
Completed Timestamp:       09 Dec 2010 08:25:04
Submitted By:              DFMScheduler
Destination Node Id:       2
Destination Node Name:     Mirror
Source Node Id:            1
Source Node Name:          Primary data
Job progress messages:
Event Id:      29254140
Event Status:  normal
Event Type:    job-start
Job Id:        2624020
Timestamp:     09 Dec 2010 08:25:04
Message:
Error Message:
Event Id:      29254143
Event Status:  error
Event Type:    job-progress
Job Id:        2624020
Timestamp:     09 Dec 2010 08:25:04
Message:
Error Message: vol_11: There are no data protection relationships for connection 1 of dataset vol_11_mirror (3283) and the source node has data to protect so the update from the primary node to the secondary node has failed. Please check the conformance status of the dataset.
Event Id:      29254144
Event Status:  error
Event Type:    job-end
Job Id:        2624020
Timestamp:     09 Dec 2010 08:25:04
Message:
Error Message:


Upgrade to D16 was due to bug # 42274 which we hit after upgrade of 7.3.4

sinhaa
5,243 Views

The error message states "Please check the conformance status of the dataset.".

What is the output of the command "dfpm dataset conform -D <Dataset_Name>"

Is your Dataset conformant? If not did this conformance chnaged after Upgrading to 4.0.1?

Warm regards,
Abhishek

If this post resolved your issue, help others by selecting ACCEPT AS SOLUTION or adding a KUDO.

lovik_netapp
5,243 Views

Error in conformance is due to SM lag and it wasn't before the upgrade or

few hours after the upgrade.

-Sent from my handheld

adaikkap
5,243 Views

Hi,

Lag messages are not conformance actions.

If you can post the following like,

dfpm job detail <jobid>-----------Ignore this as you already did it

dfpm dataset list –x <3283>

dfpm dataset list –m <3283>

dfpm dataset list –R <3283>

dfpm dataset conform –D<3283>

will help us to help you better.

The  error messages is generated when there a no relationships.

Regards

adai

lovik_netapp
5,243 Views

Sorry, I don't have access to my system now so I can't give you the details but yeah, earlier in the day when I was checking these things I noticed that -R doesn't show any relationship and a log file (if right it was smmon) shows that malformed SNMP response.

Another thing which I noticed was dfdrm commands are working well and yeah I forgot to tell you that our ontap release is 7.3.2P* release which had SNMP burt and fix was applied in 7.3.3P3.

It could be red herring but corelating all these things points that 4.0.1 isn't able to understand SM relationship details from 7.3.2 due to SNMP problem, therefore it has remoed all the relationship from it's DB and now no updates are happening.

But if this is the case than I wonder how D16 was able to get everything working before?

adaikkap
5,242 Views

Any updates from the customer ?

Regards

adai

lovik_netapp
5,242 Views

No, still it's pending.

deannamcneil
5,244 Views

I would have loved to read a follow up. I am running ONTAP 8.0.1 and I am trying to wrap my head around odd failures with snapmirror reporting. I have upgraded to ONCommand 5 and just want to monitor my jobs using DFM management console. I created an empty dataset and policy with just notifications but get errors like "Baseline failure" and it doesn't make sense to me. I tried running some of the recommended CLI items listed in this thread but they didn't apply I guess with my version of ONTap. Any thoughts are welcome

Public