2010-12-08 01:47 PM
Last day we upgraded our DFM from 4.0D16 to 4.0.1 happily, however after few hours all PM's mirror jobs started failing. Checked all the logs and don't see any error other than 'connection entry not found' in job logs.
We have in total around ~50 relationship which are configured for snapshots and snapmirrors and all of them are affected since that upgrade. I am not sure what went wrong few hours after upgrade and caused all those SM jobs to fail and snapshot jobs to keep working but yeah now it has put our state in big trouble.
We have a case opened with support and working with them however if anyone else has an idea what else could be wrong, please suggest.
2010-12-09 12:41 AM
Can you get the output of dfpm job detail <jobid> for couple of failed jobs ?
Also was there any specific reason for you to be on 4.0D16 ? Because IIRC some of the fixes in D releases are not rolled up in 4.0.1
So if the reason for you to be on D16 was due to any bug hit then you must make sure the same is available in 4.0.1(I can do that for you if you can give the bug id)
In that case I would recommed you to request a D patch release on 4.0.1 with the specific D release items.
2010-12-09 02:18 AM
Job Id: 2624020
2010-12-09 03:03 AM
The error message states "Please check the conformance status of the dataset.".
What is the output of the command "dfpm dataset conform -D <Dataset_Name>"
Is your Dataset conformant? If not did this conformance chnaged after Upgrading to 4.0.1?
2010-12-09 10:31 AM
Lag messages are not conformance actions.
If you can post the following like,
dfpm job detail <jobid>-----------Ignore this as you already did it
dfpm dataset list –x <3283>
dfpm dataset list –m <3283>
dfpm dataset list –R <3283>
dfpm dataset conform –D<3283>
will help us to help you better.
The error messages is generated when there a no relationships.
2010-12-09 02:33 PM
Sorry, I don't have access to my system now so I can't give you the details but yeah, earlier in the day when I was checking these things I noticed that -R doesn't show any relationship and a log file (if right it was smmon) shows that malformed SNMP response.
Another thing which I noticed was dfdrm commands are working well and yeah I forgot to tell you that our ontap release is 7.3.2P* release which had SNMP burt and fix was applied in 7.3.3P3.
It could be red herring but corelating all these things points that 4.0.1 isn't able to understand SM relationship details from 7.3.2 due to SNMP problem, therefore it has remoed all the relationship from it's DB and now no updates are happening.
But if this is the case than I wonder how D16 was able to get everything working before?
2011-11-16 07:22 AM
I would have loved to read a follow up. I am running ONTAP 8.0.1 and I am trying to wrap my head around odd failures with snapmirror reporting. I have upgraded to ONCommand 5 and just want to monitor my jobs using DFM management console. I created an empty dataset and policy with just notifications but get errors like "Baseline failure" and it doesn't make sense to me. I tried running some of the recommended CLI items listed in this thread but they didn't apply I guess with my version of ONTap. Any thoughts are welcome