Subscribe

Adding Protection Policy fails after a 'cf giveback -f'

I recently encountered burt 369072 (during intense inode cleaning, CP coordinator thread fails to yield processor) which caused a panic to my backup filer...Panic Message:'process on cpu1 hung (coordinator) for 5002 milliseconds! in process coordinator on release NetApp Release 7.3.2P3.'

At the same time of the panic, there was a disk failure...to be able to 'cf giveback' I had to 'cf giveback -f', which was successful.  Since the giveback, I’ve not been able to add a Protection Policy to any of unprotected datasets. When I attempt to add the backup provisioning policy, my resource pool gets a blue '?' and will not allow the provisioning of the required volumes...I get a message stating:

Reason: Storage system: 'filername'(16689):Active/Active failover: Take Over by partner disabled.

Suggestion: "Storage system: 'filername(16689):Enable Active/Active failover on the partner.

I've verified CF stats is enable and the partner is up...I’ve cleared all events in DFM....not sure what else I can do here.

Any thoughts?

Thanks,
Scott

p.s. I can send a screenshot to any of the Engineer folks that want to have a look the exact output...will not post it as it contains fqdn.

Re: Adding Protection Policy fails after a 'cf giveback -f'

Hi Scott,

Your Backup provisioning policy is enabled for "Controller Failure Reliability"

I.e. for provisioning only on active active clusters.

So can you run dfm host discover on both your active-active pair?

So that dfm can discover that they are in active-active state.

Close and open the NMC and try again?

Regards

adai

Re: Adding Protection Policy fails after a 'cf giveback -f'

Hey Adai,

Thanks for the response. I did a 'dfm host discover <filer>' and logged out of NMC as suggested.... it would appear as though the is part worked (Refreshing data from host <filername> (24232) now.). However I’m faced with the same error when attempting to apply a protection policy to a dataset.  Now, I've learned over the last while that DFM is extremely slow picking up changes.  I waited about 30 minutes for the host discover to take affect; should this be sufficient?

Thanks,
Scott

Re: Adding Protection Policy fails after a 'cf giveback -f'

All dfm monitors have a default monitoring interval.

So it takes time as its not event driven rather periodic polling.

NMC take time to refresh it or you will have to move to some other page to make it refresh or close and open for immediate change in NMC.

Regards

adai

Re: Adding Protection Policy fails after a 'cf giveback -f'

Looking into the DFM Server logs, I see that' I’m getting '[dfmserver:ERROR]: Thread 0x102c: cf settings is not 2, instead its 4'.

Speaking with Justin Parisi (NetApp support), we tried a number of different thing.  Justin mentioned this is a similar issue to that of burt: 382019, whereas the snmp traps are not sending the correct info.

Any thoughts?

Re: Adding Protection Policy fails after a 'cf giveback -f'

It appears that the root cause of the issues was running out of space... I went below 5% free space on the C:\ of the DFM server causing DFM to stop updating and .

I didn't notice but the event I configured were not triggering emails when thresholds were reached...this was in parallel to the issue of assigning a Protection policy.   Once I realize that I was almost out of space and cleaned up (10-15%), I started receiving alert email and OM updated with the fact that CF was enabled and up.

During the time that I was running low on disk space, OM stopped gathering data so I how have a gap in my usage statistics.

Re: Adding Protection Policy fails after a 'cf giveback -f'

Hi Scott,

     The best practice is to setup email alerts for the management station event.

Below is the important list of management station events.

[root@lnx ~]# dfm eventtype list | grep -i free
management-station:enough-free-space          Normal       dfm.free.space
management-station:filesystem-filesize-limit-reached Error        dfm.free.space
management-station:not-enough-free-space      Error        dfm.free.space
management-station:perf-advisor-enough-free-space Normal       dfm.perfAdvisor.free.space
management-station:perf-advisor-not-enough-free-space Error        dfm.perfAdvisor.free.space
[root@lnx ~]#

The other places to look for the same is output of dfm about. O/p Sanitized to post here.

[root@lnx ~]# dfm about

Version                          4.0 (4.0D1)

Serial Number                    XXXXXXXXXXXX

Administrator Name               root

Host Name                        lnx186-223

Host IP Address                  XXXXXXXXXXXX

Host Full Name                   lnx186-223.XXXXXXXXXXXXXXXX

Operations Manager Node limit    999 (currently managing 3)

Provisioning Manager Node Limit  999 (currently managing 0)

Protection Manager Node Limit    999 (currently managing 2)

Operating System                 Red Hat Enterprise Linux AS release 4 (Nahant Update 4) 2.6.9-42.ELsmp i686

CPU Count                        1

System Memory                    2026 MB (load excluding cached memory: 58%)

Installation Directory           /opt/NTAPdfm

                                 3.94 GB free (31.0%)<<<<<<<<<<<<<<<<<<<<<<<<<Here you would find Error Instead of free.

The next place is dfm diag | grep -i management which shows the events that are generated on the management stations.

The dfmserver.log also has this info of monitoring is suspended to due the space issue.

Regards

adai