Subscribe

Need DFM/OCUM help - stop collecting data/alerts

Hi our DFM server (5.2R1) is having some issues that support hasn't be able to help us with and I was hoping someone on here can give me some ideas, our case has been open for over a month now.  If your Netapp our case number is 2005006488.  What happens is data just stops being collected every day or two.  All of the services appear to be running normally but no new/current data is being collected, I usually just do a dfm service stop/start to get it running again.  I can just stop and start the monitoring service and it will collect again but I usually do them all.

Performance Advisor Checklist

perfAdvisorEnabled     Passed

hostType               Passed

hostRevision           Passed

hostLogin              Passed

perfAdvisorTransport   Passed

-sh-4.1$

-sh-4.1$ dfm about

Version                          5.2 (5.2R1)

Executable Type                  64-bit

Serial Number                    1-50-007252

Edition                          Standard edition of DataFabric Manager server

Data ONTAP Operating Mode        7-Mode

Here are the things they have had me do:

Modify the Semephores

[root]# ipcs -sl

------ Semaphore Limits --------

max number of arrays = 1024

max semaphores per array = 250

max semaphores system wide = 256000

max ops per semop call = 32

semaphore max value = 32767

Shut down 2nd DFM server

We also have a second DFM server running in our DR site which they said was not recommened monitoring the same servers so I shut that down over a weekend...  No change.

Please change the following option in order to help resolve some of these issues:

dfm options set hostLoginProtocol=ssh

Reconfigure the VM to use 8gb of reserved memory rather than 4gb allocated

I then volunteered to rebuild the entire dfm server but I didn't want to lose my data.

I did an rpm -e and blew away all the old directories etc.

Reinstalled and added back in the filers manually and let it cook over a weekend again and it remained up.

I then restored our database and the issue has returned. 

Support now wants me to have the entire vfiler destroyed and rebuilt but our change control is very strict and destroying and rebuilding a server would be weeks of change control so I want to make sure I exhaust other options before I do a burn down and restore which I don't believe will resolve it.

I am willing to do another rebuild but is there a way to do selective restores where I only pull in historical data and my custom reports and not configuration settings?

I'm open to idea's.

Thanks,

James




Re: Need DFM/OCUM help - stop collecting data/alerts

First has support said anything about you  running rc1 and not 5.2R1.   You are essentially running a release candidate.

Are you saying that your performance advisor data just stops collecting or your events like volume fulls stop getting alerted?

Re: Need DFM/OCUM help - stop collecting data/alerts

Opps it was 5.2R1 on the last rebuild I must have grabbed the wrong file:

This is from my original output.

Version 5.2 (5.2R1)

Executable Type 64-bit

Serial Number 1-50-007252

Edition Standard edition of DataFabric Manager server

Data ONTAP Operating Mode        7-Mode