Hi our DFM server (5.2R1) is having some issues that support hasn't be able to help us with and I was hoping someone on here can give me some ideas, our case has been open for over a month now. If your Netapp our case number is 2005006488. What happens is data just stops being collected every day or two. All of the services appear to be running normally but no new/current data is being collected, I usually just do a dfm service stop/start to get it running again. I can just stop and start the monitoring service and it will collect again but I usually do them all.
Performance Advisor Checklist
perfAdvisorEnabled Passed
hostType Passed
hostRevision Passed
hostLogin Passed
perfAdvisorTransport Passed
-sh-4.1$
-sh-4.1$ dfm about
Version 5.2 (5.2R1)
Executable Type 64-bit
Serial Number 1-50-007252
Edition Standard edition of DataFabric Manager server
Data ONTAP Operating Mode 7-Mode
Here are the things they have had me do:
Modify the Semephores
[root]# ipcs -sl
------ Semaphore Limits --------
max number of arrays = 1024
max semaphores per array = 250
max semaphores system wide = 256000
max ops per semop call = 32
semaphore max value = 32767
Shut down 2nd DFM server
We also have a second DFM server running in our DR site which they said was not recommened monitoring the same servers so I shut that down over a weekend... No change.
Please change the following option in order to help resolve some of these issues:
dfm options set hostLoginProtocol=ssh
Reconfigure the VM to use 8gb of reserved memory rather than 4gb allocated
I then volunteered to rebuild the entire dfm server but I didn't want to lose my data.
I did an rpm -e and blew away all the old directories etc.
Reinstalled and added back in the filers manually and let it cook over a weekend again and it remained up.
I then restored our database and the issue has returned.
Support now wants me to have the entire vfiler destroyed and rebuilt but our change control is very strict and destroying and rebuilding a server would be weeks of change control so I want to make sure I exhaust other options before I do a burn down and restore which I don't believe will resolve it.
I am willing to do another rebuild but is there a way to do selective restores where I only pull in historical data and my custom reports and not configuration settings?
I'm open to idea's.
Thanks,
James