I have an installation of DFM 4.02 running on a Windows Server 2008 R2 virtual machine. 12 GB RAM and 6 vCPU.
We are monitoring 8 filers in total, all running Ontap 8.1.1 7-mode.
Furthermore, we use Protection Manager extensively, with a multitude of datasets, protection polices, management of secondary space etc. This all works reasonably well. The major problem we are facing is that using the Management Console is really really slow. Quite frequently, we get error messages in the GUI stating that the current command has taken over 60 seconds before getting a response, would you like to retry?
A reboot makes the situation somewhat better, but not by miles. It's only a matter of time before the box slows down to the speed of running your hand through ice cold tar.
I've not done any twiddling of any performance related parameters of the sybase database. From other posts, I've learned that on Windows, db caches should dynamically use available memory in a semi intelligent fashion.
Last night was really bad, we had a lot of backup jobs failing all over the place and I can only summarize this was due to the DFM server being slower than ever, so we rebooted this morning.
Looking the sybase.log file, I can for example see:
Performance warning: Database file "D:\Program Files (x86)\NetApp\DataFabric Manager\DFM\data\monitordb.db" consists of 56 disk fragments
Performance warning: No unique index or primary key for table "srmFiles" in database "monitordb"
The SRM bit worries me. We also use FileSRM to monitor our NAS environment, which is made up by only two vFilers, but with terabytes and terabytes of data. We serve tens of thousands of users, and we have file scans running several days a week with a staggered schedule.
Will FileSRM store its data into the monitor.db (looks like it), and will this potentially slow down all other operations?
When just looking in task manager in Windows, I notice a fairly high CPU utilization on the server, and the process dbsrv10.exe is the culprit. Not that the process is pegging the CPU, but it's still running 30 - 70 %, and that's across 6 vCPUs.
Quick question. You said you are using Protection Manager extensively. Do you have any datasets that are out of conformance? Or are any conformance jobs that are long running? I have seen conformance totally choke a 4.x system before. Like put the strangle hold down so the system is basically unusable and takes FOREVER for things to load. In my example the conformance issues were due to exhausted resource pools. Once I fixed the resource pools, conformance was able to run and performance returned to normal.
Check the conformance log for details about any datasets that are non-conformant.
12Gb of RAM and 6 vCPUs should be plenty to work with - are you reserving the resources or allocating them? For a heavy ProtMgr environment plus monitoring I recommend reserving half of these resources for the VM (6Gb RAM 3vCPU) and report back.
FSRM is no longer supported nor sold, and it can add a large amount of data to the db which can slow down the server.
You might also consider upgrading to 5.0.2P1 or 5.1 64 bit to take advantage of the larger addressable memory space assuming the server is x64 - this would allow the server to work more efficiently especially if the db is larger (>4Gb).