Active IQ Unified Manager Discussions
Active IQ Unified Manager Discussions
Hi guys, can anyone shed some light on DFM database performance? We have 14 filers being monitored via DFM / OnCommand 5.0.0 running on a VM (MS Windows 2008 R2 SP1 - 64bit, 4 vCPU's and 10GB RAM).
Whenever I delete events from the DFM database the dbsrv10.exe process ramps right up and more occasionally than not an alert is generated in DFM saying that the "Management station load is too high". DFM then pretty much is unresponsive for up to 10 minutes.
Has anyone else had poor performance issues with DFM / server. Does the DFM database require some kind of cleanup or tuning? Current size of monitordb.db file is 13.2GB. Any advice much appriciated. Cheers Ian
Hi,
We are running DFM but an older version 4.0.2 and notice the same thing, performance is very bad when listing/deleting/selecting alerts, when this is done DFM host CPU goes to 100%. I was hoping our performance would improve when we upgrade to Oncommand 5 which I thought changed to 64bit architecture but from your experience it still looks bad.
DFM interface is very slow and inefficient to use like this, I believe you have to log a call to "prune" the DB which just seems ridiculous (it should have this built in). Our monitordb.db is 9GB but we are only monitoring 4 filers.
Luke
Hi
Same problem here. DFM 4.0.2 on a 6CPU/12Gb RAM machine. Monitoring 8 filers. DFM is pretty much unusable.
Did you guys find some kind of solution? We have tried to purge our DB with no difference in performance (took about two hours for a 7 Gb DB including purge/reload if someone want´s to know). According to NetApp support our dfmserver process is maxing out memorywise. I think there is a 2 Gb cap/process as DFM 4.0.2 is a 32-bit application.
/Hans
Guys try out the new dfm purge utility and also upgrade to 5.0.2P1
Regards
adai
Hi Adai,
Does the P1 fix the slowness issue? I literally have to wait for 2-3 minutes while editing datasets.
Regards,
Usman
Hi Usman,
The slowness is not due to any bug. But it could be related to multiple configuration issues. What version of OCUM/DFM are you running currently ?
Is your server a VMware ? If so what is the memory and CPU configuration ? Are they reserved or allocated ? How many dataset are currently there ? How many controllers are being monitored ? Is Performance Advisor also Enabled on the same server ?
Answers to many of the above questions affect what you described. In general 5.0.2P1 is a very stable release with no known memory or performance or functional issue.
Regards
adai
Hi Adai,
Thanks for getting back to me. I have been working on this for a while, even opened a case couple of times but nothing has really helped. Just installed the 5.0.2P1 patch which was running good for about 5 mins after which it has slowed down again. I am constantly seeing at least one core of the server being pegged and the dbsrv10.exe process using about 25% CPU. It seems to me the delay is being caused by the sybase database. Prior to installing the 5.0.2P1 patch I did use the dfm purge utility as well.
What version of OCUM/DFM are you running currently ?
5.0.2P1 on Windows 2008 R2 SP1 x64
Is your server a VMware ?
Yes
If so what is the memory and CPU configuration ?
4 vCPU, 6GB RAM (played around with different configs to test but none have helped)
Are they reserved or allocated ?
Yup, I have 5GHz reserved for CPU and 6GB for RAM
How many dataset are currently there ?
113
How many controllers are being monitored ?
8
Is Performance Advisor also Enabled on the same server ?
Was initially, I disabled it in hopes of improving performance, but didn't help.
Regards,
Usman
Adai,
I have also noticed that the slowness is during the night time when our backup jobs are running. During day time, its usually pretty fast. Any ideas?
Regards,
Usman
Guys,
Can anyone provide any insight into this issue? Any changes that can be made to speed up things.
Regards,
Usman
Hi Usman,
From which version did you upgrade from ? How long was it since you upgraded ?
After upgrade to 5.0.x we do the following.
1.Purge all data proteciton jobs older than 90 days which happens during everyday @ midnight. So soon after upgrade due to the amount of jobs to be purged you would encounter slowness for a week or so.
2.We prune all perf data files for stale instances which starts every sunday@ midnight and runs once every week. Since you have perf data which definitely will have stale entries due to mark-deleted objects this will also consume resources in your dfm server.
3.By default the events pruning happen everyday @ midnight for events older than 180 days.
since all this 3 happen during midnight I think these are the reasons why you feel slowness. But 1 & 2 should stabilize in a week or 2 and you should be able to see marked improvements in you dfm server.
I would also recommend you to upgrade your RAM to 16GB or so since you are running 100+ datasets.
Regards
adai
Hello guys,
We have a huge DFM environment with the same problems as you. The only way that we found to speed up was to split the DFM servers. But now it become again slower on 2 of them DFM for NAS and DFM for SAN, we found that it is really related to the number of datasets with protection policy. We also tried to Purge the DB with the tool embedded but we have seen no improvement.
We tried to increase CPU, RAM... but nothing help. We have the idea to put this VMs on SSD Drive; anybody try this ?
Now we have this configuration:
DFM for NAS
On Command Core 5.1
VM on Linux, 6vCPU, 12GB RAM, database on RDM FC
218 Datasets
14 Filers
DFM for SAP
On Command Core 5.1
VM on Linux, 6vCPU, 12GB RAM, database on RDM FC
220 Datasets
23 Filers
DFM for SAN
On Command Core 5.1
VM on Linux, 6vCPU, 12GB RAM, database on RDM FC
222 Datasets
12 Filers
DFM For Performance Advisor
On Command Core 5.1
VM on Linux, 6vCPU, 12GB RAM, database on RDM FC
22 Filers monitored
Regards
Jerome
That's funny Jerome, I was thinking of putting the DFM server on SSD as well but don't have any available right now.
Jerome, are you noticing the dbsrv10.exe process constantly using around 25% cpu when your environment slows down? I myself had to split us the DFM servers as a result of this slowness and turned off performance advisor as well, but nothing has helped so far.
Hello,
When it's slow we use 400% to 500% of CPU means the DB use 4 to 5 vcpu (6 vcpu in total)
PID USER | PR NI VIRT RES SHR S %CPU %MEM | TIME+ COMMAND |
27182 root 16 0 6664m 4.7g 15m S 370.8 40.0 65899:10 dbsrv11
27222 root 15 0 245m 26m 10m S 0.3 0.2 36:37.45 dfmeventd
27226 root 15 0 159m 6608 5848 S 0.3 0.1 21:19.48 dfmwatchdog
jerome
We foud also that a lot of things are not purged form the DB and I thinks it's not helping performance. Look at the event count number it's horrific
Exemple on our DFM NAS (CMD: dfm diag)
DP Job Information
Job State Count
Jobs Running 44
Jobs Completed Total 124949
Jobs Aborted Total 0
Jobs Aborting Total 0
Jobs Completed Today 6184
Jobs Aborted Today 0
Jobs Aborting Today 0
Dataset Protection Status
Protection State Count
Protected 216
Unprotected 2
Event Counts
Table Count
Events 15694712
Current Events 15641683
Abnormal Events 12152877
Event Type Counts
Event Type Count
volume.growthrate 11211599
sm.update 3177912
data-protection-job.status 892396
snap-status 29322
df.avail 29253
volume-clone.discovered 27466
df.kbytes 25057
df.snapshot.kbytes 24847
df.inodes 24845
volume.first-snap 24843
Version 5.1 (5.1)
Hi Jerome,
In your case the problem seem to be mainly due to DP jobs per day and its history. Looks like your jobs purge is set to 21 days. I would recommend you to bring this down to 1 week.
With respect to the events purge, 5.2 will address the same. I suggest you to lower your events purge to 1month or so.
Also putting a SSD will definitely help.
Regards
adai
Hello Adai,
What do you means by : I suggest you to lower your events purge to 1month or so ?
There is a parameter to configure the purge of the event ?
Jerome
Hi Adai,
Its been a while since we upgraded to 5.0.1. A few weeks back I upgraded to 5.0.2 and just a couple of days back I upgraded to 5.0.2P1. Still very slow especially when backup jobs are running.
I'll try to increase RAM to 16GB but I doubt that will make a difference as the RAM utilization is barely 3GB right now.
Set RAM to 16GB, still seeing same issues.
It seems the sybase database is the bottleneck. I've tried setting the dbcachesize parameter higher before which didn't help much. All other functions appear to be fine but everything slows down as soon as any datasets come into play.
Any ideas gents?
I have upgraded to 5.1 in hopes of sybase 11 improving performance but it seems to still be the same. As I mentioned, I believe the bottleneck is the database itself. Based on the following article, my guess is that only one vcpu gets pegged waiting for a sql query to finish and as a result, any other function takes a long time to execute (when the vcpu gets pegged, I can tell from task manager that the other processes are not using much cpu resources).
http://www.symantec.com/business/support/index?page=content&id=TECH190207
Does anyone have experience with using dbisqlc to monitor what's going on with the sybase database. Not really sure what credentials to use and what connection parameters should be set to access the database. Any help would be greatly appreciated.
Regards,
Usman
Hi Usman,
The symantec link that you point to doesnt have much relevance here. But can you run by this VMware KB to see you are not affected by this ?
Also the time it takes for edit of the dataset is not necessarily due to server but also few other things. How many relationship per dataset do you have ?
Regards
adai
Hi Adai,
I was simply trying to point to the fact that we may be able to see which query is taking such a long time to execute by using the procedure in the symantec link of using the dbisqlc utility. Can you provide some guidance as how I could go about it? What credentials to use? I tried default sa with no password which didn't work.
I have verified that CPU contention is not being caused by CPU ready times %RDY or high co-stop %CSTP
Slowness is not necessarily only when editing existing datasets. It even occurs when I am adding new datasets and trying to add datastores from the data tab. Each dataset has 2-5 datastores max.