Re: Poor performance - DFM database server

ITSSANSTORAGE · ‎2012-11-11

Hi guys, can anyone shed some light on DFM database performance? We have 14 filers being monitored via DFM / OnCommand 5.0.0 running on a VM (MS Windows 2008 R2 SP1 - 64bit, 4 vCPU's and 10GB RAM).

Whenever I delete events from the DFM database the dbsrv10.exe process ramps right up and more occasionally than not an alert is generated in DFM saying that the "Management station load is too high". DFM then pretty much is unresponsive for up to 10 minutes.

Has anyone else had poor performance issues with DFM / server. Does the DFM database require some kind of cleanup or tuning? Current size of monitordb.db file is 13.2GB. Any advice much appriciated. Cheers Ian

lmunro_hug · ‎2012-11-12

Hi,

We are running DFM but an older version 4.0.2 and notice the same thing, performance is very bad when listing/deleting/selecting alerts, when this is done DFM host CPU goes to 100%. I was hoping our performance would improve when we upgrade to Oncommand 5 which I thought changed to 64bit architecture but from your experience it still looks bad.

DFM interface is very slow and inefficient to use like this, I believe you have to log a call to "prune" the DB which just seems ridiculous (it should have this built in). Our monitordb.db is 9GB but we are only monitoring 4 filers.

Luke

hawenxxxx · ‎2013-01-22

Hi

Same problem here. DFM 4.0.2 on a 6CPU/12Gb RAM machine. Monitoring 8 filers. DFM is pretty much unusable.

Did you guys find some kind of solution? We have tried to purge our DB with no difference in performance (took about two hours for a 7 Gb DB including purge/reload if someone want´s to know). According to NetApp support our dfmserver process is maxing out memorywise. I think there is a 2 Gb cap/process as DFM 4.0.2 is a 32-bit application.

/Hans

adaikkap · ‎2013-03-01

Guys try out the new dfm purge utility and also upgrade to 5.0.2P1

DFM Purge Tool: How to Video

Regards

adai

USMANBUTT · ‎2013-03-31

Hi Adai,

Does the P1 fix the slowness issue? I literally have to wait for 2-3 minutes while editing datasets.

Regards,

Usman

adaikkap · ‎2013-03-31

Hi Usman,

The slowness is not due to any bug. But it could be related to multiple configuration issues. What version of OCUM/DFM are you running currently ?

Is your server a VMware ? If so what is the memory and CPU configuration ? Are they reserved or allocated ? How many dataset are currently there ? How many controllers are being monitored ? Is Performance Advisor also Enabled on the same server ?

Answers to many of the above questions affect what you described. In general 5.0.2P1 is a very stable release with no known memory or performance or functional issue.

Regards

adai

USMANBUTT · ‎2013-03-31

Hi Adai,

Thanks for getting back to me. I have been working on this for a while, even opened a case couple of times but nothing has really helped. Just installed the 5.0.2P1 patch which was running good for about 5 mins after which it has slowed down again. I am constantly seeing at least one core of the server being pegged and the dbsrv10.exe process using about 25% CPU. It seems to me the delay is being caused by the sybase database. Prior to installing the 5.0.2P1 patch I did use the dfm purge utility as well.

What version of OCUM/DFM are you running currently ?

5.0.2P1 on Windows 2008 R2 SP1 x64

Is your server a VMware ?

Yes

If so what is the memory and CPU configuration ?

4 vCPU, 6GB RAM (played around with different configs to test but none have helped)

Are they reserved or allocated ?

Yup, I have 5GHz reserved for CPU and 6GB for RAM

How many dataset are currently there ?

113

How many controllers are being monitored ?

8

Is Performance Advisor also Enabled on the same server ?

Was initially, I disabled it in hopes of improving performance, but didn't help.

Regards,

Usman

USMANBUTT · ‎2013-04-01

Adai,

I have also noticed that the slowness is during the night time when our backup jobs are running. During day time, its usually pretty fast. Any ideas?

Regards,

Usman

USMANBUTT · ‎2013-04-02

Guys,

Can anyone provide any insight into this issue? Any changes that can be made to speed up things.

Regards,

Usman

adaikkap · ‎2013-04-02

Hi Usman,

From which version did you upgrade from ? How long was it since you upgraded ?

After upgrade to 5.0.x we do the following.

1.Purge all data proteciton jobs older than 90 days which happens during everyday @ midnight. So soon after upgrade due to the amount of jobs to be purged you would encounter slowness for a week or so.

2.We prune all perf data files for stale instances which starts every sunday@ midnight and runs once every week. Since you have perf data which definitely will have stale entries due to mark-deleted objects this will also consume resources in your dfm server.

3.By default the events pruning happen everyday @ midnight for events older than 180 days.

since all this 3 happen during midnight I think these are the reasons why you feel slowness. But 1 & 2 should stabilize in a week or 2 and you should be able to see marked improvements in you dfm server.

I would also recommend you to upgrade your RAM to 16GB or so since you are running 100+ datasets.

Regards

adai

jerome_barrelet · ‎2013-04-03

Hello guys,

We have a huge DFM environment with the same problems as you. The only way that we found to speed up was to split the DFM servers. But now it become again slower on 2 of them DFM for NAS and DFM for SAN, we found that it is really related to the number of datasets with protection policy. We also tried to Purge the DB with the tool embedded but we have seen no improvement.

We tried to increase CPU, RAM... but nothing help. We have the idea to put this VMs on SSD Drive; anybody try this ?

Now we have this configuration:

DFM for NAS

On Command Core 5.1

VM on Linux, 6vCPU, 12GB RAM, database on RDM FC

218 Datasets

14 Filers

DFM for SAP

On Command Core 5.1

VM on Linux, 6vCPU, 12GB RAM, database on RDM FC

220 Datasets

23 Filers

DFM for SAN

On Command Core 5.1

VM on Linux, 6vCPU, 12GB RAM, database on RDM FC

222 Datasets

12 Filers

DFM For Performance Advisor

On Command Core 5.1

VM on Linux, 6vCPU, 12GB RAM, database on RDM FC

22 Filers monitored

Regards

Jerome

USMANBUTT · ‎2013-04-03

That's funny Jerome, I was thinking of putting the DFM server on SSD as well but don't have any available right now.

Jerome, are you noticing the dbsrv10.exe process constantly using around 25% cpu when your environment slows down? I myself had to split us the DFM servers as a result of this slowness and turned off performance advisor as well, but nothing has helped so far.

jerome_barrelet · ‎2013-04-03

Hello,

When it's slow we use 400% to 500% of CPU means the DB use 4 to 5 vcpu (6 vcpu in total)

PID USER

PR NI VIRT RES SHR S %CPU %MEM

TIME+ COMMAND

27182 root 16 0 6664m 4.7g 15m S 370.8 40.0 65899:10 dbsrv11

27222 root 15 0 245m 26m 10m S 0.3 0.2 36:37.45 dfmeventd

27226 root 15 0 159m 6608 5848 S 0.3 0.1 21:19.48 dfmwatchdog

jerome

jerome_barrelet · ‎2013-04-03

We foud also that a lot of things are not purged form the DB and I thinks it's not helping performance. Look at the event count number it's horrific

Exemple on our DFM NAS (CMD: dfm diag)

DP Job Information

Job State Count

Jobs Running 44

Jobs Completed Total 124949

Jobs Aborted Total 0

Jobs Aborting Total 0

Jobs Completed Today 6184

Jobs Aborted Today 0

Jobs Aborting Today 0

Dataset Protection Status

Protection State Count

Protected 216

Unprotected 2

Event Counts

Table Count

Events 15694712

Current Events 15641683

Abnormal Events 12152877

Event Type Counts

Event Type Count

volume.growthrate 11211599

sm.update 3177912

data-protection-job.status 892396

snap-status 29322

df.avail 29253

volume-clone.discovered 27466

df.kbytes 25057

df.snapshot.kbytes 24847

df.inodes 24845

volume.first-snap 24843

Version 5.1 (5.1)

adaikkap · ‎2013-04-04

Hi Jerome,

In your case the problem seem to be mainly due to DP jobs per day and its history. Looks like your jobs purge is set to 21 days. I would recommend you to bring this down to 1 week.

With respect to the events purge, 5.2 will address the same. I suggest you to lower your events purge to 1month or so.

Also putting a SSD will definitely help.

Regards

adai

jerome_barrelet · ‎2013-04-05

Hello Adai,

What do you means by : I suggest you to lower your events purge to 1month or so ?

There is a parameter to configure the purge of the event ?

Jerome

USMANBUTT · ‎2013-04-03

Hi Adai,

Its been a while since we upgraded to 5.0.1. A few weeks back I upgraded to 5.0.2 and just a couple of days back I upgraded to 5.0.2P1. Still very slow especially when backup jobs are running.

I'll try to increase RAM to 16GB but I doubt that will make a difference as the RAM utilization is barely 3GB right now.

USMANBUTT · ‎2013-04-03

Set RAM to 16GB, still seeing same issues.

It seems the sybase database is the bottleneck. I've tried setting the dbcachesize parameter higher before which didn't help much. All other functions appear to be fine but everything slows down as soon as any datasets come into play.

Any ideas gents?

USMANBUTT · ‎2013-04-03

I have upgraded to 5.1 in hopes of sybase 11 improving performance but it seems to still be the same. As I mentioned, I believe the bottleneck is the database itself. Based on the following article, my guess is that only one vcpu gets pegged waiting for a sql query to finish and as a result, any other function takes a long time to execute (when the vcpu gets pegged, I can tell from task manager that the other processes are not using much cpu resources).

http://www.symantec.com/business/support/index?page=content&id=TECH190207

Does anyone have experience with using dbisqlc to monitor what's going on with the sybase database. Not really sure what credentials to use and what connection parameters should be set to access the database. Any help would be greatly appreciated.

Regards,

Usman

adaikkap · ‎2013-04-04

Hi Usman,

The symantec link that you point to doesnt have much relevance here. But can you run by this VMware KB to see you are not affected by this ?

http://kb.vmware.com/selfservice/microsites/search.do?language=en_US&cmd=displayKC&externalId=1005362

Also the time it takes for edit of the dataset is not necessarily due to server but also few other things. How many relationship per dataset do you have ?

Regards

adai

USMANBUTT · ‎2013-04-04

Hi Adai,

I was simply trying to point to the fact that we may be able to see which query is taking such a long time to execute by using the procedure in the symantec link of using the dbisqlc utility. Can you provide some guidance as how I could go about it? What credentials to use? I tried default sa with no password which didn't work.

I have verified that CPU contention is not being caused by CPU ready times %RDY or high co-stop %CSTP

Slowness is not necessarily only when editing existing datasets. It even occurs when I am adding new datasets and trying to add datastores from the data tab. Each dataset has 2-5 datastores max.