Effective December 3, NetApp adopts Microsoft’s Business-to-Customer (B2C) identity management to simplify and provide secure access to NetApp resources.
For accounts that did not pre-register (prior to Dec 3), access to your NetApp data may take up to 1 hour as your legacy NSS ID is synchronized to the new B2C identity.
To learn more, read the FAQ and watch the video.
Need assistance? Complete this form and select “Registration Issue” as the Feedback Category.

Active IQ Unified Manager Discussions

Poor performance - DFM database server

ITSSANSTORAGE

Hi guys, can anyone shed some light on DFM database performance? We have 14 filers being monitored via DFM / OnCommand 5.0.0 running on a VM (MS Windows 2008 R2 SP1 - 64bit, 4 vCPU's and 10GB RAM).

Whenever I delete events from the DFM database the dbsrv10.exe process ramps right up and more occasionally than not an alert is generated in DFM saying that the "Management station load is too high". DFM then pretty much is unresponsive for up to 10 minutes.

Has anyone else had poor performance issues with DFM / server. Does the DFM database require some kind of cleanup or tuning? Current size of monitordb.db file is 13.2GB. Any advice much appriciated. Cheers Ian

25 REPLIES 25

markaroach

I'm having the same issue.  It looks to me like a disk queue issue.  The disk queue length for the disk with the databases frequently goes to 250.  Way, Way, Way too high when it should be 1 or less.

kaikapola

I've heard rumors that number of running DP jobs may be related to this issue. But how to collect number running DP jobs automaticly?

//KK

adaikkap

Hi KK,

        Run this cli and do a count of it, dfpm job list -v jobs-running | wc -l

Regards

adai

USMANBUTT

Guys,

Can anyone provide any insight into this issue? Any changes that can be made to speed up things.

Regards,

Usman

adaikkap

Hi Usman,

     From which version did you upgrade from ? How long was it since you upgraded ?

After upgrade to 5.0.x we do the following.

1.Purge all data proteciton jobs older than 90 days which happens during everyday @ midnight. So soon after upgrade due to the amount of jobs to be purged you would encounter slowness for a week or so.

2.We prune all perf data files for stale instances which starts every sunday@ midnight and runs once every week. Since you have perf data which definitely will have stale entries due to mark-deleted objects this will also consume resources in your dfm server.

3.By default the events pruning happen everyday @ midnight for events older than 180 days.

since all this 3 happen during midnight I think these are the reasons why you feel slowness. But 1 & 2 should stabilize in a week or 2 and you should be able to see marked improvements in you dfm server.

I would also recommend you to upgrade your RAM to 16GB or so since you are running 100+ datasets.

Regards

adai

USMANBUTT

Also, I am seeing in the dfmserver.log that dfm is still trying to connect to vfiler's that were deleted a while ago. How would I go about stopping it?

Regards,

Usman

USMANBUTT

I have upgraded to 5.1 in hopes of sybase 11 improving performance but it seems to still be the same. As I mentioned, I believe the bottleneck is the database itself. Based on the following article, my guess is that only one vcpu gets pegged waiting for a sql query to finish and as a result, any other function takes a long time to execute (when the vcpu gets pegged, I can tell from task manager that the other processes are not using much cpu resources).

http://www.symantec.com/business/support/index?page=content&id=TECH190207

Does anyone have experience with using dbisqlc to monitor what's going on with the sybase database. Not really sure what credentials to use and what connection parameters should be set to access the database. Any help would be greatly appreciated.

Regards,

Usman

adaikkap

Hi Usman,

     The symantec link that you point to doesnt have much relevance here. But  can you run by this VMware KB to see you are not affected by this ?

http://kb.vmware.com/selfservice/microsites/search.do?language=en_US&cmd=displayKC&externalId=1005362

Also the time it takes for edit of the dataset is not necessarily due to server but also few other things. How many relationship per dataset do you have ?

Regards

adai

USMANBUTT

Hi Adai,

I was simply trying to point to the fact that we may be able to see which query is taking such a long time to execute by using the procedure in the symantec link of using the dbisqlc utility. Can you provide some guidance as how I could go about it? What credentials to use? I tried default sa with no password which didn't work.

I have verified that CPU contention is not being caused by CPU ready times %RDY or high co-stop %CSTP

Slowness is not necessarily only when editing existing datasets. It even occurs when I am adding new datasets and trying to add datastores from the data tab. Each dataset has 2-5 datastores max.

USMANBUTT

In task manager, when no vcpu is at 100%, I experience normal performance on the server. Whenever any vcpu is at 100%, its associated with the dbsrv11.exe process which is what makes me believe that its the database queries causing the slowness.

USMANBUTT

Set RAM to 16GB, still seeing same issues.

It seems the sybase database is the bottleneck. I've tried setting the dbcachesize parameter higher before which didn't help much. All other functions appear to be fine but everything slows down as soon as any datasets come into play.

Any ideas gents?

USMANBUTT

Hi Adai,

Its been a while since we upgraded to 5.0.1. A few weeks back I upgraded to 5.0.2 and just a couple of days back I upgraded to 5.0.2P1. Still very slow especially when backup jobs are running.

I'll try to increase RAM to 16GB but I doubt that will make a difference as the RAM utilization is barely 3GB right now.

jerome_barrelet

Hello guys,

We have a huge DFM environment with the same problems as you. The only way that we found to speed up was to split the DFM servers. But now it become again slower on 2 of them DFM for NAS and DFM for SAN, we found that it is really related to the number of datasets with protection policy. We also tried to Purge the DB with the tool embedded but we have seen no improvement.

We tried to increase CPU, RAM... but nothing help. We have the idea to put this VMs on SSD Drive; anybody try this ?

Now we have this configuration:

DFM for NAS

On Command Core 5.1

VM on Linux, 6vCPU, 12GB RAM, database on RDM FC

218 Datasets

14 Filers

DFM for SAP

On Command Core 5.1

VM on Linux, 6vCPU, 12GB RAM, database on RDM FC

220 Datasets

23 Filers

DFM for SAN

On Command Core 5.1

VM on Linux, 6vCPU, 12GB RAM, database on RDM FC

222 Datasets

12 Filers

DFM For Performance Advisor

On Command Core 5.1

VM on Linux, 6vCPU, 12GB RAM, database on RDM FC

22 Filers monitored

Regards

Jerome

USMANBUTT

That's funny Jerome, I was thinking of putting the DFM server on SSD as well but don't have any available right now.

Jerome, are you noticing the dbsrv10.exe process constantly using around 25% cpu when your environment slows down? I myself had to split us the DFM servers as a result of this slowness and turned off performance advisor as well, but nothing has helped so far.

jerome_barrelet

We foud also that a lot of things are not purged form the DB and I thinks it's not helping performance. Look at the event count number it's horrific

Exemple on our DFM NAS (CMD: dfm diag)

DP Job Information

Job State                        Count

Jobs Running                     44

Jobs Completed Total             124949

Jobs Aborted Total               0

Jobs Aborting Total              0

Jobs Completed Today             6184

Jobs Aborted Today               0

Jobs Aborting Today              0

Dataset Protection Status

Protection State                 Count

Protected                        216

Unprotected                      2

Event Counts

Table                            Count

Events                           15694712

Current Events                   15641683

Abnormal Events                  12152877

Event Type Counts

Event Type                 Count

volume.growthrate          11211599

sm.update                  3177912

data-protection-job.status 892396

snap-status                29322

df.avail                   29253

volume-clone.discovered    27466

df.kbytes                  25057

df.snapshot.kbytes         24847

df.inodes                  24845

volume.first-snap          24843

Version                          5.1 (5.1)

adaikkap

Hi Jerome,

     In your case the problem seem to be mainly due to DP jobs per day and its history. Looks like your jobs purge is set to 21 days. I would recommend you to bring this down to 1 week.

With respect to the events purge, 5.2 will address the same. I suggest you to lower your events purge to 1month or so.

Also putting a SSD will definitely help.

Regards

adai

jerome_barrelet

Hello Adai,

What do you means by : I suggest you to lower your events purge to 1month or so ?

There is a parameter to configure the purge of the event ?

Jerome

jerome_barrelet

Hello,

When it's slow we use 400% to 500% of CPU means the DB use 4 to 5 vcpu (6 vcpu in total)

PID USER  PR  NI  VIRT  RES  SHR S %CPU %MEMTIME+  COMMAND

  27182 root      16   0 6664m 4.7g  15m S 370.8 40.0  65899:10 dbsrv11

27222 root      15   0  245m  26m  10m S  0.3  0.2  36:37.45 dfmeventd

27226 root      15   0  159m 6608 5848 S  0.3  0.1  21:19.48 dfmwatchdog

jerome

lmunro_hug

Hi,

We are running DFM but an older version 4.0.2 and notice the same thing, performance is very bad when listing/deleting/selecting alerts, when this is done DFM host CPU goes to 100%. I was hoping our performance would improve when we upgrade to Oncommand 5 which I thought changed to 64bit architecture but from your experience it still looks bad.

DFM interface is very slow and inefficient to use like this, I believe you have to log a call to "prune" the DB which just seems ridiculous (it should have this built in). Our monitordb.db is 9GB but we are only monitoring 4 filers.

Luke

hawenxxxx

Hi

Same problem here. DFM 4.0.2 on a 6CPU/12Gb RAM machine. Monitoring 8 filers. DFM is pretty much unusable.

Did you guys find some kind of solution? We have tried to purge our DB with no difference in performance (took about two hours for a 7 Gb DB including purge/reload if someone want´s to know). According to NetApp support our dfmserver process is maxing out memorywise. I think there is a 2 Gb cap/process as DFM 4.0.2 is a 32-bit application.

/Hans

Announcements
NetApp on Discord Image

We're on Discord, are you?

Live Chat, Watch Parties, and More!

Explore Banner

Meet Explore, NetApp’s digital sales platform

Engage digitally throughout the sales process, from product discovery to configuration, and handle all your post-purchase needs.

NetApp Insights to Action
I2A Banner
Public