Active IQ Unified Manager Discussions

ADVICE - splitting a DFM large install base from one server to two new servers

emanuel
9,729 Views

I submitted a post about the prefered version of DFM to upgrade to ... this one is more of what we intend to perform after the primary upgrade is complete and some questions have been presented.

SITUATION:

We are monitoring ( with DFM 4.0D12 ) 100 plus controllers worldwide; mainly 7.3.2P4 with  some 8.0.1P2.  Primary function is monitor, alerting, and Performance  Advisor.  They have around 2200 total volumes and 14400 qtrees created;  expected to grow steadily ( in 12 months qtress could reach 30000 ).

PROBLEMS:

Numerous of cases where the monitor.db service has stopped and backups "kill" performance advisor as the DFM database is backing up.

DESIRED SOLUTION:

Upgrade their current DFM server to the latest version of DFM ( leaning towards 4.0.1D5 or later ).  Then evacuate the current server to two new servers.  One server will monitor, alert, and report on all filers under monitoring.  The other server will execute only performance advisor - backup will be disabled to prevent loss of monitoring.  We would take a backup copy of the current and upgraded DFM server to populate the new servers; they both will have exact copies of the original database but then trimmed down to perform the specific functions.

QUESTIONS:

1.     Is it possible to configure DFM to just be a Performance Advisor system only ( turn off all un-needed monitoring or functions ) which can allow PA to still gather performance stats?  A bare-bones PA system

2.     On the system performing monitoring and alerting; can we blow away the performance advisor database?

3.     They are going to use two modern servers - 24 cores, 24 GB RAM; questions they are asking is will DFM take advantage of the cores on the system; DFM about detects them but does it actually use "more than one core"?

Thanks for your time, Emanuel

18 REPLIES 18

adaikkap
9,661 Views
QUESTIONS:

1.     Is it possible to configure DFM to just be a Performance Advisor system only ( turn off all un-needed monitoring or functions ) which can allow PA to still gather performance stats?  A bare-bones PA system

I can think of switching off the following monitoring.dfmon and ccmon hostRbacmon and userquota mon, which are not required by PA.

2.     On the system performing monitoring and alerting; can we blow away the performance advisor database?

PA data is not stored in db they are stored as flat files in the perfdir,the location of which, can be got from the output of dfm about cli.

3.     They are going to use two modern servers - 24 cores, 24 GB RAM; questions they are asking is will DFM take advantage of the cores on the system; DFM about detects them but does it actually use "more than one core"?

we use all available cores

Regards

adai

hadrian
9,662 Views

How exactly do we do this?

"I can think of switching off the following monitoring.dfmon and ccmon hostRbacmon and uwerquota mon, which are not required by PA"

adaikkap
9,661 Views

Do, dfm options list | grep -i moninterval.

[root@lnx~]# dfm options set hostRBACMonInterval=off
Changed host RBAC monitoring interval to Off.
[root@lnx ~]#

In fact,you can turn off all monitoring other than discovery ones.provided you are going to dedicate, this only for performance advisor.

regards

adai

emanuel
9,663 Views

It would be interesting if we could make DFM have specific modes selectible from the options menu ... so it turns off and on all relavant options.

-- Performance Monitoring Only

-- Alerting Only

-- Reporting Only

-- etc

I know it may not be all possible since it needs to poll data to use it for other means.

shaunjurr
9,661 Views

We have a setup that is probably half the number of controllers and a lot few qtrees, but we see the same behavior during backup.  We basically lose almost all DFM functionality during backups.  Is there any on-going work to fix this?

We tried using snapshot backup once, but it seemed to balloon the storage requirements immensely.  Our local NetApp partner opened a case, but nothing ever came of it.  In fact, we've almost never gotten any results for DFM support...

(Sorry to piggy-back your issue... but we've thought of splitting up monitoring as well, just haven't gotten that far)

adaikkap
9,661 Views

Can you get us the output of the following ?

Dfm volume list –a | wc –l and dfm volume list |wc –l

Dfm qtree list –a | wc –l and dfm qtree list |wc –l

Dfm lun list –a | wc –l and dfm lun list | wc –l

If the difference in count of the two is more than 2x that could explain why your snapshot backup is consuming more space.

Open a case with NGS and prune your perf data. And dfm db, which will reduce your Storage requirement for snapshot based backup.

It’s the only way, IIRC only Performance Advisor data is not collected during the entrie duration of the backup.

Others work.

Regards

adai

shaunjurr
9,662 Views

Hi,

Here is the information... after some delay:

[root@dfm ~]# dfm volume list -a | wc -l

17894

[root@dfm ~]# dfm volume list | wc -l

1146

[root@dfm ~]# dfm qtree list -a | wc -l

20926

[root@dfm ~]# dfm qtree list | wc -l

1214

[root@dfm ~]# dfm lun list -a | wc -l

8149

[root@dfm ~]# dfm lun list | wc -l

683

I am not sure, however, how you conclude that this has some effect on the size of a snapshot backup.  If taking a snapshot simply means setting the database in "backup mode" (however this is done with Sybase), quiescing the filesystem, and taking a snapshot, why should the snapshot balloon?

The performance data is on a different NFS mount (I think in the day, we just moved the database back to local storage... ).  In any case, the details fail me now and there was never any real solution from NetApp support, so we gave up and used our limited time on things that gave us results. Sybase should really work on an NFS mount (and has worked for us internally in testing), but that doesn't seem to be supported

Now, there are some KB articles on taking snapshot backup, but it doesn't seem to be included in the standard documentation.  Even if it were, having the database on one of the systems that one is trying to manage can be a complicated (or just terrible) solution.  It seems to me that if I had a database solution that was offline for 30-45 minutes daily because of poor design choices, then I probably couldn't say that I have an enterprise class solution. Either the method of sampling data or the backup method needs to be changed.

adaikkap
8,763 Views

Raise a case against Bug 439756 for your database backup problem and large snapshot space.

Inorder for the snapshot based backup to work, perfdata ,script-plugin, and db dir needs to be either lun or local storage and not otherwise.

The documentation for setting up snapshot based backup

https://now.netapp.com/NOW/knowledge/docs/DFM_win/rel381/html/software/upgrade/install7.htm

Regards

adai

francoisbnc
7,303 Views

And what about db concistency for snapshot based backup? Not needed to switch in backup mode before?

francois

adaikkap
7,303 Views

No,consistent snapshot is taken by snapdrive for unix or windows as the case may, depending on your DFM server OS flavor.

Regards

adai

jakub_wartak
9,662 Views

Hi,


3.     They are going to use two modern servers - 24 cores, 24 GB RAM; questions they are asking is will DFM take advantage of the cores on the system; DFM about detects them but does it actually use "more than one core"?

we use all available cores


are you sure?

I've conducted a detailed tests using DFM 3.8.1D14 (on Windows Server Enterprise Edition 2003R2) and it appears that at least dfmserver part seems to be *NOT* scalabe. As that part is responsbile for communication with Snapdrive agents it looks like that any bottleneck there caues a lot of 'HTTP Post Errors'  which in turn in havoc whole Netapp stack (Snapmanagers, NMC GUIs, etc). My measurements indicated that e.g. not virutalized server with 4 AMD cores can be bottlenecked sometimes at 7 requests/s (a pretty low value - typcial webservers - other than embeded libzapid built into DFMserver can handle much more)... what's even more interesting one can fully reproduce that in lab (simulating a DFM server under extreme load). At this point i'm not sure it is related to DFMserver@Windows or overall it's the application design/implementation/architecture problem. What is interesting is that i couldn't drive dfmserver.exe to report more than 25% CPU used total on 4-way machine ...

Looks to me that any more serious deployment of Netapp whole stack (with SMO/SMSAP/SD with RBAC) is going to hit this problem?

Is there is any work happening on allowing of DFM to scale-up instead of only scale-out ?

-Jakub.

emanuel
9,662 Views

Hello

Another good question is ... will Protection Manager, version 4 and later on, support large memory on these host systems?  Customers are purchasing 16+ GB memory systems, some are using over 32GB.

adaikkap
8,763 Views

The next version of dfm is a 64 bit one, and it can use as much of RAM and CPU that you make available to it.

Regards

adai

wmccormick
8,763 Views

According to tr-3440, DFM 3.8 was limited to two (Linux) and four (Windows) cores.  DFM 4.x has removed this restriction and will use an unlimited number of cores.

jakub_wartak
8,763 Views

This TR-3440 is artificial, altough it mentions many Netapp filers monitored, many aggregates, PM jobs and so on - it fails with one critical thing - many Snapmanagers interacting with PM (dfmserver) and this is where i'm seeing a lot of scalability problems. IMHO dfmserver won't scale up... perhaps Sybase will, but not dfmserver.

-J.

adaikkap
8,763 Views

Yes, you are correct, sizing is not done with many snapmanager datasets.

BTW how many snapmangers do you have talking to your PM ?

Regards

adai

jakub_wartak
8,763 Views

We had something like 200 LPARs with Snapdrives, but only part of those like 80(?) were configured to use Snapmanagers + Snapdrives + RBAC... All of those were using single DFM server + plus several NMC GUIs running concurrently. We are 1-2 days after "scale-out" migration so we'll see how it behaves.

BTW: are you interested in the artificial benchmark results of dfmserver in our lab? If yes i can forward it to you.

adaikkap
7,302 Views

Yes please.

Regards

adai

Public