2011-10-07 10:58 AM
I have a customer that has been experiencing alot performance problems with reporting with his Operations Manager server (4.0.2). After doing some investigating, I came across this in tr-3440, OnCommand 5.0 Sizing Guide:
<<When the difference between the following commands are more than 2x then contact NGS to prune your db for improving the responce times.
<<dfm volume list -a and the volume list is 3x or more
<<dfm qtree list -a and qtree list is 2x or more
<<dfm lun list -a and lun list is 2x or more
<<dfm host list -a and host list is 2x or more
Going by the 4.0 Sizing Guide (the older tr-3440) it says that the max number of volumes and qtree for a setup with DFM, ProtMgr, and ProvMgr is 1650 volumes and 5800 qtrees. This customer of mine also has Performance Advisor running and the out put of the above commands is 85k+ for volumes (51x over limit) and 48k+ for qtrees (8x over limit)! So, it appears that the experienced performance problems, to at least some extent, are due to the massive ammounts of historical data lingering around and thus should be purged.
Now, I know of all the warnings about purging a filer object and that purging a filer object will also purge every record that it owns and hence could take a long time and you would be better off just installing the database. However, since I'm not intending to purge the filer object but just volumes and qtrees, what would be the impact of this? Will it take a long time and/or will it impact performance?
Also, if reinstalling the database would be the fastest and best way, is there a way to export the ProtMgr configs and reimport them back into the newly installed database? Otherwise we would have to document all the datasets and manually reconfigure them (yuck!).
Thanks for any input!
Solved! SEE THE SOLUTION
2011-10-08 12:37 AM
The number you are talking about 1650 & 5800 volumes and qtrees respectively is for a DFM with all license enabled(ie Ops-Mgr, Perf Advisor,Prot Mgr and Prov Mgr) which is just 40 nodes.
But the number is 10K and 50K volumes and qtrees for a just a DFM server doing only OM + PA with 250 node.
If your difference between dfm volume list and dfm volume list -a is more than 3X and similarly for others, then its time you purge you data of those deleted instances from the DB.
As anyways you wont be able to access them unless you guys uses the -a options in the cli. It makes perfect sense to run the db pruning.
You should also thing about moving to OnCommand 5.0 which is a 65 bit architecture and takes advantage of the available compute and memory.
Pls do open a NGS case and get your db cleaned and perf data too, after that you will definitely feel the difference. BTW do you have lot of snapmanagers in your environment ?
If thats the case its the flexclone volumes/luns that is the cause for lot of these deleted instance.
2011-11-21 07:52 AM
I have the same problem with OC 5.0 (upgraded over time from several previous DFM versions).
My deleted object count is also very high - yes it's Snapmanagers and their temporary vol clones / lun clones that fills up the database.
Is there an official way to prune the DB from deleted objects?
Something like "DELETE FROM volumes WHERE Deleted = Yes" ?
I'd like to prune the DB of our own lab DFM - not a production customer environment...
2011-11-29 04:49 PM
The official way to prune the DB from deleted objects is to open a support case with NetApp Global Support.
I've been researching it for days now to see if there is any way to avoid opening the case, but there isn't =).
Hope this helps,
2012-01-11 11:05 AM
This whole issue of having all these artifact objects is due to Snapmanager. Now I have almost 100k total volume objects total, both managed and "deleted," 53k qtrees and about 6000 luns. We migrated off SMO, but we still have SMVI, which as slowed down the growth, but I still need to migrate to a new DFM instance (hence my other posts that you have been responding to).
Anyway, I would strongly urge the OnCommand development team to integrate some intelligence into DFM to ignore these clones that are created in the backup process of Snapmanager so large environments like mine don't run into this again.
I ran a test on our lab of purging 600 volume objects and reloading the database. It took 50 minutes do do this. With as many objects that i would have to purge in production, this would take forever.
I opened a case with NGS and after looking at my situation, they agreed that the best course was to just migrate to a new DFM instance.
2013-03-05 08:46 PM
We have this purge tool today that take care of cleanup all this. Dfmpurge which will remove all these stale instances but requires down time. The utility has 2 modes and gives the estimation of downtime required as well. In most cases it shouldn't take more than 30mins to cleanup.
Pls take a look at this video ( 3.43 Mins) and read the KB on how this tool works.
Video Link: DFM Purge Tool: How to Video
Link to tool chest: http://support.netapp.com/NOW/download/tools/dfmpu
2013-03-06 07:02 AM
The tools looking for these kind of objects
Qtree 13658 ( 99%) 13573 ( 99%)
Volume 9018 ( 99%) 8966 ( 99%)
Interface 32 (100%) 32 (100%)
FCP Target 17 (100%) 17 (100%)
Aggregate 4 (100%) 4 (100%)
What's about events table?, in our case pretty hudge
WHERE eventDeleted IS NULL
WHERE eventDeleted IS NOT NULL