not easily to locate in the docs, anyone knows what is the latest number of nodes a single unified manager instance is certified to poll? We really want to consolidate our many ocum instances to make this a bit more unified.
yeap, hope was that 9.5 would have some improvements over 7.2 in terms of scaling up. Guess we are stuck for now and keep scaling horizontally which is not ideal of course. Maybe time to take that unified out of the name. thanks anyway
How many nodes are you looking to have? If you wish to improve scalability perhaps open a case and we can look into a RFE or your account team may be able to get you a PVR if that is something you'd like to look into.
to answer one of the questions, we have a site with 60 cDot clusters and a combined node count of a bit over 340. Currently we spread it over 5 unified manager instances (all VM's but are maxing out their 12 cpu/40 GB ram os resources). Basically we don't want to spend a ton of money going physical (increased OS specs) if we cannot get the number of instances down substantially. Knowing that we will get pushback from support should we see performance problems should we go above the recommended limit of nodes to unified manager ration.
The TR-4621 (page 😎 explains how you should plan for scale. On a thumb rule, you should not have more than 72 nodes managed by a single OCUM instance. There have been many changes since UM 7.2 which makes resource utlization more intuitive; the events helps you with reactive measures as well.
This step is taken into consideration because of multiple factors:
Performance data is collected by default from 7.2 and is a non-configurable option. You cannot de-couple performance monitoring engine:
On an average, it is observed that each node can take upto 5GB for retaining 13 months of performance data
With the DB size clocking on the higher side, it would be hard to take backups for the data at scale.
Going beyond the 72 node limit is not recommended. Moreover, I do not think all the 340 odd nodes are deployed from the same site. Getting UM to monitor more than 72 node per datacenter is a rare occurance. For this you will need to expand UM instance if your node count goes beyond 72. The Technical Report covers the consideration and reason behind it in detail.
Please feel free to reach out to me in case you need further information.
Here is another NetApp customer suffering from OCUM (latest) since v7.2 after it was changed to have the "performance data" included.
We have a huge NetApp installed base, spread around the globe
some 2-3 years ago, we had ~230x7-Mode nodes in oneDFM before, worked ok for years once the DFM-internal DB was on a SSD!
Now we have
170xcDOT nodes in OCUM v7.1 and DFM v5.2.1 (Operations Manager, but lets stick with DFM )
82x7-Mode nodes DFM v5.2.2
Our DFM and OCUM servers OS is Windows
Company is on MS "Active Directory" and so due to DFM/OCUM "RBAC + user management" Windows is the OS of choice!
DFM and OCUM are critical parts of our IT infrastructure!
Many critical application here are referring to them if somehow NetApp is part of their application / architecture (almost every tool and application has somehow something on a NetApp!)
Per design, as we for ever had "only one" DFM, all those applications and tools, no matter where in the world they are, refer to this one DFM (or OCUM) today
Our prod. OCUM is still v7.1 because newer versions (see above, topic "performance data included") never worked here
backup is not working
If managed to have a backup, recovery never worked
Upgrades from v7.2 to anything newer also never worked
We have an open case with NetApp support for months (the guys we got for that case are GREAT btw.!) and brought our request to be able to split the "performance data" at least for backups/upgrades up to the product managemnt... request was declined!
Now, after some months with NetApp support and various cases, we are forced to split the one OCUM instance into FIVE new ones (!!!)
Which for me means, we will
multiply all the trouble we face with OCUM, see "critical" above
All our and, even worse, also all our internal customers and colleagues tools and applications will have to be changed to refer to FIVE instead of one OCUM instance in the next months
BTW., talking about OCUM, it doesn't even offer historical Qtree performance and capacity data, which the old DFM does... so I must warn every NetApp customer... DO NOT relay on OCUM and don't use it for something serious!
P.S. We also have plenty Harvests (NABox!) in parallel, but:
This is only for us, the NetApp admin team
Also no Qtree performance or capacity infos there
P.P.S.: Does anyone know how to include Qtree performance and capacity data into Harvest or, better, NABox? 😐
I'm a QA manager from the Unified Manager engineering team. We'd like to get in touch with you to help you on this. Please send me an email at firstname.lastname@example.org and I will get you connected with a few engineers and product management here.