Active IQ Unified Manager Discussions

max number of nodes per unified manager

gberger
6,746 Views

Hello,

not easily to locate in the docs, anyone knows what is the latest number of nodes a single unified manager instance is certified to poll? We really want to consolidate our many ocum instances to make this a bit more unified.

Talking the latest 9.5 release of ocum.

 

thanks,

1 ACCEPTED SOLUTION

Dhiman
6,405 Views

Hi,

 

The TR-4621 (page 😎 explains how you should plan for scale. On a thumb rule, you should not have more than 72 nodes managed by a single OCUM instance. There have been many changes since UM 7.2 which makes resource utlization more intuitive; the events helps you with reactive measures as well. 

 

This step is taken into consideration because of multiple factors:

  1. Performance data is collected by default from 7.2 and is a non-configurable option. You cannot de-couple performance monitoring engine:
  2. On an average, it is observed that each node can take upto 5GB for retaining 13 months of performance data
  3. With the DB size clocking on the higher side, it would be hard to take backups for the data at scale. 

Going beyond the 72 node limit is not recommended. Moreover, I do not think all the 340 odd nodes are deployed from the same site. Getting UM to monitor more than 72 node per datacenter is a rare occurance. For this you will need to expand UM instance if your node count goes beyond 72. The Technical Report covers the consideration and reason behind it in detail. 

 

Please feel free to reach out to me in case you need further information. 

 

Best,

Dhiman

OCUM TME

 

 

   

View solution in original post

9 REPLIES 9

ruijuan
6,737 Views

Please refer to this doc: https://www.netapp.com/us/media/tr-4621.pdf

Section 3.

gberger
6,727 Views

yeap, hope was that 9.5 would have some improvements over 7.2 in terms of scaling up. Guess we are stuck for now and keep scaling horizontally which is not ideal of course. Maybe time to take that unified out of the name. thanks anyway

paul_stejskal
6,617 Views

How many nodes are you looking to have? If you wish to improve scalability perhaps open a case and we can look into a RFE or your account team may be able to get you a PVR if that is something you'd like to look into.

TKFUSSELL
6,573 Views

no problem so far with 48 nodes in 17 sites in 7 different countries...  🙂

gberger
6,529 Views

to answer one of the questions, we have a site with 60 cDot clusters and a combined node count of a bit over 340. Currently we spread it over 5 unified manager instances (all VM's but are maxing out their 12 cpu/40 GB ram os resources). Basically we don't want to spend a ton of money going physical (increased OS specs) if we cannot get the number of instances down substantially. Knowing that we will get pushback from support should we see performance problems should we go above the recommended limit of nodes to unified manager ration.

paul_stejskal
6,483 Views

I'd talk to your account team then. They can help push for an RFE fix or help you with a PVR.

Dhiman
6,406 Views

Hi,

 

The TR-4621 (page 😎 explains how you should plan for scale. On a thumb rule, you should not have more than 72 nodes managed by a single OCUM instance. There have been many changes since UM 7.2 which makes resource utlization more intuitive; the events helps you with reactive measures as well. 

 

This step is taken into consideration because of multiple factors:

  1. Performance data is collected by default from 7.2 and is a non-configurable option. You cannot de-couple performance monitoring engine:
  2. On an average, it is observed that each node can take upto 5GB for retaining 13 months of performance data
  3. With the DB size clocking on the higher side, it would be hard to take backups for the data at scale. 

Going beyond the 72 node limit is not recommended. Moreover, I do not think all the 340 odd nodes are deployed from the same site. Getting UM to monitor more than 72 node per datacenter is a rare occurance. For this you will need to expand UM instance if your node count goes beyond 72. The Technical Report covers the consideration and reason behind it in detail. 

 

Please feel free to reach out to me in case you need further information. 

 

Best,

Dhiman

OCUM TME

 

 

   

anton_oks
6,143 Views

Hi there.

 

Here is another NetApp customer suffering from OCUM (latest) since v7.2 after it was changed to have the "performance data" included.

 

Some background:

  • We have a huge NetApp installed base, spread around the globe
  • some 2-3 years ago, we had ~230x 7-Mode nodes in one DFM before, worked ok for years once the DFM-internal DB was on a SSD!
  • Now we have
    • 170x cDOT nodes in OCUM v7.1 and DFM v5.2.1 (Operations Manager, but lets stick with DFM Smiley Tongue )
    • 82x 7-Mode nodes DFM v5.2.2
  • Our DFM and OCUM servers OS is Windows
    • Company is on MS "Active Directory" and so due to DFM/OCUM "RBAC + user management" Windows is the OS of choice!
  • DFM and OCUM are critical parts of our IT infrastructure!
    • Many critical application here are referring to them if somehow NetApp is part of their application / architecture (almost every tool and application has somehow something on a NetApp!)
      • Per design, as we for ever had "only one" DFM, all those applications and tools, no matter where in the world they are, refer to this one DFM (or OCUM) today
  • Our prod. OCUM is still v7.1 because newer versions (see above, topic "performance data included") never worked here
    • backup is not working
    • If managed to have a backup, recovery never worked
    • Upgrades from v7.2 to anything newer also never worked

We have an open case with NetApp support for months (the guys we got for that case are GREAT btw.!) and brought our request to be able to split the "performance data" at least for backups/upgrades up to the product managemnt...  request was declined!

Now, after some months with NetApp support and various cases, we are forced to split the one OCUM instance into FIVE new ones (!!!)

 

Which for me means, we will

  • multiply all the trouble we face with OCUM, see "critical" above
  • All our and, even worse, also all our internal customers and colleagues tools and applications will have to be changed to refer to FIVE instead of one OCUM instance in the next months

 

 

BTW., talking about OCUM, it doesn't even offer historical Qtree performance and capacity data, which the old DFM does... so I must warn every NetApp customer... DO NOT relay on OCUM and don't use it for something serious!

 

Frustrated greetings....

Anton Oks

 

P.S. We also have plenty Harvests (NABox!) in parallel, but:

  • This is only for us, the NetApp admin team
  • Also no Qtree performance or capacity infos there

P.P.S.: Does anyone know how to include Qtree performance and capacity data into Harvest or, better, NABox? 😐

ruijuan
6,085 Views

Hi Anton,

 

I'm a QA manager from the Unified Manager engineering team. We'd like to get in touch with you to help you on this. Please send me an email at ruijuan@netapp.com and I will get you connected with a few engineers and product management here.

 

Thank you,

Ruijuan

Public