Active IQ Unified Manager Discussions
Active IQ Unified Manager Discussions
Anyone having any issues with Harvest 1.6 consuming 100% CPU ?
I have 2 clusters and 1 OCUM
if i take out 1 cluster and run it with 1 cluster its still 100% cpu usage
Tasks: 117 total, 3 running, 114 sleeping, 0 stopped, 0 zombie %Cpu(s): 43.0 us, 8.1 sy, 0.0 ni, 48.7 id, 0.0 wa, 0.0 hi, 0.2 si, 0.0 st KiB Mem : 1843120 total, 959732 free, 276056 used, 607332 buff/cache KiB Swap: 0 total, 0 free, 0 used. 1354556 avail Mem PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND 7914 netapp-+ 20 0 216068 20896 3712 R 100.0 1.1 14:20.07 netapp-worker 8683 root 20 0 482196 41196 6100 S 2.0 2.2 0:04.84 svc2influxdb.py 5786 thollow+ 20 0 20376 5848 1172 S 0.3 0.3 0:05.52 nmon 1 root 20 0 128124 6700 4168 S 0.0 0.4 0:25.32 systemd 2 root 20 0 0 0 0 S 0.0 0.0 0:00.01 kthreadd
I upgraded the AWS instance and it still comsumed all available CPU.
If i add in the second cluster i get 2 netapp-worker process both consuming 100% CPU
Any ideas ?
Centos 7 , Full patched..
So i have upgraded the instance to m5a.large and still have the issue.
Solved! See The Solution
after weeks of stuffing aroud finally found out what the issue was.
im my harvest.conf
under the host i had this
[a1c34-cdot1]
hostname = 19.19.209.109
site = a1c34-lab
username = netapp-harvest
password = test1234
data_update_freq = 60
host_type = filer
once i removed the line
host_type = filer
everything worked.
CPU dropped to barely anything
and its now collecting perf data.
Hi Greg,
This is strange, do you observer 100% CPU usage all of the time? Normally Harvest pollers should be sleeping most of the time, so you should see CPU usage only for a few seconds each minute. (And even so, 100% CPU isn't something that you should normally see).
There is a few things you could do:
- Check Harvest logs to see if there are any warnings or errors.
- Check the Grafana dashboard of the Harvest poller, especially the graphs showing API time. If anything is higher than 10 seconds, might be part of the issue.
Hi,
Yes is consuming 100% of the CPU all the time.
Im wondering if its AWS and how they limit the CPU's for smaller instances..
https://forums.aws.amazon.com/thread.jspa?threadID=71347
This is the first time i have tried to deploy it in AWS. I have been running it using both
NABOX and Vmware (2cpu and 8gb RAM) on prem and its never been an issue.
I even dropped the polling of Cdot down to every 5min.
The logs for harvest look good.
Ill keep poking around..
This is a log for Cdot Cluster
[2019-11-20 12:20:32] [NORMAL ] [main] Startup complete. Polling for new data every [600] seconds.
[2019-11-20 12:23:19] [NORMAL ] WORKER STARTED [Version: 1.6] [Conf: netapp-harvest.conf] [Poller: lpaunetapp2001]
[2019-11-20 12:23:19] [NORMAL ] [main] Poller will monitor a [filer] at [10.x.x.x:443]
[2019-11-20 12:23:19] [NORMAL ] [main] Poller will use [password] authentication with username [netapp-harvest] and password [**********]
[2019-11-20 12:23:20] [NORMAL ] [main] Collection of system info from [10.x.x.x] running [NetApp Release 9.5P8] successful.
[2019-11-20 12:23:20] [NORMAL ] [main] Found best-fit monitoring template (same generation and major release, minor same or less): [cdot-9.5.0.conf]
[2019-11-20 12:23:20] [NORMAL ] [main] Added and/or merged monitoring template [/opt/netapp-harvest/template/default/cdot-9.5.0.conf]
[2019-11-20 12:23:20] [NORMAL ] [main] Metrics will be submitted with graphite_root [netapp.perf.lpau.lpaunetapp2001]
[2019-11-20 12:23:20] [NORMAL ] [main] Using graphite_meta_metrics_root [netapp.poller.perf.lpau.lpaunetapp2001]
[2019-11-20 12:23:20] [NORMAL ] Creating output plugins
[2019-11-20 12:23:20] [NORMAL ] Created output plugins
[2019-11-20 12:23:20] [WARNING] [wafl_hya_per_aggr] Object type does not exist in Data ONTAP release; skipping
[2019-11-20 12:23:20] [WARNING] [nfsv4:node] Object type does not exist in Data ONTAP release; skipping
[2019-11-20 12:23:20] [WARNING] [cifs:vserver] Object type does not exist in Data ONTAP release; skipping
[2019-11-20 12:23:20] [WARNING] [hostadapter] Object type does not exist in Data ONTAP release; skipping
[2019-11-20 12:23:20] [WARNING] [object_store_client_op] Object type does not exist in Data ONTAP release; skipping
[2019-11-20 12:23:20] [WARNING] [lun] Object type does not exist in Data ONTAP release; skipping
[2019-11-20 12:23:20] [WARNING] [wafl_comp_aggr_vol_bin] Object type does not exist in Data ONTAP release; skipping
[2019-11-20 12:23:20] [WARNING] [nfsv3] Object type does not exist in Data ONTAP release; skipping
[2019-11-20 12:23:20] [WARNING] [processor] Object type does not exist in Data ONTAP release; skipping
[2019-11-20 12:23:20] [WARNING] [offbox_vscan_server] Object type does not exist in Data ONTAP release; skipping
[2019-11-20 12:23:20] [WARNING] [nfsv4_1:node] Object type does not exist in Data ONTAP release; skipping
[2019-11-20 12:23:20] [WARNING] [offbox_vscan] Object type does not exist in Data ONTAP release; skipping
[2019-11-20 12:23:20] [WARNING] [cifs:node] Object type does not exist in Data ONTAP release; skipping
[2019-11-20 12:23:20] [WARNING] [volume:node] Object type does not exist in Data ONTAP release; skipping
[2019-11-20 12:23:20] [WARNING] [nfsv4] Object type does not exist in Data ONTAP release; skipping
[2019-11-20 12:23:20] [WARNING] [nfsv3:node] Object type does not exist in Data ONTAP release; skipping
[2019-11-20 12:23:20] [WARNING] [lif] Object type does not exist in Data ONTAP release; skipping
[2019-11-20 12:23:20] [WARNING] [wafl_hya_sizer] Object type does not exist in Data ONTAP release; skipping
[2019-11-20 12:23:20] [WARNING] [workload] Object type does not exist in Data ONTAP release; skipping
[2019-11-20 12:23:20] [WARNING] [iscsi_lif] Object type does not exist in Data ONTAP release; skipping
[2019-11-20 12:23:20] [WARNING] [fcvi] Object type does not exist in Data ONTAP release; skipping
[2019-11-20 12:23:20] [WARNING] [token_manager] Object type does not exist in Data ONTAP release; skipping
[2019-11-20 12:23:20] [WARNING] [disk:constituent] Object type does not exist in Data ONTAP release; skipping
[2019-11-20 12:23:20] [WARNING] [workload_volume] Object type does not exist in Data ONTAP release; skipping
[2019-11-20 12:23:20] [WARNING] [fcp_lif] Object type does not exist in Data ONTAP release; skipping
[2019-11-20 12:23:20] [WARNING] [system:node] Object type does not exist in Data ONTAP release; skipping
[2019-11-20 12:23:20] [WARNING] [copy_manager] Object type does not exist in Data ONTAP release; skipping
[2019-11-20 12:23:20] [WARNING] [resource_headroom_aggr] Object type does not exist in Data ONTAP release; skipping
[2019-11-20 12:23:20] [WARNING] [nic_common] Object type does not exist in Data ONTAP release; skipping
[2019-11-20 12:23:20] [WARNING] [volume] Object type does not exist in Data ONTAP release; skipping
[2019-11-20 12:23:20] [WARNING] [path] Object type does not exist in Data ONTAP release; skipping
[2019-11-20 12:23:20] [WARNING] [resource_headroom_cpu] Object type does not exist in Data ONTAP release; skipping
[2019-11-20 12:23:20] [WARNING] [wafl] Object type does not exist in Data ONTAP release; skipping
:
This is a log for OCUM
[2019-11-20 12:15:57] [NORMAL ] WORKER STARTED [Version: 1.6] [Conf: netapp-harvest.conf] [Poller: 10.19.21.235] [2019-11-20 12:15:57] [NORMAL ] [main] Poller will monitor a [OCUM] at [10.x.x.x:443] [2019-11-20 12:15:57] [NORMAL ] [main] Poller will use [password] authentication with username [netapp-harvest] and password [**********] [2019-11-20 12:15:57] [NORMAL ] [sysinfo] Discovered [lpaunetapp2001] on OCUM server and will submit metrics under group [lpau]. [2019-11-20 12:15:57] [NORMAL ] [sysinfo] Discovered [lpaunetapp0001] on OCUM server and will submit metrics under group [lpau]. [2019-11-20 12:15:57] [NORMAL ] [main] Collection of system info from [10.x.x.x] running [9.4] successful. [2019-11-20 12:15:57] [NORMAL ] [main] Found best-fit monitoring template (same generation and major release, minor same or less): [ocum-9.4.0.conf] [2019-11-20 12:15:57] [NORMAL ] [main] Added and/or merged monitoring template [/opt/netapp-harvest/template/default/ocum-9.4.0.conf] [2019-11-20 12:15:57] [NORMAL ] [main] Metrics for cluster [lpaunetapp0001] will be submitted with graphite_root [netapp.capacity.lpau.lpaunetapp0001] [2019-11-20 12:15:57] [NORMAL ] [main] Metrics for cluster [lpaunetapp2001] will be submitted with graphite_root [netapp.capacity.lpau.lpaunetapp2001] [2019-11-20 12:15:57] [NORMAL ] [main] Using graphite_meta_metrics_root [netapp.poller.capacity.lpau.10_19_21_235] [2019-11-20 12:15:57] [NORMAL ] Creating output plugins [2019-11-20 12:15:57] [NORMAL ] Created output plugins [2019-11-20 12:15:57] [NORMAL ] [main] Startup complete. Polling for new data every [900] seconds. [2019-11-20 12:20:31] [NORMAL ] WORKER STARTED [Version: 1.6] [Conf: netapp-harvest.conf] [Poller: 10.19.21.235] [2019-11-20 12:20:31] [NORMAL ] [main] Poller will monitor a [OCUM] at [10.x.x.x:443] [2019-11-20 12:20:31] [NORMAL ] [main] Poller will use [password] authentication with username [netapp-harvest] and password [**********] [2019-11-20 12:20:32] [NORMAL ] [sysinfo] Discovered [lpaunetapp2001] on OCUM server and will submit metrics under group [lpau]. [2019-11-20 12:20:32] [NORMAL ] [sysinfo] Discovered [lpaunetapp0001] on OCUM server and will submit metrics under group [lpau]. [2019-11-20 12:20:32] [NORMAL ] [main] Collection of system info from [10.x.x.x] running [9.4] successful. [2019-11-20 12:20:32] [NORMAL ] [main] Found best-fit monitoring template (same generation and major release, minor same or less): [ocum-9.4.0.conf] [2019-11-20 12:20:32] [NORMAL ] [main] Added and/or merged monitoring template [/opt/netapp-harvest/template/default/ocum-9.4.0.conf] [2019-11-20 12:20:32] [NORMAL ] [main] Metrics for cluster [lpaunetapp0001] will be submitted with graphite_root [netapp.capacity.lpau.lpaunetapp0001] [2019-11-20 12:20:32] [NORMAL ] [main] Metrics for cluster [lpaunetapp2001] will be submitted with graphite_root [netapp.capacity.lpau.lpaunetapp2001] [2019-11-20 12:20:32] [NORMAL ] [main] Using graphite_meta_metrics_root [netapp.poller.capacity.lpau.10_x_x_x] [2019-11-20 12:20:32] [NORMAL ] Creating output plugins [2019-11-20 12:20:32] [NORMAL ] Created output plugins [2019-11-20 12:20:32] [NORMAL ] [main] Startup complete. Polling for new data every [900] seconds. [2019-11-20 12:23:19] [NORMAL ] WORKER STARTED [Version: 1.6] [Conf: netapp-harvest.conf] [Poller: 10.x.x.x] [2019-11-20 12:23:19] [NORMAL ] [main] Poller will monitor a [OCUM] at [10.x.x.x:443] [2019-11-20 12:23:19] [NORMAL ] [main] Poller will use [password] authentication with username [netapp-harvest] and password [**********] [2019-11-20 12:23:19] [NORMAL ] [sysinfo] Discovered [lpaunetapp2001] on OCUM server and will submit metrics under group [lpau]. [2019-11-20 12:23:19] [NORMAL ] [sysinfo] Discovered [lpaunetapp0001] on OCUM server and will submit metrics under group [lpau]. [2019-11-20 12:23:19] [NORMAL ] [main] Collection of system info from [10.x.x.x] running [9.4] successful. [2019-11-20 12:23:19] [NORMAL ] [main] Found best-fit monitoring template (same generation and major release, minor same or less): [ocum-9.4.0.conf] [2019-11-20 12:23:19] [NORMAL ] [main] Added and/or merged monitoring template [/opt/netapp-harvest/template/default/ocum-9.4.0.conf] [2019-11-20 12:23:19] [NORMAL ] [main] Metrics for cluster [lpaunetapp0001] will be submitted with graphite_root [netapp.capacity.lpau.lpaunetapp0001] [2019-11-20 12:23:19] [NORMAL ] [main] Metrics for cluster [lpaunetapp2001] will be submitted with graphite_root [netapp.capacity.lpau.lpaunetapp2001] [2019-11-20 12:23:19] [NORMAL ] [main] Using graphite_meta_metrics_root [netapp.poller.capacity.lpau.10_x_x_x] [2019-11-20 12:23:19] [NORMAL ] Creating output plugins [2019-11-20 12:23:19] [NORMAL ] Created output plugins [2019-11-20 12:23:19] [NORMAL ] [main] Startup complete. Polling for new data every [900] seconds.
That might explain it, 2 CPUs should be complete fine for two pollers. I run 10 pollers on 2 CPUs and I'm stil fine.
But the warnings in the first log also look odd. What is your cDot version? Even if you have a very old release, I don't think so many object types should be unavailable.
I shouldn't be asking your release since it right up there in the log 🙂
No this isn't right for Ontap 9.5. If you want us to take a closer look, run your poller in vebose mode and share the logs with me (either here or just send to me).
after weeks of stuffing aroud finally found out what the issue was.
im my harvest.conf
under the host i had this
[a1c34-cdot1]
hostname = 19.19.209.109
site = a1c34-lab
username = netapp-harvest
password = test1234
data_update_freq = 60
host_type = filer
once i removed the line
host_type = filer
everything worked.
CPU dropped to barely anything
and its now collecting perf data.