Solved: Harvest 1.6 consuming 100% CPU

Greg_Wilson · ‎2019-11-19

Anyone having any issues with Harvest 1.6 consuming 100% CPU ?

I have 2 clusters and 1 OCUM

if i take out 1 cluster and run it with 1 cluster its still 100% cpu usage

Tasks: 117 total,   3 running, 114 sleeping,   0 stopped,   0 zombie
%Cpu(s): 43.0 us,  8.1 sy,  0.0 ni, 48.7 id,  0.0 wa,  0.0 hi,  0.2 si,  0.0 st
KiB Mem :  1843120 total,   959732 free,   276056 used,   607332 buff/cache
KiB Swap:        0 total,        0 free,        0 used.  1354556 avail Mem

  PID USER      PR  NI    VIRT    RES    SHR S  %CPU %MEM     TIME+ COMMAND
 7914 netapp-+  20   0  216068  20896   3712 R 100.0  1.1  14:20.07 netapp-worker
 8683 root      20   0  482196  41196   6100 S   2.0  2.2   0:04.84 svc2influxdb.py
 5786 thollow+  20   0   20376   5848   1172 S   0.3  0.3   0:05.52 nmon
    1 root      20   0  128124   6700   4168 S   0.0  0.4   0:25.32 systemd
    2 root      20   0       0      0      0 S   0.0  0.0   0:00.01 kthreadd

I upgraded the AWS instance and it still comsumed all available CPU.

If i add in the second cluster i get 2 netapp-worker process both consuming 100% CPU

Any ideas ?

Centos 7 , Full patched..

So i have upgraded the instance to m5a.large and still have the issue.

Greg_Wilson · ‎2019-12-16

after weeks of stuffing aroud finally found out what the issue was.

im my harvest.conf

under the host i had this

[a1c34-cdot1]
hostname = 19.19.209.109
site = a1c34-lab
username = netapp-harvest
password = test1234
data_update_freq = 60
host_type = filer

once i removed the line

host_type = filer

everything worked.

CPU dropped to barely anything

and its now collecting perf data.

View solution in original post

vachagan_gratian · ‎2019-11-20

Hi Greg,

This is strange, do you observer 100% CPU usage all of the time? Normally Harvest pollers should be sleeping most of the time, so you should see CPU usage only for a few seconds each minute. (And even so, 100% CPU isn't something that you should normally see).

There is a few things you could do:

- Check Harvest logs to see if there are any warnings or errors.

- Check the Grafana dashboard of the Harvest poller, especially the graphs showing API time. If anything is higher than 10 seconds, might be part of the issue.

Greg_Wilson · ‎2019-11-20

Hi,

Yes is consuming 100% of the CPU all the time.

Im wondering if its AWS and how they limit the CPU's for smaller instances..

https://forums.aws.amazon.com/thread.jspa?threadID=71347

https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/burstable-performance-instances-unlimited-mode.html

This is the first time i have tried to deploy it in AWS. I have been running it using both

NABOX and Vmware (2cpu and 8gb RAM) on prem and its never been an issue.

I even dropped the polling of Cdot down to every 5min.

The logs for harvest look good.

Ill keep poking around..

This is a log for Cdot Cluster

[2019-11-20 12:20:32] [NORMAL ] [main] Startup complete. Polling for new data every [600] seconds.
[2019-11-20 12:23:19] [NORMAL ] WORKER STARTED [Version: 1.6] [Conf: netapp-harvest.conf] [Poller: lpaunetapp2001]
[2019-11-20 12:23:19] [NORMAL ] [main] Poller will monitor a [filer] at [10.x.x.x:443]
[2019-11-20 12:23:19] [NORMAL ] [main] Poller will use [password] authentication with username [netapp-harvest] and password [**********]
[2019-11-20 12:23:20] [NORMAL ] [main] Collection of system info from [10.x.x.x] running [NetApp Release 9.5P8] successful.
[2019-11-20 12:23:20] [NORMAL ] [main] Found best-fit monitoring template (same generation and major release, minor same or less): [cdot-9.5.0.conf]
[2019-11-20 12:23:20] [NORMAL ] [main] Added and/or merged monitoring template [/opt/netapp-harvest/template/default/cdot-9.5.0.conf]
[2019-11-20 12:23:20] [NORMAL ] [main] Metrics will be submitted with graphite_root [netapp.perf.lpau.lpaunetapp2001]
[2019-11-20 12:23:20] [NORMAL ] [main] Using graphite_meta_metrics_root [netapp.poller.perf.lpau.lpaunetapp2001]
[2019-11-20 12:23:20] [NORMAL ] Creating output plugins
[2019-11-20 12:23:20] [NORMAL ] Created output plugins
[2019-11-20 12:23:20] [WARNING] [wafl_hya_per_aggr] Object type does not exist in Data ONTAP release; skipping
[2019-11-20 12:23:20] [WARNING] [nfsv4:node] Object type does not exist in Data ONTAP release; skipping
[2019-11-20 12:23:20] [WARNING] [cifs:vserver] Object type does not exist in Data ONTAP release; skipping
[2019-11-20 12:23:20] [WARNING] [hostadapter] Object type does not exist in Data ONTAP release; skipping
[2019-11-20 12:23:20] [WARNING] [object_store_client_op] Object type does not exist in Data ONTAP release; skipping
[2019-11-20 12:23:20] [WARNING] [lun] Object type does not exist in Data ONTAP release; skipping
[2019-11-20 12:23:20] [WARNING] [wafl_comp_aggr_vol_bin] Object type does not exist in Data ONTAP release; skipping
[2019-11-20 12:23:20] [WARNING] [nfsv3] Object type does not exist in Data ONTAP release; skipping
[2019-11-20 12:23:20] [WARNING] [processor] Object type does not exist in Data ONTAP release; skipping
[2019-11-20 12:23:20] [WARNING] [offbox_vscan_server] Object type does not exist in Data ONTAP release; skipping
[2019-11-20 12:23:20] [WARNING] [nfsv4_1:node] Object type does not exist in Data ONTAP release; skipping
[2019-11-20 12:23:20] [WARNING] [offbox_vscan] Object type does not exist in Data ONTAP release; skipping
[2019-11-20 12:23:20] [WARNING] [cifs:node] Object type does not exist in Data ONTAP release; skipping
[2019-11-20 12:23:20] [WARNING] [volume:node] Object type does not exist in Data ONTAP release; skipping
[2019-11-20 12:23:20] [WARNING] [nfsv4] Object type does not exist in Data ONTAP release; skipping
[2019-11-20 12:23:20] [WARNING] [nfsv3:node] Object type does not exist in Data ONTAP release; skipping
[2019-11-20 12:23:20] [WARNING] [lif] Object type does not exist in Data ONTAP release; skipping
[2019-11-20 12:23:20] [WARNING] [wafl_hya_sizer] Object type does not exist in Data ONTAP release; skipping
[2019-11-20 12:23:20] [WARNING] [workload] Object type does not exist in Data ONTAP release; skipping
[2019-11-20 12:23:20] [WARNING] [iscsi_lif] Object type does not exist in Data ONTAP release; skipping
[2019-11-20 12:23:20] [WARNING] [fcvi] Object type does not exist in Data ONTAP release; skipping
[2019-11-20 12:23:20] [WARNING] [token_manager] Object type does not exist in Data ONTAP release; skipping
[2019-11-20 12:23:20] [WARNING] [disk:constituent] Object type does not exist in Data ONTAP release; skipping
[2019-11-20 12:23:20] [WARNING] [workload_volume] Object type does not exist in Data ONTAP release; skipping
[2019-11-20 12:23:20] [WARNING] [fcp_lif] Object type does not exist in Data ONTAP release; skipping
[2019-11-20 12:23:20] [WARNING] [system:node] Object type does not exist in Data ONTAP release; skipping
[2019-11-20 12:23:20] [WARNING] [copy_manager] Object type does not exist in Data ONTAP release; skipping
[2019-11-20 12:23:20] [WARNING] [resource_headroom_aggr] Object type does not exist in Data ONTAP release; skipping
[2019-11-20 12:23:20] [WARNING] [nic_common] Object type does not exist in Data ONTAP release; skipping
[2019-11-20 12:23:20] [WARNING] [volume] Object type does not exist in Data ONTAP release; skipping
[2019-11-20 12:23:20] [WARNING] [path] Object type does not exist in Data ONTAP release; skipping
[2019-11-20 12:23:20] [WARNING] [resource_headroom_cpu] Object type does not exist in Data ONTAP release; skipping
[2019-11-20 12:23:20] [WARNING] [wafl] Object type does not exist in Data ONTAP release; skipping
:

This is a log for OCUM

[2019-11-20 12:15:57] [NORMAL ] WORKER STARTED [Version: 1.6] [Conf: netapp-harvest.conf] [Poller: 10.19.21.235]
[2019-11-20 12:15:57] [NORMAL ] [main] Poller will monitor a [OCUM] at [10.x.x.x:443]
[2019-11-20 12:15:57] [NORMAL ] [main] Poller will use [password] authentication with username [netapp-harvest] and password [**********]
[2019-11-20 12:15:57] [NORMAL ] [sysinfo] Discovered [lpaunetapp2001] on OCUM server and will submit metrics under group [lpau].
[2019-11-20 12:15:57] [NORMAL ] [sysinfo] Discovered [lpaunetapp0001] on OCUM server and will submit metrics under group [lpau].
[2019-11-20 12:15:57] [NORMAL ] [main] Collection of system info from [10.x.x.x] running [9.4] successful.
[2019-11-20 12:15:57] [NORMAL ] [main] Found best-fit monitoring template (same generation and major release, minor same or less): [ocum-9.4.0.conf]
[2019-11-20 12:15:57] [NORMAL ] [main] Added and/or merged monitoring template [/opt/netapp-harvest/template/default/ocum-9.4.0.conf]
[2019-11-20 12:15:57] [NORMAL ] [main] Metrics for cluster [lpaunetapp0001] will be submitted with graphite_root [netapp.capacity.lpau.lpaunetapp0001]
[2019-11-20 12:15:57] [NORMAL ] [main] Metrics for cluster [lpaunetapp2001] will be submitted with graphite_root [netapp.capacity.lpau.lpaunetapp2001]
[2019-11-20 12:15:57] [NORMAL ] [main] Using graphite_meta_metrics_root [netapp.poller.capacity.lpau.10_19_21_235]
[2019-11-20 12:15:57] [NORMAL ] Creating output plugins
[2019-11-20 12:15:57] [NORMAL ] Created output plugins
[2019-11-20 12:15:57] [NORMAL ] [main] Startup complete.  Polling for new data every [900] seconds.
[2019-11-20 12:20:31] [NORMAL ] WORKER STARTED [Version: 1.6] [Conf: netapp-harvest.conf] [Poller: 10.19.21.235]
[2019-11-20 12:20:31] [NORMAL ] [main] Poller will monitor a [OCUM] at [10.x.x.x:443]
[2019-11-20 12:20:31] [NORMAL ] [main] Poller will use [password] authentication with username [netapp-harvest] and password [**********]
[2019-11-20 12:20:32] [NORMAL ] [sysinfo] Discovered [lpaunetapp2001] on OCUM server and will submit metrics under group [lpau].
[2019-11-20 12:20:32] [NORMAL ] [sysinfo] Discovered [lpaunetapp0001] on OCUM server and will submit metrics under group [lpau].
[2019-11-20 12:20:32] [NORMAL ] [main] Collection of system info from [10.x.x.x] running [9.4] successful.
[2019-11-20 12:20:32] [NORMAL ] [main] Found best-fit monitoring template (same generation and major release, minor same or less): [ocum-9.4.0.conf]
[2019-11-20 12:20:32] [NORMAL ] [main] Added and/or merged monitoring template [/opt/netapp-harvest/template/default/ocum-9.4.0.conf]
[2019-11-20 12:20:32] [NORMAL ] [main] Metrics for cluster [lpaunetapp0001] will be submitted with graphite_root [netapp.capacity.lpau.lpaunetapp0001]
[2019-11-20 12:20:32] [NORMAL ] [main] Metrics for cluster [lpaunetapp2001] will be submitted with graphite_root [netapp.capacity.lpau.lpaunetapp2001]
[2019-11-20 12:20:32] [NORMAL ] [main] Using graphite_meta_metrics_root [netapp.poller.capacity.lpau.10_x_x_x]
[2019-11-20 12:20:32] [NORMAL ] Creating output plugins
[2019-11-20 12:20:32] [NORMAL ] Created output plugins
[2019-11-20 12:20:32] [NORMAL ] [main] Startup complete.  Polling for new data every [900] seconds.
[2019-11-20 12:23:19] [NORMAL ] WORKER STARTED [Version: 1.6] [Conf: netapp-harvest.conf] [Poller: 10.x.x.x]
[2019-11-20 12:23:19] [NORMAL ] [main] Poller will monitor a [OCUM] at [10.x.x.x:443]
[2019-11-20 12:23:19] [NORMAL ] [main] Poller will use [password] authentication with username [netapp-harvest] and password [**********]
[2019-11-20 12:23:19] [NORMAL ] [sysinfo] Discovered [lpaunetapp2001] on OCUM server and will submit metrics under group [lpau].
[2019-11-20 12:23:19] [NORMAL ] [sysinfo] Discovered [lpaunetapp0001] on OCUM server and will submit metrics under group [lpau].
[2019-11-20 12:23:19] [NORMAL ] [main] Collection of system info from [10.x.x.x] running [9.4] successful.
[2019-11-20 12:23:19] [NORMAL ] [main] Found best-fit monitoring template (same generation and major release, minor same or less): [ocum-9.4.0.conf]
[2019-11-20 12:23:19] [NORMAL ] [main] Added and/or merged monitoring template [/opt/netapp-harvest/template/default/ocum-9.4.0.conf]
[2019-11-20 12:23:19] [NORMAL ] [main] Metrics for cluster [lpaunetapp0001] will be submitted with graphite_root [netapp.capacity.lpau.lpaunetapp0001]
[2019-11-20 12:23:19] [NORMAL ] [main] Metrics for cluster [lpaunetapp2001] will be submitted with graphite_root [netapp.capacity.lpau.lpaunetapp2001]
[2019-11-20 12:23:19] [NORMAL ] [main] Using graphite_meta_metrics_root [netapp.poller.capacity.lpau.10_x_x_x]
[2019-11-20 12:23:19] [NORMAL ] Creating output plugins
[2019-11-20 12:23:19] [NORMAL ] Created output plugins
[2019-11-20 12:23:19] [NORMAL ] [main] Startup complete.  Polling for new data every [900] seconds.

vachagan_gratian · ‎2019-11-22

That might explain it, 2 CPUs should be complete fine for two pollers. I run 10 pollers on 2 CPUs and I'm stil fine.

But the warnings in the first log also look odd. What is your cDot version? Even if you have a very old release, I don't think so many object types should be unavailable.

vachagan_gratian · ‎2019-11-22

I shouldn't be asking your release since it right up there in the log 🙂

No this isn't right for Ontap 9.5. If you want us to take a closer look, run your poller in vebose mode and share the logs with me (either here or just send to me).

Greg_Wilson · ‎2019-12-16