Active IQ Unified Manager Discussions

Harvest 1.6 consuming 100% CPU

Greg_Wilson

Anyone having any issues with Harvest 1.6 consuming 100% CPU ?

 

I have 2 clusters and 1 OCUM

 

if i take out 1 cluster and run it with 1 cluster its still 100% cpu usage

 

Tasks: 117 total,   3 running, 114 sleeping,   0 stopped,   0 zombie
%Cpu(s): 43.0 us,  8.1 sy,  0.0 ni, 48.7 id,  0.0 wa,  0.0 hi,  0.2 si,  0.0 st
KiB Mem :  1843120 total,   959732 free,   276056 used,   607332 buff/cache
KiB Swap:        0 total,        0 free,        0 used.  1354556 avail Mem

  PID USER      PR  NI    VIRT    RES    SHR S  %CPU %MEM     TIME+ COMMAND
 7914 netapp-+  20   0  216068  20896   3712 R 100.0  1.1  14:20.07 netapp-worker
 8683 root      20   0  482196  41196   6100 S   2.0  2.2   0:04.84 svc2influxdb.py
 5786 thollow+  20   0   20376   5848   1172 S   0.3  0.3   0:05.52 nmon
    1 root      20   0  128124   6700   4168 S   0.0  0.4   0:25.32 systemd
    2 root      20   0       0      0      0 S   0.0  0.0   0:00.01 kthreadd

I upgraded the AWS instance and it still comsumed all available CPU.

 

If i add in the second cluster i get 2 netapp-worker process both consuming 100% CPU

 

Any ideas ?

 

Centos 7 , Full patched..

 

So i have upgraded the instance  to m5a.large and still have the issue.

 

 

 

 

5 REPLIES 5

Re: Harvest 1.6 consuming 100% CPU

vachagan_gratian

Hi Greg,

 

This is strange, do you observer 100% CPU usage all of the time? Normally Harvest pollers should be sleeping most of the time, so you should see CPU usage only for a few seconds each minute. (And even so, 100% CPU isn't something that you should normally see).

 

There is a few things you could do:

- Check Harvest logs to see if there are any warnings or errors.

- Check the Grafana dashboard of the Harvest poller, especially the graphs showing API time. If anything is higher than 10 seconds, might be part of the issue.

Re: Harvest 1.6 consuming 100% CPU

Greg_Wilson

Hi,

 

Yes is consuming 100% of the CPU all the time.

 

Im wondering if its AWS and how they limit the CPU's for smaller instances..

 

https://forums.aws.amazon.com/thread.jspa?threadID=71347

 

https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/burstable-performance-instances-unlimited-mode.html

 

This is the first time i have tried to deploy it in AWS. I have been running it using both

NABOX and Vmware (2cpu and 8gb RAM) on prem and its never been an issue.

 

I even dropped the polling of Cdot down to every 5min.

 

The logs for harvest look good.

 

Ill keep poking around..

 

 

This is a log for Cdot Cluster

 

[2019-11-20 12:20:32] [NORMAL ] [main] Startup complete. Polling for new data every [600] seconds.
[2019-11-20 12:23:19] [NORMAL ] WORKER STARTED [Version: 1.6] [Conf: netapp-harvest.conf] [Poller: lpaunetapp2001]
[2019-11-20 12:23:19] [NORMAL ] [main] Poller will monitor a [filer] at [10.x.x.x:443]
[2019-11-20 12:23:19] [NORMAL ] [main] Poller will use [password] authentication with username [netapp-harvest] and password [**********]
[2019-11-20 12:23:20] [NORMAL ] [main] Collection of system info from [10.x.x.x] running [NetApp Release 9.5P8] successful.
[2019-11-20 12:23:20] [NORMAL ] [main] Found best-fit monitoring template (same generation and major release, minor same or less): [cdot-9.5.0.conf]
[2019-11-20 12:23:20] [NORMAL ] [main] Added and/or merged monitoring template [/opt/netapp-harvest/template/default/cdot-9.5.0.conf]
[2019-11-20 12:23:20] [NORMAL ] [main] Metrics will be submitted with graphite_root [netapp.perf.lpau.lpaunetapp2001]
[2019-11-20 12:23:20] [NORMAL ] [main] Using graphite_meta_metrics_root [netapp.poller.perf.lpau.lpaunetapp2001]
[2019-11-20 12:23:20] [NORMAL ] Creating output plugins
[2019-11-20 12:23:20] [NORMAL ] Created output plugins
[2019-11-20 12:23:20] [WARNING] [wafl_hya_per_aggr] Object type does not exist in Data ONTAP release; skipping
[2019-11-20 12:23:20] [WARNING] [nfsv4:node] Object type does not exist in Data ONTAP release; skipping
[2019-11-20 12:23:20] [WARNING] [cifs:vserver] Object type does not exist in Data ONTAP release; skipping
[2019-11-20 12:23:20] [WARNING] [hostadapter] Object type does not exist in Data ONTAP release; skipping
[2019-11-20 12:23:20] [WARNING] [object_store_client_op] Object type does not exist in Data ONTAP release; skipping
[2019-11-20 12:23:20] [WARNING] [lun] Object type does not exist in Data ONTAP release; skipping
[2019-11-20 12:23:20] [WARNING] [wafl_comp_aggr_vol_bin] Object type does not exist in Data ONTAP release; skipping
[2019-11-20 12:23:20] [WARNING] [nfsv3] Object type does not exist in Data ONTAP release; skipping
[2019-11-20 12:23:20] [WARNING] [processor] Object type does not exist in Data ONTAP release; skipping
[2019-11-20 12:23:20] [WARNING] [offbox_vscan_server] Object type does not exist in Data ONTAP release; skipping
[2019-11-20 12:23:20] [WARNING] [nfsv4_1:node] Object type does not exist in Data ONTAP release; skipping
[2019-11-20 12:23:20] [WARNING] [offbox_vscan] Object type does not exist in Data ONTAP release; skipping
[2019-11-20 12:23:20] [WARNING] [cifs:node] Object type does not exist in Data ONTAP release; skipping
[2019-11-20 12:23:20] [WARNING] [volume:node] Object type does not exist in Data ONTAP release; skipping
[2019-11-20 12:23:20] [WARNING] [nfsv4] Object type does not exist in Data ONTAP release; skipping
[2019-11-20 12:23:20] [WARNING] [nfsv3:node] Object type does not exist in Data ONTAP release; skipping
[2019-11-20 12:23:20] [WARNING] [lif] Object type does not exist in Data ONTAP release; skipping
[2019-11-20 12:23:20] [WARNING] [wafl_hya_sizer] Object type does not exist in Data ONTAP release; skipping
[2019-11-20 12:23:20] [WARNING] [workload] Object type does not exist in Data ONTAP release; skipping
[2019-11-20 12:23:20] [WARNING] [iscsi_lif] Object type does not exist in Data ONTAP release; skipping
[2019-11-20 12:23:20] [WARNING] [fcvi] Object type does not exist in Data ONTAP release; skipping
[2019-11-20 12:23:20] [WARNING] [token_manager] Object type does not exist in Data ONTAP release; skipping
[2019-11-20 12:23:20] [WARNING] [disk:constituent] Object type does not exist in Data ONTAP release; skipping
[2019-11-20 12:23:20] [WARNING] [workload_volume] Object type does not exist in Data ONTAP release; skipping
[2019-11-20 12:23:20] [WARNING] [fcp_lif] Object type does not exist in Data ONTAP release; skipping
[2019-11-20 12:23:20] [WARNING] [system:node] Object type does not exist in Data ONTAP release; skipping
[2019-11-20 12:23:20] [WARNING] [copy_manager] Object type does not exist in Data ONTAP release; skipping
[2019-11-20 12:23:20] [WARNING] [resource_headroom_aggr] Object type does not exist in Data ONTAP release; skipping
[2019-11-20 12:23:20] [WARNING] [nic_common] Object type does not exist in Data ONTAP release; skipping
[2019-11-20 12:23:20] [WARNING] [volume] Object type does not exist in Data ONTAP release; skipping
[2019-11-20 12:23:20] [WARNING] [path] Object type does not exist in Data ONTAP release; skipping
[2019-11-20 12:23:20] [WARNING] [resource_headroom_cpu] Object type does not exist in Data ONTAP release; skipping
[2019-11-20 12:23:20] [WARNING] [wafl] Object type does not exist in Data ONTAP release; skipping
:

This is a log for OCUM

 

[2019-11-20 12:15:57] [NORMAL ] WORKER STARTED [Version: 1.6] [Conf: netapp-harvest.conf] [Poller: 10.19.21.235]
[2019-11-20 12:15:57] [NORMAL ] [main] Poller will monitor a [OCUM] at [10.x.x.x:443]
[2019-11-20 12:15:57] [NORMAL ] [main] Poller will use [password] authentication with username [netapp-harvest] and password [**********]
[2019-11-20 12:15:57] [NORMAL ] [sysinfo] Discovered [lpaunetapp2001] on OCUM server and will submit metrics under group [lpau].
[2019-11-20 12:15:57] [NORMAL ] [sysinfo] Discovered [lpaunetapp0001] on OCUM server and will submit metrics under group [lpau].
[2019-11-20 12:15:57] [NORMAL ] [main] Collection of system info from [10.x.x.x] running [9.4] successful.
[2019-11-20 12:15:57] [NORMAL ] [main] Found best-fit monitoring template (same generation and major release, minor same or less): [ocum-9.4.0.conf]
[2019-11-20 12:15:57] [NORMAL ] [main] Added and/or merged monitoring template [/opt/netapp-harvest/template/default/ocum-9.4.0.conf]
[2019-11-20 12:15:57] [NORMAL ] [main] Metrics for cluster [lpaunetapp0001] will be submitted with graphite_root [netapp.capacity.lpau.lpaunetapp0001]
[2019-11-20 12:15:57] [NORMAL ] [main] Metrics for cluster [lpaunetapp2001] will be submitted with graphite_root [netapp.capacity.lpau.lpaunetapp2001]
[2019-11-20 12:15:57] [NORMAL ] [main] Using graphite_meta_metrics_root [netapp.poller.capacity.lpau.10_19_21_235]
[2019-11-20 12:15:57] [NORMAL ] Creating output plugins
[2019-11-20 12:15:57] [NORMAL ] Created output plugins
[2019-11-20 12:15:57] [NORMAL ] [main] Startup complete.  Polling for new data every [900] seconds.
[2019-11-20 12:20:31] [NORMAL ] WORKER STARTED [Version: 1.6] [Conf: netapp-harvest.conf] [Poller: 10.19.21.235]
[2019-11-20 12:20:31] [NORMAL ] [main] Poller will monitor a [OCUM] at [10.x.x.x:443]
[2019-11-20 12:20:31] [NORMAL ] [main] Poller will use [password] authentication with username [netapp-harvest] and password [**********]
[2019-11-20 12:20:32] [NORMAL ] [sysinfo] Discovered [lpaunetapp2001] on OCUM server and will submit metrics under group [lpau].
[2019-11-20 12:20:32] [NORMAL ] [sysinfo] Discovered [lpaunetapp0001] on OCUM server and will submit metrics under group [lpau].
[2019-11-20 12:20:32] [NORMAL ] [main] Collection of system info from [10.x.x.x] running [9.4] successful.
[2019-11-20 12:20:32] [NORMAL ] [main] Found best-fit monitoring template (same generation and major release, minor same or less): [ocum-9.4.0.conf]
[2019-11-20 12:20:32] [NORMAL ] [main] Added and/or merged monitoring template [/opt/netapp-harvest/template/default/ocum-9.4.0.conf]
[2019-11-20 12:20:32] [NORMAL ] [main] Metrics for cluster [lpaunetapp0001] will be submitted with graphite_root [netapp.capacity.lpau.lpaunetapp0001]
[2019-11-20 12:20:32] [NORMAL ] [main] Metrics for cluster [lpaunetapp2001] will be submitted with graphite_root [netapp.capacity.lpau.lpaunetapp2001]
[2019-11-20 12:20:32] [NORMAL ] [main] Using graphite_meta_metrics_root [netapp.poller.capacity.lpau.10_x_x_x]
[2019-11-20 12:20:32] [NORMAL ] Creating output plugins
[2019-11-20 12:20:32] [NORMAL ] Created output plugins
[2019-11-20 12:20:32] [NORMAL ] [main] Startup complete.  Polling for new data every [900] seconds.
[2019-11-20 12:23:19] [NORMAL ] WORKER STARTED [Version: 1.6] [Conf: netapp-harvest.conf] [Poller: 10.x.x.x]
[2019-11-20 12:23:19] [NORMAL ] [main] Poller will monitor a [OCUM] at [10.x.x.x:443]
[2019-11-20 12:23:19] [NORMAL ] [main] Poller will use [password] authentication with username [netapp-harvest] and password [**********]
[2019-11-20 12:23:19] [NORMAL ] [sysinfo] Discovered [lpaunetapp2001] on OCUM server and will submit metrics under group [lpau].
[2019-11-20 12:23:19] [NORMAL ] [sysinfo] Discovered [lpaunetapp0001] on OCUM server and will submit metrics under group [lpau].
[2019-11-20 12:23:19] [NORMAL ] [main] Collection of system info from [10.x.x.x] running [9.4] successful.
[2019-11-20 12:23:19] [NORMAL ] [main] Found best-fit monitoring template (same generation and major release, minor same or less): [ocum-9.4.0.conf]
[2019-11-20 12:23:19] [NORMAL ] [main] Added and/or merged monitoring template [/opt/netapp-harvest/template/default/ocum-9.4.0.conf]
[2019-11-20 12:23:19] [NORMAL ] [main] Metrics for cluster [lpaunetapp0001] will be submitted with graphite_root [netapp.capacity.lpau.lpaunetapp0001]
[2019-11-20 12:23:19] [NORMAL ] [main] Metrics for cluster [lpaunetapp2001] will be submitted with graphite_root [netapp.capacity.lpau.lpaunetapp2001]
[2019-11-20 12:23:19] [NORMAL ] [main] Using graphite_meta_metrics_root [netapp.poller.capacity.lpau.10_x_x_x]
[2019-11-20 12:23:19] [NORMAL ] Creating output plugins
[2019-11-20 12:23:19] [NORMAL ] Created output plugins
[2019-11-20 12:23:19] [NORMAL ] [main] Startup complete.  Polling for new data every [900] seconds.

Re: Harvest 1.6 consuming 100% CPU

vachagan_gratian

That might explain it, 2 CPUs should be complete fine for two pollers. I run 10 pollers on 2 CPUs and I'm stil fine.

 

But the warnings in the first log also look odd. What is your cDot version? Even if you have a very old release, I don't think so many object types should be unavailable.

Re: Harvest 1.6 consuming 100% CPU

vachagan_gratian

I shouldn't be asking your release since it right up there in the log 🙂

 

No this isn't right for Ontap 9.5. If you want us to take a closer look, run your poller in vebose mode and share the logs with me (either here or just send to me).

Re: Harvest 1.6 consuming 100% CPU

Greg_Wilson

after weeks of stuffing aroud finally found out what the issue was.

 

im my harvest.conf 

 

under the host i had this

 

[a1c34-cdot1]
hostname = 19.19.209.109
site = a1c34-lab
username = netapp-harvest
password = test1234
data_update_freq = 60
host_type = filer

 

once i removed the line

 

host_type = filer

everything worked.

 

CPU dropped to barely anything

and its now collecting perf data.

 

 

View solution in original post

Earn Rewards for Your Review!
GPI Review Banner
All Community Forums
Public