OnCommand Storage Management Software Discussions

Highlighted

NetApp-Harvest 1.4 poller not working - Unified Manager 9.4

Hi there -  Starting 12/31/2018, we’re having a problem with NetApp-Harvest receiving SVM capacity metrics from UM 9.4. SVM capacity metrics from UM server are impacted however node/aggr capacity metrics ARE being received properly. Things we've tried to no joy:

 

1. Restarting of UM NetApp Harvest pollers on both Grafana/NetApp-Harvest servers
2. Reboot of UM server
3. SVM and node Curl commands successfully ran from Grafana/NetApp-Harvest servers (they're receiving UM APIs.)

 

Configuration:

ONTAP 9.3P4

NetApp-Harvest 1.4

UM 9.4

 

 

[2019-01-03 00:32:02] [WARNING] [qtree] Cluster name for aggr [svm-nas-oma-c01] (3751f7fb-30fe-11e7-963c-90e2bac3282c:type=vserver,uuid=290f9c35-3211-11e7-963c-90e2bac282c) not found in cache; skipping

 

[2019-01-03 09:16:01] [WARNING] [volume] update failed with reason: Timeout. Could not read API response.

[2019-01-03 09:16:01] [WARNING] [volume] data-list update failed.

 

We need to get this resolved as soon as possible as this is affecting team operations. NetApp Support case was opened last week but they referred us here as it's community-supported.

13 REPLIES

Re: NetApp-Harvest 1.4 poller not working - Unified Manager 9.4

All metrics (both Node/Cluster and SVM) are available within UM. NetApp-Harvest poller logs are all showing as "...not found in cache; skipping"

Re: NetApp-Harvest 1.4 poller not working - Unified Manager 9.4

Re: NetApp-Harvest 1.4 poller not working - Unified Manager 9.4

What you can do to try I get more debug information is run Harvest manually with the debug option for that specific worker, without -daemon, and with -v

 

Usage: /opt/netapp-harvest/netapp-worker -poller <str> [-conf <str>] [-confdir <str>] [-logdir <str>] [-daemon] [-v] [-h]

PURPOSE:
Collect performance data from Data ONTAP or OCUM and submit to Graphite.
VERSION:
1.4
ARGUMENTS:
Required:
-poller <str> Poller section to run
Optional:
-conf <str> Name of config file to use to find poller name
(default: netapp-harvest.conf)
-confdir <str> Name of directory where config file is located
(default: /opt/netapp-harvest)
-logdir <str> Name of directory where log files should be written
(default: /opt/netapp-harvest/log)
-h Output this help text
-v Output verbose output to stdout and logfile
-daemon Daemonize process (linux only)
EXAMPLE:
Run poller netapp-1 interactively in verbose mode
netapp-worker -poller netapp-1 -v
Run poller netapp-2 as a daemon
netapp-worker -poller netapp-2 -daemon
Run poller netapp-5 from conf file test.conf as a daemon
netapp-worker -poller netapp-5 -conf test.conf -daemon

 

Re: NetApp-Harvest 1.4 poller not working - Unified Manager 9.4

Restarted poller with -v option. Nothing jumps out at me. Metrics collection dropped even lower than before...

 

Normal: ~ 100K metrics

Jan 3 Poller Restart: 12.5K metrics

Jan 9 Poller Restart: 1.5K metrics

 

$ sudo ./netapp-manager --restart --poller nc2pwnaocum01 -v
[OK ] Line [18] is Section [global]
[OK ] Line [26] in Section [global] has Key/Value pair [grafana_api_key]=[**********]
[OK ] Line [28] in Section [global] has Key/Value pair [grafana_url]=[http://nc2plgrafana01:3000]
[OK ] Line [29] in Section [global] has Key/Value pair [grafana_dl_tag]=[]
[OK ] Line [35] is Section [default]
[OK ] Line [37] in Section [default] has Key/Value pair [graphite_enabled]=[1]
[OK ] Line [38] in Section [default] has Key/Value pair [graphite_server]=[10.65.44.73]
[OK ] Line [39] in Section [default] has Key/Value pair [graphite_port]=[2003]
[OK ] Line [40] in Section [default] has Key/Value pair [graphite_proto]=[tcp]
[OK ] Line [41] in Section [default] has Key/Value pair [normalized_xfer]=[mb_per_sec]
[OK ] Line [42] in Section [default] has Key/Value pair [normalized_time]=[millisec]
[OK ] Line [43] in Section [default] has Key/Value pair [graphite_root]=[default]
[OK ] Line [44] in Section [default] has Key/Value pair [graphite_meta_metrics_root]=[default]
[OK ] Line [47] in Section [default] has Key/Value pair [host_type]=[FILER]
[OK ] Line [48] in Section [default] has Key/Value pair [host_port]=[443]
[OK ] Line [49] in Section [default] has Key/Value pair [host_enabled]=[1]
[OK ] Line [50] in Section [default] has Key/Value pair [template]=[default]
[OK ] Line [51] in Section [default] has Key/Value pair [data_update_freq]=[60]
[OK ] Line [52] in Section [default] has Key/Value pair [ntap_autosupport]=[0]
[OK ] Line [53] in Section [default] has Key/Value pair [latency_io_reqd]=[10]
[OK ] Line [54] in Section [default] has Key/Value pair [auth_type]=[password]
[OK ] Line [55] in Section [default] has Key/Value pair [username]=[netapp-harvest]
[OK ] Line [56] in Section [default] has Key/Value pair [password]=[**********]
[OK ] Line [71] is Section [NC2DACSTORE02]
[OK ] Line [72] in Section [NC2DACSTORE02] has Key/Value pair [hostname]=[nc2dacstore02-mgmt.us.ad.lfg.com]
[OK ] Line [73] in Section [NC2DACSTORE02] has Key/Value pair [group]=[gso_dev]
[OK ] Line [78] is Section [nc1pacstore01]
[OK ] Line [79] in Section [nc1pacstore01] has Key/Value pair [password]=[**********]
[OK ] Line [80] in Section [nc1pacstore01] has Key/Value pair [hostname]=[nc1pacstore01-mgmt.us.ad.lfg.com]
[OK ] Line [81] in Section [nc1pacstore01] has Key/Value pair [group]=[gso]
[OK ] Line [83] is Section [NC2PACSTORE01]
[OK ] Line [84] in Section [NC2PACSTORE01] has Key/Value pair [hostname]=[nc2pacstore01-mgmt.us.ad.lfg.com]
[OK ] Line [85] in Section [NC2PACSTORE01] has Key/Value pair [password]=[**********]
[OK ] Line [86] in Section [NC2PACSTORE01] has Key/Value pair [data_update_freq]=[150]
[OK ] Line [87] in Section [NC2PACSTORE01] has Key/Value pair [group]=[gso]
[OK ] Line [89] is Section [NC2PACSTORE02]
[OK ] Line [90] in Section [NC2PACSTORE02] has Key/Value pair [hostname]=[nc2pacstore02-mgmt.us.ad.lfg.com]
[OK ] Line [91] in Section [NC2PACSTORE02] has Key/Value pair [password]=[**********]
[OK ] Line [92] in Section [NC2PACSTORE02] has Key/Value pair [data_update_freq]=[150]
[OK ] Line [93] in Section [NC2PACSTORE02] has Key/Value pair [group]=[gso]
[OK ] Line [95] is Section [NC2PACSTORE03]
[OK ] Line [96] in Section [NC2PACSTORE03] has Key/Value pair [hostname]=[nc2pacstore03-mgmt.us.ad.lfg.com]
[OK ] Line [97] in Section [NC2PACSTORE03] has Key/Value pair [password]=[**********]
[OK ] Line [98] in Section [NC2PACSTORE03] has Key/Value pair [group]=[gso]
[OK ] Line [100] is Section [NC2PACSTORE04]
[OK ] Line [101] in Section [NC2PACSTORE04] has Key/Value pair [hostname]=[nc2pacstore04-mgmt.us.ad.lfg.com]
[OK ] Line [102] in Section [NC2PACSTORE04] has Key/Value pair [password]=[**********]
[OK ] Line [103] in Section [NC2PACSTORE04] has Key/Value pair [group]=[gso]
[OK ] Line [105] is Section [nc2pacstore05]
[OK ] Line [106] in Section [nc2pacstore05] has Key/Value pair [hostname]=[nc2pacstore05-mgmt.us.ad.lfg.com]
[OK ] Line [107] in Section [nc2pacstore05] has Key/Value pair [password]=[**********]
[OK ] Line [108] in Section [nc2pacstore05] has Key/Value pair [group]=[gso]
[OK ] Line [124] is Section [ga1pacstore01]
[OK ] Line [125] in Section [ga1pacstore01] has Key/Value pair [hostname]=[ga1pacstore01-mgmt.us.ad.lfg.com]
[OK ] Line [126] in Section [ga1pacstore01] has Key/Value pair [password]=[**********]
[OK ] Line [127] in Section [ga1pacstore01] has Key/Value pair [group]=[atl]
[OK ] Line [133] is Section [il3pzcstore001]
[OK ] Line [134] in Section [il3pzcstore001] has Key/Value pair [hostname]=[il3pzcstore001-mgmt.us.ad.lfg.com]
[OK ] Line [135] in Section [il3pzcstore001] has Key/Value pair [group]=[il3]
[OK ] Line [150] is Section [va1pzcstore001]
[OK ] Line [151] in Section [va1pzcstore001] has Key/Value pair [hostname]=[va1pzcstore001-mgmt.us.ad.lfg.com]
[OK ] Line [152] in Section [va1pzcstore001] has Key/Value pair [group]=[va1]
[OK ] Line [178] is Section [nh1pacstore01]
[OK ] Line [179] in Section [nh1pacstore01] has Key/Value pair [hostname]=[nh1pacstore01-mgmt.us.ad.lfg.com]
[OK ] Line [180] in Section [nh1pacstore01] has Key/Value pair [password]=[**********]
[OK ] Line [181] in Section [nh1pacstore01] has Key/Value pair [group]=[cnc]
[OK ] Line [187] is Section [pa1pacstore01]
[OK ] Line [188] in Section [pa1pacstore01] has Key/Value pair [hostname]=[pa1pacstore01-mgmt.us.ad.lfg.com]
[OK ] Line [189] in Section [pa1pacstore01] has Key/Value pair [password]=[**********]
[OK ] Line [190] in Section [pa1pacstore01] has Key/Value pair [group]=[rad]
[OK ] Line [196] is Section [in2pacstore01]
[OK ] Line [197] in Section [in2pacstore01] has Key/Value pair [hostname]=[in2pacstore01-mgmt.us.ad.lfg.com]
[OK ] Line [198] in Section [in2pacstore01] has Key/Value pair [password]=[**********]
[OK ] Line [199] in Section [in2pacstore01] has Key/Value pair [group]=[fwa]
[OK ] Line [380] is Section [ct1pacstore01]
[OK ] Line [381] in Section [ct1pacstore01] has Key/Value pair [hostname]=[ct1pacstore01-mgmt.us.ad.lfg.com]
[OK ] Line [382] in Section [ct1pacstore01] has Key/Value pair [password]=[**********]
[OK ] Line [383] in Section [ct1pacstore01] has Key/Value pair [group]=[hfd]
[OK ] Line [388] is Section [nc2pwnaocum01]
[OK ] Line [389] in Section [nc2pwnaocum01] has Key/Value pair [password]=[**********]
[OK ] Line [390] in Section [nc2pwnaocum01] has Key/Value pair [hostname]=[nc2pwnaocum01.us.ad.lfg.com]
[OK ] Line [391] in Section [nc2pwnaocum01] has Key/Value pair [group]=[gso]
[OK ] Line [392] in Section [nc2pwnaocum01] has Key/Value pair [host_type]=[OCUM]
[OK ] Line [393] in Section [nc2pwnaocum01] has Key/Value pair [data_update_freq]=[900]
[OK ] Line [394] in Section [nc2pwnaocum01] has Key/Value pair [normalized_xfer]=[gb_per_sec]
STATUS POLLER GROUP
############### #################### ##################
[STOPPED] nc2pwnaocum01 gso
[STARTED] nc2pwnaocum01 gso

$

Re: NetApp-Harvest 1.4 poller not working - Unified Manager 9.4

Restarted the UM collector but it's still throwing the same error messages in the logs. Output attached to the NetApp Support case.

 

sudo ./netapp-manager --restart--poller <UM_Server> -v

Re: NetApp-Harvest 1.4 poller not working - Unified Manager 9.4

It's important that you run netapp-worker with -v and not in daemon, then redirect the output. There are other ways but I think it's easier.

 

netapp-manager is the tool to manage daemon start-stop but not producing meaningful debug from Harvest.

Re: NetApp-Harvest 1.4 poller not working - Unified Manager 9.4

Restarted netapp-worker. Same error: "update failed with reason: Timeout. Could not read API response". Full UM Poller Log attached to the NetApp Support Case.

 

[2019-01-10 08:11:02] [NORMAL ] Creating output plugins
[2019-01-10 08:11:02] [NORMAL ] Created output plugins
[2019-01-10 08:11:02] [DEBUG ] [lun] data-list poller first poll at [2019-01-10 08:15:00]
[2019-01-10 08:11:02] [DEBUG ] [qtree] data-list poller first poll at [2019-01-10 08:15:00]
[2019-01-10 08:11:02] [DEBUG ] [volume] data-list poller first poll at [2019-01-10 08:15:00]
[2019-01-10 08:11:02] [DEBUG ] [aggregate] data-list poller first poll at [2019-01-10 08:15:00]
[2019-01-10 08:11:02] [NORMAL ] [main] Startup complete. Polling for new data every [900] seconds.
[2019-01-10 08:11:02] [DEBUG ] Sleeping [238] seconds
[2019-01-10 08:15:00] [DEBUG ] [aggregate] Found instance [netapp.capacity.gso.NC2PACSTORE04.node.NC2PACSTORE04-02.aggr.nc2pacstore04_n02_a1_sp3_sas10k900_esx] metric [size-used-per-day] with value [18448096445]

...

[2019-01-10 08:15:01] [DEBUG ] M= netapp.poller.capacity.gso.nc2pwnaocum01.aggregate.pluginTime 0 1547126100
[2019-01-10 08:15:01] [DEBUG ] [aggregate] Issuing new socket connect to Graphite server.
[2019-01-10 08:16:01] [WARNING] [volume] update failed with reason: Timeout. Could not read API response.
[2019-01-10 08:16:01] [DEBUG ] [volume] data-list poller next refresh at [2019-01-10 08:30:00]
[2019-01-10 08:16:01] [WARNING] [volume] data-list update failed.

Re: NetApp-Harvest 1.4 poller not working - Unified Manager 9.4

Thank you,

 

This indicates an error talking to OCUM. Are you positive that Harvest successfully connects to OCUM server?

Re: NetApp-Harvest 1.4 poller not working - Unified Manager 9.4

Yes - curl returns output when ran against OCUM server.

 

[prcpa8@nc2plgrafana02 log]$ curl -k -X GET --header 'Accept: application/vnd.netapp.object.inventory.hal+json' 'https://nc2pwnaocum01/rest/v1/svms?limit=20'
<html><head><title>OnCommand Unified Manager | Error</title></head><body><h1>Error 401 - Unauthorized</h1><p>Please go back to the <a href='/'>homepage</a> and try agan.</p></body></html>[


prcpa8@nc2plgrafana02 log]$ curl -k -X GET --header 'Accept: application/vnd.netapp.object.inventory.hal+json' 'https://nc2pwnaocum01/rest/svms?li<html><head><title>OnCommand Unified Manager | Error</title></head><body><h1>Error 401 - Unauthorized</h1><p>Please go back to the <a href='/'>homepage</a> and try agan.</p></body></html>[prcpa8@nc2plgrafana02 log]$

Forums