Active IQ Unified Manager Discussions

NetApp Harvest Graphite Issue

andrewramos
14,638 Views

Hey Chris,

 

First off Harvest is such an awesome tool so thank you for setting up the process and continuing to support it. I had everything set up and working for several days now but noticed i was having the 1 day retention issue and quickly realized I had added the entries after the defaults in the storage-schemas.conf file so I corrected that and ran rm -rf /var/lib/graphite/whisper/netapp to delete the metrics as indicated in this thread:

 

http://community.netapp.com/t5/OnCommand-Storage-Management-Software-Discussions/netapp-harvest-graphs-not-showing-more-than-1-day/td-p/110793

 

At first it broke Grafana as the Dashboards & Data Source were no longer displayed but I was able to correct that by initializing the DB again by running graphite-manage syncdb and the data source shows up and connection tests successfully.

 

Now the issue is that the NetApp Whisper folder is not being created automatically in Graphite so no data is being displayed but everything appears to be working. I've confirmed carbon-cache, grafana-server, & apache2 are all running and restarted each several times. The pollers start successfully and are also running. When I do a test metric it shows up in graphite just fine so it appears to be an issue between Harvest/Graphite/DB. I've gone over the installation steps a few times but i'm not sure what I'm missing. Any light you can shed would be appreciated. Thanks

 

 

 

1 ACCEPTED SOLUTION

madden
14,609 Views

Hi @andrewramos

 

I would check the carbon logs:

 

Installed from source (RHEL): /opt/graphite/storage/log/carbon-cache/carbon-cache-a/creates.log

Installed from package (Ubuntu): /var/log/carbon/creates.log

 

My guess is filesystem permissions are preventing carbon from creating the files.  If the logs agree then something like this should do the trick:

 

Installed from source (RHEL): # chown -R carbon:carbon /opt/graphite/storage 

installed on from package on ubuntu: # chown -R _graphite:_graphite /var/lib/graphite/whisper

 

Let us know how it goes!

 

Cheers,
Chris Madden

Storage Architect, NetApp EMEA (and author of Harvest)

Blog: It all begins with data

 

If this post resolved your issue, please help others by selecting ACCEPT AS SOLUTION or adding a KUDO or both!

 

 

View solution in original post

6 REPLIES 6

madden
14,610 Views

Hi @andrewramos

 

I would check the carbon logs:

 

Installed from source (RHEL): /opt/graphite/storage/log/carbon-cache/carbon-cache-a/creates.log

Installed from package (Ubuntu): /var/log/carbon/creates.log

 

My guess is filesystem permissions are preventing carbon from creating the files.  If the logs agree then something like this should do the trick:

 

Installed from source (RHEL): # chown -R carbon:carbon /opt/graphite/storage 

installed on from package on ubuntu: # chown -R _graphite:_graphite /var/lib/graphite/whisper

 

Let us know how it goes!

 

Cheers,
Chris Madden

Storage Architect, NetApp EMEA (and author of Harvest)

Blog: It all begins with data

 

If this post resolved your issue, please help others by selecting ACCEPT AS SOLUTION or adding a KUDO or both!

 

 

andrewramos
14,581 Views

Hey @madden thanks for the quick reply.

 

The current creates.log just shows the test metrics being accepted without any errors. I see in the older logs where it was working properly.

 

I tried setting the permissions as you stated but the /var/lib/graphite/storage folder didnt exist. I tried updating the graphite-carbon install package but nothing was changed. I then tried to create the folder manually, set permissions, and restarted the services and pollers but still in the same boat. Also, fyi I running everything on a single Ubuntu 14.04 server. Thanks

 

 

madden
14,567 Views

Hi @andrewramos

 

Metrics are sent by Harvest to Carbon, so both must be running:

 

root@nps-nl-metrics:/var/log/carbon# service carbon-cache status
 * carbon-cache is running
root@nps-nl-metrics:/var/log/carbon# service netapp-harvest status
STATUS          POLLER               SITE
############### #################### ##################
[RUNNING]       nps-nl-cdot          nps-nl

If something isn't running use the same as above but replace "status" with "start".

 

Next is to check the carbon logs:

root@nps-nl-metrics:/var/log/carbon# ls -ltr /var/log/carbon | tail
-rw-r--r-- 1 _graphite _graphite     217 Aug 31  2015 listener.log.3.gz
-rw-r--r-- 1 root      root          399 Aug 31  2015 console.log.3.gz
-rw-r--r-- 1 _graphite _graphite     662 Aug 31  2015 listener.log.2015_8_31
-rw-r--r-- 1 _graphite _graphite  590231 Feb 23 21:55 creates.log.1
-rw-r--r-- 1 _graphite _graphite   50601 Mar 10 15:05 query.log.1
-rw-r--r-- 1 _graphite _graphite     403 Mar 22 14:53 listener.log.2.gz
-rw-r--r-- 1 root      root          521 Mar 22 14:53 console.log.2.gz
-rw-r--r-- 1 root      root          723 Apr 25 07:22 console.log.1
-rw-r--r-- 1 _graphite _graphite     261 May 11 17:18 listener.log.1
-rw-r--r-- 1 _graphite _graphite     808 May 11 17:19 query.log

Review the ones with the most recent activity (the bottom ones in the list).  See if you get any clues.

 

Next check the Harvest logs:

root@nps-nl-metrics:/var/log/carbon# ls -ltr /opt/netapp-harvest/log
total 9912
-rw-rw-r-- 1 nps-nl-admin nps-nl-admin 5186677 May 11 21:23 nps-nl-cdot_netapp-harvest.log

Review the logs and see if you get any clues.

 

You can also start the poller in verbose mode to see it gives you more info:

root@nps-nl-metrics:/var/log/carbon# /opt/netapp-harvest/netapp-worker -poller nps-nl-cdot -v
[2016-05-11 21:25:21] [NORMAL ] WORKER STARTED [Version: 1.2.2P1] [Conf: netapp-harvest.conf] [Poller: nps-nl-cdot]
[2016-05-11 21:25:21] [WARNING] Started in foreground mode; messages to STDERR are redirected to the logfile and are not visible on the console.
[2016-05-11 21:25:21] [DEBUG  ] [conf] Line [17] is Section [global]
[2016-05-11 21:25:21] [DEBUG  ] [conf] Line [18] in Section [global] has Key/Value pair [grafana_api_key]=[XXXXXX=]
[2016-05-11 21:25:21] [DEBUG  ] [conf] Line [19] in Section [global] has Key/Value pair [grafana_url]=[https://localhost:443]
[2016-05-11 21:25:21] [DEBUG  ] [conf] Line [20] in Section [global] has Key/Value pair [grafana_dl_tag]=[]
[2016-05-11 21:25:21] [DEBUG  ] [conf] Line [26] is Section [default]
[2016-05-11 21:25:21] [DEBUG  ] [conf] Line [28] in Section [default] has Key/Value pair [graphite_enabled]=[1]
[2016-05-11 21:25:21] [DEBUG  ] [conf] Line [29] in Section [default] has Key/Value pair [graphite_server]=[localhost]
[2016-05-11 21:25:21] [DEBUG  ] [conf] Line [30] in Section [default] has Key/Value pair [graphite_port]=[2003]
[2016-05-11 21:25:21] [DEBUG  ] [conf] Line [31] in Section [default] has Key/Value pair [graphite_proto]=[tcp]
[2016-05-11 21:25:21] [DEBUG  ] [conf] Line [32] in Section [default] has Key/Value pair [normalized_xfer]=[mb_per_sec]
[2016-05-11 21:25:21] [DEBUG  ] [conf] Line [33] in Section [default] has Key/Value pair [normalized_time]=[millisec]
[2016-05-11 21:25:21] [DEBUG  ] [conf] Line [34] in Section [default] has Key/Value pair [graphite_root]=[]
[2016-05-11 21:25:21] [DEBUG  ] [conf] Line [35] in Section [default] has Key/Value pair [graphite_meta_metrics_root]=[]
[2016-05-11 21:25:21] [DEBUG  ] [conf] Line [38] in Section [default] has Key/Value pair [host_type]=[FILER]
[2016-05-11 21:25:21] [DEBUG  ] [conf] Line [39] in Section [default] has Key/Value pair [host_port]=[443]
[2016-05-11 21:25:21] [DEBUG  ] [conf] Line [40] in Section [default] has Key/Value pair [host_enabled]=[1]
[2016-05-11 21:25:21] [DEBUG  ] [conf] Line [41] in Section [default] has Key/Value pair [template]=[default]
[2016-05-11 21:25:21] [DEBUG  ] [conf] Line [42] in Section [default] has Key/Value pair [data_update_freq]=[60]
[2016-05-11 21:25:21] [DEBUG  ] [conf] Line [43] in Section [default] has Key/Value pair [ntap_autosupport]=[0]
[2016-05-11 21:25:21] [DEBUG  ] [conf] Line [44] in Section [default] has Key/Value pair [latency_io_reqd]=[10]
[2016-05-11 21:25:21] [DEBUG  ] [conf] Line [45] in Section [default] has Key/Value pair [auth_type]=[ssl_cert]
[2016-05-11 21:25:21] [DEBUG  ] [conf] Line [46] in Section [default] has Key/Value pair [ssl_cert]=[netapp-harvest.pem]
[2016-05-11 21:25:21] [DEBUG  ] [conf] Line [47] in Section [default] has Key/Value pair [ssl_key]=[netapp-harvest.key]
[2016-05-11 21:25:21] [DEBUG  ] [conf] Line [60] is Section [nps-nl-cdot]
[2016-05-11 21:25:21] [DEBUG  ] [conf] Line [61] in Section [nps-nl-cdot] has Key/Value pair [hostname]=[192.168.100.102]
[2016-05-11 21:25:21] [DEBUG  ] [conf] Line [62] in Section [nps-nl-cdot] has Key/Value pair [site]=[nps-nl]
[2016-05-11 21:25:21] [NORMAL ] [main] Poller will monitor a [FILER] at [192.168.100.102:443]
[2016-05-11 21:25:21] [NORMAL ] [main] Poller will use [ssl_cert] authentication with ssl_cert [netapp-harvest.pem] and ssl_key [netapp-harvest.key]
[2016-05-11 21:25:21] [DEBUG  ] [connect] Reverse hostname lookup successful.  Using HTTP/1.1 for communication.
[2016-05-11 21:25:21] [DEBUG  ] [sysinfo] Updating system-info cache
[2016-05-11 21:26:21] [WARNING] [sysinfo] Update of system-info cache DOT Version failed with reason: in Zapi::invoke, cannot connect to socket

So in my case I see the poller cannot connect to the storage system, which I can go troubleshoot further.  

 

If it's able to connect you'll see a lot of messages fly by, and after 60-120s the first metrics will be sent to Carbon.  A metric being sent looks like:

[2016-05-11 11:01:10] [DEBUG  ] M= netapp.perf.dev.blob1.svm.asp-nfs-vvol.vol.rootvol.qos_ops 0.166666666666667 1462957264

 

If it fails to hand them off to Carbon it will show a warning, otherwise it gave them to something!

 

There is also the good 'ol troubleshooting step of a reboot 🙂

 

Cheers,
Chris Madden

Storage Architect, NetApp EMEA (and author of Harvest)

Blog: It all begins with data

 

If this post resolved your issue, please help others by selecting ACCEPT AS SOLUTION or adding a KUDO or both!

 

 

 

 

andrewramos
14,499 Views

I havent had much time to since last week to troubleshoot but most of the services seem to work properly. I've attached some logs and output but from what I can tell it appears to be collecting metrics and sending them to carbon but whisper is not creating anything.

madden
14,470 Views

I think I found it:

 

[2016-05-12 13:42:23] [DEBUG  ] [conf] Line [30] in Section [default] has Key/Value pair [graphite_enabled]=[1]
[2016-05-12 13:42:23] [DEBUG  ] [conf] Line [31] in Section [default] has Key/Value pair [graphite_server]=[10.120.126.115]
[2016-05-12 13:42:23] [DEBUG  ] [conf] Line [32] in Section [default] has Key/Value pair [graphite_port]=[81]

 

By default carbon is listening on 2003 and the web interface is on 81.  Change this to 2003 and I think all will work.

 

Cheers,
Chris Madden

Storage Architect, NetApp EMEA (and author of Harvest)

Blog: It all begins with data

 

If this post resolved your issue, please help others by selecting ACCEPT AS SOLUTION or adding a KUDO or both!

andrewramos
14,459 Views

That did the trick! I mustve changed the port to 81 when troubleshooting the original issue and didnt note it which Im pretty certain was resolved by setting the ownership of the whisper directory so I'll mark that as the resolution to the original question but thanks for helping troubleshoot two issues!

 

Looking back at the original thread i was working off of that guy seemed to experience a similar behavior at the end where it stopped reporting so I'm wondering if he had the same issue but he never replied back and I didnt check to see if he started a new thread. The putty log from when i removed the directory is a little sloppy but I dont see anything out of line that would effect permissions. I dont see how removing a subdirectory would either but maybe it's worth looking into. 

 

http://community.netapp.com/t5/OnCommand-Storage-Management-Software-Discussions/netapp-harvest-graphs-not-showing-more-than-1-day/td-p/110793

 

Public