Active IQ Unified Manager Discussions
Active IQ Unified Manager Discussions
Hey Chris,
First off Harvest is such an awesome tool so thank you for setting up the process and continuing to support it. I had everything set up and working for several days now but noticed i was having the 1 day retention issue and quickly realized I had added the entries after the defaults in the storage-schemas.conf file so I corrected that and ran rm -rf /var/lib/graphite/whisper/netapp to delete the metrics as indicated in this thread:
At first it broke Grafana as the Dashboards & Data Source were no longer displayed but I was able to correct that by initializing the DB again by running graphite-manage syncdb and the data source shows up and connection tests successfully.
Now the issue is that the NetApp Whisper folder is not being created automatically in Graphite so no data is being displayed but everything appears to be working. I've confirmed carbon-cache, grafana-server, & apache2 are all running and restarted each several times. The pollers start successfully and are also running. When I do a test metric it shows up in graphite just fine so it appears to be an issue between Harvest/Graphite/DB. I've gone over the installation steps a few times but i'm not sure what I'm missing. Any light you can shed would be appreciated. Thanks
Solved! See The Solution
Hi @andrewramos
I would check the carbon logs:
Installed from source (RHEL): /opt/graphite/storage/log/carbon-cache/carbon-cache-a/creates.log
Installed from package (Ubuntu): /var/log/carbon/creates.log
My guess is filesystem permissions are preventing carbon from creating the files. If the logs agree then something like this should do the trick:
Installed from source (RHEL): # chown -R carbon:carbon /opt/graphite/storage
installed on from package on ubuntu: # chown -R _graphite:_graphite /var/lib/graphite/whisper
Let us know how it goes!
Cheers,
Chris Madden
Storage Architect, NetApp EMEA (and author of Harvest)
Blog: It all begins with data
If this post resolved your issue, please help others by selecting ACCEPT AS SOLUTION or adding a KUDO or both!
Hi @andrewramos
I would check the carbon logs:
Installed from source (RHEL): /opt/graphite/storage/log/carbon-cache/carbon-cache-a/creates.log
Installed from package (Ubuntu): /var/log/carbon/creates.log
My guess is filesystem permissions are preventing carbon from creating the files. If the logs agree then something like this should do the trick:
Installed from source (RHEL): # chown -R carbon:carbon /opt/graphite/storage
installed on from package on ubuntu: # chown -R _graphite:_graphite /var/lib/graphite/whisper
Let us know how it goes!
Cheers,
Chris Madden
Storage Architect, NetApp EMEA (and author of Harvest)
Blog: It all begins with data
If this post resolved your issue, please help others by selecting ACCEPT AS SOLUTION or adding a KUDO or both!
Hey @madden thanks for the quick reply.
The current creates.log just shows the test metrics being accepted without any errors. I see in the older logs where it was working properly.
I tried setting the permissions as you stated but the /var/lib/graphite/storage folder didnt exist. I tried updating the graphite-carbon install package but nothing was changed. I then tried to create the folder manually, set permissions, and restarted the services and pollers but still in the same boat. Also, fyi I running everything on a single Ubuntu 14.04 server. Thanks
Hi @andrewramos
Metrics are sent by Harvest to Carbon, so both must be running:
root@nps-nl-metrics:/var/log/carbon# service carbon-cache status * carbon-cache is running root@nps-nl-metrics:/var/log/carbon# service netapp-harvest status STATUS POLLER SITE ############### #################### ################## [RUNNING] nps-nl-cdot nps-nl
If something isn't running use the same as above but replace "status" with "start".
Next is to check the carbon logs:
root@nps-nl-metrics:/var/log/carbon# ls -ltr /var/log/carbon | tail -rw-r--r-- 1 _graphite _graphite 217 Aug 31 2015 listener.log.3.gz -rw-r--r-- 1 root root 399 Aug 31 2015 console.log.3.gz -rw-r--r-- 1 _graphite _graphite 662 Aug 31 2015 listener.log.2015_8_31 -rw-r--r-- 1 _graphite _graphite 590231 Feb 23 21:55 creates.log.1 -rw-r--r-- 1 _graphite _graphite 50601 Mar 10 15:05 query.log.1 -rw-r--r-- 1 _graphite _graphite 403 Mar 22 14:53 listener.log.2.gz -rw-r--r-- 1 root root 521 Mar 22 14:53 console.log.2.gz -rw-r--r-- 1 root root 723 Apr 25 07:22 console.log.1 -rw-r--r-- 1 _graphite _graphite 261 May 11 17:18 listener.log.1 -rw-r--r-- 1 _graphite _graphite 808 May 11 17:19 query.log
Review the ones with the most recent activity (the bottom ones in the list). See if you get any clues.
Next check the Harvest logs:
root@nps-nl-metrics:/var/log/carbon# ls -ltr /opt/netapp-harvest/log total 9912 -rw-rw-r-- 1 nps-nl-admin nps-nl-admin 5186677 May 11 21:23 nps-nl-cdot_netapp-harvest.log
Review the logs and see if you get any clues.
You can also start the poller in verbose mode to see it gives you more info:
root@nps-nl-metrics:/var/log/carbon# /opt/netapp-harvest/netapp-worker -poller nps-nl-cdot -v [2016-05-11 21:25:21] [NORMAL ] WORKER STARTED [Version: 1.2.2P1] [Conf: netapp-harvest.conf] [Poller: nps-nl-cdot] [2016-05-11 21:25:21] [WARNING] Started in foreground mode; messages to STDERR are redirected to the logfile and are not visible on the console. [2016-05-11 21:25:21] [DEBUG ] [conf] Line [17] is Section [global] [2016-05-11 21:25:21] [DEBUG ] [conf] Line [18] in Section [global] has Key/Value pair [grafana_api_key]=[XXXXXX=] [2016-05-11 21:25:21] [DEBUG ] [conf] Line [19] in Section [global] has Key/Value pair [grafana_url]=[https://localhost:443] [2016-05-11 21:25:21] [DEBUG ] [conf] Line [20] in Section [global] has Key/Value pair [grafana_dl_tag]=[] [2016-05-11 21:25:21] [DEBUG ] [conf] Line [26] is Section [default] [2016-05-11 21:25:21] [DEBUG ] [conf] Line [28] in Section [default] has Key/Value pair [graphite_enabled]=[1] [2016-05-11 21:25:21] [DEBUG ] [conf] Line [29] in Section [default] has Key/Value pair [graphite_server]=[localhost] [2016-05-11 21:25:21] [DEBUG ] [conf] Line [30] in Section [default] has Key/Value pair [graphite_port]=[2003] [2016-05-11 21:25:21] [DEBUG ] [conf] Line [31] in Section [default] has Key/Value pair [graphite_proto]=[tcp] [2016-05-11 21:25:21] [DEBUG ] [conf] Line [32] in Section [default] has Key/Value pair [normalized_xfer]=[mb_per_sec] [2016-05-11 21:25:21] [DEBUG ] [conf] Line [33] in Section [default] has Key/Value pair [normalized_time]=[millisec] [2016-05-11 21:25:21] [DEBUG ] [conf] Line [34] in Section [default] has Key/Value pair [graphite_root]=[] [2016-05-11 21:25:21] [DEBUG ] [conf] Line [35] in Section [default] has Key/Value pair [graphite_meta_metrics_root]=[] [2016-05-11 21:25:21] [DEBUG ] [conf] Line [38] in Section [default] has Key/Value pair [host_type]=[FILER] [2016-05-11 21:25:21] [DEBUG ] [conf] Line [39] in Section [default] has Key/Value pair [host_port]=[443] [2016-05-11 21:25:21] [DEBUG ] [conf] Line [40] in Section [default] has Key/Value pair [host_enabled]=[1] [2016-05-11 21:25:21] [DEBUG ] [conf] Line [41] in Section [default] has Key/Value pair [template]=[default] [2016-05-11 21:25:21] [DEBUG ] [conf] Line [42] in Section [default] has Key/Value pair [data_update_freq]=[60] [2016-05-11 21:25:21] [DEBUG ] [conf] Line [43] in Section [default] has Key/Value pair [ntap_autosupport]=[0] [2016-05-11 21:25:21] [DEBUG ] [conf] Line [44] in Section [default] has Key/Value pair [latency_io_reqd]=[10] [2016-05-11 21:25:21] [DEBUG ] [conf] Line [45] in Section [default] has Key/Value pair [auth_type]=[ssl_cert] [2016-05-11 21:25:21] [DEBUG ] [conf] Line [46] in Section [default] has Key/Value pair [ssl_cert]=[netapp-harvest.pem] [2016-05-11 21:25:21] [DEBUG ] [conf] Line [47] in Section [default] has Key/Value pair [ssl_key]=[netapp-harvest.key] [2016-05-11 21:25:21] [DEBUG ] [conf] Line [60] is Section [nps-nl-cdot] [2016-05-11 21:25:21] [DEBUG ] [conf] Line [61] in Section [nps-nl-cdot] has Key/Value pair [hostname]=[192.168.100.102] [2016-05-11 21:25:21] [DEBUG ] [conf] Line [62] in Section [nps-nl-cdot] has Key/Value pair [site]=[nps-nl] [2016-05-11 21:25:21] [NORMAL ] [main] Poller will monitor a [FILER] at [192.168.100.102:443] [2016-05-11 21:25:21] [NORMAL ] [main] Poller will use [ssl_cert] authentication with ssl_cert [netapp-harvest.pem] and ssl_key [netapp-harvest.key] [2016-05-11 21:25:21] [DEBUG ] [connect] Reverse hostname lookup successful. Using HTTP/1.1 for communication. [2016-05-11 21:25:21] [DEBUG ] [sysinfo] Updating system-info cache [2016-05-11 21:26:21] [WARNING] [sysinfo] Update of system-info cache DOT Version failed with reason: in Zapi::invoke, cannot connect to socket
So in my case I see the poller cannot connect to the storage system, which I can go troubleshoot further.
If it's able to connect you'll see a lot of messages fly by, and after 60-120s the first metrics will be sent to Carbon. A metric being sent looks like:
[2016-05-11 11:01:10] [DEBUG ] M= netapp.perf.dev.blob1.svm.asp-nfs-vvol.vol.rootvol.qos_ops 0.166666666666667 1462957264
If it fails to hand them off to Carbon it will show a warning, otherwise it gave them to something!
There is also the good 'ol troubleshooting step of a reboot 🙂
Cheers,
Chris Madden
Storage Architect, NetApp EMEA (and author of Harvest)
Blog: It all begins with data
If this post resolved your issue, please help others by selecting ACCEPT AS SOLUTION or adding a KUDO or both!
I havent had much time to since last week to troubleshoot but most of the services seem to work properly. I've attached some logs and output but from what I can tell it appears to be collecting metrics and sending them to carbon but whisper is not creating anything.
I think I found it:
[2016-05-12 13:42:23] [DEBUG ] [conf] Line [30] in Section [default] has Key/Value pair [graphite_enabled]=[1] [2016-05-12 13:42:23] [DEBUG ] [conf] Line [31] in Section [default] has Key/Value pair [graphite_server]=[10.120.126.115] [2016-05-12 13:42:23] [DEBUG ] [conf] Line [32] in Section [default] has Key/Value pair [graphite_port]=[81]
By default carbon is listening on 2003 and the web interface is on 81. Change this to 2003 and I think all will work.
Cheers,
Chris Madden
Storage Architect, NetApp EMEA (and author of Harvest)
Blog: It all begins with data
If this post resolved your issue, please help others by selecting ACCEPT AS SOLUTION or adding a KUDO or both!
That did the trick! I mustve changed the port to 81 when troubleshooting the original issue and didnt note it which Im pretty certain was resolved by setting the ownership of the whisper directory so I'll mark that as the resolution to the original question but thanks for helping troubleshoot two issues!
Looking back at the original thread i was working off of that guy seemed to experience a similar behavior at the end where it stopped reporting so I'm wondering if he had the same issue but he never replied back and I didnt check to see if he started a new thread. The putty log from when i removed the directory is a little sloppy but I dont see anything out of line that would effect permissions. I dont see how removing a subdirectory would either but maybe it's worth looking into.