Active IQ Unified Manager Discussions

grafana timeseries data request errors

tim_dicon
4,329 Views

Hi all,

 

I've followed the Graphite_Grafana_Quick_Start_v1.4.pdf and NetApp_Harvest_IAG_1.2.2.pdf, added some clusters and the stats/graphs are all good for most of the clusters added.  However for the 1st 6 node cluster added to the setup, I see a lot of N/As and "!" in grafana dashboards,  hovering over the ! i see the detail "timeseries data request errors". 

 

If I use the grafana node dashboard, and look at 'node1' there are only a couple of graphs with a  !.  However if I look at 'node4' it's pretty much every graph with a !.

 

If I look at top CPU domains specifically for the same cluster and nodes in Graphite, for the nodes drawing that graph correctly in granfana I see the stats in Graphite.  But as soon as I add a CPU domain stat for a node that produces ! in grafana, I lose all stats in the graphite browser/composer.

 

I assume some of the stats are not being polled into Graphite correctly, but am unsure where to look next to RCA. Can anbody help?

 

Thanks Tim 

3 REPLIES 3

tim_dicon
4,314 Views

additional info, after reviewing the setup:-

 

during setup of /etc/httpd/conf.d/graphite-vhost.conf I originally did NOT update the path and remove the /conf as per section 4.3.3.2.b

 

<Directory /opt/graphite/conf/>
Order deny,allow
Allow from all
<Directory>

 

change to for RHEL 6

 

<Directory /opt/graphite/>
Options All
AllowOverride All
</Directory>

 

I have now removed it and restarted the server, but do not see any difference in the graphs in grafana, the previoulsy problematic ones are still displaying the ! (Timeseries data request error)

 

I have also queried /opt/graphite/storage/log/carbon-cache/carbon-cache-a/console.log and it is full of errors like below:-

 

26/11/2015 14:40:53 :: Error writing to /opt/graphite/storage/whisper/netapp/perf/EUDC/eu-cnas01/svm/eu-nasd-01/vol/db_tst_a18/qos_latency.wsp
26/11/2015 14:40:53 :: Unhandled Error
Traceback (most recent call last):
File "/usr/lib64/python2.6/site-packages/twisted/python/threadpool.py", line 207, in _worker
result = context.call(ctx, function, *args, **kwargs)
File "/usr/lib64/python2.6/site-packages/twisted/python/context.py", line 118, in callWithContext
return self.currentContext().callWithContext(ctx, func, *args, **kw)
File "/usr/lib64/python2.6/site-packages/twisted/python/context.py", line 81, in callWithContext
return func(*args,**kw)
File "/opt/graphite/lib/carbon/writer.py", line 158, in writeForever
writeCachedDataPoints()
--- <exception caught here> ---
File "/opt/graphite/lib/carbon/writer.py", line 137, in writeCachedDataPoints
whisper.update_many(dbFilePath, datapoints)
File "/usr/lib/python2.6/site-packages/whisper.py", line 577, in update_many
return file_update_many(fh, points)
File "/usr/lib/python2.6/site-packages/whisper.py", line 584, in file_update_many
header = __readHeader(fh)
File "/usr/lib/python2.6/site-packages/whisper.py", line 220, in __readHeader
raise CorruptWhisperFile("Unable to read header", fh.name)
whisper.CorruptWhisperFile: Unable to read header (/opt/graphite/storage/whisper/netapp/perf/EUDC/eu-cnas01/svm/eu-vnasd-01/vol/db_tst_a18/qos_latency.wsp)

madden
4,284 Views

Hi,

 

 

This error is the root cause:

whisper.CorruptWhisperFile: Unable to read header (/opt/graphite/storage/whisper/netapp/perf/EUDC/eu-cnas01/svm/eu-vnasd-01/vol/db_tst_a18/qos_latency.wsp)

 

So that is telling you that you have a corrupt whisper file.  If a file is corrupt accesses to it from the API will fail.  Likely if you clicked on the Grafana timeseries warning notice and clicked through the debug tabs you would have seen the same error.  Given there is a whole traceback it isn't so obvious.

 

I have seen a corrupt metrics file at just one other installation since working with Graphite at dozens of customers.  Maybe your disk was full and this could be the cause?

 

Searching google for that error above might give you some tips on how to find them all or maybe grepping them from the logs could work too.  Once you have them, assuming you have little history, just delete them and let them be recreated.

 

Cheers,
Chris Madden

Storage Architect, NetApp EMEA (and author of Harvest)

Blog: It all begins with data

 

P.S.  Please select “Options” and then “Accept as Solution” if this response answered your question so that others will find it easily!

tim_dicon
4,215 Views

Hi Chis,

 

1st off..  Thanks for this (harvest, graphite, & grafana) setup .  It's amazing.. and certainly fills a big hole in perf stats from 7 mode, that is not presently avail direct from Netapp for cdot (OPM is coming along, but this is way better :), and OCI is mega $$$s.)

 

Re the orginal issue, yes i did hit 100% on /opt, and wondered if this was part of the issue (sorry for not mentioning it before!).  I will find the effected wsp files and delete them and see where I get...

 

Thanks so far.

 

Tim

Public