How to install Graphite and Grafana

by Extraordinary Contributor on ‎2015-09-08 06:39 AM

 

Introduction: This guide has basic installation steps for the Open Source software Graphite and Grafana. These software are commonly used with OnCommand Performance Manager (OPM) and/or NetApp Harvest available on the ToolChest.  Although you can search for and use Graphite and Grafana using instructions on the internet, this guide will provide a tested recipe on RHEL 6 & 7, and Ubuntu, to get you up and running fast.

 

Step 1:

Learn about the possible solution components Graphite, Grafana, OnCommand Performance Manager, and NetApp Harvest in Chapter 1.

 

Step 2:

Prepare for the installation by determining the hardware requirements, installing the base Linux operating system, and opening firewalls in Chapter 2.

 

Step 3:

Install Graphite and Grafana on Ubuntu using the steps in Chapter 3, or RHEL using the steps in Chapter 4.

 

Step 4:

Verify the installation was successful by executing some verification tests using the steps in Chapter 5.

 

Step 5:

If any issues were encountered, see Chapter 6 for some troubleshooting steps.

 

Comments

Something to undestand.

There's a video on NetApp SE YouTube channel that reports an OVA version of the entire package with new interfaces, dashboards and so on.

And here there's a link to a new 1.4 version composed by the single parts to install.

 

Here's a link to the OVA. Is not from NetApp, is it good?

 

http://ybontap.tynsoe.org/wordpress/graphite-va/pcd-va-release-notes/

s
ept. 11 comment: starting today that site is password protected, so I don't have idea on how to get it.

 

 

Frequent Contributor

Hi,

 

That OVA is not public but NetApp SE and Graphite Communities are the support pathways for Graphite (setup, configuration, chart creation, etc.).  CSS is limited to troubleshooting the push of data from OPM to a Graphite instance (using the External Data Provide facility in the OPM Maintenance Console) - OPM docs have the details for OPM to Graphite configuration.

 

We are working on a better way to distribute it but not there yet

 

Thank you

It could be not public but in the video I've seen on NetApp channel there's a moment where that name can be read.
By the way. Dowloaded, installed, up and running. Very nice, good enhancements. New dashboards and metrics appreciated.

 

My Company is SSC/Star partner here in Italy so I don't think we need assistance from a NetApp SE Smiley Happy

And I think is strange that there's a public video speaking of this new OVA and the OVA itself is not available...

Regards

Extraordinary Contributor

As a follow-up to the discussion about the VA, recently the NetApp Harvest data collector was posted to the NetApp Support ToolChest. Previously Harvest was restricted availability via a NetApp SE, much like the VA, but it is now available to all customers and partners. So if you don't have access to the VA, or prefer to install and scale the system yourself, you can now use the instructions in the PDF of this article to install Graphite+Grafana, and then download and install the NetApp harvest data collector from the Support ToolChest to get the same result the OVA provides.  In a recent blog post of mine I have a short video of the solution and the steps to accomplish, and again, this is available to any customer or partner today!

 

Cheers,
Chris Madden

Storage Architect, NetApp EMEA

 

Hi Chris,

 

thank you very much for the update.

By the way the new release of performance dispayer is very beatiful and also the already prepared free dashboards are really good! Much more than the announced features of OCPM 2.0 Smiley Happy

 

 

 

grphite.png


gra.png

 Hi,

 

I'm using this tool for months and it's an excellent one and delevoped (finally) for monitoring usage.

This whitepaper it's useful indeed, even if I found few documents regarding its customization and/or further specific implementation.

I already created different dashboards, in order to have a quick/different views about some parameters, as for example in order to avoid the aggr0 statistics on the aggregates' statistics per node. And it works, even if it's quite difficult to know what it's possible to do without a specific reference guide. In fact, I'm referring to graphite/grafana docs in order to do that.

Still, I noticed that the only NFSv3 is included. I have many SVMs with the only NFSv4 and in this case the dashboards are missing important statistic data indeed.

In theory, as the statistics command provides some counters about NFSv4 this could be possible.

For this reason, I would like to know how I can implement this additional protocol to Grafana.

 

But again, these are not critics because this tool is the best I ever used in order to monitor Netapp Filers

 

Extraordinary Contributor

Hi mark_display,

 

If using OPM as the data source it is not possible to customize the counter list but if using Harvest from the toolchest it is possible.  In the user guide for Harvest in section 10.2.3 is a small discussion of how to add collection for an additional object like nfsv4.  If the nfsv4 counter and instance list is similar to nfsv3 it might be just a little copy/paste work and a few other tweaks, and then the same in the Grafana dashboards.  It is also on my 'to-do' list to add to the Harvest default templates and dashboards but I haven't had time to do it yet unfortunately.

 

Cheers,
Christopher Madden

Storage Architect, NetApp EMEA (and author of Harvest)

Hi Christopher,

 

many thanks for your advice!

I'm using the OVA v.2.1.0-pre1 with Harvest as data source.

Far as I have understood, probably it would be better to install a Linux system with the Harvest from the toolchest and use that in order to customize and/or add additional objects, as you suggested.

It should be also possible to export the dashboards settings and import them but actually I don't know about the hystorical data, which are also quite important for us.

By the way, if it won't be possible (to migrate hystorical data) I will take them both for a while, stopping the polling on the old one.

what do you advice?

if you confirm me that I will try asap this new environment Smiley Happy

 

thank you again & Best Regards

 

 

Great tool! 

 

Extraordinary Contributor

Hi @mark_display ,

 

If you want to take your data with you it's as simple as copying the /opt/graphite/storage/whisper, or /var/lib/graphite/whisper directory to your new system.  So just stop Harvest and carbon on your old host, copy those directories to your new host, and then start things up on your new host.  If you have both old and new running and want to merge data between files see the whisper-fill utility which basically copies over data points that are missing.  To use it on your whole metrics hierarchy you'd need a small script to fill new ones from your old.

 

Hope that helps!

 

Cheers,
Christopher Madden

Storage Architect, NetApp EMEA (and author of Harvest)

Hi Chris

 

Any update on where/when the pre built virtual aplliance ova will be available to customers ?

 

I have reached out to my SE , but im waiting to hear back.

 

I heard on a TechONTAP podcast where it was being discussed it was iminent and witing on licensing or something ?

Hey All,

 

I know an OVA would be much easier, but if anyone is considering going about this per the instructions above, it's really not too bad. It took me about 4 hours or so (with interruptions), and i'm not even a linux guy. Just follow the instructions exactly.

 

Good luck!

 

Change request submitted.

 

Ubuntu downloaded.

 

Nearly ready to give it a go ;-)

Agreed...it's pretty painless to install. Took us about 3 hours end to end.

 

Very nice :-)

This was a simple install.  One question though.  When would one want to use InfluxDB or Cyanite instead of the carbon/whisper DB?  Is it a question of workload and scalability?

 

I apologize if this should be in another thread, but I felt it was relevant enough to ask here.

 

Thanks!

Extraordinary Contributor

Hi Waynemcc,

 

Glad you got up and running quickly!  Regarding other time series databases indeed scalability would be the main reason I would look to them.  Graphite and whisper are rock solid, have lots of functions you can use to render the data, and it scales to a few hundred thousand metrics/min without getting fancy.  For my use cases so far it's been enough.  But looking to the future I wouldn't be surprised if one of the challengers betters Graphite/whisper and becomes my preferred 'default' metrics store.  Which one will it be?  No idea, time will tell.

 

Cheers,
Chris Madden

Storage Architect, NetApp EMEA (and author of Harvest)

Blog: It all begins with data

HI @madden,

 

many thanks for your answer.

It helps for sure! Smiley Happy

 

Meanwhile I installed a RedHat 7 system and the netapp-harvest. I preferred to install the new grafana version, at first the 2.5 but since I noticed some issues (title bars didn't collapse etc) I installed the pre-2.6 version which fix some issues with the 2.5.

Then I installed the new dashoboards and almost everything is fine, apart some errors in 7-mode and c-mode graphs but I'm working on that. (ex. deferred_back-to-back_CP etc)

NFSv4 implemented without problems.

I also changed the node "disk utilization" in order to exclude the aggr0 (hosting only the root).

 

Still, now I have a request in order to implement volume's checks per node rather than per SVM.

I checked but I'm not sure if I could simply add a new section within the dashboard, adding it to the volume_summary one, and collect the values using the "c-dot volume" plugin or it useless because the plugin itself should be changed for this scope.

Could you advice me about ? 

Cheers

Extraordinary Contributor

Hi Mark,

 

Please send me a direct message with any mistakes you found in the dashboards (be as detailed as possible, screenshots might help) so I can fix them.

 

Regarding volumes, today I put them in the SVM hierarchy because they are owned by the SVM and over time might be moved around aggrs and nodes.  There is a common request however to diagnose which volume(s) are making a node busy.  I wrote a small plugin (which still needs to be cleaned up and is why not released yet) that you run from Harvest on the graphite server itself that connects to the cluster, finds all vols + svm + current node/aggr, and then creates symlinks per vol in node.<nodename>.aggr.<aggrname>.vol.<volname>  that point to the correct dir with the metrics files in svm.<svmname>.vol.<volname>.  In Graphite this allows you to find the busiest volumes on an aggr.  This technique doesn't require any extra metrics or diskspace and allows for vol moves without leaving behind stale metrics files.  It does require you to run a harvest (that might not be your main collector, just one to update the symlinks) on the graphite server itself.  So it's more work to setup, but I don't know of another solution.  I also want to do this to map lifs to their current home port to enable you to see which LIFs are consuming a port.

 

Once I have something ready I will annouce on my blog.

 

Cheers,
Chris Madden

Storage Architect, NetApp EMEA (and author of Harvest)

Blog: It all begins with data

hi Chris

 

is there any way to monitor onTap v8.0.5  7-Mode with Harvest? Harvest note says enalbe TLS but it is not an option on 8.0.x and i cannot get it working without. i have other system running 8.x.x and they are all working fine. any suggestion would be much appreciated.


Regards

TA

Member

I am having some issues with a couple systems that I initially created pollers for when Graphite was not working properly. Many of the panels are showing "Timeseries data request errors" for these two systems. All of the other systems added afterward appear to be working properly. 

 

Is there an easy way to clear these issues, or remove the systems completely from the database and recreate them?

 

The ! corners in the panels say the "Timeseries data request errors" when you hover over them. It definitely seems to be related to graphite.

 

 

UPDATE: Resolved issue. There were several corrupted wsp files. This probably happened when the volume filled up before I moved them to a separate file system.

 

Basically just ran this to delete all empty wsp files - sudo find <root whisper path> -type f -empty -delete

 

 

 

TimeSeries.png

 

When I click on it.

 

ErrorMessage.png

 

 

 

 

Extraordinary Contributor

Hi @TARIQALIJAN 

 

The need for TLS comes from the the SSL libraries from the OS.  I have seen that TLS is required on Debian/Ubuntu but that it is not required on RHEL/CentOS. If you really need that old 8.0 system (and can't upgrade it to 8.1) then maybe you can run Harvest from RHEL/CentOS?

 

Cheers,

Chris

Frequent Contributor

Yes, it really depends on the security policy of the distribution, and the libssl versions. Some versions of SSLv3 are now known to have important flaws, hence the requirement for TLS enforced in the NetApp SDK in latest releases (and there was something else going on way before that as well).

thanks both

 

I really appreciate your repsonse. i will try RHEL till we upgrade our OnTap to 8.2.x. thanks again.

 

Regards

TA

 I'm having a problem. I have been following following documents:

  • Netapp Harvest installation and Administration guide 1.2.2
  • Quick start:Installing graphite and Grafan

The test mentioned in the manuals are running as expected. However I don't see the storage controllers appearing in Grafana and as a result I don't see datapoints... Pointing me in the direction I have to look for would be nice...

 

Thanks for helping me out.

 

Regards,

Rik Daniels

Frequent Contributor

What did you configure as the datasource in Grafana?

hi @madden @yannb

 

I have now installed haverst/graphite/grafana on CentOS but still not seeing controller with version 8.0.x although i can see the ones with 8.1.x with TLS. is there anything i need to be chaning in the configuration for it to work?

 

Regards

TA

Extraordinary Contributor
@TARIQALIJAN Yann mentioned ssl is also blocked in newer sdk releases. Maybe try sdk 5.3 or 5.3.1? From the normal software download page at the very bottom is a drop down box where you can choose an older release. Chris

hi Chris

 

I am already using sdk 5.3.1. do you think i should down grade it further to 5.3?  would it work with latest haverst (v1.2.2)

 

Regards

Tariq

Extraordinary Contributor
@TARIQALIJAN I would try 5.3. This was the release I originally used a couple years ago

@madden

removed sdk5.3.1 and replaced the files with v5.3 and it worked like a charm. thanks Chris. 

 

Regards 

TA

Hello,

 

I've a strange issue here... I restarted the server after migrating the VM to a different host and now, 2 weeks later, I can only see 24 hours of historical data. Across all nodes and all graphs. I haven't touched any config files. How can I fix this?

 

Regards,

Igor

Extraordinary Contributor

@IGORSTOJNOV

 

Sounds like your storage-schemas.conf file is not configured correctly.  See this post for more and how to resolve.

 

Cheers,
Chris

 

Hi Chris,

 

Been setting up Harvest in our einvironment and noticed that some of the graphs have no data points.  I'm getting most items graphed like latency, IOPS, & Node Disk Util, but not others like Node Capacity Used, Top SVM Capacity Used, & Top Aggr % Used.

grafana-pic1.JPG

 

Detail of the 'No DataPoints' graph:

grafana-pic2.JPG

 

I am seeing data in Graphite charts:

grafana-pic3.JPG

Any suggests on how to troubleshoot missing data points on some of the graphs?  How can I tell if the data source of hte missing point is from the filer or OUM?

 

 

Thanks

 

PatS

PatS,

Are your aggregates named *root* or *aggr0*. If so then under the metrics for capacity, remove them from the exclude function. I had the same problem and i had to edit that metric to get the capacity data points 

 

capacity.PNG

 

 

Extraordinary Contributor

@PSimmons

 

Storage capacity metrics (netapp.capacity.*) are submitted by an OCUM poller.  Did you set this up already?  If you did then I would check the specific OCUM server log in /opt/netapp-harvest/log for hints on what is wrong.  If you didn't, please see the Harvest Admin guide for info how to set this up.

 

Hope this helps!

 

Cheers,

Chris

@madden& @mzp

 

Thanks for hte input and help.  I looked over the poller logs and did not see thing a miss.  So I nuked the SDK and Harvest parts then re-installed with my save config files and that seemed to get things working.  I am getting Node and SMV graphs in Grafana now.

 

On a side note Chris, for a future blog entry, how about a topic of how to create custom metric to graph in Grafana from the thousand of data collected.

 

 

Thanks again

 

 

Pat

Christophe,

 

Concerning your quote of "‎2015-11-26 10:05 AM " and " to find the busiest volumes on an aggr."

 

Did you complete the plugin yet ? If so, can you prive the information how to implement it ?

 

Greetings,

Kris Boeckx

 

 


madden wrote:

 

Regarding other time series databases indeed scalability would be the main reason I would look to them.  Graphite and whisper are rock solid, have lots of functions you can use to render the data, and it scales to a few hundred thousand metrics/min without getting fancy.  For my use cases so far it's been enough.  But looking to the future I wouldn't be surprised if one of the challengers betters Graphite/whisper and becomes my preferred 'default' metrics store.  Which one will it be?  No idea, time will tell.

 

Cheers,
Chris Madden

Storage Architect, NetApp EMEA (and author of Harvest)

Blog: It all begins with data

 


 

Great work on this Chris!  Very glad to see NetApp embracing this technology and providing this to their customers.

 

We have a large graphite/grafana system that we've been using for the last year.  We just recently found out about Harvest and deployed it in our lab last week so we could begin to kick the tires.  Being able to overlay NetApp metrics with our other infrastructure is something we've wanted to do.

 

With regards to other time series databases, do you have anything on the roadmap enabling this to function with a different backend (other than graphite like influxdb, prometheus, cyanite, opentsdb, etc)?  Do you have any customers using Harvest with their time series database system located in a SaaS offering?

 

thanks,

Matt

Extraordinary Contributor

Hi @mattbowden

 

Thanks, and glad you like it so far! For your questions, since Harvest sends data in the graphite newline seperated format it is pretty easy to get it into any timeseries db you want.  Creating dashboards though is quite a bit more work, and not something that I would enjoy to do again for other timeseries dbs, so you are on your own if you want a different one.  For SaaS offerings, I don't know of anyone using a hosted option, but Harvest shouldn't be any different from other data sources. 

 

Cheers,
Chris

 

 

Hi,

 

any update on the pre-build ovf ?

 

Thanks!

I was able to get harvest setup and have verified that it's collecting data from our cluster (I can see appropriate whisper files and data in graphite).  The harvest logs seem clean, no errors that would indicate issues.  However, the drop downs at the top for Group, Cluster, SVM , and TopResources have only "None" as an available option...it doesn't actually show any data from our environment.  The graphs just show test-series-0 metrics.  

 

Can someone point me in the right direction of what to troubleshoot? I can't edit these drop downs like a normal grafana graph to see where they're grabbing these data sources.

 

Screen Shot 2016-02-19 at 2.58.37 PM.png

Frequent Contributor

Can you go into the graphite user interface (/graphite if using the virtual appliance) and open the folders on the right so we see the hierarchy?

 

You can also go into the whisper directory and do a couple of ls in one or two subdirectories.

 

From experience, I suspects you would have maybe some special characters into your site names or cluster names?

Hmm...maybe the dashes in the cluster name na-atc-cl1?

 

Screen Shot 2016-02-19 at 4.01.54 PM.png

 

 

Screen Shot 2016-02-19 at 4.07.29 PM.png

Frequent Contributor

 

I think you are not "talking" to graphite at all, these are just test series.

 

Can you show your datasources?

I get the feeling I'm missing something obvious.  The graphite data source is there and tests ok.  I can create a new graph wih the graphite metrics.  I just can't get anything to autopopulate from the drop downs in the imported harvest graphs.  

 

 

Screen Shot 2016-02-19 at 4.34.10 PM.png

 

Screen Shot 2016-02-19 at 4.34.24 PM.png

 

Screen Shot 2016-02-19 at 4.35.17 PM.png

Argh, I just noticed that "default" was not checked for the graphite data source.  After checking that everything is working.  I wouldn't have seen that without your help yannb.

Frequent Contributor

Happy it's working!

Having issues with collecting data from 7-mode filers;

From netapp-harvest-logging:

[2016-02-22 12:53:54] [NORMAL ] WORKER STARTED [Version: 1.2.2] [Conf: netapp-harvest.conf] [Poller: nlnaf26]
[2016-02-22 12:53:54] [NORMAL ] [main] Poller will monitor a [FILER] at [nlnaf26:443]
[2016-02-22 12:53:54] [NORMAL ] [main] Poller will use [password] authentication with username [aodfm01a] and password [**********]
[2016-02-22 12:53:54] [WARNING] [sysinfo] Update of system-info cache DOT Version failed with reason: No response received from server


It seems there might be a connection issue to port 443.

Are there some options which need to be set (at the filer?)

 

Does someone know?

 

For cDOT it works very fine !

 

 

Frequent Contributor

Try setting TLS on :

 

options tls.enable on

Great. It works. Thanks very much.

 

I see, it can be find in the Installation document of NetApp-harvest 5.1  I overlooked.

 

Maybe, Chriss Madden, can add the error "[WARNING] [sysinfo] Update of system-info cache DOT Version failed with reason: No response received from server" with solution "options tls.enable on"

 

Regards, Maarten de Boer

 

 

Warning!

This NetApp Community is public and open website that is indexed by search engines such as Google. Participation in the NetApp Community is voluntary. All content posted on the NetApp Community is publicly viewable and available. This includes the rich text editor which is not encrypted for https.

In accordance to our Code of Conduct and Community Terms of Use DO NOT post or attach the following:

  • Software files (compressed or uncompressed)
  • Files that require an End User License Agreement (EULA)
  • Confidential information
  • Personal data you do not want publicly available
  • Another’s personally identifiable information
  • Copyrighted materials without the permission of the copyright owner

Files and content that do not abide by the Community Terms of Use or Code of Conduct will be removed. Continued non-compliance may result in NetApp Community account restrictions or termination.