Active IQ Unified Manager Discussions

NetApp-Harvest 1.6.1 updater + some news about Harvest 2.0

vachagan_gratian

 

Dear Harvest users,

 

First of all, apologies for not responding to your questions lately, I was too busy but I'll try to get back to unanswered messages during the next week. Here is some good news: we released a Harvest updater to fix some issues in Harvest 1.6 and add some requested features. We don't go for an official release, since that would take a lot (more) time. The updates include:

 

  • Support for SSL authentication in Harvest Extensions
  • Fixing bug in the extension snapmirror_replications.py
  • New extension to collect capacity counters (without OCUM)
  • Caching resolved Graphite hostname (previously sending metrics to Graphite could add pressure on your DNS if caching was not configured in your server/network).

Here is how to run the updater:

  • Download the updater package here
  • Verify MD5 checksum:
$ md5sum harvest_updater_161.tar.gz 
> 1923977dee44366080ca19e724ad4650  harvest_updater_161.tar.gz
  • Unextract the package somewhere on your Harvest server, e.g.:
$ tar -xzvf harvest_updater_161.tar.gz -C /tmp/
  • Stop all harvest pollers
  • Run the updater:
$ cd harvest_updater_161/
$ ./harvest_updater
  • Restart Harvest

The updater adds three Grafana dashboards which you'll need to manually import in the Grafana webgui to use/update them:

/opt/netapp-harvest/grafana/db_netapp-detail-nfs-connections.json
/opt/netapp-harvest/grafana/db_netapp-detail-snapmirror.json
/opt/netapp-harvest/grafana/db_netapp-detail-volume-capacity.json

 

Reversing the update. Before the updater changes any files, it will create a backup in /opt/netapp-harvest/backup/harvest_updater_16100/, so if something goes wrong, you can reverse the update by:

$ ./harvest_updater --reverse

 

Second of all, many of you are asking about Harvest 2.0 and about replacing Graphite. We are well aware of the scalability issues of Graphite, and while we will continue supporting Graphite, our main backend in Harvest 2.0 will be (most likely) Prometheus. Unfortunately I can't give estimation of a release date, since at the moment we are trying to get more manpower behind this project, but I can tell you that Harvest 2.0 is our main focus at the moment.

 

Finally if by any chance you have written a Python module to send performance metrics to Prometheus and you want to contribute to an open-source project (Harvest 2.0 will be on Github!), please get in touch with me.

 

Cheers,

 

Vachagan

50 REPLIES 50

jakari

And it's expired again.


I'll do the 1.6 update anyway (9.7 update broke our old 1.4 install) but either 1.6.1 needs to be rounded up into the normal download package or... 1.7?

 

(I know this is rough, since time is better spent getting 2.0 ready!)

 

Hi,

 

unfortunately link is expired again 🙂

 

Cheers & thx

Florian

Hi Florian,

 

Thanks. I know, I found something new in one of the extensions that I'll need to fix and I'll post a new link soon.

Hello,

 

Seems the link expired again 😞

 

Not sure if it's related to snapmirror features, but our MetroCluster cannot be graphed by netapp-harvest 1.6.0

 

via the OCUM section, we get this in the log file :

 

 

[2020-07-10 10:57:46] [WARNING] [sysinfo] Discovered [SUMMERMCDC1] on OCUM server but unable to submit metrics because no matching conf section found; to collect                                       this cluster please add a section.
[2020-07-10 10:57:46] [WARNING] [sysinfo] Discovered [SUMMERMCDC2] on OCUM server but unable to submit metrics because no matching conf section found; to collect                                       this cluster please add a section.

 

 

via the dedicated section in the harvest configuration file, I get : 

 

 

[2020-07-10 10:57:46] [NORMAL ] [main] Metrics will be submitted with graphite_root [netapp.perf.***************]

 

 

But there is absolutely nothing in the graphs (no IOPS, no latency, etc...)

 

2020_07_21_11_22_17_Window.png

 

any ideas ?

Thanks,

GS

bkamil

Thanks @vachagan_gratian !

I'm getting these in _snapmirror_replicatons.log log:

[2020-03-20 12:42:13,859] [ERROR] [find_missing_nodes] ZAPI request failed: either instances or instance-uuids must be given
[2020-03-20 12:45:56,091] [WARNING] [timeout_handler] Extension timeout exceeded. Terminating process
[2020-03-20 12:46:27,599] [ERROR] [find_missing_nodes] ZAPI request failed: either instances or instance-uuids must be given
[2020-03-20 12:47:26,989] [ERROR] [find_missing_nodes] ZAPI request failed: either instances or instance-uuids must be given

 

Regarding the "python_extension_methods_v12.py" file:

If I do the copy like you suggested the existing file "python_extension_methods.py" won't get overwritten - not sure if that desired. I tried both ways, overwritting an existing file and copying as a new one, same result.

 

 

 

vachagan_gratian

@bkamil , hmm, this is strange.. Which Ontap release do you use? Also, could you run the extension in verbose and share the full logs with me?

 

No need to overwrite the extension methods files. The old file will be still required by the other extensions (so you should have both, python_extension_methods.py and python_extension_methods_v12.py in /opt/netapp-harvest/extension/).

 

I'm getting:

[ERROR] [get_snapmirrors] ZAPI request failed: [SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed (_ssl.c:618)

 

With Ontap 9.5P10, everything else works with harvest.

bkamil

One more thought.

How to check if Harvest AutoSupport actually works or not?

My Harvest servers can only access the Internet via proxy so just wanted to make sure you guys will receive the useful info.

 

Appreciate all the great work being done on this project!

 

vachagan_gratian

As far as I know, no internet is required for AutoSupport, so we'll get the stats. Thank you so much 🙂

Cbrown

Awesome!  Thanks for the update! 

 

I had a successful update but I'm getting Permission denied when I update the dashboards.  Any ideas or did I miss a step? 

 

[root@HOST harvest_updater_161]# /opt/netapp-harvest/grafana/db_netapp-detail-nfs-connections.json
-bash: /opt/netapp-harvest/grafana/db_netapp-detail-nfs-connections.json: Permission denied

[root@HOST harvest_updater_161]# /opt/netapp-harvest/grafana/db_netapp-detail-snapmirror.json
-bash: /opt/netapp-harvest/grafana/db_netapp-detail-snapmirror.json: Permission denied

 

I noticed for the snapmiorror metric, it's trying to pull from my graphite server this metric but it doesn't exist: netapp.perf.$Group.$Cluster.node.$Node.snapmirror.src.*   I have 10 CDOT clusters ranging from 9.3.P6 to 9.5 and that snapmirror metric doesn't exist.  

 

 

 

vachagan_gratian

As for the metrics not existing, have you activated the extension? I would check the logs of the extension (/opt/netapp-harvest/log/CLUSTER_netapp-harvest_snapmirror_replications.log) and if there are no messages, try to run the extension in foreground mode in verbose:

 

$ ./extension/snapmirror_replications.py -host <HOSTNAME> -user <USERNAME> -pass <PASSWORD> -v

or:

$ ./extension/snapmirror_replications.py -host <HOSTNAME> -auth_type ssl_cert -ssl_cert <SSL_CERT_FILENAME> -ssl_key <SSL_KEY_FILENAME> -v

 

Hi,

 

I fixed most of the things and when running the script with the "-v" switch I get:

Skipping send: no graphite root defined

 

Could that explain why I see no data? Where do I need to define my server?

 

I see in the files there are parameters:

 
REQUIRED:
_HARVEST_HOSTNAME - hostname/IP of ONTAP system
_HARVEST_AUTH_TYPE - either "password" (default) or "ssl_cert"
_HARVEST_CLUSTER - Name of ONTAP Cluster
_HARVEST_GRAPHITE_ROOT - prefix of metric path in Graphite
(e.g.: 'netapp.capacity.GROUP.CLUSTER'),
if prefix is "_", data will be collected
but not sent to Graphite.

 

Do I need to define them in each script? They don't inherit the info from the "old" harvest?

 

Also, do i need to schedule the scripts to be run separately or will they be run automatically by the "old" Harvest? I already ran the harvest script: 

./harvest_updater

bugfinder

basically I did run the plugins for testing with args like this:

/opt/netapp-harvest/extension/volume_capacity_counters.py -v -cluster mycluster -host myclusterhostname -user myuser -pass 'mypassword' -graphite_root 'netapp.capacity.mygroup.mycluster'

 

if called via the pre-post-exec plugin, these values are passed via the environment, but if you call the extension directly, you have to specify all these.

israelmmi

what do you mean by the "pre-post-exec plugin"?

I have the general harvest service running, but right now I have no data from these dashboards.

And there is no such log file /opt/netapp-harvest/log/CLUSTER_netapp-harvest_snapmirror_replications.log.

 

Is there a way for me to enable debug on a constant basis so I can see if this is running at all? Or is the fact that there are no debug logs means there is nothing running?

bugfinder

the basic structure of the setup for the harvest extensions is explained in docs/NetApp_Harvest_Extension_Manager_1.6.pdf 

on page 5 of 9 it talks about the extensions.conf file (probably meant extension.conf, but whatever ...)

 

basically at the end of the main config netapp-harvest.conf, in the section for your cluster you can have a line

"template = default,extensions.conf"

then in template/extensions.conf you can have the structure as explained in the howto cat uses the pre-post-exec plugin

to call scripts from the "command_list" variable and thus collect additional data.

When running the test command I get the following error:


[2020-02-04 10:05:25,763] [ERROR] [connect_zapi] Failed to import NaServer: No module named NaServer
[2020-02-04 10:05:25,763] [ERROR] [connect_zapi] Make sure NetApp SDK python package is available in /opt/netapp-harvest/lib/. Exiting

 

For this extension you need to manually install the NMSDK Python library. Here's how:

  • download NMSDK from here,
  • unextract netapp-manageability-sdk-*.zip
  • copy  contents of /netapp-manageability-sdk-*/lib/python/NetApp/ to /opt/netapp-harvest/lib/python/

That helped. Now I get this error:

 

[ERROR] [poll_snapmirrors] ZAPI request failed: Insufficient privileges: user 'netapp-harvest' does not have read access to this resource

 

 

you'll need to extend the user privileges, see explained here under 4.

I've played a bit with the idea about the volume capacity extension. I have to admit my love for python is not that deep, so I started by rewriting the new volume_capacity_counters.py in perl. Having done that, I added a little code to

add the vol_summary data as well to get the summary per SVM. And as things were so nice, I did the similar thing for

aggregates.

and a one-liner for netapp-worker to clean up the zombies:

--- a/netapp-worker
+++ b/netapp-worker
@@ -605,6 +605,9 @@ while (1)
# Check and rotate logfile if needed
check_and_rotate_logfile();

+ # clean up zombies
+ while ( waitpid ( -1, POSIX::WNOHANG ) >0 ) {};
+
my $sleep_time = ($counter_nextrun - time());
if ($sleep_time > 0)
{

 

the perl variant for volume_capacity_counters.pl will be attached, you'll probably have to adapt line 83/84 to

use the files from the SDK.

 

bugfinder

okay,  where to post files as extension to this ?

the terms of use tell me to not attach software files ...

I'd have extension/volume_capacity_counters.pl and extension/aggregate_capacity_counters.pl

I'm willing to freely share, but where to put them...

Announcements
NetApp on Discord Image

We're on Discord, are you?

Live Chat, Watch Parties, and More!

Explore Banner

Meet Explore, NetApp’s digital sales platform

Engage digitally throughout the sales process, from product discovery to configuration, and handle all your post-purchase needs.

NetApp Insights to Action
I2A Banner
Public