Solved: NetApp-Harvest

Shadowkrusha · ‎2015-11-11

I just wanted to say thank you for making the tool available.

Within a few hours of it collecting data, we already knew more about what the storage was doing than before 🙂

A minor note. The init.d script needs a #!/bin/bash on the first line (well at least under RedHat 7.1) otherwise it gets an exec error when the system starts, and doesn't start the pollers.

Looking to see futher into the capability, is there a way to collect snapmirror statistics?

Iain

madden · ‎2015-11-11

Hi Ian,

Glad you like it! Regarding the init.d script, thanks, in the next update this will be fixed as well as it setting exit codes to work with Puppet for example.

Regarding replication (SnapMirror and SnapVault) I haven't thought too much about adding it into Harvest because Unified Manager does a good job at it already. Did you try UM and find it lacking for this job? If you can give more details on what you're looking for (type and granularity of info, frequency of collection) I can have a think on it.

One customer I was working with wanted to know how much bandwidth each cluster was using on their WAN for replication. Because all remote replication was flowing over peer link LIFs we just created some graphs that summed up these LIFS across the cluster and graphed it. It was a 10 minute job to create these and now he can answer the network team when they come asking! So this is also an idea depending on your need.

Cheers,
Chris Madden

Storage Architect, NetApp EMEA (and author of Harvest)

Blog: It all begins with data

View solution in original post

madden · ‎2015-11-11

Hi Ian,

Glad you like it! Regarding the init.d script, thanks, in the next update this will be fixed as well as it setting exit codes to work with Puppet for example.

Regarding replication (SnapMirror and SnapVault) I haven't thought too much about adding it into Harvest because Unified Manager does a good job at it already. Did you try UM and find it lacking for this job? If you can give more details on what you're looking for (type and granularity of info, frequency of collection) I can have a think on it.

One customer I was working with wanted to know how much bandwidth each cluster was using on their WAN for replication. Because all remote replication was flowing over peer link LIFs we just created some graphs that summed up these LIFS across the cluster and graphed it. It was a 10 minute job to create these and now he can answer the network team when they come asking! So this is also an idea depending on your need.

Cheers,
Chris Madden

Storage Architect, NetApp EMEA (and author of Harvest)

Blog: It all begins with data

TomMattingly · ‎2016-02-02

Chris,

I just heard about Harvest yesterday talking to a NetApp tech but the previous comment about snapmirrors I would like to chime in on. It would be beneficial to have a screen that could show all snapmirrors/snapvaults and the start/stop times as well as length of transfer and amount of data transferred overlaid in comparison to the other snapmirrors (in an easy to view format, no not command line).

Basically a way that an admin can visualize start/stop times and be able to see for example how snapmirrors overlap one another at a particular time of the day.

Not sure if Harvest is designed or even targeted for the problem but its an issue I deal with between deduplications/snapmirror/snapvaults to balance our cDoT controller load. I started doing something similar with deduplications and then hand building an Excel spreadsheet to display what dedups are running at what time of the day and then 'balancing' the load out. I get a lot less performance alerts but it still persists so it snapmirrors/snapvaults to go after next.

Tom

madden · ‎2016-02-03

Hi @TomMattingly

I understand your challenge.

For Deduplication from OCUM I actually have a boolean for running / stopped posted as this metric:

netapp.capacity.$Group.$Cluster.svm.$SVM.vol.$Volume.dedupe_status

And containing the size of data scanned:

netapp.capacity.$Group.$Cluster.svm.$SVM.vol.$Volume.last_dedupe_scanned_size

If you look on the Grafana Volume dashboard "Per Volume Capacity Efficiency Drilldown (Must Select Cluster/SVM/Volume)' row you will see a graph that shows dedupe job size when active. So you might be able to make your own graph with these metrics to determine concurrency of dedupe jobs. I also have something that allows volumes to aggr mapping in the works, so then you can see what volumes are on what aggrs and answer a question like "show me the last 24 hrs of node1:aggr1 dedupe jobs" and see which vols where active at what time periods.

For replication it gets more complicated because we can be the src or dst, can have 1:many relationships, may create/break often, etc. Graphite doesn't offer an API to do metric renames or manage them at all so I think it could get messy with stale metric names if the naming includes src/dst information. Doing something like the dedupe though should be possible. So new metrics as a simple boolean like repl_src_status and repl_dst_status to track if an update is active, and repl_progress to show how much data has been xfered if active, should be possible.

Would that be enough info to meet your use case?

Cheers,
Chris Madden

Storage Architect, NetApp EMEA (and author of Harvest)

Blog: It all begins with data

P.S. Please select “Options” and then “Accept as Solution” if this response answered your question so that others will find it easily!

TomMattingly · ‎2016-02-03

Chris,

So that might work... In the case of the dedups because I couldn't find a tool that displays in a gui form in one place, I created a simple Excel spreadsheet and by looking at the (hard) start times and then the (soft:aka variable) finish times I was able to put together a rough idea of when dedpes where occuring with start and stop times. In my case they run every day. The spreadsheet guided me to adjust the dedupes that were overlapping. By doing this I dramatically dropped several alerts from overloading the disks as well as the processors. And my NetApp was much happier!

It was something similar to that for the SnapMirrors and SnapVaults that I was looking for. In my case, I would see a time for the snapmirror/vault to start and then (at least from what I can see in my putty session) a time that the snapmirror/vault finished. I was hoping that Harvest could display at least a read-only view of the snapmirrors and dedups so administrators like me could see overlaps of snaps and dedups thereby providing a visual way of seeing that overlap. Changing it would be done through System Manager and outside the scope of Harvest.

Having transfer speed and total length would be nice but not a hard requirement to improving the overlap issue at least in my case. Though I mentioned those features as a 'value-add' to the display.

Tom

madden · ‎2016-02-04

Hi @TomMattingly,

Great, so the thought I have seems enough. I will add it to the backlog as a new feature in Harvest. I plan to do some updates to the OCUM code soon and can add snapmirror active/idle boolean and xfer rate while I'm at it. For now know that dedupe running/stopped and data processed is already in there as mentioned above, so check that out!

Cheers,
Chris Madden

Storage Architect, NetApp EMEA (and author of Harvest)

Blog: It all begins with data

P.S. Please select “Options” and then “Accept as Solution” if this response answered your question so that others will find it easily!

rcasero1 · ‎2021-04-14

Hi Chris, hope all is well.

I have a question regarding the Capacity drill down, I'm not seeing Top Volume Capacity Drill down.

Any help on how to see these stats ?

Thank you,

Ralph.

NetApp-Harvest

New video on NetApp KB TV

New video on NetApp KB TV

New video on NetApp KB TV