Re: Whisper file size

kkusunoki · ‎2016-03-07

I completed the setup of harvest following this guide. (Thank you so much, @madden)

But now, I'm facing new problem of whisper file size.

My storage-schemas.conf is as follows, and I'm collecting Data from 6 HA pair netapp (Metrics: 5~6k/min).

In this case, whisper file size is growing at 2GB per day. So I have to prepare at least 730GB fast storage per year, I think it's too large...

Is there any way to reduce whisper file size? Or another storage engine like influxdb? ( Actually I tried to use influxdb graphite plugin but grafana dashboard cannnot read it)

Any suggestion is really appreciated.

[netapp.capacity]
pattern = ^netapp\.capacity\.*
retentions = 15m:180d, 1d:5y

[netapp.poller.capacity]
pattern = ^netapp\.poller\.capacity\.*
retentions = 15m:180d, 1d:5y

[netapp.perf]
pattern = ^netapp\.perf\.*
retentions = 60s:35d, 5m:180d, 15m:395d, 1h:5y

[netapp.poller.perf]
pattern = ^netapp\.poller\.perf\.*
retentions = 60s:35d, 5m:180d, 15m:395d, 1h:5y

[netapp.perf7]
pattern = ^netapp\.perf7\.*
retentions = 60s:35d, 5m:180d, 15m:395d, 1h:5y

[netapp.poller.perf7]
pattern = ^netapp\.poller\.perf7\.*
retentions = 60s:35d, 5m:180d, 15m:395d, 1h:5y

madden · ‎2016-03-08

Hi @kkusunoki,

Great to hear you like the solution!

Regarding the whisper files they are 'thick' at time of creation to support the full retention configured. So if you see continual growth that means you have brand new metrics files getting created. Usually when you start monitoring a new system you will see a jump in disk used, but after the initial instances are discoverd and added it calms down and increases more moderately as metrics files for newly created volumes are added for example. You can check the creates.log in the Carbon logging directory structure to see what metrics are getting created on an average day. Once you know what they are you can better understand the growth and if it is expected.

One downside of Graphite is there is no way to programatically purge metrics files. So if a volume is created you get metrics files created but when it's deleted those metrics files will stay there forever. Also, if an aggregate moves (like during the upgrade of a node) it may get discovered as located on the other node resulting in brand new metrics under that other node that will stay there until something deletes them.

Some people put a housekeeping job in cron to periodically remove the stale metrics files like this:

# Purge metrics files that are not updated in last 90 days
find /opt/graphite/storage/whisper -type f -mtime +90 -name \*.wsp -delete; find /opt/graphite/storage/whisper -depth -type d -empty -delete

I also learned that if you store the data on NetApp the not-yet-filled timestamps in the whisper files don't consume any disk space either, so if you use thin provisioning you won't pay for the data storage up front. So maybe that is another way to not worry so much about what looks like a high consumption in Graphite server filesystem.

If after the above you just want your metrics files to be smaller you can absolutely modify the retention settings in the storage-schemas.conf to something shorter. Using the whisper-info.py utility you can see the space used by each retention 'archive' and understand which one(s) to optimize. If you do decide to change the storage-schemas.conf it only impacts *new* metrics files created. To update old ones you can use the whisper-resize.py utility. I explain a bit of the usage of this in this post, and while the discussion there is about increasing retention the steps to reduce it are the same.

Hope that helps and good luck!

Cheers,
Chris Madden

Storage Architect, NetApp EMEA (and author of Harvest)

Blog: It all begins with data

If this post resolved your issue, please help others by selecting ACCEPT AS SOLUTION or adding a KUDO

kkusunoki · ‎2016-03-08

Hi @madden!

Thank you so much for concrete advice!

I noticed a lot of volumes were created last weekend by another team.

I misunderstanded whisper file size could grow for a lot of metrics every minute.

Now, I see that whisper don't grow if new metrics isn't come creating new matter like volume.

But I have another question.

I'd like to estimate how much size should I prepare for whisper data because it's not easy to expand graphite VM's volume in our production.

Could you tell me how should I calculate these sizes for new metrics in case it's typical matter on NetApp?

My current assumption in case just one iscsi volume is newly created on also new svm, is as follows.

- /var/lib/graphite/whisper/netapp/perf/

My current wsp file size is 2207584 byte.

One svm and one vol have 15 wsp file.

So I should estimate 2207584 * 15 = 33.11376 megabytes for new svm and volume.

If 1000 volume on 1000 svm is created per year, I have to prepare almost 33 GB per year.

For example:

/var/lib/graphite/whisper/netapp/perf/block-lab2-1/block-lab2-1/svm/svm3345/vol_summary/other_ops.wsp
/var/lib/graphite/whisper/netapp/perf/block-lab2-1/block-lab2-1/svm/svm3345/vol_summary/read_data.wsp
/var/lib/graphite/whisper/netapp/perf/block-lab2-1/block-lab2-1/svm/svm3345/vol_summary/total_data.wsp
/var/lib/graphite/whisper/netapp/perf/block-lab2-1/block-lab2-1/svm/svm3345/vol_summary/read_ops.wsp
/var/lib/graphite/whisper/netapp/perf/block-lab2-1/block-lab2-1/svm/svm3345/vol_summary/total_ops.wsp
/var/lib/graphite/whisper/netapp/perf/block-lab2-1/block-lab2-1/svm/svm3345/vol/volume01/other_ops.wsp
/var/lib/graphite/whisper/netapp/perf/block-lab2-1/block-lab2-1/svm/svm3345/vol/volume01/read_data.wsp
/var/lib/graphite/whisper/netapp/perf/block-lab2-1/block-lab2-1/svm/svm3345/vol/volume01/total_data.wsp
/var/lib/graphite/whisper/netapp/perf/block-lab2-1/block-lab2-1/svm/svm3345/vol/volume01/read_ops.wsp
/var/lib/graphite/whisper/netapp/perf/block-lab2-1/block-lab2-1/svm/svm3345/vol/volume01/total_ops.wsp
/var/lib/graphite/whisper/netapp/perf/block-lab2-1/block-lab2-1/svm/svm3345/vol/svm3345_root/other_ops.wsp
/var/lib/graphite/whisper/netapp/perf/block-lab2-1/block-lab2-1/svm/svm3345/vol/svm3345_root/read_data.wsp
/var/lib/graphite/whisper/netapp/perf/block-lab2-1/block-lab2-1/svm/svm3345/vol/svm3345_root/total_data.wsp
/var/lib/graphite/whisper/netapp/perf/block-lab2-1/block-lab2-1/svm/svm3345/vol/svm3345_root/read_ops.wsp
/var/lib/graphite/whisper/netapp/perf/block-lab2-1/block-lab2-1/svm/svm3345/vol/svm3345_root/total_ops.wsp

- /var/lib/graphite/whisper/netapp/poller/

Recently new file is not created. So I have no idea about this folder size.

Which case this folder grow?

madden · ‎2016-03-11

Hi @kkusunoki

The /var/lib/graphite/whisper/netapp/poller/ directory includes metadata about the poller's work. So for each poller, and each object in that poller, 3 metrics files are created to track time spent collecting the data and number of metrics received. Those metrics are visible in the default grafana dashboard 'NetApp Detail: Harvest Poller'. The metric count in poller is not impacted by the quantity of nodes, svms, vols, etc.

Hope that helps!

Cheers,
Chris Madden

Storage Architect, NetApp EMEA (and author of Harvest)

Blog: It all begins with data

If this post resolved your issue, please help others by selecting ACCEPT AS SOLUTION or adding a KUDO

kkusunoki · ‎2016-03-14

Hi @madden,

Thank you so much for educating me about poller's size.

I understood that I should calculate the increase only in "perf" to forecast necessary whisper file size.

By the way, how about the estimation of "perf" part?

I'd appreciate it if you tell me any missing item.