AFF
AFF
Hello!
Any OnCommand tools out there that can track Flash Cache hit percentage, over a period of time?
Unified Manager and Balance don't seem to cover this, and I'd like to compare different Flash Cache settings in order to choose the most efficient one.
Thanks,
Igor
Did you ever get an answer for this? I would like to be able to do this as well. I have not been able to find anything to do this. Please let me know either way. Thanks
The tool you are looking for is the Automated Workload Analyzer (AWA), and it's built into Data Ontap for both cDOT and 7-mode. The general process is that you start the tool, run some workloads, then stop and print the results (somewhat like a perfstat).
Data Ontap is already categorizing IO loads and collecting details about them. What AWA does is organize which IO loads would have been served via FlashCache or FlashPool as compared to the overall load. The summary output then identifies, based on the cache hit ratio desired, how much cache would be needed to get that ratio. If you already have some Flash (cache or pool) available, the AWA will also tell you how well the current Flash is doing for the workload being analyzed.
The AWA up to DoT 8.3 is aggregate based - that is a given aggregate is run through the process and the results are work against that aggregate. AWA in cDOT 8.3.1 will break that down to the volume level. Since you can tune how each volume is cached (random read/write versus sequential read/write in various combinations) the data allows you to finely tune which volumes in an aggregate make use of the various cache options (FlashCache/Pool) for instance as compared to others, thus making best use of your Flash even if the workloads aren't optimized per aggregate.
I've used the older AWA a lot to gather details which demonstrate what I have seen in other performance reports - my available cache is way undersized for my workloads. With recent deployment of cDOT 8.3.1 I'm looking forward to rerunning AWA collections to get volume level statistics ahead of budget planning to make sure I don't over buy where I don't need it.
More AWA information is available in the standard DOT documentation - not a hidden feature or anything like that. In "real" 7-mode back in the day I believe it was still present but a hidden or diagnostic access option only. Also - there are some bugs written against AWA that which are tied to filer disruption. I suggest using the support site to seach on "AWA" and verify that the conditions associated with those bugs don't apply to you or you are on a fixed version of DOT, just to be safe.
Hope this helps you.
Bob Greenwald
Lead Storage Engineer
Huron Legal | Huron Consulting Group
NCIE - SAN, Data Protection
Hello Bob,
Thanks for the great reponse, you've clearly given this some thought. I agrea, AWA is a useful tool in this respect, it allows you to monitor specific agregates for chaching performance, but it only churns out average stats for the monitored period.
The only alternative appeared quite recently. Chris Madden created NetApp harvester which can be used with the Grafana plugin for Graphite to create very nice dashboards. It's all Open Source, relatively easy to set up, you can run it on Ubuntu and it produces very detailed graphs.
Installation guide is here:
Among other things, it also tracks Flash Cache statistics. But this time - on controller level, not aggregate.
Here's an example of what it looks like:
You can pick out whichever graph you want, choose which data series to highlight, zoom in / pick a time period, etc and there's more. Nice piece of work!
Regards,
Igor
I am also planning to add the Graphite/Graphana solution to my monitoring toolkit, but my need is more for predictive analysis where I don't have enough FC/FP and balancing out per volume when I do.
Nice thing is that there are a lot of tools that reach into the same set of counters. For that matter, in a pinch you could go old school and get counter data directly from the CLI as in...
cluster1::statistics*> show-periodic -object ext_cache_obj -instance ec0 -counter hit_percent -interval 2 -iterations 5
cluster1: ext_cache_obj.ec0: 10/21/2015 04:44:00
hit
percent
-------
23%
6%
21%
22%
22%
cluster1: ext_cache_obj.ec0: 10/21/2015 04:44:10
hit
percent
-------
Minimums:
6%
Averages for 5 samples:
18%
Maximums:
23%
I've been digging into the statistics functionality myself more of late especially for quick hit type scenarios, like get key performance data during a workload test that doesn't need history or fancy graphing. I've been slowly building up a library of statistics command templates that cover the essentials when I need maybe 10-15 minutes of data once in a while.
Hope this helps you.
Bob Greenwald
Lead Storage Engineer
Huron Legal | Huron Consulting Group
NCIE - SAN, Data Protection