I'm after being able to monitor the IO of individual files so I can see which are the busiest VM's on the cluster from a storage point of view. We have VMWare running over NFS and HyperV running over SMB3 at my company. As these are both file protocols I was wondering if there was a file-IO counter that i could query using Harvest or some similar software.
I've trailed Insight as it has that functionality but it's so expensive that I can't justify it for this one area I want to monitor. It also doesn't monitor ethernet switches which is all we run. I've seen Tintri can monitor VM's as the data is already there if the your hypervisor is connecting via a file protocol.
Does anyone know if this is possible? I can't seem to see a file-IO counter but I thought I'd check as thi would be really handy for me.
OCI is the packaged product from NetApp for your requirement. If you are looking to build something yourself you could use Harvest for the NetApp storage, and then some other data collector for VMware and Hyper-V that collects and sends to Graphite with display in Grafana. If you google a bit you will find some collectors for those use cases; I wish I could point you to the 'best' ones but I always get distracted when I go looking!
If you had a smaller number of VMs you could also enable QoS on them (no limit needed) which will cause a workload to be created for them and collection of a variety of stats; if you use Harvest look at the volume detail page in Grafana, then the panels in the rows with QoS in the title, to see what you get. I would caution though if you try to use it on thousands of VMs the amount of data could be overwhelming. Also if going this approach you have no other info on the VM like CPU util, mem util, etc, which you could get if you collect data from the hypervisor layer.
In 7-mode we had a feature that showed "top clients" and "top files" and I expect this will come to cDOT as well. But, it is more of a list of "what is hammering my system right now" rather than something you are collecting and storing in a DB. Because a cluster can store billions of files tracking IO load on all files all the time would be too expensive. This is why tracking them explicitly (like QoS mentioned) or top (using statistical analysis and a changing list) has been our strategy.
For your use case I would see if you can find a collector on the internet (or build one!) that sends to Graphite and then build a dashboard in Grafana to visualize it all.
Cheers, Chris Madden
Storage Architect, NetApp EMEA (and author of Harvest)