I was wondering if anyone could save the day for me here. We have a VMWare environment where we mount flex vols on a NetApp filer via NFS. The VM's are all contained in one large volume that is deduped to save on space and make better use of the flashpools we have fronting the aggregates. I've put some specs of each piece of the puddle below.
Basically, we're getting periods on our production filer heads where they struggle because the CPU gets maxed out trying to deal with all the reads and writes coming in. By using a perfstat I've been able to identify that one of our VM datastores is reading and writing massive amounts of data for short periods at random points of the day. As the flex vol that acts as the datastore hosts many VM's I can't identify which VM's are actually causing the load. Does anyone have any ideas on what software solution could do this or how they have tackled this issue, if they've encountered it? We have oncommand Balance but the monitoring interval seems stuck at every hour so short bursts or throughput or IO just get lost when the data is summarized. I've trialed Oncommand Insight in the past which was quite useful but the last quote we got for this was ridiculously expensive
Cluster of 4 * FAS3250
3 of these are the production nodes with SAS disks attached
Each head has 1TB of flash pool on it.
Each head has a 1*50TB aggregate and hosts a vareity of applications like sql, vmware, hyperV etc..
Each head has multiple Datastores on it that will host around 100 VM's each. Some datastores are offset to account for misalignment of the VM's in it
our VMWare hosts are running ESX version 5.1
If anyone has any ideas on this it would be great. I do realise I'm clutching at straws though.