Is there a good starters guide to monitoring

[ Edited ]

We use a FAS2040 to provide SAN Datastores to our small vcenter network of 3 vSphere 4.1 ESX/ESXi hosts (hosted on HP DL380's), we have the Virtual Storage Console (VSC) installed into vcenter


We have recently (last few weeks) started encountering issues with very high (1000s of ms) disk latency on one of these datastores, causing certain VMs to occasionaly grind to a halt


Is there a good intro anywhere for new users to how to use NetApp tools to help track down the issues, as I have no idea where to start with the massive pile of documentation available


Alec Keeler, Release Engineer

Re: Is there a good starters guide to monitoring

[ Edited ]


Not 100% the answer to your question, but maybe it is helping you a little bit:


(I am not an expert with netapp, but had similar problems in the past, too)


When you use VMWARE, maybe you are using VMWare-snapshot-based backup-software like VEEAM? This was causing high latency in my system when VEEAM made snapshots at times after midnight.


Best thing you can do: if you have a maintenance contract, open a case with Netapp!


If not:

I would check if the times of the latency correlate with backup- or other regular events. Is it around midnight, when deduplication and/or reallocation are running? This adds more work to the filer and disks, too.

I also would check the CPU-load in times of high latency.

Furthermore I would look for b2b-CPs (with the command "sysstat -x -1  " ,  1 is the time in seconds - every second (or every 5, 10, ... seconds if you use another number) you get information about actual CPU-Load and other interesting things).


There is a tool "perfmon" or "perfstat", which is collecting informations, but I guess you will not be able to interpret the resulting file - the netapp support team can do that, if you have a maintenance subsription.

"To make running the perfstat easier, here is a link to a GUI tool: .  Download it to its own folder.  Then open the GUI and follow the instructions.  You will need to download the perfstat tool and a openSSH tool with the links provided in the tool and put them in the same folder as the GUI.  Then you go through the drop down menu steps one by one.  Afterwards in the perfstat tool, go down to the Run Perfstat 7 with All Options page.  Fill out the top two boxes of both columns.  Then check the box listed as Force All Iterations to Run.   This should force all of the required iterations to complete.  Set the length of the iterations to 5 and the number of iterations to 6.  Then run the perfstat.  The perfstat will run and the output file will be placed in the folder the GUI is installed at.  Then you can upload the file to Netapp using the GUI or following this article:"


 "The 7200 drives in a FAS2040 are still going to be a possible bottleneck when working with VMs and with this type of filer.  This system has a small NVRAM and it is split up into buffers.  The filer has two buffers for accepting and logging write data.  With a filer that has a small amount of NVRAM, the buffers will fill up quickly and with the slower disks, you may still hit the b2b CPs that I mentioned in an earlier email, during high load times.  That is because it will take longer to write things to the slower disks, even when they are working optimally, and since the 2nd buffer is not that large, it will fill up pretty quickly when there is a lot of work coming into the filer. "


The FAS2040 with 7.2 k drives is limited in performance, I had to learn.

Adding DS14 MK 4 Shelfs with 15 k FC-Drives helped in my case, but I am still not really happy with the speed and the performance of my filer. Not enough memory, not enough NVRAM, not enough CPU-Power, slow 7.2 k-SATA-Disks with approx. 80 IOPs. The 15k FC-Disks are a lot faster (approx 180 IOPs),  and can be attached to the FAS2040 if you have the FC-ports still available. 15k SAS-Drives could be used, too, but are normally more expensive and smaller in size.