2012-02-20 06:05 AM - edited 2015-12-18 01:21 AM
I seem to be seeing what appears to be latency on my FAS2040 at certain time periods. I noticed from Ops manager that the average latency figures were starting to increase so I did a bit of digging and have managed to track it down to one specifc LUN.
The LUN in question is contains a numer of SQL databases on a clustered instance, one of which is 700GB in size. The DBA's run a dbcc check on the LUN every night around 3am - during this time the latency shoots up to 65ms and and does not drop much below 45ms for the entire duration. They also run a backup prior to this where is also climbs up to around 35ms. This total disk Storage System - network throughput also mirrors the graph below with an average of 60-90 mb per sec. They also run dbcc checks on other LUNs during this time but the figures are generally a lot better (although there is the odd spike here and there)
The graph below is from perfomance advisor showing the latency spike:-
The aggregate in question has around 50 volumes (and LUNS) on it but they are all fairly small in comparison to this one and this is by far the busiest LUN. The aggregate comprises of 25 disks and is assigned to 1 controller in the pair. The controllers are connected to a 4GB Fibre fabric.
Should I be concerned about these figures? Is there anything I can do to track down further where the bottleneck may be?
I know I can open a support call but as I am reletivley new to SAN perfomance monitoring I thought I would throw a question at the community first.
2012-02-21 05:58 PM
It's pretty clear in your graph that the disks in your aggregate are sweating pretty hard. DBCC integrity checks are known to hammer the disks they are on. It would not hurt to investigate controller CPU and verify the latency on the other LUNS in the other aggregates on your controllers to verify they are not performing badly as well. However, I'm farily confident you are having I/O pressure on that aggregate.
Here are some ideas you can think about to perhaps relieve some of the pressure:
1) Stagger your DBCC checks at different times of the night. Better yet, alternate database sets on different different days.
2) Split your databases among separate aggregates. The more spindles you get underneath your databases during these operations, the better.
What we do is snapmirror our database LUNs to our DR site on a separate pair of controllers. We then create and mount FlexClones of our SQL data on a SQL Server down in our DR site and have it perform the DBCC integrity checks. It requires some extra hardware you may not have, but if you do have it handy offloading your integrity checks would be ideal.