Hi there,
Go easy on me as this is my first post (:
We have a Netapp FAS2020 that has provided disappointing performance since day one, especially with our Exchange server. Frequently users see pop up messages from Outlook indicating that it is waiting for the Exchange server. In the Exchange toolbox, I have run the Performance Troubleshooter and it tells me that both the OS and data drives are seeing disappointing drive latency.
In addition, I am using LogicMonitor to monitor most of our infrastructure and it is corroborating similar performance issues, not only with the Exchange server but also with the Netapp volume where ALL of the VMs are stored. Exchange and LogicMonitor typically show 50- 80ms and occassionally upwards of 110ms.
I have built a fresh Exchange 2010 server and migrated everyone to the new server from Exchange 2007 and the issues remain.
I have had our Netapp vendor / consulting company come in and spend an entire day here and discovered nothing out of the ordinary. They see the spikes but have no idea what is causing it or what to do about it.
I am thinking that perhaps our choice of all SATA drives in the Netapp might be part of the problem. We chose SATA for all the same reasons folks choose SATA drives, cheaper, larger etcc.
There are several scenarios that I am looking at to boos performance and most of them involve migrating to fast drives.
I would appreciate comments from folks on several options listed below.
Current config and details:
Netapp FAS2020, dual controllers, Active-Active config, e0a-CIFS shares, e0b-NFS for VMware, 12x 500gb 7200 SATA drives, Disk Shelf DS14MK2-AT w/14x 500GB X267A-R5 SATA.
Netapp is configured for all available disks to be 1 aggregate.
VMware ESXi v. 4.1, 3 node cluster in HA, ~18 VMs including 1 Exchange 2010 server and soon to be retired Exchange 2007 server. ESXi servers are Dell 2950 III with 8 cores of CPU each and 16/32 GB of RAM. Plenty of resources showing available so I don't think it's the servers.
VMware servers have 5 nics configured, 2 to internal network, 2 to isolated SAN network switchs (pair of Dell 5524 managed switches stacked) and 1 nic for vMotion
~ 1.5tb of CIFS shares and ~1tb of VMs.
Exchange is configured for around 125 mailboxes. Not much of a load in my opinion.
Options:
Option A Replace external SATA shelf with a pair of DS14MK2 w/14x 300GB 10K X276A-R5 fiber channel shelves.
Option B Replace all internal 500gb SATA drives with 15K SAS drives.
Option C Both.
Option A and B will cost about the same. Obviously c is obviously double.
All shelves and drives would be purchased used. Our filer is new but the SATA shelf is used and yes, we understand all the issues with used and are fine with the risks.
Are there any other ideas or suggestions on what to do here?
Is the general agreement that I might have too much going on for basic SATA drives to handle or should I be looking at something else?
Thanks in advance for any and all help given!!!!!
Here is an example of error from LogicMonitor alert:
Subject: LMD768 error Write latency on usnetapp volume esx_os is 108.28 ms
The NetApp usnetapp, volume esx_os , has a latency of 108.28 ms for write requests, putting it in a state of error.
Thresholds are >= 30 40.
This state has existed since 2011-08-12 12:28:06 EST - or for 0h 14m.
Latency is usually caused by either high CPU load, or disks that are too busy.
If this data is performance sensitive, either:
- add more or faster drives
- replace the filer heads with faster models
- migrate some data to a different system or different aggregate