Data Backup and Recovery

FAS2020 upgrade internal disks and/or external shelves to increase performance?

relay7000
2,892 Views

Hi there,

Go easy on me as this is my first post (:

We have a Netapp FAS2020 that has provided disappointing performance since day one, especially with our Exchange server.  Frequently users see pop up messages from Outlook indicating that it is waiting for the Exchange server.   In the Exchange toolbox, I have run the Performance Troubleshooter and it tells me that both the OS and data drives are seeing disappointing drive latency.

In addition, I am using LogicMonitor to monitor most of our infrastructure and it is corroborating similar performance issues, not only with the Exchange server but also with the Netapp volume where ALL of the VMs are stored.  Exchange and LogicMonitor typically show 50- 80ms and occassionally upwards of 110ms.

I have built a fresh Exchange 2010 server and migrated everyone to the new server from Exchange 2007 and the issues remain.

I have had our Netapp vendor / consulting company come in and spend an entire day here and discovered nothing out of the ordinary.  They see the spikes but have no idea what is causing it or what to do about it.

I am thinking that perhaps our choice of all SATA drives in the Netapp might be part of the problem.  We chose SATA for all the same reasons folks choose SATA drives, cheaper, larger etcc.

There are several scenarios that I am looking at to boos performance and most of them involve migrating to fast drives.

I would appreciate comments from folks on several options listed below.

Current config and details:

     Netapp FAS2020, dual controllers, Active-Active config, e0a-CIFS shares, e0b-NFS for VMware, 12x 500gb 7200 SATA drives, Disk Shelf  DS14MK2-AT w/14x 500GB X267A-R5 SATA.

     Netapp is configured for all available disks to be 1 aggregate.

     VMware ESXi v. 4.1, 3 node cluster in HA,  ~18 VMs including 1 Exchange 2010 server and soon to be retired Exchange 2007 server.  ESXi servers are Dell 2950 III with 8 cores of CPU each and 16/32 GB of RAM.  Plenty of resources showing available so I don't think it's the servers.

     VMware servers have 5 nics configured, 2 to internal network, 2 to isolated SAN network switchs (pair of Dell 5524 managed switches stacked) and 1 nic for vMotion

     ~ 1.5tb of CIFS shares and ~1tb of VMs.

     Exchange is configured for around 125 mailboxes.  Not much of a load in my opinion.

Options:

     Option A  Replace external SATA shelf with a pair of  DS14MK2 w/14x 300GB 10K X276A-R5  fiber channel shelves.

     Option B  Replace all internal 500gb SATA drives with 15K SAS drives.

     Option C  Both. 

Option A and B will cost about the same.  Obviously c is obviously double.

All shelves and drives would be purchased used.  Our filer is new but the SATA shelf is used and yes, we understand all the issues with used and are fine with the risks.

Are there any other ideas or suggestions on what to do here?

Is the general agreement that I might have too much going on for basic SATA drives to handle or should I be looking at something else?

Thanks in advance for any and all help given!!!!!

    

Here is an example of error from LogicMonitor alert:

Subject: LMD768 error Write latency on usnetapp volume esx_os is 108.28 ms

The NetApp usnetapp, volume esx_os , has a latency of 108.28 ms for write requests, putting it in a state of error.

Thresholds are >= 30 40.

This state has existed since 2011-08-12 12:28:06 EST - or for 0h 14m.

Latency is usually caused by either high CPU load, or disks that are too busy.

If this data is performance sensitive, either:

- add more or faster drives

- replace the filer heads with faster models

- migrate some data to a different system or different aggregate

2 REPLIES 2

ajeffrey
2,892 Views

Hi There,

Just to be thorough (and your vendor hopefully checked this), but if you have a recent version of ONTAP you can issue an nfsstat and the last part of the output will show you something like this:

<snip>

Misaligned Read request stats

BIN-0    BIN-1    BIN-2    BIN-3    BIN-4    BIN-5    BIN-6    BIN-7

0        0        0        0        0        0        0        0

Misaligned Write request stats

BIN-0    BIN-1    BIN-2    BIN-3    BIN-4    BIN-5    BIN-6    BIN-7

0        0        0        0        0        0        0        0

</snip>

You hopefully will have a clean alignment report.  (If you don't you can check some of Vaughn Stewart's blogs about the importance of alignment to get some context.)

If you are on an older release of ONTAP you can always check alignment problems with the mbralign tool or the latest VSC vCenter plugin. The only reason I ask is that you said this was an upgrade of an older Exchange server and alignment problems will not get created with Windows 2008 server but they also will not be fixed in an upgrade scenario.

Also - just a thought (and I don't know the cost here) but you may want to simply add another shelf of SATA and grow the aggregate you are running on.  14 disks is not a very large aggregate and as you may know - with ONTAP spindles are king. You will likely need to do this for future growth at some point anyway.

And yet another option, a possible consideration if all else fails, is to consider a PAM accelerator card which would significantly boost performance without having to swap shelves. I am not a sales guy so I don't know the relative merits of the two possibilities.

.02

Cheers

relay7000
2,892 Views

Thanks for your help!!

My ONTAP version is 7.3.3 and it does not show me Misaligned stats as you show.

My current aggregate is combination of the available internal SATA drives and the external shelf, so something like 20 disks given how many are allocated to each controller etc..  I will investigate further on alignment and see what I can find.

Also, I did not upgrade the Exchange 2007 server to 2010.  I would NEVER upgrade any Microsoft installation  (:  Always a fresh install and migration.  The current server is 2008 R2 SP1 and was fresh installed from scratch and then users migrated to the new server.

THanks again for your help!

Public