We recently setup a FAS2020 w/ two controllers each with a aggregate containing six 450GB SAS drives in RAID-DP. We are running ESX4 and have 3 new host servers. We setup iscsi luns and users that have large email accounts (over 1GB) experience a large delay sifting through there folders in Outlook 2007. We think the issue is not having enough spindles/iops to perform this work. We have spent hours playing around with different settings, and looking for any assistance..
How many outlook users are there? How many disks in the aggregates? Have you collected performance monitor logs from Exchange server(s)? If so, what does perfmon look like?
There could be many reasons why Exchange users experience slowness...
RPC Latency and Read Latency Lag. We have RAID-DP w/ six disks, so only 3 disks are in use, since two for DP and one as a spare. There are 250 Outlook users total, but the issue is when users open folders with more then 5000 emails in them, there could be a delay of up to 30 seconds.
Microsoft has historically had hard limits and/or performance recommendations for # of items in folders. This (see "Do you have many items in a single folder?") seems to indicate that depending on your patch level, you may experience performance problems around 10k items. It seems you shouldn't see this at 5k, but it may be worth looking at.
What type of disks are you using? FC, SAS, SATA? You probably need more disks in the RAID-DP group. It'll be nice if you can collect some stats from the controller when it is busy.
450 gig sas drives with iscsi luns, they are even trunked lacp with two gig links.
How you recommend i run stats, eric mentioned some steps, but the sysstat -m 1 doesnt work.
I think you can try sysstat -x when the controller is busy, then copy & paste the results. You can also ask your NetApp support person to help you run perfstat, which is a tool that collects a lot of stats from the controller.
You could start off by collecting more data. for disks I suggest you use the following command
command. Have a look at what it does. If you are under support warranty I d get a capture of data from statit
and log a case with support for them to review it unless you are confident about reading the data. you could
also post it here for us to take a look into. Do the following:
When the slowness is occuring:
priv set advanced
sysstat -m 1
wait for 1 minute
ctrl-c (to end sysstat)
statit -e (to end statit)
grab all the output in a txt file and attach it to here and/or to tech support.
I didnt see the -m as an available option
usage: sysstat [-c count] [-s] [-u | -x | -f | -i | -b] [interval]
-c count - the number of iterations to execute
-s - print out summary statistics when done
-u - print out utilization format instead
-x - print out all fields (overrides -u)
-f - print out FCP target statistics
-i - print out iSCSI target statistics
-b - print out SAN statistics
interval - the interval between iterations in seconds, default is 15 seconds