2011-03-20 09:13 AM
We're having some performance issues with a FAS2050. The device has 2 controllers. Our problems seem to be with the aggregate under the control of controller 1. I'm not a storage guy. We have a small IT department and the FAS2050 was installed and configured a few years back by a consultant that we think didn't do a great job with the install.
We've run reallocate measure on the volumes in this aggregate and are getting a "4". For comparison, the aggregate on controller 2 comes back with a "2" and its performance is vastly better (I mean that, anectdotally our servers are much faster from the aggregate and the disk_busy are lower and the latency is great).
I can run the reallocate on the aggregate, but when it gets to this VMFS1 volume, it aborts with the message:
Reallocate scan for volume VMFS1 has stopped because there is insufficient free space.
I've tried this several different ways and just can't get it to run. Physically, the aggregate is 6 disks: 1 x parity, 1 x dparity, 3 x data and 1 x spare. All disk are 500gb. On the aggregate there are 3 volumes, vol0 is 8gb, vcb1 is 225 gb and VMFS1 is 800gb. The VMFS1 volume (and LUN) are for VMWare ESX. And that's the primary volume that I'm interested in reallocating. But, the entire 800gb in the volume is used up...no free space.
So what are the options in a case like this, just to add more disks? The one volume, vcb1, we could scrap because we don't need it anymore. So could I remove the vcb1 volume and make that space available to VMFS1 volume? If so, how would I do that and could I do that without shutting down the ESX farm that access the VMFS1 volume (we use iSCSI for our connectivity)?
We're not sure of our support contract with NetApp and have not opened a case with them. We're trying to figure this thing out and fix it, and hopefully learn something in the process.
Thanks for your help.
Solved! SEE THE SOLUTION
2011-03-20 10:44 AM
I think you've been confusing aggregate and volume a few times in your post, but I think I understood what you want to do.
It is true, to do a "reallocate" on a VOLUME, you need to have some space in that volume free. Since your VOLUME is completely full, performance will degrade on it. Note that the recommended fill ratio for a volume is around 80% to have enough "breathing space" for internal volume operations.
So here's what you do:
try "df -Ah" on the console and see if there's some space in the AGGREGATE left. If not, you can free up some space by deleting the volume you don't need anymore. After you have free space in the aggregate you can resize the volume.
To delete volume vcb1 (ALL DATA WILL BE LOST!)
vol offline vcb1
vol destroy vcb1
try "df -Ah" again afterwards, it should show some free space
Then, resize VMFS1 by a few gigs:
vol size VMFS1 +150g
You can also do it in the web frontend of course
2011-03-21 10:05 AM
Thanks so much for the reply. I believe you understood my question perfectly. I am away from the office this week, but hopefully by early next week, I can use the commands you sent to grow the volume and give the reallocate another shot. I'll check back after I've given that a try.
2011-03-22 03:55 AM
Your performance problems are directly related to the small number of disks, beside the fact that you probably have filled the aggregate too full. The recommendations are 80%, but over 90% and you will see a very significant performance degradation.
It seems that the aggregate in question is really only 5 disks (the spare doesn't actually do any production IO). I would suggest going down to "single parity" raid4 and using the second parity disk for data. Since you will only have a total of 4 data disks, the statistical chances of having a problem with one vs. two parity disks are pretty minimal. 'aggr options <your_aggr> raidtype raid4', then 'disk zero spares', then 'aggr add <your_aggr> 1' (or use the -n option to see what disk the filer will choose... dry run option).
Basically, you can't expect much from 3 or 4 disks as far as I/O goes. If 'sysstat -x 1' shows you disk at 100% very often, then your bottleneck is that you simply have too few disks.
You might try to run 'sis' on your volumes to see if you can also:
1. Reduce the storage usage
2. See if the reduced number of "active" blocks fits better into the filesystem cache, which would give you better read performance.
1. Make sure you are following the recommendations otherwise for volume creation.
2. Make sure you are following recommendations for using iSCSI.
2011-03-31 03:33 AM
Michael, I did follow the commands to destroy the unneeded volume. And I added that space to the volume that needed some free space. I was able to get the reallocate job to run to completion. I don't see a big performance benefit yet. And right now I'm running a reallocate measure to see if I'm under the threshold this time.
But you're answer was correct. Looks like the consultant that installed this device drastically under-provisioned it and we're now looking at options to expand the storage in this unit or replace it altogether.
Thanks for your help.