Volume Storage out of control

DANKRATZ96 · ‎2014-01-07

Our FAS2240-2 has a volume that is growing for unexplained reasons. How can I determine what's using all the space and stop it without deleting the volume?

A week ago four VMware vm's that used the the volume locked up when it ran out of space. One vm was deleted to free up working space. Gradually over the next few days it kept growing, even though the vm's should not have been increasing in size. All vm's have since been moved to other volumes and volume snaps were deleted. While browsing the datastore from Vsphere the volume shows to be empty but OnCommand reports 97% used; 54GB available of 1.8TB.

Thanks,

Dan

billshaffer · ‎2014-01-07

When were the datastores deleted? When were the volume snaps deleted? The actual space reclamation for these takes time; until it is done, ONTAP will still show data in the volume.

Bill

DANKRATZ96 · ‎2014-01-07

The problematic volume has not been deleted because I'd like to know why it's behaving this way before deleting the evidence. The volume snaps were deleted two days ago. The volume snapshot detail in OnCommand reports about 1MB Cululative Total Size for each of the six snaps, but the volume reports 28GB available of 2TB.

billshaffer · ‎2014-01-07

You said the datastore on vmware shows them empty - are these NFS datastores or have you carved out luns in the volume and presented those as datastores? If you're using luns, and they still exist, that is what is taking the space.

Bill

DANKRATZ96 · ‎2014-01-07

These are all NFS datastores. Thanks, Bill.

Dan

billshaffer · ‎2014-01-07

Ah well. That was the low hanging fruit...

I assume there is free space in the aggregate? How many other volumes in the aggr? Have you mounted the volume up to a host and verified there's nothing there? Can you post:

df <vol>

snap list <vol>

aggr show_space <aggr>

Bill

DANKRATZ96 · ‎2014-01-08

The aggr is 97% used, but three of six volumes have more than 50% space available.

Aggregate 'aggr0'

Total space WAFL reserve Snap reserve Usable space BSR NVLOG A-SIS Smtape
5160771072KB 516077104KB 0KB 4644693968KB 0KB 99414384KB 0KB

Space allocated to volumes in the aggregate

Volume                          Allocated            Used       Guarantee
vol0                          191462080KB       4052144KB          volume
vol_***                     2159414112KB     437429852KB          volume
vol_2***                 26627380KB      14952428KB            none
vol_3***                     527200708KB     126126452KB          volume
VOL_4***                    1476161988KB     907011340KB          volume
vol_5***                   3278448KB        412380KB            none

agr

Aggregate                       Allocated            Used           Avail
Total space                  4384144716KB    1489984596KB     111571540KB
Snap reserve                          0KB      49243032KB             0KB
WAFL reserve                  516077104KB      59087464KB     456989640KB

snap list vol2

%/used %/total date name

---------- ---------- ------------ --------

31% (31%) 0% ( 0%) Jan 08 08:00 hourly.0

31% ( 0%) 0% ( 0%) Jan 08 00:00 nightly.0

44% (25%) 0% ( 0%) Jan 07 20:00 hourly.1

44% ( 0%) 0% ( 0%) Jan 07 16:00 hourly.2

44% ( 0%) 0% ( 0%) Jan 07 12:00 hourly.3

44% ( 0%) 0% ( 0%) Jan 07 08:00 hourly.4

44% ( 0%) 0% ( 0%) Jan 07 00:00 nightly.1

44% ( 0%) 0% ( 0%) Jan 06 20:00 hourly.5

Thanks - Dan

billshaffer · ‎2014-01-08

Is the volume in question vol2? vol2 and vol5 have no space guarantee - so they will show as free space the lesser of the aggregate free space, and the difference between the vol size and vol used. Even if the other volumes are at 50% capacity, if they are volume guaranteed the entire size of the volume is considered "used" in the aggregate.

Please post a df of the volume in question, df -A of aggr0.

Bill

DANKRATZ96 · ‎2014-01-08

Aggregate               kbytes       used      avail capacity
aggr0               4644693968 4564316928   80377040      98%
aggr0/.snapshot              0   80432548          0     ---%

Both vol2 and vol5 were thin provisioned. Is that why the Guarantee is none instead of volume? Can it be changed from command line?

Thanks,

Dan

billshaffer · ‎2014-01-08

Thin Provisioned == space guarantee of none

That's the problem. The aggregate only has 80G available, and because these volumes have no space guarantee, they can only utilize whatever space is left in the aggregate. Even though they are 1.8TB (virtual), there is only 80G left of physical space, so they report as much.

You can change the space guarantee (vol options vol2 guarantee volume) - but it won't work in your case, because the containing aggregate does not have enough space to cover the request.

You can reduce the size of the other volumes - that will free space in the aggregate. Then you can size the thin volumes appropriately and make them thick. All this of course depends on your needs and projected usage - I'm just saying that you CAN do this....

Does that help?

Bill

DANKRATZ96 · ‎2014-01-08

Now I know how to recover from this increasingly restrictive state. Thanks.

In your opinion is there a guideline as to how much aggr space should be left available, for growth? That must be different depending on whether think or thin volumes are in use, but maybe 20% would be a good starting point. Clearly I let it get too small due to not understanding how free space is calculated.

Is there any problem resizing smaller a volume that has VM's in it, or should they be empty during the resize operation?

Thanks again.

Dan

billshaffer · ‎2014-01-09

I have resized NFS data stores live many times - both up and down - with no impact. I can't remember if a rescan is needed on VMWare - if it doesn't see the new size I'd recommend it.

NetApp has documentation on what percentage space should be left available in volumes and aggregates, based on WAFL and performance - I want to say it's in the 80-90% range. Personally I've run aggregates (and volumes) at 99% for extended periods and not really noticed any issues, but I'm sure there are people out there who would say otherwise - of course it depends on change rate, etc.. And yes, it depends on thick or thin provisioning, and how much data you actually expect to use. Thin provisioning always has made me a little uneasy because of the ability for a single app, user, or in your case VM to affect many others. But there are use cases where thin provisioning is great. It all depends....

Bill

Volume Storage out of control

And the Legacy Continues! 🏆