Deleted FlexVol, aggregate won't release the space

petermiet · ‎2012-04-16

I have an aggregate on a FAS2050 running OnTap 7.3.2 with several 1TB FlexVols on it, and on Friday I deleted one of them to re-create it during some troubleshooting but didn't see the space available. I figured it must have had to process and release (our array is SATA) so I let it go over the weekend and still no additional space freed. Running "aggr show_space" makes me believe that the space is sitting still in WAFL reserve:

Aggregate Allocated Used Avail

Total space 9024035984KB 2521786544KB 348701216KB

Snap reserve 0KB 0KB 0KB

WAFL reserve 1041451312KB 99056988KB 942394324KB

That's almost 900GB available in reserve! I searched and found suggestions of it being tied up in snapshots or snapshot reserve, but this aggregate has none. I've also been watching "df -Ak" for a while and the numbers to appear to be fluctuating but not by much. Is it really possible that deleting an empty volume can take 3+ days for the space to be returned or is there something else going on?

scottgelb · ‎2012-04-16

does priv set advanced ; wafl scan status show space reclamation running? Are aggr snapshots running? Even when rolling off they keep getting recreated so might help to disable aggr snaps and delete existing aggr snaps and see if that makes a difference. Space should come back when the aggr snaps clear.

aborzenkov · ‎2012-04-16

Do you have aggregate snapshots?

petermiet · ‎2012-04-17

Scott, the wafl scan status shows space reclamation on the other aggregate and individual volumes, but not the aggr in question. I also don't have any aggregate snapshots configured or taken but I disabled the schedule anyway.

Apparently I had a disk fail last night though, so i'm going to say the rebuild is going to take precedence now. Thankfully I already have a replacement on the way.

scottgelb · ‎2012-04-17

After the rebuild is done and you have time... does the total of every volume in aggr show_space not match the total used in the aggregate? Depending on if volume guarantee is on or not it might vary but you can compute based on the used or total size depending on guarantee... then see how much extra space isn't freed up in the aggr.

clackamas · ‎2012-04-19

Do these commands;

FAS3070*> snap reserve -A

FAS3070*> snap list -A

FAS3070*> snap delta -A

I bet the data is being held in an aggregate snap shot. Since this is exactly what they do. Also it can be a slow process getting all the blocks back in large volumes.

scottgelb · ‎2012-04-19

It seems that should be the issue but he said he doesn't have aggr snaps... please post the output of the commands just to make sure.

petermiet · ‎2012-04-25

sorry guys, that single disk failure turned into a multi-disk failure and a rebuild of a different aggregate... all it back to normal (finally) except this problem. Anyway, here is the output from those commands:

snap reserve -A:

Aggregate aggr2: current snapshot reserve is 0% or 0 k-bytes.

snap list -A

Aggregate aggr2

working...

No snapshots exist.

snap delta -A

Aggregate aggr2

working...

snap delta: No snapshots exist in Aggregate aggr2

Also aggr show_space:

Aggregate 'aggr2'

Total space WAFL reserve Snap reserve Usable space BSR NVLOG A-SIS

10414513152KB 1041451312KB 0KB 9373061840KB 0KB 0KB

Space allocated to volumes in the aggregate

Volume Allocated Used Guarantee

SQL_DB_14 1349633820KB 83706340KB volume

SQL_DB_12 1349633820KB 651752992KB volume

SQL_DB_16 1349633820KB 504296272KB volume

SQL_DB_21 1199674504KB 718353544KB volume

SQL_DB_22 1199674504KB 443348312KB volume

SQL_DB_24 1679544308KB 508884544KB volume

SQL_DB_25 896241208KB 541405156KB volume

Aggregate Allocated Used Avail

Total space 9024035984KB 3451747160KB 348701220KB

Snap reserve 0KB 0KB 0KB

WAFL reserve 1041451312KB 95460992KB 945990320KB

The WAFL reserve allocated/available is what is puzzling me. Should it really be almost 1TB in size?

ASUNDSTROM · ‎2013-03-15

Peter,

Were you able to find out why the space wasn't showing up? I'm having a similar issue with all the same indicators.

nqlcadmin · ‎2013-04-30

Having same issue. Any ideas?

johannbell · ‎2013-04-30

nqlcadmin/ASUNDSTROM, do you have Aggregate snapshots enabled? Which ONTAP version are you running?

nqlcadmin · ‎2013-04-30

Using 7.3.2

There are a few but they don't equal the amount missing. All vols except vol0 have been deleted and some space has been reclaimed but it looks to have stopped at 900gig. The aggr0 says 1.17 TB in use... but there is nothing there.

if I run a wafl scan status I get:

wafl scan status -A aggr0

Aggregate aggr0:

Scan id Type of scan progress

1 active bitmap rearrangement fbn 6652 of 19669 w/ max_chain_len 1

But I'm still not getting any space reclaimed.

Let me know what command outputs you want.

edit:

df

Filesystem kbytes used avail capacity Mounted on

/vol/vol0/ 20132660 830560 19302100 4% /vol/vol0/

/vol/vol0/.snapshot 5033164 168332 4864832 3% /vol/vol0/.snapsh

> snap list -A aggr0

Aggregate aggr0

working...

No snapshots exist.

Doesn't make sense

johannbell · ‎2013-04-30

Thanks nqlcadmin. So the aggregate is empty? If you do an 'aggr show_space -g' it has no volumes in it? Do you have an aggregate snap reserve?

nqlcadmin · ‎2013-04-30

aggr0 is empty now. It has vol0 on it cause its a small setup. There are no other volumes on it now, all have been deleted and some space reclaimed but how it's using 1.17TB with nothing on there "visibly" im not sure.

Aggregate 'aggr0'

Total space WAFL reserve Snap reserve Usable space BSR NVLOG A-SIS

2449GB 244GB 110GB 2093GB 0GB 7GB

Space allocated to volumes in the aggregate

Volume Allocated Used Guarantee

vol0 25GB 1GB volume

Aggregate Allocated Used Avail

Total space 25GB 1GB 896GB

Snap reserve 110GB 0GB 110GB

WAFL reserve 244GB 1189GB 0GB

Thats the output. WAFL reserve will not let go

johannbell · ‎2013-04-30

The WAFL reserve is fixed at 10% of the aggregate size, and this won't change. This isn't space that needs to be reclaimed, it's the reserved space overhead for the WAFL system on the aggregate so it can store data.

The figure which should change is the Avail 896GB in the total aggregate space.

When you refresh the aggre show_space -g, does this change?

nqlcadmin · ‎2013-04-30

No it does not and this is where its stuck at 896gig. It reclaimed that ~900gig in about 10 mins but has now stopped well and truly.

johannbell · ‎2013-04-30

How long has it been like that?

For the WAFL reserve, I meant the allocated won't change, not the used. The used should be decreasing as the reclamation progresses. It can take some time though.

You can collect a perfstat, or check the system utilisation. If it's busy, this process could be taking longer than expected.

Try a sysstat -x 1

and check the CPU/Disk utilisation figures. They're likely very high.

nqlcadmin · ‎2013-04-30

Been like that for hours now.....

The system, is doing very little.

0% 0 0 0 0 0 0 8 0 0 0 >60 100% 0% - 0% 0 0 0 0 0 0

CPU NFS CIFS HTTP Total Net kB/s Disk kB/s Tape kB/s Cache Cache CP CP Disk FCP iSCSI FCP kB/s iSCSI kB/s

in out read write read write age hit time ty util in out in out

3% 0 0 0 0 0 0 320 400 0 0 >60 100% 8% T 6% 0 0 0 0 0 0

0% 0 0 0 0 0 1 24 32 0 0 >60 100% 0% - 2% 0 0 0 0 0 0

0% 0 0 0 0 0 0 0 0 0 0 >60 100% 0% - 0% 0 0 0 0 0 0

1% 0 0 0 0 0 0 0 0 0 0 >60 100% 0% - 0% 0 0 0 0 0 0

0% 0 0 0 0 0 0 32 24 0 0 >60 100% 0% - 1% 0 0 0 0 0 0

0% 0 0 0 0 0 0 0 0 0 0 >60 100% 0% - 0% 0 0 0 0 0 0

0% 0 0 0 0 0 0 0 8 0 0 >60 - 0% - 1% 0 0 0 0 0 0

0% 0 0 0 0 0 0 24 24 0 0 >60 100% 0% - 2% 0 0 0 0 0 0

0% 0 0 0 0 0 0 0 0 0 0 >60 100% 0% - 0% 0 0 0 0 0 0

0% 0 0 0 0 0 0 8 0 0 0 >60 100% 0% - 0% 0 0 0 0 0 0

3% 0 0 0 0 0 0 420 484 0 0 >60 100% 9% T 8% 0 0 0 0 0 0

0% 0 0 0 0 0 0 0 8 0 0 >60 100% 0% - 1% 0 0 0 0 0 0

0% 0 0 0 0 0 0 0 0 0 0 >60 - 0% - 0% 0 0 0 0 0 0

0% 0 0 0 0 0 0 24 24 0 0 >60 100% 0% - 2% 0 0 0 0 0 0

0% 0 0 0 0 0 0 8 0 0 0 >60 100% 0% - 0% 0 0 0 0 0 0

0% 0 0 0 0 0 0 0 0 0 0 >60 100% 0% - 0% 0 0 0 0 0 0

0% 0 0 0 0 0 0 24 32 0 0 >60 100% 0% - 2% 0 0 0 0 0 0

0% 0 0 0 0 0 0 0 0 0 0 >60 100% 0% - 0% 0 0 0 0 0 0

1% 0 0 0 0 0 0 0 0 0 0 >60 - 0% - 0% 0 0 0 0 0 0

0% 0 0 0 0 0 0 32 24 0 0 >60 100% 0% - 1% 0 0 0 0 0 0

CPU NFS CIFS HTTP Total Net kB/s Disk kB/s Tape kB/s Cache Cache CP CP Disk FCP iSCSI FCP kB/s iSCSI kB/s

in out read write read write age hit time ty util in out in out

3% 0 0 0 0 0 1 344 384 0 0 >60 100% 9% T 7% 0 0 0 0 0 0

0% 0 0 0 0 0 1 0 8 0 0 >60 100% 0% - 1% 0 0 0 0 0 0

0% 0 0 0 0 0 0 24 24 0 0 >60 100% 0% - 3% 0 0 0 0 0 0

0% 0 0 0 0 0 0 0 0 0 0 >60 100% 0% - 0% 0 0 0 0 0 0

0% 0 0 0 0 0 0 8 0 0 0 >60 - 0% - 0% 0 0 0 0 0 0

0% 0 0 0 0 0 0 24 24 0 0 >60 100% 0% - 1% 0 0 0 0 0 0

0% 0 0 0 0 0 0 0 8 0 0 >60 100% 0% - 1% 0 0 0 0 0 0

0% 0 0 0 0 0 0 0 0 0 0 >60 100% 0% - 0% 0 0 0 0 0 0

1% 0 0 0 0 0 0 24 24 0 0 >60 100% 0% - 2% 0 0 0 0 0 0

0% 0 0 0 0 0 0 8 0 0 0 >60 100% 0% - 0% 0 0 0 0 0 0

3% 0 0 0 0 0 0 316 388 0 0 >60 100% 9% T 7% 0 0 0 0 0 0

0% 0 0 0 0 0 0 24 32 0 0 >60 100% 0% - 2% 0 0 0 0 0 0

0% 0 0 0 0 0 0 0 0 0 0 >60 100% 0% - 0% 0 0 0 0 0 0

0% 0 0 0 0 0 0 32 24 0 0 >60 100% 0% - 2% 0 0 0 0 0 0

0% 0 0 0 0 1 0 0 0 0 0 >60 100% 0% - 0% 0 0 0 0 0 0

0% 0 0 0 0 0 0 0 8 0 0 >60 - 0% - 1% 0 0 0 0 0 0

0% 0 0 0 0 0 0 24 24 0 0 >60 100% 0% - 2% 0 0 0 0 0 0

0% 0 0 0 0 0 0 0 0 0 0 >60 100% 0% - 0% 0 0 0 0 0 0

0% 0 0 0 0 0 0 8 0 0 0 >60 100% 0% - 0% 0 0 0 0 0 0

CPU NFS CIFS HTTP Total Net kB/s Disk kB/s Tape kB/s Cache Cache CP CP Disk FCP iSCSI FCP kB/s iSCSI kB/s

in out read write read write age hit time ty util in out in out

3% 0 0 0 0 0 0 328 424 0 0 >60 100% 9% T 9% 0 0 0 0 0 0

0% 0 0 0 0 0 1 0 8 0 0 >60 100% 0% - 1% 0 0 0 0 0 0

0% 0 0 0 0 0 0 0 0 0 0 >60 - 0% - 0% 0 0 0 0 0 0

0% 0 0 0 0 0 0 24 24 0 0 >60 100% 0% - 2% 0 0 0 0 0 0

0% 0 0 0 0 0 0 8 0 0 0 >60 100% 0% - 0% 0 0 0 0 0 0

0% 0 0 0 0 0 0 0 0 0 0 >60 100% 0% - 0% 0 0 0 0 0 0

0% 0 0 0 0 0 0 24 32 0 0 >60 100% 0% - 2% 0 0 0 0 0 0

johannbell · ‎2013-04-30

Is the WAFL scan status progressing? It's normal for a large reclaim to take hours, but it should be progressing.

Alternatively check the aggregate space without the -g switch and see if it's incrementing (as it will take a long time for the reclaim to show in GB).

nqlcadmin · ‎2013-04-30

Yes it is:

wafl scan status

Volume vol0:

Scan id Type of scan progress

2 active bitmap rearrangement fbn 1964 of 2955 w/ max_chain_len 3

wafl scan status -A

Aggregate aggr0:

Scan id Type of scan progress

1 active bitmap rearrangement fbn 10943 of 19669 w/ max_chain_len 13

WAFL is progressing but the aggr space is staying the same:

aggr show_space -g

Aggregate 'aggr0'

Total space WAFL reserve Snap reserve Usable space BSR NVLOG A-SIS

2449GB 244GB 110GB 2093GB 0GB 7GB

Space allocated to volumes in the aggregate

Volume Allocated Used Guarantee

vol0 25GB 1GB volume

Aggregate Allocated Used Avail

Total space 25GB 1GB 896GB

Snap reserve 110GB 0GB 110GB

WAFL reserve 244GB 1189GB 0GB

I've done this before with a cifs share and it didn't take this long, it gave me the correct space back and I reallocated to a new vol. Now I have no volumes and I'm using 1.1TB of nothing haha

johannbell · ‎2013-04-30

As long as it's progressing, you should get the space back in time.

If you try the 'aggr show_space' and compare it again in say 30 minutes, it will likely have increased some. If you're waiting on this to be able to reuse the space, you could thin provision the new volume, otherwise I'd leave it overnight and check it in the morning.