Network and Storage Protocols

Deleted FlexVol, aggregate won't release the space

petermiet
15,892 Views

I have an aggregate on a FAS2050 running OnTap 7.3.2 with several 1TB FlexVols on it, and on Friday I deleted one of them to re-create it during some troubleshooting but didn't see the space available. I figured it must have had to process and release (our array is SATA) so I let it go over the weekend and still no additional space freed. Running "aggr show_space" makes me believe that the space is sitting still in WAFL reserve:

Aggregate                       Allocated            Used           Avail

Total space                  9024035984KB    2521786544KB     348701216KB

Snap reserve                          0KB             0KB             0KB

WAFL reserve                 1041451312KB      99056988KB     942394324KB

That's almost 900GB available in reserve! I searched and found suggestions of it being tied up in snapshots or snapshot reserve, but this aggregate has none. I've also been watching "df -Ak" for a while and the numbers to appear to be fluctuating but not by much. Is it really possible that deleting an empty volume can take 3+ days for the space to be returned or is there something else going on?

25 REPLIES 25

scottgelb
14,526 Views

does priv set advanced ; wafl scan status show space reclamation running?  Are aggr snapshots running?  Even when rolling off they keep getting recreated so might help to disable aggr snaps and delete existing aggr snaps and see if that makes a difference.  Space should come back when the aggr snaps clear.

aborzenkov
14,526 Views

Do you have aggregate snapshots?

petermiet
14,526 Views

Scott, the wafl scan status shows space reclamation on the other aggregate and individual volumes, but not the aggr in question. I also don't have any aggregate snapshots configured or taken but I disabled the schedule anyway.

Apparently I had a disk fail last night though, so i'm going to say the rebuild is going to take precedence now. Thankfully I already have a replacement on the way.

scottgelb
14,526 Views

After the rebuild is done and you have time... does the total of every volume in aggr show_space not match the total used in the aggregate?  Depending on if volume guarantee is on or not it might vary but you can compute based on the used or total size depending on guarantee... then see how much extra space isn't freed up in the aggr.

clackamas
14,868 Views

Do these commands;

FAS3070*> snap reserve -A

FAS3070*> snap list -A

FAS3070*> snap delta -A

I bet the data is being held in an aggregate snap shot.  Since this is exactly what they do.  Also it can be a slow process getting all the blocks back in large volumes.

scottgelb
14,868 Views

It seems that should be the issue but he said he doesn't have aggr snaps... please post the output of the commands just to make sure.

petermiet
14,868 Views

sorry guys, that single disk failure turned into a multi-disk failure and a rebuild of a different aggregate... all it back to normal (finally) except this problem. Anyway, here is the output from those commands:

snap reserve -A:

Aggregate aggr2: current snapshot reserve is 0% or 0 k-bytes.

snap list -A

Aggregate aggr2

working...

No snapshots exist.

snap delta -A

Aggregate aggr2

working...

snap delta: No snapshots exist in Aggregate aggr2

Also aggr show_space:

Aggregate 'aggr2'

      Total space    WAFL reserve    Snap reserve    Usable space       BSR NVLOG           A-SIS

  10414513152KB    1041451312KB             0KB    9373061840KB             0KB             0KB

 

Space allocated to volumes in the aggregate

 

Volume                          Allocated            Used       Guarantee

SQL_DB_14                    1349633820KB      83706340KB          volume

SQL_DB_12                    1349633820KB     651752992KB          volume

SQL_DB_16                    1349633820KB     504296272KB          volume

SQL_DB_21                    1199674504KB     718353544KB          volume

SQL_DB_22                    1199674504KB     443348312KB          volume

SQL_DB_24                    1679544308KB     508884544KB          volume

SQL_DB_25                     896241208KB     541405156KB          volume

 

Aggregate                       Allocated            Used           Avail

Total space                  9024035984KB    3451747160KB     348701220KB

Snap reserve                          0KB             0KB             0KB

WAFL reserve                 1041451312KB      95460992KB     945990320KB

The WAFL reserve allocated/available is what is puzzling me. Should it really be almost 1TB in size?

ASUNDSTROM
14,868 Views

Peter,

Were you able to find out why the space wasn't showing up?  I'm having a similar issue with all the same indicators.

nqlcadmin
14,868 Views

Having same issue. Any ideas?

johannbell
9,737 Views

nqlcadmin/ASUNDSTROM, do you have Aggregate snapshots enabled? Which ONTAP version are you running?

nqlcadmin
9,737 Views

Using 7.3.2

There are a few but they don't equal the amount missing. All vols except vol0 have been deleted and some space has been reclaimed but it looks to have stopped at 900gig. The aggr0 says 1.17 TB in use... but there is nothing there.

if I run a wafl scan status I get:

wafl scan status -A aggr0

Aggregate aggr0:

Scan id                   Type of scan     progress

       1    active bitmap rearrangement     fbn 6652 of 19669 w/ max_chain_len 1

But I'm still not getting any space reclaimed.

Let me know what command outputs you want.

edit:

df

Filesystem              kbytes       used      avail capacity  Mounted on

/vol/vol0/            20132660     830560   19302100       4%  /vol/vol0/

/vol/vol0/.snapshot    5033164     168332    4864832       3%  /vol/vol0/.snapsh

> snap list -A aggr0

Aggregate aggr0

working...

No snapshots exist.

Doesn't make sense

johannbell
9,737 Views

Thanks nqlcadmin. So the aggregate is empty? If you do an 'aggr show_space -g' it has no volumes in it? Do you have an aggregate snap reserve?

nqlcadmin
9,737 Views

aggr0 is empty now. It has vol0 on it cause its a small setup. There are no other volumes on it now, all have been deleted and some space reclaimed but how it's using 1.17TB with nothing on there "visibly" im not sure.

Aggregate 'aggr0'

    Total space    WAFL reserve    Snap reserve    Usable space       BSR NVLOG           A-SIS

         2449GB           244GB           110GB          2093GB             0GB             7GB

Space allocated to volumes in the aggregate

Volume                          Allocated            Used       Guarantee

vol0                                 25GB             1GB          volume

Aggregate                       Allocated            Used           Avail

Total space                          25GB             1GB           896GB

Snap reserve                        110GB             0GB           110GB

WAFL reserve                        244GB          1189GB             0GB

Thats the output. WAFL reserve will not let go

johannbell
9,738 Views

The WAFL reserve is fixed at 10% of the aggregate size, and this won't change. This isn't space that needs to be reclaimed, it's the reserved space overhead for the WAFL system on the aggregate so it can store data.

The figure which should change is the Avail 896GB in the total aggregate space.

When you refresh the aggre show_space -g, does this change?

nqlcadmin
9,738 Views

No it does not and this is where its stuck at 896gig. It reclaimed that ~900gig in about 10 mins but has now stopped well and truly.

johannbell
8,736 Views

How long has it been like that?

For the WAFL reserve, I meant the allocated won't change, not the used. The used should be decreasing as the reclamation progresses. It can take some time though.

You can collect a perfstat, or check the system utilisation. If it's busy, this process could be taking longer than expected.

Try a sysstat -x 1

and check the CPU/Disk utilisation figures. They're likely very high.

nqlcadmin
8,736 Views

Been like that for hours now.....

The system, is doing very little.

0%     0     0     0       0     0     0      8      0     0     0   >60  100%   0%  -    0%      0     0     0     0     0     0

CPU   NFS  CIFS  HTTP   Total    Net kB/s   Disk kB/s     Tape kB/s Cache Cache  CP   CP Disk    FCP iSCSI   FCP  kB/s iSCSI  kB/s

                                  in   out   read  write  read write   age   hit time  ty util                 in   out    in   out

  3%     0     0     0       0     0     0    320    400     0     0   >60  100%   8%  T    6%      0     0     0     0     0     0

  0%     0     0     0       0     0     1     24     32     0     0   >60  100%   0%  -    2%      0     0     0     0     0     0

  0%     0     0     0       0     0     0      0      0     0     0   >60  100%   0%  -    0%      0     0     0     0     0     0

  1%     0     0     0       0     0     0      0      0     0     0   >60  100%   0%  -    0%      0     0     0     0     0     0

  0%     0     0     0       0     0     0     32     24     0     0   >60  100%   0%  -    1%      0     0     0     0     0     0

  0%     0     0     0       0     0     0      0      0     0     0   >60  100%   0%  -    0%      0     0     0     0     0     0

  0%     0     0     0       0     0     0      0      8     0     0   >60    -    0%  -    1%      0     0     0     0     0     0

  0%     0     0     0       0     0     0     24     24     0     0   >60  100%   0%  -    2%      0     0     0     0     0     0

  0%     0     0     0       0     0     0      0      0     0     0   >60  100%   0%  -    0%      0     0     0     0     0     0

  0%     0     0     0       0     0     0      8      0     0     0   >60  100%   0%  -    0%      0     0     0     0     0     0

  3%     0     0     0       0     0     0    420    484     0     0   >60  100%   9%  T    8%      0     0     0     0     0     0

  0%     0     0     0       0     0     0      0      8     0     0   >60  100%   0%  -    1%      0     0     0     0     0     0

  0%     0     0     0       0     0     0      0      0     0     0   >60    -    0%  -    0%      0     0     0     0     0     0

  0%     0     0     0       0     0     0     24     24     0     0   >60  100%   0%  -    2%      0     0     0     0     0     0

  0%     0     0     0       0     0     0      8      0     0     0   >60  100%   0%  -    0%      0     0     0     0     0     0

  0%     0     0     0       0     0     0      0      0     0     0   >60  100%   0%  -    0%      0     0     0     0     0     0

  0%     0     0     0       0     0     0     24     32     0     0   >60  100%   0%  -    2%      0     0     0     0     0     0

  0%     0     0     0       0     0     0      0      0     0     0   >60  100%   0%  -    0%      0     0     0     0     0     0

  1%     0     0     0       0     0     0      0      0     0     0   >60    -    0%  -    0%      0     0     0     0     0     0

  0%     0     0     0       0     0     0     32     24     0     0   >60  100%   0%  -    1%      0     0     0     0     0     0

CPU   NFS  CIFS  HTTP   Total    Net kB/s   Disk kB/s     Tape kB/s Cache Cache  CP   CP Disk    FCP iSCSI   FCP  kB/s iSCSI  kB/s

                                  in   out   read  write  read write   age   hit time  ty util                 in   out    in   out

  3%     0     0     0       0     0     1    344    384     0     0   >60  100%   9%  T    7%      0     0     0     0     0     0

  0%     0     0     0       0     0     1      0      8     0     0   >60  100%   0%  -    1%      0     0     0     0     0     0

  0%     0     0     0       0     0     0     24     24     0     0   >60  100%   0%  -    3%      0     0     0     0     0     0

  0%     0     0     0       0     0     0      0      0     0     0   >60  100%   0%  -    0%      0     0     0     0     0     0

  0%     0     0     0       0     0     0      8      0     0     0   >60    -    0%  -    0%      0     0     0     0     0     0

  0%     0     0     0       0     0     0     24     24     0     0   >60  100%   0%  -    1%      0     0     0     0     0     0

  0%     0     0     0       0     0     0      0      8     0     0   >60  100%   0%  -    1%      0     0     0     0     0     0

  0%     0     0     0       0     0     0      0      0     0     0   >60  100%   0%  -    0%      0     0     0     0     0     0

  1%     0     0     0       0     0     0     24     24     0     0   >60  100%   0%  -    2%      0     0     0     0     0     0

  0%     0     0     0       0     0     0      8      0     0     0   >60  100%   0%  -    0%      0     0     0     0     0     0

  3%     0     0     0       0     0     0    316    388     0     0   >60  100%   9%  T    7%      0     0     0     0     0     0

  0%     0     0     0       0     0     0     24     32     0     0   >60  100%   0%  -    2%      0     0     0     0     0     0

  0%     0     0     0       0     0     0      0      0     0     0   >60  100%   0%  -    0%      0     0     0     0     0     0

  0%     0     0     0       0     0     0      0      0     0     0   >60  100%   0%  -    0%      0     0     0     0     0     0

  0%     0     0     0       0     0     0     32     24     0     0   >60  100%   0%  -    2%      0     0     0     0     0     0

  0%     0     0     0       0     1     0      0      0     0     0   >60  100%   0%  -    0%      0     0     0     0     0     0

  0%     0     0     0       0     0     0      0      8     0     0   >60    -    0%  -    1%      0     0     0     0     0     0

  0%     0     0     0       0     0     0     24     24     0     0   >60  100%   0%  -    2%      0     0     0     0     0     0

  0%     0     0     0       0     0     0      0      0     0     0   >60  100%   0%  -    0%      0     0     0     0     0     0

  0%     0     0     0       0     0     0      8      0     0     0   >60  100%   0%  -    0%      0     0     0     0     0     0

CPU   NFS  CIFS  HTTP   Total    Net kB/s   Disk kB/s     Tape kB/s Cache Cache  CP   CP Disk    FCP iSCSI   FCP  kB/s iSCSI  kB/s

                                  in   out   read  write  read write   age   hit time  ty util                 in   out    in   out

  3%     0     0     0       0     0     0    328    424     0     0   >60  100%   9%  T    9%      0     0     0     0     0     0

  0%     0     0     0       0     0     1      0      8     0     0   >60  100%   0%  -    1%      0     0     0     0     0     0

  0%     0     0     0       0     0     0      0      0     0     0   >60    -    0%  -    0%      0     0     0     0     0     0

  0%     0     0     0       0     0     0     24     24     0     0   >60  100%   0%  -    2%      0     0     0     0     0     0

  0%     0     0     0       0     0     0      8      0     0     0   >60  100%   0%  -    0%      0     0     0     0     0     0

  0%     0     0     0       0     0     0      0      0     0     0   >60  100%   0%  -    0%      0     0     0     0     0     0

  0%     0     0     0       0     0     0     24     32     0     0   >60  100%   0%  -    2%      0     0     0     0     0     0

johannbell
8,736 Views

Is the WAFL scan status progressing? It's normal for a large reclaim to take hours, but it should be progressing.

Alternatively check the aggregate space without the -g switch and see if it's incrementing (as it will take a long time for the reclaim to show in GB).

nqlcadmin
8,736 Views

Yes it is:

wafl scan status

Volume vol0:

Scan id                   Type of scan     progress

       2    active bitmap rearrangement     fbn 1964 of 2955 w/ max_chain_len 3

wafl scan status -A

Aggregate aggr0:

Scan id                   Type of scan     progress

       1    active bitmap rearrangement     fbn 10943 of 19669 w/ max_chain_len 13

WAFL is progressing but the aggr space is staying the same:

aggr show_space -g

Aggregate 'aggr0'

    Total space    WAFL reserve    Snap reserve    Usable space       BSR NVLOG           A-SIS

         2449GB           244GB           110GB          2093GB             0GB             7GB

Space allocated to volumes in the aggregate

Volume                          Allocated            Used       Guarantee

vol0                                 25GB             1GB          volume

Aggregate                       Allocated            Used           Avail

Total space                          25GB             1GB           896GB

Snap reserve                        110GB             0GB           110GB

WAFL reserve                        244GB          1189GB             0GB

I've done this before with a cifs share and it didn't take this long, it gave me the correct space back and I reallocated to a new vol. Now I have no volumes and I'm using 1.1TB of nothing haha

johannbell
7,726 Views

As long as it's progressing, you should get the space back in time.

If you try the 'aggr show_space' and compare it again in say 30 minutes, it will likely have increased some. If you're waiting on this to be able to reuse the space, you could thin provision the new volume, otherwise I'd leave it overnight and check it in the morning.

Public