ONTAP Discussions

Volume is full and LUN cannot be accessed

PANTELIS_KOMATAS
13,088 Views

Dear all

We are facing a serious problem with our Netapp filer FAS2050 with version 7.3.2. A volume is hsown as being full and the size of the volume is 2 TB. The volume contains a LUN that it is 1.5 TB bif but when we run df -h on the filer it shows the volume to be absoletely full with no space left at all. There are no snapshots on the volume to delete.

Since the volume has A-SIS on it can not grow beyond 2 TB. When we bring the LUN online the vmware hosts with ESX 4.0 which access the LUN seem to hung and the only way to remedy this is to offline the volume. So despite that the data is there we do not have any way to get the data out of the lun. Any ides?  We have tries to perform an ndmpcopy to another volume on the ha pair filer but it fails with No spece left on device message. Also we have tried the commnad sis undo volume name but we get an error : Volume is in transition state.

Please help!!!!

1 ACCEPTED SOLUTION

radek_kubka
17,867 Views

the second column of your df -r output shows that used space is equal to the volume size, so changing volume guarantee shouldn't change anything in the aggregate (if you don't have enough space the operation would fail anyway)

View solution in original post

25 REPLIES 25

radek_kubka
11,224 Views

Hi,

Although you have no snapshots, with dedupe being enabled fractional reserve may have kicked in.

Do you know to what value the FR is set at the moment? (if not, vol status -v will show this)

Regards,

Radek

PANTELIS_KOMATAS
11,224 Views

Dear Radek

Thanks for your response the fractional reserve is set to 100.

scottgelb
10,707 Views

You might just want to turn off Lun guarantee if the only Lun in the volume. Lun is smaller than the volume and no snaps

PANTELIS_KOMATAS
10,707 Views

Lun guarante was always off. I really do not understand where the 500 GB has gone since the lun is only 1,5 TB and the volume 2 TB.

scottgelb
10,707 Views

df -r output?

Sent from my iPhone 4S

PANTELIS_KOMATAS
15,561 Views

/vol/MBX01ARCHIVEvm/ 2147483648 2147483648          0 (1549302796) /vol/MBX01ARCHIVEvm/
/vol/MBX01ARCHIVEvm/.snapshot          0          0          0          0  /vol/MBX01ARCHIVEvm/.snapshot

radek_kubka
11,224 Views

Then change it to 0 & see what happens:

vol options volname fractional_reserve 0

PANTELIS_KOMATAS
11,225 Views

volume guarantee is set to none so fractional reserver canno be changed

radek_kubka
10,707 Views

OK, fair enough - it used to be the case with ONTAP versions prior to 7.3."something".

I always thought that although FR is fixed at 100%, it actually behaves like being set to 0% with volume guarantee being set to none.

*If* you have space in the aggregate, you may experiment, set volume guarantee to 'volume' & then set FR to 0%.

radek_kubka
11,224 Views

Try changing FR - I have a strange feeling that might be the answer.

PANTELIS_KOMATAS
16,078 Views

Radek

Since my aggregate has exactly 2.0 TB left availiable if i set the volume guarante to volume i will have a problem the volume supposedely takes 1.5 TB already.

what do you think?

radek_kubka
17,868 Views

the second column of your df -r output shows that used space is equal to the volume size, so changing volume guarantee shouldn't change anything in the aggregate (if you don't have enough space the operation would fail anyway)

PANTELIS_KOMATAS
16,078 Views

Dear Radek

Thanks for you answer and patience. You really saved my life. I have set the volume guarantee to volume and aggreegate went to 1.5 TB availiable. Then i set fractional reserve to 0 and the volume now has 512 GB availiable at 75 % full. You saved me.

Can you please describe a bit what was the problem with the fractional reserver setting?

radek_kubka
16,078 Views

Funnily enough, FR was designed to prevent LUNs from going offline when there are some snapshots in the volume & change rate is unexpectedly high. You can read a bit more about FR in Chris's blog post:

https://communities.netapp.com/groups/chris-kranz-hardware-pro/blog/2009/03/05/fractional-reservation--lun-overwrite

I think there was already a case discussed on Communitiies, when FR kicked in with no snapshots in the volume, but with dedupe enabled - I vaguely remember there is hidden snapshot, or something like that to be blamed.

aborzenkov
16,079 Views

Yes, I would love to understand what actually was going on here. Unfortunately, there is not enough information (for a start, we do not even know whether there was anything else on this volume) nor do we know the history of events.

aborzenkov
15,784 Views

Funnily enough, FR was designed to prevent LUNs from going offline

Well, OP did never mention LUN going offline. I did experiment with A-SIS and LUN and I can claim with confidence that in the state shown there is enough space for write to LUN even though NetApp will loudly complain that volume is 100% full. I still am not sure what exact problem there was. FR looks more like red herring here.

FR kicked in with no snapshots in the volume, but with dedupe enabled - I vaguely remember there is hidden snapshot, or something like that to be blamed.

Yes. Volume with A-SIS behaves exactly like volume with snapshots w.r.t LUN.The practical problem is - deduplication is post-process. And we have to make sure there is enough space to overwrite what had already been written which goes into new blocks until sis job runs next time. Even if it had been deduplicated. So we need to reserve amount of space equal - surprise - to logical amount of space LUN consumed so far.

Unfortunately, existing tools do not present clear breakdown of space consumption. E.g. with space reserved LUNs (default) "df -s" simply lies. It computes space savings against "used" column of df - which in case of space reservation is reserved, not consumed space. Here is an example:

simsim> df -r v1

Filesystem              kbytes       used      avail   reserved  Mounted on

/vol/v1/                 20480      15828       4652    (14412) /vol/v1/

simsim> df -s v1

Filesystem                used      saved       %saved

/vol/v1/                 15828      14328          48%

simsim> lun set reservation /vol/v1/lv disable

simsim> df -r v1

Filesystem              kbytes       used      avail   reserved  Mounted on

/vol/v1/                 20480        396      20084          0  /vol/v1/

simsim> df -s v1

Filesystem                used      saved       %saved

/vol/v1/                   396      14328          97%

As soon as we step into thin provisioning territory, everything becomes very confusing ...

radek_kubka
15,784 Views
Well, OP did never mention LUN going offline.

Technically speaking - yes. But in a very practical terms, the LUN was not accessible. If after freeing up space locked by FR the LUN is again accessible, then FR can be blamed, even if it is not a direct cause.

In this particular case FR was probably trying to claim more space than was available in the volume & maybe that's why LUN access went pear shaped.

Volume with A-SIS behaves exactly like volume with snapshots w.r.t LUN.The practical problem is - deduplication is post-process. And we have to make sure there is enough space to overwrite what had already been written which goes into new blocks until sis job runs next time.

That actually makes sense - but FR should be intelligent enough to take into account dedupe ratio. E.g. if FR is set to 100%, it should not allow to shrink the volume below the size of a non-duplicated LUN, but there is no need to claim any more space, if there are no snaps.

aborzenkov
15,784 Views

Well ... without diagnostic at the time of problem it makes little sense to speculate.

There is no way to take in account deduplication ratio for simple reason - you cannot predict future. Even if existing data are all compressed into single block, new data could be all different. So you need to ensure as much space to accomodate it. If you know your data - adjust FR; this has always been the case

Ayone knows if it is possible to see how much of FR is in use?

radek_kubka
15,785 Views
There is no way to take in account deduplication ratio for simple reason - you cannot predict future. 

Well, now it's purely academic . I am getting what FR does if dedupe is enabled - if set to 100%, it books the full, declared (non-deduped) LUN size, exactly the same as if there is any snapshot. But I think it is a waste of space *if* dedupe ratio is low / near-zero (and this actually can be measured, as we are taking about what is, not what will be).

Two examples:

- 2TB volume, 1TB LUN, de-dupe enabled (& run) with no savings => with FR=100%, volume is full; but if there is no snapshots, the LUN will never use more space in a volume than 1TB

- 2TB volume, 1TB LUN, de-dupe enabled (& run) with 0.9TB savings => with FR=100%, 1.1TB of space is used in the volume (so e.g. we can shrink the volume to 1.1TB)

aborzenkov
11,645 Views

But I think it is a waste of space *if* dedupe ratio is low

Sure. FR is a waste of space. This is simply the question of more or less safe defaults. I prefer what NetApp does - make sure it is safe by default and if user wants to shoot himself in the foot - the gun is there.

but if there is no snapshots

OK, I apologize - I let myself to be misled by your comments and so my answer was misleading as well. The actual problem has nothing to do with snapshots nor with deduplication ratio.

Please accept the simple fact - blocks that had been processed by deduplication scan are frozen until next scan. This means that if you have "1TB LUN, de-dupe enabled (& run) with no savings" - 1TB of your space is frozen for unknown amount of time. So you need space for future writes. Default is conservative 100%. It is not related to whether you are going to take any snapshot. Nor is it related to how good future data may be deduplicated, because they must be stored in "fat" form initially. Nor is it related to how low existing deduplication ratio is, because space is already gone.

Public