Network and Storage Protocols

Isci Lun going offline with msg volume space full

storage_india
12,540 Views

have create volume of 96gb and inside it created a lun of size 65gb. 

rsh filer0011 df -h exch_index_1

Filesystem               total       used      avail capacity  Mounted on

/vol/exch_index_1/        96GB       65GB       30GB      68%  /vol/exch_index_1/

/vol/exch_index_1/.snapshot       24GB        0KB       24GB       0%  /vol/exch_index_1/.snapshot

I gues the volume got formatted at windows end so that is why it showing me 65gb used. Please correct me if wrong .

I have set volume option Fractional Reserve to 0  and have the snapshot enabled . Also the volume is getting snapmirror to DR filer.

1) When user deletes the few gb's of data and add fresh data  the all deleted data goes into snapshot and volume gets full and it takes lun offline.

Wed Apr 24 11:33:03 IST [filer0011: scsitarget.lun.noSpace:error]: LUN '/vol/exch_index_1/exch_index_1' has run out of space.
Wed Apr 24 11:33:03 IST [filer0011: lun.offline:warning]: LUN /vol/exch_index_1/exch_index_1 has been taken offline

2) To avoid this I have disabled the snapshot generation and break the snapmirror relation ship with DR volume . Now again when use starts dumping the fresh data it started eating up the snapshot space (snapmirror snapshot started growing ) and once vailable space in volume is eaten up by snapshot the volume gets full and LUN goes offline again.

Below is volume status when only snapmirror snapshot was there and user was dumping fresh data.

rsh filer0011 df -h exch_index_1

Filesystem               total       used      avail capacity  Mounted on

/vol/exch_index_1/        96GB       65GB       30GB      68%  /vol/exch_index_1/

/vol/exch_index_1/.snapshot       24GB       15GB     9133MB      63%  /vol/exch_index_1/.snapshot

rsh filer0011 snap list exch_index_1
Volume exch_index_1
working...

  %/used       %/total  date          name
----------  ----------  ------------  --------
19% (19%)   13% (13%)  Apr 26 10:06  banfpbcpnap0010(0151729288)_exch_index_1R.5 (snapmirror)

To overcome this i have removed snapmirror relationship and delete the snapmirror snapshot as well.

Now its working fine but this is not what i want it.

I want to keep the snapmirror relation on and dont want volume to go offline every now and then when user delete and add more data.

What is causing volume full , why new data is going in snapshot not in actual volume. anything wrong with Fractional Reserve setting ( set to 0 ) . I read full discussion it saying it happens when

Fractional Reserve set to 100%.

Please help.

14 REPLIES 14

MichaelCade
12,390 Views

Hi,

you need to look at the "snapshot reserve" for the volume level, from looking above if you take your 96gb volume and then you put a thick provisioned 65gb LUN within that volume. this leaves you 31gb free space in the Volume. this will be used by the snapshots if no reserve is set. from above i can see that you have one snapshot of 15gb which would total 80gb. However check the snapshot reserve and let us know what that is, as that percentage will take the space from the volume total size.

explained by Chris Kranz here - http://communities.netapp.com/groups/chris-kranz-hardware-pro/blog/2009/03/05/fractional-reservation--lun-overwrite

storage_india
12,390 Views

fractional_reserve is already set to 0 and snap reserve is 20%. Some one has recommened me that only setting up the fractional_reserve to 0 will not help i need to set volume guarantee=none. I have set that.

johannbell
12,390 Views

Hi,

New data doesn't 'go into' the snapshot rather than the 'actual volume'.

The volume is a space which contains blocks. In your case, you have a LUN in there, which contains data presented to a server via iSCSI. Since you have snapshots enabled on this volume, it will periodically snapshot (stop changing blocks which already exist on the volume) the volume, and record any changed blocks in a new area. This isn't literally how it works, but you can think of it like that. This means that any changed blocks count towards the snapshot space, not just deleted blocks. So when your user deletes 20GB of data, then writes 20GB of new data, this creates a 40GB snapshot, not a 20GB snapshot.

Your volume offlines to protect itself, as it has nowhere to write new data.

To prevent this, you have a few options:

Stop creating snapshots (no good if you want to have snapshots)

Increase the overall volume size

Keep fewer snapshots (you can either change the retention schedule, or configure snapshot auto deletion)

Decrease the rate of change in the volume (no good if you want to keep using your volume normally).

Please do a 'snap list' to check if there are any older snapshots consuming space inside your volume.

Also, what is your snapshot schedule?

volume guarantee=none only thin provisions your volume and has no effect on its configured size. It will use less space in the aggregate (maybe), but won't actually be any larger/smaller as far as your LUN is concerned.

I would suggest:

Check the snapshot schedule isn't keeping too many snaps.

Work out how much snapshot space you'll need. Do this by calculating the size of your snapshots multiplied by the number you want to retain. Eg, if your snapshots are 15GB and you want to keep 6 of them, you'll need about 90GB of space to do this.

Then you'll have to size your volume to make sure that you have enough space to contain the LUN, the total snapshot size you want to keep, and then some free space.

The Snapshot reserve is an area for use by snapshots only, however they will use space outside of this reserve as required. Given your LUN can't grow by itself, you shouldn't need to configure this option here, and you can set it later once you know how large your snaps will be. The setting isn't necessary for LUN volumes.

VKALVEMULA
12,390 Views

i also have the same issue.

i did create a LUN inside a qtree ( enabled quotas ).. no snaps..

as soon as the quota filled up, it took down the LUN

do we have any solutions so that LUN never go into offline mode.

peter_lehmann
12,390 Views

Best Practice...

qtree = good

quota = not good

no quota = no offline LUN, easy

VKALVEMULA
12,391 Views

it did solve the issue..

so whats the difference if we enable quotas, why its taking LUN into offline mode?

we are creating a LUN inside a qtree !!!

can you please share your thoughts what would be the difference when we create a LUN inside a qtree vs LUN inside a qtree which has quotas turned on.

VKALVEMULA
12,391 Views

we are in prob again... even tough the LUN says its full from windows side...

is still accepting to create small files ( 3kb or 2 kb) and we try to open those files, data is corrupted.

shaunjurr
9,327 Views

You can enable quotas on the volume/qtree, just don't set limits. There's really no point. The LUN size limits LUN data. A quota won't limit any other per user data because this isn't a CIFS or NFS exported filesystem.

You really need to read the documentation on SAN/block data administration.

S.

shaunjurr
12,391 Views

Hi,

Thin provision everything (no volume guarantees, no lun space reservation) and stop using snap reserve (set it to 0). Configure SME To not complain about the thin provisioning.

Set up volume auto sizing and warning levels on your aggregates and live a quiet life.

S.

(If it hurts, stop doing it...)

VKALVEMULA
12,391 Views

thats not the solution i m looking for

shaunjurr
9,327 Views

Seeing as how what I suggested actually works and you don't seem to have a better solution, this seems to be more a matter of PEBKAC, from where I'm standing.

Enjoy.

VKALVEMULA
9,327 Views

we can keep alerts and monitor the LUNs from our end but how can we stop customers dumping data middle of the night and filling up those luns

shaunjurr
9,327 Views

You can't. By using LUNs, you leave any such usage considerations to the server using the LUN's. Suggest that they implement mail quotas or the like.  

S.

paul_wolf
9,327 Views

The issue isn't that the 'customers dump data' to the LUN, it's that the blocks in the LUN change and get moved from the active file system to the snapshot that exists due to the SnapMirror relationship.  This is normal behavoir and reflects the data change rate that is going on in the LUN. There are a couple of options availble that can help to mitigate this issue:

1) Volume autogrow - grows the underslying FlexVol when a certain usage % threshold is reached, threshold is higher the larger the FlexVol size.

2) Snap autodelete - deletes snapshots on an as needed basis using a variety of trigger conditions, thresholds and delete options ('snap autodelete [volname]' for more options).  The issue here is that the SnapMirror snapshot is locked so this will not work as long as the SnapMirror relationship exists.

3) Thin provisioning - Set the FlexVol to some very large value. ( I typically will set my FlexVols to twice the size of the configured LUNs, I set fractional reserve to 0).  As this is thin provisioned, I have not hard allocated the space in the aggregate. Now, the danger f thin provisioning is that you are promising space that either a) you hope never gets fully used or b) you can procure and/or free up capacity to add to the aggregate at a later date.

4) Deduplication - removes duplicate blocks in the FlexVol and updates the inode table to point to the single unique data block.  This saves space and on average will yield about a 20-25% reduction in data on the FlexVol. Some data like large image files, encrypted data, etc don't deupe much if at all. Some data like virtual machine OS disks depupe very, very well. 

Public