Subscribe

Using Flexclone - how can I limit the storage given to a host?

I posed this question at the user's group meeting yesterday in Reston.

Scenario:

- Solaris host operates 2 TB database

- Project wants a copy of the 2 TB database for development / testing purposes

- 7 days of snapshots are kept on database, 1 snap / day

- snapvault copy created every 24 hours

- vol gaurantees set to vol

- 100% lun space reservations

- Volume includes space for snaps (over 4TB total vol space)

Question:  how can I use Flexclone, autogrow, vol space guarantees, snap reserve, etc., so that I can maintain operations to production database while offering a copy that's writable to another development server for dev / test purposes?  KEY OBJECTIVES - I must create and configure the writable flexvol so that writes made by dev / test server cannot exceep X amount (e.g., 200 GB).  Production snaps and snapvault must go unaltered so SLA supported for 7 day retention and nightly snapvault.

Jesse Thaloor - you were going to try a test and let us know?

Doug Gerber

Re: Using Flexclone - how can I limit the storage given to a host?

Doug,

I did some  testing last night using auto-grow configuration on the flex-cloned volume and there is one thing I have to verify before putting out a definitive answer on this. . Auto-grow appears to work, but it has raised some questions on how aggregate space is utilized with clones with auto-grow and fractional-reserve set to 0.

Stay tuned and I will have a full description in a day or so.

Thanks

Jesse

Re: Using Flexclone - how can I limit the storage given to a host?

So the Answer is YES, the functionality will work as shown below:

CAVEATS:

Ontap 7.3.1 (Pretty sure this will work the same in any release on or after 7.2.4)

Fractional Reserve to ZERO (0)

Auto Grow set on the cloned volume

No other lun based options have been set (so they are all defaults)

(In the logs you will see two filers DR2 and Lincoln. The procedure was tested on two filers so the logs may come from two controllers but the results are the same)

See the whole work through below:

# Create a new volume of size 1GB

DR2> vol create testvol aggr1 1g

# disable automatic snapshots

DR2> vol options testvol nosnap on

# set fractional reserve to 0

DR2> vol options testvol fractional_reserve 0

# check vol options

DR2> vol options testvol

nosnap=on, nosnapdir=off, minra=off, no_atime_update=off, nvfail=off,

ignore_inconsistent=off, snapmirrored=off, create_ucode=off,

convert_ucode=off, maxdirsize=31457, schedsnapname=ordinal,

fs_size_fixed=off, compression=off, guarantee=volume, svo_enable=off,

svo_checksum=off, svo_allow_rman=off, svo_reject_errors=off,

no_i2p=off, fractional_reserve=0, extent=off, try_first=volume_grow,

read_realloc=off, snapshot_clone_dependency=off

# set snap reserve to 0

DR2> snap reserve testvol 0

# Create a lun of size 900m
DR2> lun create -s 900m -t linux /vol/testvol/lun.0
# Map the lun to rhel5_iscsi as target 0
DR2> igroup create -i -t linux rhel5_iscsi
DR2> igroup add rhel5_iscsi iqn.2005-03.com.redhat:01.c197f5d80dd
DR2> lun map /vol/testvol/lun.0 rhel5_iscsi 0
# Mount the lun on a RHEL5 server
RHEL5> iscsiadm -m session -R
RHEl5> mke2fs /dev/sdb
RHEL5> mount /dev/sdb /mnt
# Aggregate space
DR2> df -A aggr1
Aggregate               kbytes       used      avail capacity 
aggr1                238105548    1058200  237047348       0% 
aggr1/.snapshot       12531868          0   12531868       0% 
# Fill the lun with 800m
RHEL5> dd if=/dev/zero of=/mnt/test.1 bs=8k count=100000
RHEL5> df /mnt
Filesystem           1K-blocks      Used Available Use% Mounted on
/dev/sdb                907096    801932     59084  94% /mnt
# check space on the testvol on NetApp
DR2> df testvol
Filesystem              kbytes       used      avail capacity  Mounted on
/vol/testvol/          1048576     923968     124608      88%  /vol/testvol/
/vol/testvol/.snapshot          0          0          0     ---%  /vol/testvol/.
snapshot
# Check aggr space
DR2> df -A aggr1
Aggregate               kbytes       used      avail capacity 
aggr1                238105548    1058224  237047324       0% 
aggr1/.snapshot       12531868          0   12531868       0% 
# Now clone the volume
DR2> vol clone create testvolc -b testvol
Thu Feb 12 21:35:14 EST [DR2: wafl.volume.clone.created:info]: Volume clone testvolc of volume testvol was created successfully.
Creation of clone volume 'testvolc' has completed.
Thu Feb 12 21:35:14 EST [DR2: lun.newLocation.offline:warning]: LUN /vol/testvolc/lun.0 has been taken offline to prevent map conflicts after a copy or move operation.
# now change the autogrow on the clone
DR2> vol autosize testvolc -i 50m -m 1200m on
vol autosize: Flexible volume 'testvolc' autosizesettings UPDATED.
Thu Feb 12 21:36:40 EST [DR2: wafl.spacemgmnt.policyChg:info]: The space management policy for volume testvolc has changed: autosize volume growth increment 51200KB, autosize volume maximum size 1228800KB, autosize state enabled.
Thu Feb 12 21:36:46 EST [DR2: wafl.vol.autoSize.done:info]: Automatic increase size of volume 'testvolc' by 51200 kbytes done.
# check auto grow setting on testvolc
DR2> vol autosize testvolc
Volume autosize is currently ON for volume 'testvolc'.
The volume is set to grow to a maximum of 1200 MB, in increments of 50 MB.
# Map the cloned lun to the same server
DR2> lun map /vol/testvolc/lun.0 rhel5_iscsi 1
Thu Feb 12 21:39:14 EST [DR2: lun.map:info]: LUN /vol/testvolc/lun.0 was mapped to initiator group rhel5_iscsi=1
# Online the lun
DR2> lun online /vol/testvolc/lun.0
# Aggr space
DR2> df -A aggr1                             
Aggregate               kbytes       used      avail capacity 
aggr1                238105548    1333776  236771772       1% 
aggr1/.snapshot       12531868          0   12531868       0%
# Vol Name
DR2> df
Aggregate               kbytes       used      avail capacity 
/vol/testvol/          1048576     923824     124752      88%  /vol/testvol/
/vol/testvol/.snapshot          0         40          0     ---%  /vol/testvol/.
snapshot
/vol/testvolc/         1099776     923868     175908      84%  /vol/testvolc/
/vol/testvolc/.snapshot          0         48          0     ---%  /vol/testvolc
/.snapshot
# Now map this lun to RHEL5 and write about 200m changes to it using SIO
RHEL5> iscsiadm -m session -R
RHEL5> mount /dev/sdc /mnt2
RHEL5> df /mnt /mnt2
Filesystem           1K-blocks      Used Available Use% Mounted on
/dev/sdb                907096    801932     59084  94% /mnt
/dev/sdc                907096    801932     59084  94% /mnt2
RHEL2> ls -l /mnt2
total 800804
drwx------ 2 root root     16384 Feb 12 15:55 lost+found
-rw-r--r-- 1 root root 819200000 Feb 12 15:58 test.1
# simulate a 100 write at 50% random
RHEL2> sio_linux 0 50 8k 0 600m 90 1 /mnt2/test.1
Version: 6.15
Read: 0 Rand: 50 BlkSz: 8192 BegnBlk: 0 EndBlk: 76800 Secs: 90 Threads: 1 Devs: 1  /mnt2/test.1
error accessing file: Input/output error
Filename: /mnt2/test.1 offset 322650112 -1/8192
# Messages on the filer when the above is running!
Fri Feb 13 13:43:32 GMT [Lincoln: wafl.vol.autoSize.done:info]: Automatic increase size of volume 'testvolc' by 51200 kbytes done.
Fri Feb 13 13:43:44 GMT [Lincoln: wafl.vol.full:notice]: file system on volume testvolc is full
Fri Feb 13 13:43:44 GMT [Lincoln: wafl.write.fail.spcres:warning]: Write failed to file with space reservations due to lack of disk space in volume testvolc (guarantee volume, inode 100, offset 117
776384, len 8192).
Fri Feb 13 13:43:45 GMT [Lincoln: scsitarget.lun.noSpace:error]: LUN '/vol/testvolc/lun.0' has run out of space.
Fri Feb 13 13:43:45 GMT [Lincoln: wafl.vol.autoSize.done:info]: Automatic increase size of volume 'testvolc' by 51200 kbytes done.
Fri Feb 13 13:44:01 GMT [Lincoln: wafl.vol.autoSize.done:info]: Automatic increase size of volume 'testvolc' by 26624 kbytes done.
Fri Feb 13 13:44:15 GMT [Lincoln: wafl.vol.autoSize.fail:info]: Unable to grow volume 'testvolc' to recover space: Volume cannot be grown beyond maximum growth limit
Fri Feb 13 13:44:50 GMT [Lincoln: wafl.vol.full:notice]: file system on volume testvolc is full
Fri Feb 13 13:44:51 GMT [Lincoln: scsitarget.write.failureNoSpace:error]: Write to LUN /vol/testvolc/lun.0 failed due to lack of space.
Fri Feb 13 13:44:51 GMT [Lincoln: lun.offline:warning]: LUN /vol/testvolc/lun.0 has been taken offline
DR2> lun show /vol/testvolc/lun.0
        /vol/testvolc/lun.0          900m (943718400)     (r/w, offline, mapped)
DR2> df -A aggr1
Aggregate               kbytes       used      avail capacity 
aggr1                238105548    1463448  236642100       1% 
aggr1/.snapshot       12531868          0   12531868       0% 
DR2> df testvolc
Filesystem              kbytes       used      avail capacity  Mounted on
/vol/testvolc/         1228800    1204068      24732      98%  /vol/testvolc/
/vol/testvolc/.snapshot          0     280136          0     ---%  /vol/testvolc/.snapshot
DR2> df testvol
Filesystem              kbytes       used      avail capacity  Mounted on
/vol/testvol/          1048576     923856     124720      88%  /vol/testvol/
/vol/testvol/.snapshot          0         72          0     ---%  /vol/testvol/.snapshot
###### END #################
So there you have it.


Re: Using Flexclone - how can I limit the storage given to a host?

Hi Doug and Jesse

Is it still best practice to set fractional space reserve to 100%, or is it a better practice to set fractional space reserve to 0% and use vol auto grow and automatic snapshot delete to ensure writes to a lun? 100% fractional reserve is tough to explain to clients, since they have trouble understanding why they have to set aside so much "extra" capacity when dealing with luns.

I understand that you can use a lower % but you better be sure what the change rate is or you'll run into problems. What are your thoughts on setting fractional space reserve to 0% and useing the tools described above? Will automatically growing the volume and/or automatically deleting snapshots be enough to ensure space for overwrites?

For example, if I created a lun of 100GB, wrote 100GB of data to the lun and took a snapshot I'd be fine for that initial snapshot (no extra space in the volume used). If every single block within that lun changed I'd need to have available space to write those changed blocks, since the original blocks are locked by a snapshot. Obviuosly, that's where the fractional space reserve comes into play - guards against the worst case scenerio of every block changing in the lun.

I realize this is not likely the case with most apps and some lower % value could be chosen.

A colleaugue of mine has told me not to worry about setting fractional reserve to 100% or even any number other then 0% because of the ability to auto grow the volume and/or delete snapshots. Is this a safe way to architect volume sizing?

Cheers

Re: Using Flexclone - how can I limit the storage given to a host?

Ian,

See Block Management with Data ONTAP 7G: FlexVol, FlexClone, and Space Guarantees for the latest on block management. To summarize, the fractional reserve set to 100% for the entire lifecycle of a lun is no longer necessary. There are some risks associated with setting this to less than 100. So as long as you know the risks, you can use autogrow/autodelete to effectively hedge the risk. In addition, Parts of the lifecycle of a LUN may need 100% fractional reserve (like when a new application is deployed with no data on the change rate) but as the application matures, the reserve can be tuned to the absolute minimum if necessary.

Thanks

Jesse

Re: Using Flexclone - how can I limit the storage given to a host?

Hi Jesse. Thanks for the TR. I understand about fractional reserve not having to be set to 100%. Just to clarify, my question would be is it now a better practice to set the fractinal reserve to 0% and use the auto grow and/or snap auto delete options to guarantee space for overwrites? I've been told that this is infact now a better practice, but I haven't gotten an official answer yet.

It can be difficult to win deals against other storage vendors if 100% fractional reserve has to be included in the storage sizing. That can really amoun to a lot of extra storage. It's difficult to assess a true fractional reserve figure, especially if their just isn't data to draw from to make that calculation. So, 100% is safe, but customers don't want to hear that they have to purchase X amount of storage now, and then over time can reduce fractional reserve when they have a better understanding of what the "true" figure should be. The sale is usually lost at that point. I know about selling the value add of a Netapp solution, but let's assume the deal is coming down to how much capacity the client needs to purchase.

So, once again, is it now best practice to set fractional reserve to 0% and use the mechanisms I've described (auto grow, auto delete) to guarantee overwrites?

Thanks

Re: Using Flexclone - how can I limit the storage given to a host?

Hi Jesse

I'm looking at that output and have a few questions.

# Aggregate space
DR2> df -A aggr1
Aggregate               kbytes       used      avail capacity
aggr1                238105548    1058200  237047348       0%
aggr1/.snapshot       12531868          0   12531868       0%
That looks like you have an aggr1 of 227GB, 1GB used (for testvol). You then created a lun (lun.0) of 900MB, mapped it to the linux host and mounted it.You then wrote 800MB of data to lun.0. At this point the filer shows testvol with 1GB reserved, 902MB used. Aggrgate space remains unchanged.
# check space on the testvol on NetApp
DR2> df testvol
Filesystem              kbytes       used      avail capacity  Mounted on
/vol/testvol/          1048576     923968     124608      88%  /vol/testvol/
/vol/testvol/.snapshot          0          0          0     ---%  /vol/testvol/.
snapshot
# Check aggr space
DR2> df -A aggr1
Aggregate               kbytes       used      avail capacity
aggr1                238105548    1058224  237047324       0%
aggr1/.snapshot       12531868          0   12531868       0%
You now clone the volume and turn on autogrow options (50MB increments to no more then 1.2GB)
# Now clone the volume
DR2> vol clone create testvolc -b testvol
Thu Feb 12 21:35:14 EST [DR2: wafl.volume.clone.created:info]: Volume clone testvolc of volume testvol was created successfully.
Creation of clone volume 'testvolc' has completed.
Thu Feb 12 21:35:14 EST [DR2: lun.newLocation.offline:warning]: LUN /vol/testvolc/lun.0 has been taken offline to prevent map conflicts after a copy or move operation.
# now change the autogrow on the clone
DR2> vol autosize testvolc -i 50m -m 1200m on
vol autosize: Flexible volume 'testvolc' autosizesettings UPDATED.
Thu Feb 12 21:36:40 EST [DR2: wafl.spacemgmnt.policyChg:info]: The space management policy for volume testvolc has changed: autosize volume growth increment 51200KB, autosize volume maximum size 1228800KB, autosize state enabled.
Thu Feb 12 21:36:46 EST [DR2: wafl.vol.autoSize.done:info]: Automatic increase size of volume 'testvolc' by 51200 kbytes done.
Does the automatic increase happen right away just because you enabled it? In any case it looks like 50MB increase took place (leaving 1150MB).
You now map the cloned lun to the linux host and online it.
# Aggr space
DR2> df -A aggr1                           
Aggregate               kbytes       used      avail capacity
aggr1                238105548    1333776  236771772       1%
aggr1/.snapshot       12531868          0   12531868       0%
# Vol Name
DR2> df
Aggregate               kbytes       used      avail capacity
/vol/testvol/          1048576     923824     124752      88%  /vol/testvol/
/vol/testvol/.snapshot          0         40          0     ---%  /vol/testvol/.
snapshot
/vol/testvolc/         1099776     923868     175908      84%  /vol/testvolc/
/vol/testvolc/.snapshot          0         48          0     ---%  /vol/testvolc
/.snapshot
It looks like aggr1 used space has increased by ~270MB. Why did it increase so much? All you've done is created a clone. That clone grew by 50MB because of the auto grow option. An initial flexclone isn't supposed to use up any additional space in the aggregate, until changes are made to the clone. So why the 270MB increase in the aggr1? I can see the 50MBsize difference between testvol and testvolc
You now mount the lun to the linux host and write 200MB changes. I'm assuming that these changes would be reflected in the snapshot that the parent and flexclone share? Would the overwrites take up free space in the volume? Since there was only 100MB left in the volume (1GB-900MB) would that have meant that the volume had to autogrow to accomodate the overwrites?
You then simulated additional writes to the flexclone volume (lun). So, the lun ran out of space since there was only 100MB (900-800) left to spare. It seems that the volume autogrew about an additional 125MB before running into the 1.2GB "extra growth" limit. I'm trying to add up all the times that the volume autogrew 50MB at a time and I don't get 1.2GB.
DR2> df -A aggr1
Aggregate               kbytes       used      avail capacity
aggr1                238105548    1463448  236642100       1%
aggr1/.snapshot       12531868          0   12531868       0%
DR2> df testvolc
Filesystem              kbytes       used      avail capacity  Mounted on
/vol/testvolc/         1228800    1204068      24732      98%  /vol/testvolc/
/vol/testvolc/.snapshot          0     280136          0     ---%  /vol/testvolc/.snapshot
DR2> df testvol
Filesystem              kbytes       used      avail capacity  Mounted on
/vol/testvol/          1048576     923856     124720      88%  /vol/testvol/
/vol/testvol/.snapshot          0         72          0     ---%  /vol/testvol/.snapshot
In the end used space in the aggr1 increased 396MB. testvolc increased 126MB. Having some trouble wrapping my head around some of these numbers.

Re: Using Flexclone - how can I limit the storage given to a host?

Ian,

Frankly I did not track the aggregate space since it was being actively used by other applications and users at the same time. So the aggregate space discrepancy is accounted for. I will try this same example on a dedicated aggregate in a few days and post it's results back.

As for the auto grow option. Yes, an increment in the storage occurs as soon as it is enabled for the first time. The volume grow 4 times in the process including the first time.

The space of the clone comes from the aggregate. When a clone is created, it uses the backing snapshot for reads. New writes to the clone take up space in the aggregate. The clone has about 150 MB (which is the first auto growth) to write to and an additional 150 MB (auto grow capacity available). So if writes to the flexclone exceeds the 300 MB (the writes are not to new space in the filesystem but an update to a block in an existing file) then the volume would fill up and the lun will go offline. Precisely what happened.

The numbers do add up to be correct.

Hope I cleared things up.

Thanks

Jesse