Solved: SMVI - Snap Reserve on LUNs?

aarondelp · ‎2009-08-25

Hello all - I was wondering what you would recommend for the snap reserve setting on a LUN that will be protected by SMVI. If you read all the TR's, they all say to set snap reserve to 0 for the LUN datastores. But, if you add SMVI into the mix, wouldn't you want a snap reserve?

Also, now that I think about it. What differences would there be in think vs. thin LUNs?

Thank you!

eric_barlier · ‎2009-08-30

Hi guys,

I have been "preaching" this config for a while, we have got it in prod. and we also implemented SMVI in the space of 12 days here in June. We went from 0 daily snaps to 7 daily snaps in 12 days. What we did is we set thin prov. on volume and LUN so that all free blocks get pushed into aggr. which facilitates space management. We then added 1 SMVI snap everyday to see what impact what on aggr. (we turned off aggr. snaps and aggr. snap reserve first). After 12 days and a bit of storage reshuffling we had 7 daily snaps. This enables us to backup +500 hosts in 45 minutes and restore times of 2 minutes for a host or a datastore. pretty massive improvement coming from a backup solution (VCB) that never worked with a backup window of 19 hours..

At the same time we also have vol autogrow and snap autodelete on. We have not had any issues with this since we implemented in June.

There is one caveat of course: if your volume hits its max growth size then snaps will start to disappear. Snaps at this point should be considered

as backups so when your backups disappear its not the best situation. I recommend mirroring or vaulting your SMVI snaps offsite if you can as

a next level of protection.

Cheers,

Eric

View solution in original post

amiller_1 · ‎2009-08-25

For what it's worth, the 2 day NCIE-SAN prep class by Steve Botkin is great for this (covers thin provisioning, volume auto-grow, snapshot auto-delete, frac reserve, etc. in great detail with discussion).

And...I'd say the answer really depends on the customer....items like....

if doing thin provisioning, they're probably already trying to maximize space so going with a good OM alerting setup and no snap reserve would probably make sense (have to do thin provisioning in some way anyway for LUNs if using dedup....part of why I like NFS)
if somewhat cautious, can do some inspection around change rates (i.e. how many snapshots do they want to keep, let that run for a week and see how much space used, then set snap reserve somewhere above that)

There's some other possibilities there but they're not springing to mind right now. Mainly just a conversation to have.... (higher space utilization but more admin work, lower space utilization but less admin watching).

Now....dunno if I shouldn't be answering this right now or not....

eric_barlier · ‎2009-08-25

Hi Aaron,

snap reserve vs no snap reserve: For me it boils down to easy of management. If you have LUN space reservation on, snap reserve, volume guarantee on

you need to monitor all of that + aggr. space used = thats 4 "envelopes" to monitor.

We have turned thin prov. on (got A LOT of space back) and pushed all the free blocks back to the aggr.

What we have achieved with this is that the only "envelope" we have to monitor are aggregates whilst at the same time we gained 20TB back. This allows

for SMVI to take place and if you still want to monitor snapshot usage by SMVI you can do so easily in OM, space report under volume tab.

What are your thoughts around that?

Cheers,

Eric

Message was edited by: eric barlier

aarondelp · ‎2009-08-26

Andrew and Eric - Thank you both for your replies. It was a bit of thinking out loud for me. I am in the process of writing a blog article and as I was doing so, this thought popped into my head.

The more I work with all of this, the more I am of the mind set to thin provision everything and only manage the aggr space (with the proper tools of course!). Thick provisioning, while the most conservative way, seems like over kill to me for most instances.

Thank you again for your replies!

Aaron

BTW Andrew, I'm jealous you went to Botkin's class! I wasn't able to make it because of a scheduling conflict. Maybe next time!

radek_kubka · ‎2009-08-26

Guys,

Just to remind you - there is one more angle to that: fractional reserve

Whilst snap reserve is an easy (in my opinion) topic, fractional reserve is massively hairy one!

This is my objection against thinly provisioned volumes with LUNs:

http://communities.netapp.com/message/14674#14674

Regards,
Radek

aarondelp · ‎2009-08-26

Hey Radek - Correct me if I'm wrong but Frac Res only kicks in on LUNs if space reserved is checked. What happens to frac res when it isn't checked? From what I've read, Frac Res. is out the window at that point. I've gone over this a million times and read 10 million things on it but I still don't have all of it in my in my head.

Let's take an example, if I want to set up a volume to hold VMWare VM's in LUNs and I want to use SMVI and dedupe. Tell me if this is the optimal way to set it all up. Yes, you are running a risk of going offline, that is why you will be smart and use all the right tools to monitor it!

You would create the volume with no guarantee and to make it easy, let's put snap reserve at 0%. We'll go ahead and set an auto-grow/auto-delete policy on it and manage the space at the aggr level.

When creating the LUN, you would uncheck space reserved. What happens to Frac Reserve at this point? Filer View will say it is 100% but that is wrong, the value is ignored. What if I modified it to something small, say 5%-10% to give me a buffer. Again, will it matter in this scenario? I don't THINK so, but I don't know for sure, I haven't tested it. Again, all what I've read. Let's assume it is ignored no matter what the value is.

Now that I've got a volume and a LUN and I'm set for max thin-provisioning, I'm ready to set up my de-dupe and snap shot schedules. Ideally, you want to run de-dupe before a snap shot because de-dupe will only affect the active file system, it will not de-dupe snap-shot blocks. Because of this, you want youe de-dupe and snap shot schedules to be as close together as possible. Say, once a day run de-dupe, then take a snapshot.

What if the customer wants to take 3-4 snaps a day with SMVI? Can you run dedupe more than once a day? Even if you could, I certainly wouldn't recommend it because of the overhead during the actually process.

So, maybe you set up SMVI with a 3x a day to take snap shots but you run dedupe every night before the last one. I believe you are as efficient as possible yet not overwhelming the controller running de-dupe a bunch of times.

What do you think?

One more thing after talking with Don Mann about this. Snapshot auto-delete will break SMVI because you will be deleting the snaps outside of the application. But, if you are out of space and you couldn't auto-grow, you already have big problems!

eric_barlier · ‎2009-08-26

Hi again,

Now that Radek has told you his point of you view as well you have plenty for your blog! ;-0

I have discussed with Radek in the past and I acknowledge there is no 1 way fits all customers.

the beauty about this is that you actually have a choice.

Cheers,
Eric

radek_kubka · ‎2009-08-27

Correct me if I'm wrong but Frac Res only kicks in on LUNs if space reserved is checked.  What happens to frac res when it isn't checked?  From what I've read, Frac Res. is out the window at that point.

Guys,

Above is 100% correct - which makes my concern around thin provisioning of volume with LUNs totally *irrelevant*. I tested that yesterday:

- vol set to guarantee=none

- LUN space reserve unchecked

- vol status -v shows FR being set to 100% & it is not possible to change it

- taking a snapshot makes no difference whatsoever to consumed space, so basically FR does not impact anything (although formally set to 100%).

So in a nutshell thin provisioning of both volume & LUNs doesn't bring any negatives in my opinion (other than the need for careful aggregate free space monitoring).

Kind regards,
Radek

radek_kubka · ‎2009-08-27

So in a nutshell thin provisioning of both volume & LUNs doesn't bring any negatives in my opinion (other than the need for careful aggregate free space monitoring).

Yeah, one more thing...

Some SnapManager products (not SMVI though) don't like disappearing snapshots, so setting snap auto-delete on a volume level is not an option. Small FR is required (purely for free space monitoring) to allow deleting snaps from within SnapManager, should the volume run out of space.

Nice.

So the question is: what happens if we have this 'theoretical' 100% FR on a thinly provisioned volume with thinly provisioned LUNs?

Any takers to answer this one?

Regards,
Radek

aarondelp · ‎2009-08-27

Radek - Very good point! I will be doing an SMVI install sometime next month so I'll be sure to test this scenario in my lab ASAP. We don't have SMVI in our lab yet but we will be getting in soon. I am also curious what will happen if you did set an autodelete policy anyway. Sure, SMVI will freak out and probably need some repairs but if your volume is full (and you aggr in this scenario) you have bigger problems. I would rather have SMVI break if my aggr is full than have everything go offline. Again, all in theory, I haven't tested all this.

In the meantime... Any smart people out there done this and have any idea what would happen?

Aaron

eric_barlier · ‎2009-08-30

Hi guys,

I have been "preaching" this config for a while, we have got it in prod. and we also implemented SMVI in the space of 12 days here in June. We went from 0 daily snaps to 7 daily snaps in 12 days. What we did is we set thin prov. on volume and LUN so that all free blocks get pushed into aggr. which facilitates space management. We then added 1 SMVI snap everyday to see what impact what on aggr. (we turned off aggr. snaps and aggr. snap reserve first). After 12 days and a bit of storage reshuffling we had 7 daily snaps. This enables us to backup +500 hosts in 45 minutes and restore times of 2 minutes for a host or a datastore. pretty massive improvement coming from a backup solution (VCB) that never worked with a backup window of 19 hours..

At the same time we also have vol autogrow and snap autodelete on. We have not had any issues with this since we implemented in June.

There is one caveat of course: if your volume hits its max growth size then snaps will start to disappear. Snaps at this point should be considered

as backups so when your backups disappear its not the best situation. I recommend mirroring or vaulting your SMVI snaps offsite if you can as

a next level of protection.

Cheers,

Eric

aarondelp · ‎2009-08-31

Eric - That is awesome to hear! Thank you very much!!!

eric_barlier · ‎2009-08-31

No worries.

A couple of gotchas from the world of SMVI:

1. One of the guys that admins SMVI wanted to delete a snapshot and he selected a snapshot and clicked delete. However SMVI works on ALL volumes in

the backup set so actually ALL snaps of the backup set (50+ volumes) disappeared. We think its best to have the storage team delete snaps.

2. Restores: if you use SMVI to restore SMVI will restore back to where the VM/datastore was. This means it will shutdown the VM. It is of course

possible to restore on a different location than the source. Its a bit more elaborate from the Virtual center point of view but not much.

Cheers,

Eric

radek_kubka · ‎2009-08-31

This enables us to backup +500 hosts in 45 minutes and restore times of 2 minutes for a host or a datastore. pretty massive improvement coming from a backup solution (VCB) that never worked with a backup window of 19 hours..

Good stuff - this is priceless to see something working not in the lab, but in the real-life & rather chunky environment.

Thanks for sharing this in detail!

Regards,

Radek

michaelgodfrey · ‎2009-10-23

Hi Eric,

When you say you;ve been preaching this config for a while, what config options are you referring to?

There are so many options and I have heard so many people recommend different things.

Volume, Auto-Grow - On or Off ?

Volume, SnapShot Auto Delete - On or Off ?

Volume Gaurantee - On or Off ?

Volume SnapReserve - 0% ?

LUN space reserved - On or Off ?

What about Fractional Reserveration (does it make a diff, or is it ignored?)

Should the VMDK's be Thin Provisioned?

Do the snap mirrored volumes on the target host need to have the same settings?

Just looking for what other peoples configs are and what kind of success they have had.

Thanks!

eric_barlier · ‎2009-10-25

Hi Michael,

The below is what we are using, it fits us and our needs:

Volume, Auto-Grow - On or Off ? > on

Volume, SnapShot Auto Delete - On or Off ? > on

Volume Gaurantee - On or Off ? > off

Volume SnapReserve - 0% ? > off

LUN space reserved - On or Off ? > off

What about Fractional Reserveration (does it make a diff, or is it ignored?) > must be set to 100 i you want to set vol guarantee to off, this is true for our ontap version, 7.2.5.1

PRO: what we have achieved with this config is everything is thin provisioned (vol guar. off and lun space reserve off), which allows for max disk utilisatin rates = good ROI.

Space monitoring only takes place at aggr. level (easier than monitor at volume level, snap reserve used etc) this allows for better reporting/trending.

snapshot autodelete: this has got PROs and CONs attached to it in my opinion. the pro is that once a volume cannot grow anymore (you do need to set a max size a vol is

allowed to grow to) deleting snapshots should provide free space for the writes in the volume. The CON is that if your snaps are backups and regarded as such they are exposed to

getting deleted.

CONs: ideally you d want to know your growth rate before engaging this, or you could phase it in by thin provisioning and then add 1 more snapshot at the time. The other

con is that you could end up in a situation where you have set a vol max autosize on that the aggr. cannot cater for, so your aggr will fill up. this is easy to manage though

once you have growth rates. The final con is that in theory even with snap autodelete on unless you allow your volume to grow enough you could up with running out of space

in a volume. In reality this has not happened to us once as we keep track of aggr. space usage and never allow them to go above 90%.

In short: the config. above is thin provisioning which by nature shifts workload into space management (operations manager is your friend) which in turn increases ROI on disk

space VASTLY. Just by turning on TP here we gained around 15TB back which at our costs are worth 300K AUD of cost avoidance.

This is what we are paid for I reckon ;-0

Hope this helps.

Eric

glen_eustace · ‎2010-10-20

I am confused about your restore time. We are using NFS for our datastores and currently have 250+ guests in 8 datastores. SMVI can backup this lot in 11 minutes but restoring a single 50GB VM takes about 90Minutes. We haven't tried restoring an entire datastore as this is a production environment and the owners of the VMs would be more than a little miffed if we rolled back an entire datastore 🙂

In your 2 minute case was this rolling back the entire datastore or a single vm from that DS ?

radek_kubka · ‎2010-10-21

SMVI can backup this lot in 11 minutes but restoring a single 50GB VM takes about 90Minutes.

Hi,

We are on the same page here - single VM restore is disk & CPU-intensive, as it copies data from a snapshot into the active file system.

Of course the actual restore time may vary depending on many factors (spindle count, filer model, what else is going on at the same time, etc.), but arguably it is not as quick as a volume snap restore.

Regards,
Radek