Re: fractional reseve and pam recommendation for hyperv environment

babar · ‎2010-10-31

Hi all,

I am designing storage system for hyperv environment.

The environment is such:

- Multiple Microsoft independent clusters consisting of 4 physcial nodes
- the customer wants to assign 5 (4TB) luns to each cluster and create around 50 virtual machines (vhds) per each LUN

We are planning to use SATA disks with two 512GB PAM cards. Is there any best practice about using PAM with hyperv.

How do I decide about the fractional reserve and snap reserve for each LUN in this case.

Thanx in advance.

Regards,

Babar

ekashpureff · ‎2010-11-03

Babar -

Two very interesting questions - and they are inter related. It's a complicated set of subjects, and the answers pertain to all virtualized environments.

Fractional reserve and snap reserve. In most cases for a SAN environment snap reserve is set to zero, given that fractional reserve is a seperate mechanism for doing the same thing.

The default fractional reserve of 100% is over kill. A 100% rate of change is rare in most day to day environments. This is most often true of data stores holding an OS, which does not change very much once it has been set up.

When fractional reserve or snap reserve are to be adjusted I measure the rate of change with 'snap list' and 'snap delta' and adjust accordingly.

There are many strategies for implementing thin provisioning of LUNs in a SAN environment. I'm fond of turning off all space reservations - set the space guarantee of the volume containing the LUNs to 'none'. I turn on auto grow and snap auto delete with try-first set for auto grow. (I prefer not to delete snapshots - they're taken for a reason!)

But there's another strategy you may wish to consider for optimum space utilization and cache (PAM) performance. Don't make 50 copies of these LUNs (VMs). Make 50 clones. The space savings for cloning virtual machines is huge. ( 50 X 20G = 1TB , 50 clones of 20G = ~40G ? ). But there's another advantage in cache performance. When you're using clones Data ONTAP only needs to cache the shared blocks once, instead of cache holding 50 copies of the same blocks. Be sure to keep your data sets seperate from your OS images to segregate your rate of change in snapshots.

I hope this response has been helpful to you.

At your service,

Eugene Kashpureff

(P.S. I appreciate points for helpful or correct answers.)

fjohn · ‎2010-11-07

Hi Baber & Eugene,

"There are many strategies for implementing thin provisioning of LUNs in a SAN environment. I'm fond of turning off all space reservations - set the space guarantee of the volume containing the LUNs to 'none'. I turn on auto grow and snap auto delete with try-first set for auto grow. (I prefer not to delete snapshots - they're taken for a reason!)

But there's another strategy you may wish to consider for optimum space utilization and cache (PAM) performance. Don't make 50 copies of these LUNs (VMs). Make 50 clones. The space savings for cloning virtual machines is huge. ( 50 X 20G = 1TB , 50 clones of 20G = ~40G ? ). But there's another advantage in cache performance. When you're using clones Data ONTAP only needs to cache the shared blocks once, instead of cache holding 50 copies of the same blocks. Be sure to keep your data sets seperate from your OS images to segregate your rate of change in snapshots"

I like to do both of these, with a twist.

First, even if I am not planning on overcommitting the storage, I like to remove the reservations at set the guarantee "none". I don't use autogrow (I rely on monitoring to decide if/when I need to grow) and I do use snapshot autodelete. I do put all my luns in qtrees, then remove the lun space reservation and set a threshold quota on the qtree. The threshold quota is a soft quota that also sends an SNMP trap to the monitoring tool of my choice. I do try to think out the layout: I put the OS on a VHD, the majority of the page file on a seperate VHD, and application data on one or more additional VHDs. I group related OS VHDs on a lun, then I can clone that LUN in the volume to create a like layout. I turn dedupe on for that "OS" volume. I put the page file VHDs grouped on LUNs that reside in a seperate volume that is not deduped or overcommitted, but still has the same qtree/quota monitoring. I find that an easy way to figure out what optimal page file sizes are. I create seperate volumes for app data with groupng and dedupe options depending on the nature of the data. Here's an example I started on the other day:

The volume Hyperv1 is 500GB and has no reserve or guarantee. I have enabled dedupe on the volume. Inside I have two qtrees, each containing a LUN. I have a 100GB thin LUN that contains a single VHD with a WIndows 7 host at the moment. I also have a 200GB LUN that contains 3 VHDs; a windows 2008 R2 domain controller, a Windows 2008 R2/Exchange 2010 Hub/Cas and a Windows 2008 R2/Exchange 2010 mailbox server. For the three windows servers, I started with a sysprep image then just cloned the lun and did unattended/automated installs and then patched up all the hotfixes and service packs. I can look in with the Data ONTAP Powershell Toolkit v1.2 and have great visibility to what's actually happening to my space:

Here I see my 500GB volume with 463.4GB of space available.

PS C:\> get-navol | ? {$_.name -like "*hyperv1*"}

Name                      State       TotalSize Used Available Dedupe FilesUsed FilesTotal Aggregate
----                      -----       --------- ---- --------- ------ --------- ---------- ---------
Hyperv1                   online       500.0 GB    7%   463.4 GB True         125        16M aggr1

And I see the dedupe ratio (I haven't boken out the page files or app data into seperate LUNs yet. When I do, the dedupe ratio on the base OS will go up substantially)

PS C:\> get-navolsis Hyperv1

LastOperationBegin : Sun Nov 7 01:29:24 GMT 2010
LastOperationEnd   : Sun Nov 7 01:44:59 GMT 2010
LastOperationError :
LastOperationSize : 53206102016
PercentageSaved    : 26
Progress           : idle for 00:02:58
Schedule           : -
SizeSaved          : 14107312128
SizeShared         : 7712165888
State              : enabled
Status             : idle
Type               : regular

Here are my LUNs:

PS C:\> get-nalun

Path                                      TotalSize Protocol     Online Mapped Thin Comment
----                                      --------- --------     ------ ------ ---- -------
/vol/Hyperv1/VM1/Lun1.lun                  200.0 GB hyper_v       True   True   True
/vol/Hyperv1/VM2/Lun2.lun                  100.0 GB hyper_v       True   True   True

Here's the interesting part; because of the quotas on the Qtrees, I also have visibility into the LUN:

PS C:\> get-naquotareport

Volume                    Qtree                     Type Disk Limit Disk Used File Limit Files Used
------                    -----                     ---- ---------- --------- ---------- ----------
Hyperv1                   VM1                       tree                45.5 GB                     8
Hyperv1                   VM2                       tree                 4.2 GB                     8

A volume contains 0.5GB of metadata. In addition to that, my 300GB of LUNs are only using a combined 49.7GB for a total of 50.2GB used in the volume. Sine I have dedupe enabled, and am getting 26%, From the volume I am only consuming 36.6GB. I'm working on a function to do a daily email report in a better format, and have been experimenting with those SNMP traps. I send the traps to SCOM/Appliance Watch Pro 2.1, where I can take actions on them when my threshold is reached. I set the threshold at 75% of the declared size so that I have time to evaluate the situation and take action before the volume fills up. Actually, if I keep getting 26% dedupe, I'll probably push the quota up to around 95% (so I still get the alert before the LUN "fills" due to the LUN size). When I get the alert in SCOM/Appliance Watch, I can log it, send an email, fire off another script (grow the LUN, run the space reclaimer, whatever) depending on the evaluation logic I write.

In a VDI situation, where you have a gold image that you volume flexclone, you don't keep data on the image; you keep it external in CIFS/roving profiles/etc. It's great for deploying lots of exact copies. When it's time to do patch Wednesday, you create a new gold image, flexclone, then rebase your VMs. Because volume flexclones depend on that base snapshot until you do a split (which you wouldn't do if your intent is to dedupe hence the new gold/rebase when you patch) it doesn't work so well for servers that you want to keep around but continue to patch. By using lun clones, there is no base snapshot floating around and I can patch away till doomsday and still run scheduled dedupe and keep a decent dedupe ratio up there. Many application servers aren't so hot at being cloned, and that's why I lun clone a sysprep image and then finish up with unattended/automated install of the app. You'll need to look closely at your situation and decide which type of clone will work best for you.

If you haven't seen the Data Ontap PowerShell Toolkit yet, it's over here http://communities.netapp.com/community/interfaces_and_tools/data_ontap_powershell_toolkit. You may also want to stop by blogs.netapp.com/msenviro and see some of the stuff Alex is doing with Opalis integration. Last but not least, if you went to NetApp Insight last week in Las Vegas or are going to the upcoming Insight events in Macau or Prauge, check out session MS-41782 or download a copy of the slide deck.

John Fullbright (JohnFul on Twitter)

fjohn · ‎2010-11-07

When you enable Flexscale for the Flash Cache, you cache metadata and normal data blocks by default. That's the mix you want in a Hyperv environment. Caching Lopri blocks is disabled by default. You wouldn't enable it unless you had a need to cache long read chains (sequential reads). If you're running multiple VMs, you won't see long read chains; the read mix will become more random.

JohnFul