2017-10-23 02:28 AM
We have a FAS 2552 which I have created two aggregates on of 4.77TB. On one of these aggregates I have a Volume which is 700GB which is thick provision and no snapshots or fractional reserve 0%. I present this to a LUN and then a ESXi platform. I create one thick provision VM disk of 450GB.
At this point the LUN shows 450GB in use. Then I lay 450GB of data down and I do this a couple of times. Then the LUN goes offline and the Volume is at 100%! Deleting the data in the drive doesn't seem to fix this. So the LUN is up and down.
Why is this ? I'm confused as the drive is thick provisioned as is the Volume. So I should only ever be using max 450GB! No snapshots.
I would say this is Monday morning blues but I was looking at this problem on Friday and I have done all the documentation and training videos..... I could have forgot the reason for this issue as I did all the calculations in the spring But then if I was right in my understanding this wouldn't be happening.....
2017-10-23 03:30 AM
By making everything thick provisioned you are only reserving the blocks in the aggregate, if the volume runs out of space then the LUN will go offline. In this case it looks like the deleted data in ESXi have not been released on the storage. There is a disconnect between the blocks in the file system and on the storage. Without e.g. SnapDrive to tell the storage which blocks have been deleted in the file system, via the Space Reclaimer feature, then the storage is not aware and so cannot delete the blocks. This is not a problem in NFS datastores since the storage owns the file system and so knows which blocks have been deleted.
You can however also enable the LUN option -space-allocation. This allows the OS to tell the storage which bocks have been deleted and so can release/free the blocks. However, to enable this feature it is disruptive since the LUN must be taken offline first. Also, this only works in thin provisioned environment and must be running VMware ESX 5.0 and later.
Please see SAN Administration Guide: https://library.netapp.com/ecm/ecm_download_file/ECMLP2492716 page 40.
Hope this help.
2017-10-23 03:50 AM
Many thanks - Thats where I was going in my thoughts. My only problem is the thin vs thick. If you can't free up space in a thick provisioned volume/LUN. THen what is the point of them in the first place ? This LUN is for MSExchange datastores. So I wanted to have the space allocated and 'reserved' knowing that I was only going to be using a 450GB VM disk. So for this way to work I would have to have SnapDrive on the Exchange VM ?? WOuld it therefore be better for all VMware LUNS to always be thin with space reclamation ...
P.S. Intreastingly I dont remember any of this coming up in the videos ... unless I fell asleep.
2017-10-23 04:06 AM
I've just checked my thin provisioned volumes too as they don't have space allocation enabled either.... Surely if the only way to get space back after deletion is to use this it should be enabled by the wizard when a VMware volume is created ???
2017-10-23 04:14 AM
So the main reason for thin provisioning is to maximise the storage, i.e. over provisioning. However, if you wish to guarantee that your volumes/LUNs will not run out of space i.e. thick provision, then just do not over provision the aggregate, then there will always be sufficent available capacity.
A thick provisoined LUN will also not benefit from any storage efficiencies that are running. For a thin LUN and thin volume environment, these savings will be seen in the volume/aggregate - not a the LUN/host OS level. In a thin provisioned environment you will need to actively monitor the volume (and aggregate) for free space, but can ignore the LUN. Volume autosize setting can also help here.
Assuming these are in-guest iSCSI presented LUNs, then you can use SnapDrive to preiodically run Space Reclaimer (out of hours) to release those deleted blocks on the file system, or enable the space-allocation option (supported in Microsoft Windows 2012 and above). Since you cannot schedule when space-allocation releases the blocks on the storage it could have a performance impact if you've a busy system or is performance critical.
2017-10-23 04:47 AM - edited 2017-10-23 04:47 AM
Many thanks for your help. So my thick provision is 450GB and I don't want anymore. It is on a aggregate which currently has 4.7TB and no other volumes on it. However the volume still went used all its allocated 700GB space.
So the manual talks about thin provision and the space allocation which I understand better now, however I'm still a bit misty over how the thick works. Does this work the same way as thin for deletion of blocks .... and does enabling space allocation for VMware not work ?
So I have a Windows server in a VM which has VM disks on the LUN in a fibre SAN environment. Would I need to install the SnapDrive on the Windows VM ?? or will Windows 2016 in a VM be able to do it all if the space allocation was enabled on a think provision ?
2017-10-23 05:56 AM
In your case it was the volume that ran out of space, because the deleted blocks on the host were not being released on the storage, that caused the LUN offline issue. We can ignore the LUN being 100% full, it was that the volume reaching 100% full that was the problem. Since the blocks were not deleted at the storage layer, only on the host OS, then as far as the volume is concerned it was full and could not accept anymore writes, so took the LUN offline to ensure no data loss. In every case you must monitor the usage of the volume.
A thick provisioned volume will only guarantee that the blocks have been reserved in the aggregate. A thick provisioned LUN only guarantees that it will have the blocks reserved in the volume rather than consumed by snapshots (not relevant in your case), but assuming the volume does not run out of space. If the volume is full, both a thick and thin provisioned LUN will go offline at the next write (or remain online but read only if space-alloc has been enabled).
If you do not over provision your aggregate, which in this case since its the only volume not a problem, then there is no disadvantage to using thin volume and thin LUN. You will benefit from automatic space reclaimation of deleted block by enabling the space-alloc LUN option, meaning the volume should not fill again due to a large deletion of the file system data. Also, all storage efficiency space savings will be realised in the volume/aggregate, meaning more available space in volume before it fills.
You will need to actively monitor the free space in the volume, however enabling volume autosize will assist here to automatically resize the volume if it is close to filling. Just remember a full volume is not good for a thick or thin LUN. Space reclaim in SnapDrive is a scheduled process on the server, while space-alloc (if thin provisioned) happens automatically on the storage. As mentioned, space-alloc also allows the LUN to remain online in the volume full scenario in a read only mode, rather than going offline.
2017-10-23 06:23 AM
OK - So if I just make space allocation enabled on all volumes both think and thin then I should start seeing data reduction when the next storage effciency happens ..??..
Windows 2016 even thou its on a VM will still do the SCSI commands even thou it is going though the ESXi host on a think provision volume ? So no need for SnapDrive ??
My other thought is why isn't space-allocation enabled by default ??? Why would anyone not want to get space back after deletion??
Sorry for all the annoying question, just like to get my thoughts straight before I make changes to systems
2017-10-23 06:52 AM
OK, in order :-)
Storage efficiences will only be seen if your LUN is thin provisioned, but has nothing to do with space-allocation, sorry if I didn't make that clear. Space efficiency will save space in the volume when runs, either in-line or post process, while space-alloc will reclaim space in the volume as the host OS deletes data in the filesystem.
SnapDrive is only supported on a physical RDM, not virtual. SnapDrive is recommended since it provisions and manages the LUNs following best practices. Or have I not quite understood and you've provisioned the Windows drives from a single FC datastore. In which case, correct SnapDrive is not required. So not using SnapManager or snapshots for backup/recovery?
Have you seen Microsoft Exchange Server 2016/2013 and SnapManager for Exchange Best Practices Guide for Clustered Data ONTAP for details on recommended layout.
Space-alloc is not enabled by default (I'm assuming) because of requirements to be supported by the host, as well as the possible performance impact on the storage.
Hope this helps.
2017-10-23 07:57 AM
Thanks for the link - I found one but it was way older than that.
Our infrastructure is simply a NetApp which presents LUNs to a VMware enviroment via SAN FC. On the host there is a Windows 2016 guest which uses a virtual disk (VMware) that sits on one of the presented LUNs. So I'm guessing in a VMware enviroment you can only use the SCSI command way of doing things not SnapDrive unless you present the LUN as RDM which adds its own extra complexity I expect.
So if I enable the space-allocation and delete some data on the Exchange disk i should see a reduction striaght away. I'm assuming that this is not as effcient as an RDM solution potentially?