We have a VMware datastore configured on our FAS2240. I'm getting an alert on our monitoring system due to usage of a particular volume. The volume in question shows 86% utilised and the LUN at 97% within System Manager, this is also reflected within VSC. However, the datastore itself, i.e. what VMware is seeing, is only 31% used.
My question is, why would there be such a difference between these figures? Is there any way to tell what is utilising this space? There are no snapshots in play.
We have other volumes configured in the same way with similar VDI load and there is nowhere near as much difference between these figures.
OK, let's go back to basis. You said this in your original post:
The volume in question shows 86% utilised and the LUN at 97% within System Manager, this is also reflected within VSC. However, the datastore itself, i.e. what VMware is seeing, is only 31% used.
First of all, volume is utilised, because there is LUN in it, which is space-reserved - it doesn't matter is not filled with data. Secondly: where are you getting the info the LUN itself is 97% full - System Manager GUI?
VMware is seeing the LUN, not the volume, so is reporting space utilisation (presumably correctly) within the LUN.
That makes sense that the volume is utilised because a LUN is stored within it. I'm seeing the 97% LUN utilisation within System Manager and also the Virtual Storage Console plugin for VMware vSphere, which is where the below screenshot is from.
As can be seen, the Datastore usage (what VMware see's) is only 52%, yet the LUN usage is close 98%, that is what is baffling me.
Not that I'm aware of, I didn't know there was a way within VMware, the volume is definitely Thin Provisioned within the NetApp side of things. I think I'll just shuffle some of the Virtual Machines around and keep an eye on it. It's very strange though.
Did you find any resolution? We are seeing she same symtoms on one of our datastores on a 3-month old VMWare deployment. At first I thought it was because snapshots were retaining data from some large servers that used to be on this datastore, but on closer look those snapshots are already gone and the numbers do not add up. I do not see why Datastore usage is 33 % while LUN usage is 93%.
A LUN will reach 100% utilization over time unless you run space reclamation. The vSphere ESX will write blocks and delete them from its point of view but the netapp machine doesnt know which blocks are actualy still in use or are free´d. So you will have 100% utilization over time.
It is the same with Windows/Linux and all other OS attached LUNs.
Correct. This behavior is common when VMware deletes / moves VMs out of the datastore. Since VMware owns the filesystem - VMFS, it has no mechanism to tell the storage controller that the space has been freed up by deleting or storage vmotion.
Recently changes to VAAI have made it possible for VMware to notify the storage that the space is reclaimed. This could be termed "hole punching" or space reclaimation. Hole Punching is something that is easily done with NFS datastores using the Virtual Storage Console plugin, but today (changes soon?) with VMFS it must be done using VMware vmkfstools.
Basically, if the LUN is at 100% utilization, as long as the volume is not full we shouldn't be too concerned. If it is really bugging you, hit the ESXi/VMA CLI and use vmkfstools. Unfortunately VMware has not implemented an automatic method for doing this. Also keep in mind only use the vmkfstools workaround if you are running ESXi 5.0U1 and a version of Data ONTAP that supports VAAI (8.0.1+).
Bonus - you can adjust the thresholds for which you get storage alerts for particular volumes by using this command:
dfm volume list | grep -i <your volume name>
dfm volume set <volume ID> volumefullthreshold=95
dfm volume set <volume ID> volumenearlyfullthreshold=90
The global volume full thresholds are configured in the GUI or CLI too.