We have three volumes, each one containg a FC LUN which is the SAN boot drive for an ESXi 4.1 host. So, 3 ESXI hosts, 3 LUNs, 3 volumes. Each volume created at the same time as far as I know and a vol status -v shows the same settings for all three.
One of the volumes is constantly filling up and since the only thing in the volume is the LUN with ESXi on it, I thought it was probably to do that, maybe excessive logging or something. After a bit of troubleshooting which showed no particular issues, we switched off the ESXi host.
The volume is still filing up, however. Every time a snapshot is taken, the volume pretty much fills up but the snapshots are tiny. If I delete the snapshot, the space is reclaimed.
The other two volumes do not exhibit this behaviour but regardless of that, the volume is filling up even with the ESXi host switched off, i.e. no activity on the LUN at all.
Can anybody suggest a place to start looking for the cause of this?
When provisioning LUNs, the volume hosting the LUN should have 2 * LUN size + Snapshot_Space. By default the fractional reserve is 100% on volumes with a "volume" guarantee; number that could be adjusted if needed. What's the snap reserve on the volume holding the LUN? What's the output of the df -g cmd on that volume? What's the LUN size? What's the lun stat output on that volume after running lun stats -z?
You might want to verify that there are no CIFS, NFS shares on the volume hosting the LUN and that no other device outside the ESXi host is writing to your LUN.
I have some more info - apparently there used to be a lot more data in that LUN which has since been removed. It was ISO files stored in the local ESXi datatstore. So, I guess that at some point the space reserved by fractional reserve would have been higher. Is there some sort of "tidemark" set when using fractional reserve? So that even if the data in the LUN is reduced, the snapshots still reserve the same amount of space?
a) Fractional_reserve plays role exactly when you create snapshot – not “at some point in the past”. So every time you create snapshot it reserves exactly the amount of space that LUN currently occupies on NetApp.
b) When host deletes data in filesystem, space is not freed from the NetApp point of view. So if it was huge once, it remains huge now.
Unfortunately, the only way to reduce space consumption is to create another LUN and copy data over and destroy existing one. When doing this new LUN could temporary be set to no space reservation to reduce consumption.
Thanks for providing those extracts Peter, your issue has to do with the fact that there is not enough space in the volume to accomodate your LUN, the fractional reserve (when used) & the snapshots. I've witnessed instances where the system didn't report the actual snapshot sizes in Filerview and while running the snap delta cmd - with your storage at 97%, the LUN in the snapshot pulled storage from the fractional reserve to allow writing over part of the LUN that was already written. Increase the size of your volume to have enough storage for 2 * LUN size + snapshot_storage and that will take care of your current issue. If you run a df -r on that volume, you'll clearly see how much of the reserve was used.
Thanks for both of your answers, I understand what is going on now. I think the key is that the space in the LUN will not be returned to the volume since the filer knows nothing about the file system on top.
I'd like to give you both points if possible, how can I do that? Do I use the 'helpful answer' option for both?