Tech ONTAP Blogs

Improve VMware vSphere Storage Efficiency with NVMe Deallocate in ONTAP 9.16.1

ChanceBingen
NetApp
132 Views

In the world of enterprise block storage, ensuring maximum storage efficiency while delivering performance that meets the requirements of modern workloads is critical.

 

When it comes to reclaiming data blocks that are no longer needed (for example, the child file system has deleted some files), enterprise storage arrays and different protocols have implemented various methods for managing data erasure, notably, the SCSI UNMAP command supported by ONTAP with iSCSI and FCP for many years, the SATA TRIM command used typically on SATA attached local SSDs, and now the NVMe deallocate command which is a new feature in ONTAP 9.16.1RC1.

 

In this post, I’ll discuss a brief history of SCSI UNMAP and NVMe deallocate with ONTAP (we’ll skip SATA TRIM for today since it doesn’t apply to ONTAP block protocols), and then talk about how to use it in a VMware vSphere or Cloud Foundation environment.

 

Note: ONTAP has supported TRIM (FSCTL_FILE_LEVEL_TRIM) over SMB (Server Message Block AKA CIFS [Common Internet File System]) since 8.3. Typically, this is used by Microsoft Hyper-V, but today we’re only talking about block protocols.

 

Since I’m going to be talking about the ESXi hypervisor in this article, you can read a bit more about Storage Space Reclamation in vSphere on VMware docs.

 

Feel free to skip the history lesson and scroll down to Using NVMe Deallocate with VMware vSphere.

 

deallocate blog image.png

 

The History of the SCSI UNMAP Command

The SCSI (Small Computer System Interface) UNMAP command was introduced as part of the SCSI SBC (SCSI Block Commands) update ratified on September 22, 2011. This spec revision introduced the UNMAP command as a way to inform the storage device about blocks that are no longer in use, allowing for more efficient storage management and space reclamation. Its inception can be traced back to the need for better management of storage space as thin provisioning was rapidly becoming the default on block storage arrays. Support for SCSI UNMAP was added in Data ONTAP 8.2 via the “space-allocation” LUN option. This option has historically been disabled on LUNs in ONTAP by default unless you created them with an application like ONTAP tools for VMware. However, starting in ONTAP 9.15.1 it is now enabled by default (yeah!).

 

Note: Contrary to a common misconception, it is not required to offline a LUN in ONTAP to change this setting.

 

Before the SCSI UNMAP command, storage devices couldn't tell if a LUN's blocks were free when a host deleted data. The UNMAP command optimized storage by reclaiming unused space, allowing others to use those resources.

 

The History of the NVMe Deallocate Command

NVMe (Non-Volatile Memory Express) is the next-generation device interface designed to take full advantage of the capabilities of modern solid-state storage, originally on a PCI Express (PCIe) bus – hence the “Express” part of the name. NVMe offers significant performance improvements over traditional SCSI interfaces due to its lower latency and higher throughput which result from a combination of optimized command set (NVMe has 30-40 highly optimized commands vs. hundreds of legacy commands in SCSI) and greatly increased IO parallelization.

 

The obvious benefits of NVMe lead to the evolution of a new block protocol, NVMe-oF (NVMe over Fabrics, a generic name referring to NVMe over any data network). ONTAP was the first storage array to support NVMe-oF with the Fibre Channel transport, referred to as NVMe/FC, all the way back in ONTAP 9.4.

 

You can read more about NVMe/FC and NVMe/TCP in Implementing and Configuring Modern SANs with NVMe/FC | TR-4684 | NetApp

 

The NVMe deallocate command, introduced as part of the NVMe 1.2 specification [see page 149] in 2014, serves a similar purpose to the SCSI UNMAP command. It allows the operating system to inform the NVMe device about which data blocks are no longer in use and can be deallocated. However, the story doesn’t end here...

 

TP (Technical Proposal) 4040, also known as "Max Data Transfer for non-IO Commands (MDTS)," ratified in the NVMe 2.0 specification, among other things, allowed for the optimization of non-IO commands, including things like write-zeros, etc.. This is important if you want to fully utilize space reclamation in vSphere.

 

Benefits of SCSI UNMAP and NVMe Deallocate Commands

Improved Storage Efficiency

These commands optimize storage by reclaiming space that is no longer used by the host file system, critical in overcommitted environments using thin provisioning. Freeing up space for active data improves overall storage efficiency savings.

 

Optimized Garbage Collection

Garbage collection identifies and erases unneeded data blocks to free up space. The SCSI UNMAP and NVMe deallocate commands help by specifying which blocks can be erased, optimizing garbage collection, and improving storage performance.

 

Using NVMe Deallocate with VMware vSphere

ONTAP 9.16.1RC1 and later, as well as vSphere 8.0U2 and later, automatically enable support for NVMe deallocate, however, there are many factors that may cause vSphere to not effectively reclaim space.

 

Remember when I mentioned TP4040? To fully reclaim space from a VMFS file system, vSphere requires you to manually enable TP4040. Unfortunately, this isn’t something that is automated via the vCenter GUI yet.

 

What this means is that you’ll need to enable it on your individual ESXi hosts. Once TP4040 support is enabled, it takes effect immediately and applies to any new namespaces mapped to the host (more on that later in the post).

 

There are a couple of ways to check the current value, for example you can run esxcfg-advcfg -g /Scsi/NvmeUseDsmTp4040 as shown below. A zero means it is not enabled.

 

 

 

 

[root@esx1:~] esxcfg-advcfg -g /Scsi/NvmeUseDsmTp4040

[Default disabled]

Value of NvmeUseDsmTp4040 is 0

 

 

 

 

You can also use PowerShell as shown in this example, assuming you have VMware PowerCLI installed:

 

 

PS C:\> Get-AdvancedSetting -Entity esx1 -Name Scsi.NvmeUseDsmTp4040

Name                 Value                Type                 Description
----                 -----                ----                 -----------
Scsi.NvmeUseDsmTp40… 0                    VMHost

 

 

 

To enable it, run esxcfg-advcfg -s 1 /Scsi/NvmeUseDsmTp4040 as shown below. 

 

 

 

 

[root@esx1:~] esxcfg-advcfg -s 1 /Scsi/NvmeUseDsmTp4040

Value of NvmeUseDsmTp4040 is 1

 

 

 

 

Or with PowerShell:

 

 

 

 

PS C:\> Get-AdvancedSetting -Entity esx1 -Name Scsi.NvmeUseDsmTp4040 | Set-AdvancedSetting -Value 1 -Confirm:$false

Name                 Value                Type                 Description
----                 -----                ----                 -----------
Scsi.NvmeUseDsmTp40… 1                    VMHost

 

 

 

 

Now, if you have a lot of hosts, this may be challenging. So, I wrote a PowerShell script to enable it in my lab.

You’re welcome to use this sample code and modify it as needed (no warranties or guarantees provided).

 

 

 

 

#Enable TP4040 support in vSphere

#By Chance Bingen, NetApp



#Load VMware PowerCLI core module if not already available

Import-Module VMware.VimAutomation.Core



# Prompt for the vCenter Server

$vCenter = Read-Host -Prompt "What vCenter do you want to connect to?"

Write-Output "You entered: $vCenter"



# Prompt for vCenter Server credentials

$cred = Get-Credential



# Connect to the vCenter Server using the provided credentials

Connect-VIServer -Server $vCenter -Credential $cred



# Prompt for the datacenter

$Datacenter = Read-Host -Prompt "Which datacenter do you want to update?"

Write-Output "You entered: $Datacenter"





$vmhosts = Get-VMhost -Location $Datacenter



#Now enabling TP4040 support

Write-Output "Now enabling TP4040 support in $Datacenter"



foreach ($vmhost in $vmhosts) {

    Get-AdvancedSetting -Entity $vmhost -Name Scsi.NvmeUseDsmTp4040 | Set-AdvancedSetting -Value 1 -Confirm:$false

}





# Disconnect from the vCenter Server

Disconnect-VIServer -Confirm:$false

 

 

 

 

If you have any existing NVMe namespaces mapped to the ESXi host you will need to either put the host into maintenance mode and reboot it, or stop all I/O (typically by putting it into maintenance mode), and reclaiming the namespaces using this syntax:

 

 

 

[root@esx1:~] esxcli storage core claiming reclaim -d <namespace_uuid>

 

 

 

 

Refer to Using the Reclaim Troubleshooting Command for more information on the reclaim command.

 

Conclusion

Administrators no longer have to struggle with reconciling how much free space is in a guest file system, a VMFS file system, and the NVMe namespace in ONTAP. Now, with deallocate support in ONTAP 9.16.1, space that is deleted gets deleted down the stack. Well, mostly. Depending on file system metadata, file system alignment, file system allocation unit sizes, and other factors, you will typically get a little less than 100% back. But through overall storage efficiency that slack space usually gets deduped or compressed out.

 

Keep in mind, however, that you may already have existing “dead space” inside of your VMDKs or guest file system before upgrading to ONTAP 9.16.1. And some of them may not have automatic garbage collection. In cases like that, you may have to run utilities inside of the guest OS, like SDelete in Windows, as well as running vmkfstools -y /vmfs/volumes/[datastorename] from one of the ESXi hosts mounting the namespace.

Public