Tech ONTAP Blogs

Use PowerShell Automation to Maximize Performance and Resilience of VMware Datastores on ASA systems

ChanceBingen
NetApp
303 Views

With the release last year of our new generation of ASA series of controllers, you may have noticed that I updated the VMware vSphere with ONTAP Best Practices to recommend using the latency policy option on the round-robin Path Selection Plugin (PSP) with ASA systems.

 

You can also set this option when using the newer High-Performance Plug-in (HPP) on either SCSI LUNs or NVMe namespaces). In fact, it’s quite easy with the HPP to just pop into the vCenter UI and click the option on the device.

 

Starting with VMware Cloud Foundation (VCF) / vSphere 9.0, this configuration is now the default for NetApp LUNs. Additionally, changes to the default NVMe namespace configuration may be introduced in a future update to ESXi.

 

ChanceBingen_0-1754057503225.png

 

But what if you aren’t using the HPP? Or if you are on a release that doesn’t support using the HPP on SCSI devices, limiting you to the old NMP plugin? Especially if you have a lot of hosts to manage?

 

My friend and coworker, @ScottBell , and I worked up this little PowerShell script using VMware PowerCLI to create a new Storage Array Type Plug-in (SATP) rule to update all of the hosts in the datacenter to use the latency policy.

 

It makes setting up demo labs so much faster and easier.

 

Before getting to the script, let me explain two things.

 

  1. Even though the new ASA systems no longer have aggregates and volumes to manage, sharing a common Storage Availability Zone (SAZ) between the HA pair(s), and all paths being exposed symmetrically as ALUA active/optimized to all Storage Units(SUs, note that LUNs and namespaces are called SUs in the new ASA systems), one controller is still considered the owner of the SU, and IO to that node may be ever so slightly faster, measured in microseconds, than paths to the other controller. Therefore, under heavy sustained load, those microseconds could add up over time.
  2. VMware added the latency option to the round robin PSP in ESXi way back in the vSphere 6.7 days. For over a decade, Linux (thus including KVM) has used a latency-based round-robin mechanism called dm-service-time, while Microsoft Windows/Hyper-V has long offered similar options like LQD (Least Queue Depth) and LB (Least Blocks).

    The goal with all of the previously mentioned examples is to prefer paths that perform the fastest. I.e., they have the lowest latency, the lowest number of outstanding commands or bytes, etc..

    Historically, that has been very helpful in other use cases too, especially instances where you may have asymmetric pathing (i.e., one path may have more hops than another), or you may have one path go flaky with a slowly failing SFP, a slightly over-bent Fibre cable, and other things like that.

    Flaky is my highly technical term, feel free to use it.

 

That’s where the advantage of using these types of multipath settings comes in handy with ASA systems. Although generally speaking, you will never notice the difference unless you are running a very storage-intensive workload, using every bit of performance the system has to offer. That being said, even when you aren’t pushing the systems very hard, the ability to automatically minimize using flaky paths that aren’t quite dead yet is incredibly beneficial.

 

Over the years of my career, I have seen many an outage or performance degradation caused by paths that just weren’t dead enough to fail out, but also ended up stalling IOs in the host’s queuing mechanism.

 

ChanceBingen_1-1754057503228.jpeg

 

So, now that we’ve gotten that out of the way, onto the script.

 

Keep in mind that you will need to either unclaim/reclaim the LUNs, or put the hosts into maintenance mode and do a rolling reboot. Usually, you have to use the latter method because the former requires there to be no IO at all on a path to be unclaimed. *Sometimes* I have had luck with unclaiming/reclaiming with a host just in maintenance mode. But it seems inconsistent in my experience.

 

# Set the latency policy for NetApp ONTAP LUNs in a vCenter environment by specifying the datacenter to update.
# By Chance Bingen and Scott Bell, NetApp, 2025.

# Load VMware PowerCLI core module if not already available
Import-Module VMware.VimAutomation.Core

# Uncomment the below Set-PowerCLIConfiguration command if you want to ignore invalid certificates
# Note: This is not recommended for production environments as it can expose you to security risks.
# Set-PowerCLIConfiguration -InvalidCertificateAction Ignore -Confirm:$false

# Prompt for the vCenter Server
$vCenter = Read-Host -Prompt "What vCenter do you want to connect to?"
Write-Output "You entered: $vCenter"

# Prompt for vCenter Server credentials
$cred = Get-Credential

# Connect to the vCenter Server using the provided credentials
Connect-VIServer -Server $vCenter -Credential $cred

# Prompt for the datacenter
$Datacenter = Read-Host -Prompt "Which datacenter do you want to update?"
Write-Output "You entered: $Datacenter"

$vmhosts = Get-VMhost -Location $Datacenter

# Suppress PowerShell errors
$OldErrorActionPreference = $ErrorActionPreference
$ErrorActionPreference = "SilentlyContinue"

foreach ($vmhost in $vmhosts) {
    $esxcli = Get-EsxCli -VMHost $vmHost -V2
    $hostname = Get-VMHost $vmHost.Name
    $arguments = $esxcli.storage.nmp.satp.rule.add.CreateArgs()
    $arguments.pspoption="policy=latency"
    $arguments.description="NetApp ONTAP Latency SATP Rule"
    $arguments.vendor="NETAPP"
    $arguments.type="vendor"
    $arguments.satp="VMW_SATP_ALUA"
    $arguments.claimoption="tpgs_on"
    $arguments.psp="VMW_PSP_RR"
    $arguments.option="reset_on_attempted_reserve"
    $arguments.model="LUN C-Mode"
    $esxcli.storage.nmp.satp.rule.add.Invoke($arguments) | Out-Null
    # Check if the command was successful
    if ($?) {
        # Output a message including the hostname
        Write-Host "Successfully added Latency policy to host: $hostname"
    } else {
        Write-Host "Failed to add Latency policy to host $hostname"
    }
}

# Reset $ErrorActionPreference to previous value
$ErrorActionPreference = $OldErrorActionPreference

# Disconnect from the vCenter Server
Disconnect-VIServer -Confirm:$false

 

One thing you might change, depending on your environment, is scoping it to a particular vSphere cluster instead of a whole datacenter. To do that, just change this:

 

$Datacenter = Read-Host -Prompt "Which datacenter do you want to update?"
Write-Output "You entered: $Datacenter"

 

To this:

 

$vcluster = Read-Host -Prompt "Which cluster do you want to update?"
Write-Output "You entered: $vcluster"

 

And change this:

 

$vmhosts = Get-VMhost -Location $Datacenter

 

To this

 

$vmhosts = Get-Cluster -Name $vcluster | Get-VMhost

 

Of course, you may wonder about NVMe namespaces, since they exclusively use the HPP and not the Native Multipathing Plugin (NMP). Those are bit easier to do because there’s a native interface for it. But, if you have a lot of hosts to manage, you may want to script that too. If you would like to see some examples of that, let us know.

 

If you have any questions, feel free to comment below, and we’ll be happy to answer them.

 

More resources can be found at:

 

ONTAP 9 – Learn About ONTAP SAN Configuration

https://docs.netapp.com/us-en/ontap/san-config/index.html

 

Learn about SAN host configurations

https://docs.netapp.com/us-en/ontap-sanhost/overview.html

 

VMware vSphere with ONTAP Best Practices Guide (Formerly TR-4597)

https://docs.netapp.com/us-en/ontap-apps-dbs/vmware/vmware-vsphere-overview.html

 

TR-4080: Best Practices for Modern SAN

https://www.netapp.com/media/10680-tr4080.pdf

 

TR-4684: Implementing and Configuring Modern SANs with NVMe/FC

https://www.netapp.com/pdf.html?item=/media/10681-tr4684pdf.pdf

 

Public