I doubt the hardware configuration you mentioned is Netapp supported or tested. Please validate it. Usually the bottleneck in the stack is disk type since it has to mirror across，network pipes，and compute cpu and memory supported by the license type..
Netapp was pretty clear with us that being "software" the hardware reqirements are those as stated as supported by the underlying hypervisor.
It seems odd that a software layer without any knowledge of the phsycal hardware would have a hardware specific requirement.
Can you send a link to the list of "supported" ontap select hardware? I don't see any mention of one.
All of our hardware is new, fully listed on VMWare HCL and they are all vSAN ready nodes. Are switches are fully supported by Netapp (they are the same switches used for our physcal Netapp cluster switches).
What hardware would you see as an "issue" for support?
On a side note, write preformance is also terrible when using the built in HP SSD's (which are rebranded Intel DC's) on the internal RAID controller. Using the HP branded Intel NVME dirve doesn't change the terrible write latnecy better or worse.
Those requirements are quite general. It doesn't list anywhere you must have "Network Adapter Brad X, with Firmware X".....instead they are general like "Dual 40Gb adpaters".
All of our hardware fully meets the requirements in those docs as far as I can see. And all the hardware and VMWare versions meet the requirements for Netapp interop and VMWAre HCI.
This isn't old hardware. Its all brand new HP vSAN ready nodes.
All of this hardware is supported and preforms very well with vSAN and physcal Netapp's. VMWare preformance graphs show the underlying latency from the nodes to phsycal storage and network is very very low (just like when using vSAN).
But the preformance from the datastore to the SVM has terrible write latency. It doesn't seem like a hardware issue to me unless you see a specific item that would be a problem.
Agree with what you say, and technically it should work, but from all the NetApp literature that I've read, they talk about SSD or HDDs, but not Flash cards; and only mention 10Gbps cards, etc..
Therefore, I cannot comment any further where the issue is.
What ESXi version have you installed on the physical nodes? Have you tagged the datastores as SSD within vSphere and is the controller driver the right one? Have you changed the MTU size on the NICs? Have you checked the performance fom the NetApp system manager and see where it is choking?
We can replicate the very low write preformance with standard SSD drives on the HP Smart Array Controller so its not the NVME PCI-E Card. When testing with the standard SSD's the write latency goes 200+ ms as soon as there is any steady workload.
These host were designed for vSAN so they have a very fast pci-e SSD for write cache and then a bunch of standard SSD's for the capacity tier. We initially tested with the PCI-E card as it can push 100,000+ IOPS at just a few milliseconds latency.
One very intresting test was we deployed the Ontap node to occupy only have of the PCI-E card. Then wedeployed a test VM directly on the PCI-E datastore next to Ontap node. During testing the Ontap node was pushing 98MB throuput at 210MS latnecy averyage. A VM on that same datastore did 1050 MB with an average 3MS latency. We then ran both test at the same time and Ontap preformance stayed the same and the standalone VM dropped only slightly...so there is plenty of head room on those pci-e cards.
All of these host are 6.5u1 and the drives are already identified by VMWare as flash devices.
We have 10GB SFP+ cards in the host already in additionto the 40GB so I'm gonna reconfigure the host to use the HP integrated 10GB cards to see if changes anything but my guess is it won't.
If there was a network latency issue with the 40GB cards then vSAN should see the same slow down but it shows single digit latency instead of 200+.
System manager does not show the same terrible preformance that the VM's see. And the Node VM to the physcal datastore shows very low latency. I'll have to dig into the node preformance to see if there are any new preformance metrics that can detail the latency between the nodes for the write ack.
The final update is that Ontap Select is not for preformance sentive workloads.
We tested many different ways and used several different brands of SSDs. All of which are VMWare HCL certified. Tested both HP servers and Dell servers (each with their own brand controllers and drives).
No matter what we did the latnecy between the VM and the SVM was very high with even a moderate load. The physcal storage stats between the SVM and physcal datastore it runs off where always very low (sub 1ms).
It seems the current version of Ontap Select introduces alot of latency within it stack. Perhaps thats why Netapp doesn't make too many (if at all) claims about preformance levels.
Hopefully they will get the platform tuned in future versions. We love the idea of software defined storage using the Netapp stack. But as it is, even for a remote branch file server, the preformance was too low.
I think the latency issue in Ontap Select is not related to the specific SSDs / hardware components involved but the long storage IO path. The write IOs traverse through two VMFS layers + Netapp Filesystem layer + the VMware network (to mirror the data).
Disclosure, I am a SE with Virtunet Systems, our software for ESXi caches hot data to in-host RAM / SSD. Ontap Select customers have used it to reduce VM storage latencies. Here's a link specific to Ontap Select.