VMware Solutions Discussions

vMotion issue when VM is on a VVOL datastore

sraudonis

Hi,

 

some time ago i installed the 9.7.1 VSC in my vSphere 7.0.1 environment and created a VVOL datastore. I migrated VM's from NFS 4.1 datastores to the VVOL datastore without problems.

 

But when  doing a "compute" vMotion the process stuck at 85% for 30 to 40 sec. In this time the VM freezes.

 

I had a NetApp case open, they  found nothing in the logs but told me that they currently don't support the 7.0.1 but they adon't know that there are issues in that kind, i should open a case at VMware. VMware made some research but don't found something too in the the logs. 

To send this case in the next level i should bring my environment to a from VMware supported version, but in the VMware HCL the newest versions are:

  • ESXi 6.5 U3
  • VASA Provider 7.1P1 (end of support since jan 2019)
  • ONTAP 9.4 (end of support since jun 2019)

So VMware closes the case...

 

Today i updated my VSC to 9.7.1P1, but the vMotion issue is still present.

So i have an issue and can't get help!

Now the environment is supported by NetApp IMT, but this don't help when NetApp told me to open a case at VMware, and  they don't help me because NetApp has missed to update the HCL at VMware.

 

This very very bad!!!!

 

NetApp PLEASE!!! Update the HCL at VMware!!!!

1 ACCEPTED SOLUTION

konnerth

Hi,

Engineering is working on certifying our VASA Provider, but ran into a minor failure with 9.7.1 that blocked certification.  I believe they are working to certify 9.7 now, and will address the issue so 9.8 can be certified.  I am sorry for the limited certification in the VCG.

Regarding your original issue (slow vMotion), we have been working with VMware on this and recently learned that there is a problem with NFS file locking that slows migration of VMs on vVols datastores.  The bug is 2668244 and is targeted to vSphere 7.0U2.  There are also several other vVols optimizations in7.0U1 and U2 that should improve migration times.

By the way, we are still working with VMware on NFSv4.1 datastores.  We don't currently list support for v4.1 datastores in the NetApp IMT due to VMs being occasionally powered off during a storage failover.  We continue working with them on this and hope to resolve this soon.

---Karl

View solution in original post

12 REPLIES 12

jcolonfzenpr

Can you share more on your setup? hardware, type of connection or network speed

 

i found many issues with VMware and nfs 4.1 lately.

 

did you test it with vvol using other connection protocol? nfs 3, fcp or iscsi.

 

Kindly.

Jon.

Jonathan Colón | Blog | Linkedin

Blissitt

As an aside, I believe that vVols only use NFSv3 (NFS 4.1 is not supported with vVols).  We don't even have 4 or 4.1 enabled on the SVMs that offer our vVols to hosts.

https://www.netapp.com/media/13555-tr4400.pdf

 

It took some effort especially with some previous NetApp versions, but we finally got vVols running stably with our NetApps.  The latest versions of ONTAP and NetApp's VASA seem to work pretty well.  I haven't run into the same problem jcolonfzenpr is having; that's an interesting one.

 

I doubt it's involved here, but never place the VSC/VASA/SRM VM on a vVol (and probably not the VCSA either).  Leave those on "regular" storage.

sraudonis

I used the "Provision Datastore" function from the VSC to create the VVOL datastore. And yes, "nfs connected-clients show" shows that the VVOL uses a NFS 3 connection.

 

If i remember right i tested VVOLs over one year ago with vSphere 6.7 and i had not that problem i have currently with vSphere 7.0.1.

 

As i wrote, the main problem is, that NetApp missed to update the VMware HCL, if this shows the same infos as the NetApp IMT, i can open a case at VMware regarding this problem.

 

I had already a case open at NetApp but they told me to contact VMware, because they found nothing wrong in config and logs...

konnerth

Hi,

Engineering is working on certifying our VASA Provider, but ran into a minor failure with 9.7.1 that blocked certification.  I believe they are working to certify 9.7 now, and will address the issue so 9.8 can be certified.  I am sorry for the limited certification in the VCG.

Regarding your original issue (slow vMotion), we have been working with VMware on this and recently learned that there is a problem with NFS file locking that slows migration of VMs on vVols datastores.  The bug is 2668244 and is targeted to vSphere 7.0U2.  There are also several other vVols optimizations in7.0U1 and U2 that should improve migration times.

By the way, we are still working with VMware on NFSv4.1 datastores.  We don't currently list support for v4.1 datastores in the NetApp IMT due to VMs being occasionally powered off during a storage failover.  We continue working with them on this and hope to resolve this soon.

---Karl

View solution in original post

sraudonis

Hello Karl,

 

thank you very much for that informations! And a happy new year!

 

But i'm wondering, when i look into the IMT and search for VSC 9.7.1 or for the VASA provider then the combination of VASA/VSC 9.7.1 / ONAP 9.8 / vSphere 7.0.1 is shown as supported with no additional note.

 

This is very confusing...

 

The problems with NFS 4.1 i'm aware, but i have only a single node system here, so i can ignore that problem.

 

The bug id you wrote, that insn't a NetApp id, correct? Because i can't find that id...

 

So i just tryed to install the 7.0.2 (december beta) on my hosts, to look if the vMotion problem in that version is alredy resolved, but when booting that iso, the H410C giving me a PSOD:

 

Failed at bora/modules/vmkernel/rdma/driver/rdma-driver.c:545 -- VMK_ASSERT(deviceAttr.nodeGuid != 0)

 

I will do some research or open a vmware beta case...

 

I keep you informed...

 

Kind regards

Stefan

 

sraudonis

I figured it out. I had to disable the RDMA driver:

 

esxcli system module set --enabled=false --module=nmlx5_rdma

 

Then i was able to install the update. Seems that the driver in the beta version has a bug... 

 

Now i have 2 Hosts:

host1 = ESXi 7.0.1 (U1c)

host2 = ESXi 7.0.2 (december beta)

 

When i'm doing a vmotion from host1 to host2, the vmotion takes 5 sec. (Whow!)

But when going back from host2 to host1 then it rakes 40sec and the VM freezes the most time.

 

So @konnerth you are right, the U2 will resolve the vMotion issue!

 

sraudonis

@konnerth , btw the last 7.0 U1c (december patch) resolves a NFS 4.1 problem:

 

https://docs.vmware.com/en/VMware-vSphere/7.0/rn/vsphere-esxi-70u1c.html

PR 2655181: Virtual machines on NFS 4.1 datastore might become unresponsive after an NFS server failover or failback

If a reclaim request repeats during an NFS server failover or failback operation, the open reclaim fails and causes virtual machines on NFS 4.1 datastores to become unresponsive.

This issue is resolved in this release. The fix analyzes reclaim responses in detail and allows retries only when necessary.

 

konnerth

Hi Stefan (@sraudonis),

 

Happy New Year!

 

Yes, we worked with VMware through much of 2020 to resolve that one, after they identified and fixed the 'datastore inaccessible' problem in earlier releases.  Yet, we identified one more infrequent problem in recent testing that we are working on right now.

 

---Karl

sraudonis

O.k. in my testing environment i see currently no issues with NFS 4.1 datstores (non VVOL), it's running very well.

 

I dicovered only a little problem in the VSC 9.7.1 and 9.7.1p1 with the NFS host settings in a vSphere 7.0.1 environment, but i think this has Andreas yesterday already reported in your direction, because he was able to see the same issue in his new version. 🙂

Blissitt

For anyone who wants to upgrade to "VSC" 9.8, the product is now called "ONTAP tools for VMware vSphere."  Go to the NetApp Downloads section and you'll find it under the letter "O."

 

Version 9.7.1P1 may have a bug causing data loss and is not listed on NetApp's IMT as supported past VMware 7.0U1, so if you're running 9.7.1P1 I'd suggest you upgrade soon.

sraudonis

Currently i have uninstalled all this and using only the NFS VAAI plugin...

 

This is easy to configure, is working every time, and the best is the Snapshot offload.  I'm happy with that... 🙂

 

No need for the VSC when using NFS.

sraudonis

2 NetApp HCI H410C nodes connected via 25G to a Mellanox switch. And a single node FAS2650 connected with 10G to the same switch. For vMotion there is a own VLAN on the 25G network.

 

On the same NFS SVM i have NFS 4.1 datastores and here i have no problems.

 

I have only tested NFS...

 

Currently i have moved all my VM's back to normal NFS 4.1 datastores.

 

After updating the VSC to P1 i created a new test VVOL and moved one VM to this datastore. The storage vMotion runs without problems, but when doing a vMotion to a other host the problem is there.

 

Kind regards

Stefan

Announcements
NetApp on Discord Image

We're on Discord, are you?

Live Chat, Watch Parties, and More!

Explore Banner

Meet Explore, NetApp’s digital sales platform

Engage digitally throughout the sales process, from product discovery to configuration, and handle all your post-purchase needs.

NetApp Insights to Action
I2A Banner
Public