Solved: VMware DRS and NetApp snaps

storageguy · ‎2021-05-11

Good Morning,

Looking for some inputs. When DRS (Auto) is enabled on datastore cluster , vmdks moved between datastores I guess. Hence I assume for NetApp it is data change and snaps will be bigger. I ask this question because we noticed and aggregate utilization went above 98% and soon as DRS disabled and snaps delete, it started to come down. I couldn't see aggr utilization going down while DRS was enabled and snaps got deleted. So enabling DRS good or bad?

Thanks in advance

jeras · ‎2021-05-13

Our recommendation is to use the default storage DRS setting of "manual". This is discussed towards the bottom of this section of TR-4597, the VMware vSphere for ONTAP Best Practices Technical Report: https://docs.netapp.com/us-en/netapp-solutions/hybrid-cloud/vsphere_ontap_other_capabilities_for_vsphere.html#ontap-qos-and-vmware-sioc

When you have SDRS set to "auto" and VMDKs files are copied to a new datastore by vSphere, the VMDK files in the original location are deleted by SDRS which can cause your ONTAP snapshot space to dramatically increase as you have seen. When SDRS deletes the files related to the VM that was moved, those data blocks are no longer part of the active file system on the original datastore and the space associated with those data blocks is now being accounted for in the ONTAP Snapshot space in the original datastore volume.

Among the considerations for keeping the default "manual" setting for SDRS are:

When SDRS moves VMDKs between datastores, any space savings from ONTAP cloning or deduplication are lost. You can rerun deduplication to regain these savings.
Moving VMDKs between datastores on the same aggregate has little benefit and SDRS does not have visibility into other workloads that might share the aggregate.
SDRS is reacting to a measurement crossing a specific threshold and does not understand whether the event trigger is an temporary anomaly/performance spike in the ONTAP system. By using the "manual" SDRS setting, you can further investigate whether the event trigger was just a performance spike in the ONTAP system or is an situation that would warrant moving the VM to a different datastore.

You can use "auto" for SDRS, but you'll need to keep an eye on ONTAP Snapshot space consumption as you experienced. You can consider using Active IQ Unified Manager to set an alert trigger for Snapshot space growth to provide you an alert on the sudden spike in Snapshot space growth.

View solution in original post

jeras · ‎2021-05-13

Our recommendation is to use the default storage DRS setting of "manual". This is discussed towards the bottom of this section of TR-4597, the VMware vSphere for ONTAP Best Practices Technical Report: https://docs.netapp.com/us-en/netapp-solutions/hybrid-cloud/vsphere_ontap_other_capabilities_for_vsphere.html#ontap-qos-and-vmware-sioc

When you have SDRS set to "auto" and VMDKs files are copied to a new datastore by vSphere, the VMDK files in the original location are deleted by SDRS which can cause your ONTAP snapshot space to dramatically increase as you have seen. When SDRS deletes the files related to the VM that was moved, those data blocks are no longer part of the active file system on the original datastore and the space associated with those data blocks is now being accounted for in the ONTAP Snapshot space in the original datastore volume.

Among the considerations for keeping the default "manual" setting for SDRS are:

When SDRS moves VMDKs between datastores, any space savings from ONTAP cloning or deduplication are lost. You can rerun deduplication to regain these savings.
Moving VMDKs between datastores on the same aggregate has little benefit and SDRS does not have visibility into other workloads that might share the aggregate.
SDRS is reacting to a measurement crossing a specific threshold and does not understand whether the event trigger is an temporary anomaly/performance spike in the ONTAP system. By using the "manual" SDRS setting, you can further investigate whether the event trigger was just a performance spike in the ONTAP system or is an situation that would warrant moving the VM to a different datastore.

You can use "auto" for SDRS, but you'll need to keep an eye on ONTAP Snapshot space consumption as you experienced. You can consider using Active IQ Unified Manager to set an alert trigger for Snapshot space growth to provide you an alert on the sudden spike in Snapshot space growth.