We have completed OnTap Upgrade from 9.6p9 to 9.7p8 following Upgrade Advisor. Everything looks good, no any errors in any logs. After checked vSphere (NFS datastores), there are no any errors/issues neither. OCUM or Event Log didn't indicate any issues.
After a week or so, they experienced some performance issues with a few particular VM's among thousands VM's, and then they suspect these are caused by Upgrade, and ask Storage people to look into it. OnTap covers almost everything, people can relate any issue to it without presenting any evidences or any indications, it is like "presumption of guilty" to me. Of cause, I don't want to say that to them.
But, what should I say, what my professional response should be? I hope some experts here can help me.
the IMT - https://mysupport.netapp.com/matrix/#welcome Will check for interop between NetApp software/hardware/etc and 3rd party. For example ONTAP version and the VSC. Or between ONTAP and VMware using iSCSI etc. there's a lot in there.
aiq.netapp.com -> check for health issues and you can also run the upgrade advisor from here.
Unified manager and the System manager should alert if there are any errors.
You can also manually check the event log inside ONTAP
Keep in mind the volumes you are comparing might have different clientside workloads, but in the above output mainly look for any major outliers. You can also perform a takeover of the node that owns the disks for the problematic volumes or an aggregate relocate and compare the statistics before takeover and during a takeover or aggregate relocate. This would essentially eliminate anything that is specific to the node hardware-wise.
Another check like the above would be to compare the statistics for any errors when the LIF lives on the home node and when you migrate the LIF to another node.
::> node run -node <NODENAME> -command "ifstat -a"