Hello everybody,
I'd like to ask whether any of you experienced the same problem as me, and if a solution/troubleshooting procedure is known for this problem.
We've 2 FAS2554 ONTAP 8.3P2 in cluster (4 nodes). Each node has one big aggregate (plus the ONTAP "service" aggregate), for a total of 4 aggregates for data storage.
One SVM is dedicated to providing iSCSI LUNs to Windows Server 2012 R2 virtual machines and is hosted on aggregate 3. iSCSI LUNs are distributed on the 4 aggregates using 1 volume for each LUN.
I noticed that virtual machines using iSCSI LUNs hosted on aggregate 3 experience periods of very low disk access performance.
I installed Harvest+Graphite+Grafana and noticed that periodically aggregate 3 shows very low IOPS, throughput and latency. At the same time it shows very low reads from HDDs and very high reads from RAM. The behavior seems very regular, with anomaly periods appearing approximately every 100 minutes and lasting about 30 minutes.
I attach a couple of graphs taken from Grafana.
Does anybody have an idea of what's happening?
Many thanks in advance!
Regards