Periodic very low performance on cluster node

zamo2k · ‎2017-02-14

Hello everybody,

I'd like to ask whether any of you experienced the same problem as me, and if a solution/troubleshooting procedure is known for this problem.

We've 2 FAS2554 ONTAP 8.3P2 in cluster (4 nodes). Each node has one big aggregate (plus the ONTAP "service" aggregate), for a total of 4 aggregates for data storage.

One SVM is dedicated to providing iSCSI LUNs to Windows Server 2012 R2 virtual machines and is hosted on aggregate 3. iSCSI LUNs are distributed on the 4 aggregates using 1 volume for each LUN.

I noticed that virtual machines using iSCSI LUNs hosted on aggregate 3 experience periods of very low disk access performance.

I installed Harvest+Graphite+Grafana and noticed that periodically aggregate 3 shows very low IOPS, throughput and latency. At the same time it shows very low reads from HDDs and very high reads from RAM. The behavior seems very regular, with anomaly periods appearing approximately every 100 minutes and lasting about 30 minutes.

I attach a couple of graphs taken from Grafana.

Does anybody have an idea of what's happening?

Many thanks in advance!

Regards

andris · ‎2017-02-15

You might want to open a technical case with Support to look into this further - there could a number of factors at work, here.

On the other hand, 8.3P2 is long in the tooth - your time might be better served moving to the recommended 8.3.x release - currently 8.3.2P9.

Ref: Recommended Data ONTAP Releases on the NetApp Support Site

You can check out the bugs fixed between these two release, if you like:

http://mysupport.netapp.com/NOW/cgi-bin/relcmp.on?notfirst=Go%21&rels=8.3P2%2C8.3.2P9&what=fix

Of particular interest... Bug 896685

zamo2k · ‎2017-02-16

Many thanks andris for your detailed answer, it gave me a starting point.

I'll contact NetApp immediately and update this thread at the end of the story.

I hope everything could be solved with an ONTAP upgrade.

Best regards