Solved: Performance issue on our environment.

KIIKANZAVANNA · ‎2015-04-21

This is regarding the performance issue we are experiencing as our internal infrastructure seems extremely slow.
As to our environment for 2 SAS aggregates, we have the following disks configuration. We have Six raid groups with raid group 19 (with 2 spares), and they are all SAS disks.

IOPS figure estimation:
Maximum IOPs figure per aggregate.
raid groups (raid 19=17disks plus 2 parities) *6 raid groups (rg0-rg6)
114 disks in total.
With read/write 30%/70%
With RAID Penalty calculated as 2 with Raid_DP
9242 IOPs per aggregate.
http://www.wmarow.com/strcalc/

The problem we have is that users are complaining that the performance of VMs are very slow.
I have seen the IOPs figure that at one point it went up to 25000 and the performance was very low.
We have found out that some VMs have more than 3000 IOPs each and that has contributed to this figure.
I would've thought that the disks can still sustain up to the above maximum IOPs tolerance figure (that is about 28K)
I've installed the OCPM server and the error message shows one of the volumes is causing the data contention on the node but I cannot work whether if the issue is either the disks or filers

Please let me know if you need any more information on this and please advise how I can troubleshoot this issue.

I doubt we can't do much on disks unless we add flash pool as we have spare SSD disks shelves but I need to make sure that this is not the filer problem but disk performance.

Statistics show-periodic figures.
cpu total data data data cluster cluster cluster disk disk
busy ops nfs-ops cifs-ops busy recv sent busy recv sent read write
---- -------- -------- -------- ---- -------- -------- ------- -------- -------- -------- --------
67% 10555 10555 0 15% 77.9MB 183MB 2% 101MB 102MB 274MB 90.0MB
75% 9066 9066 0 13% 161MB 160MB 4% 124MB 123MB 176MB 113MB
85% 12248 12248 0 11% 106MB 143MB 2% 79.0MB 78.1MB 261MB 230MB
80% 11335 11335 0 10% 119MB 120MB 3% 103MB 102MB 258MB 197MB
77% 14315 14315 0 14% 102MB 169MB 4% 132MB 128MB 184MB 96.2MB
cluster:summary: cluster.cluster: 4/13/2015 17:39:37
cpu total data data data cluster cluster cluster disk disk
busy ops nfs-ops cifs-ops busy recv sent busy recv sent read write
---- -------- -------- -------- ---- -------- -------- ------- -------- -------- -------- --------
Minimums:
67% 9066 9066 0 10% 77.9MB 120MB 2% 79.0MB 78.1MB 176MB 90.0MB
Averages for 5 samples:
76% 11503 11503 0 12% 113MB 155MB 3% 108MB 106MB 231MB 145MB
Maximums:
85% 14315 14315 0 15% 161MB 183MB 4% 132MB 128MB 274MB 230MB

JGPSHNTAP · ‎2015-04-21

You can run AWA to see if flashpool will help.

Also, what type of controllers do you have and do you have flashcache in them?

I assume you are serving NFS from cDOT, if so, you need to review the nfs latency r/w to the exported vol.

View solution in original post

JGPSHNTAP · ‎2015-04-21

You can run AWA to see if flashpool will help.

Also, what type of controllers do you have and do you have flashcache in them?

I assume you are serving NFS from cDOT, if so, you need to review the nfs latency r/w to the exported vol.

KIIKANZAVANNA · ‎2015-04-22

We have v3240 with 1024MB NVRAM size, and no flashcache installed.

I've put the out put of awa as below:

ONTAP Version NetApp Release 8.2.2 Cluster-Mode: Fri Aug 22 01:46:52 PDT 2014
AWA Version 1
Layout Version 1
CM Version 1

Basic Information

Aggregate aggr1
Current-time Wed Apr 22 14:21:02 UTC 2015
Start-time Wed Apr 22 11:41:42 UTC 2015
Total runtime (sec) 9559
Interval length (sec) 600
Total intervals 16
In-core Intervals 1024

Summary of the past 16 intervals
max
Read Throughput 41.386 MB/s
Write Throughput 52.583 MB/s
Cacheable Read (%) 67 %
Cacheable Write (%) 15 %
Max Projected Cache Size 86 GiB
Projected Read Offload 52 %
Projected Write Offload 16 %

Summary Cache Hit Rate vs. Cache Size

Size 20% 40% 60% 80% 100%
Read Hit 10 41 43 45 52
Write Hit 10 10 10 11 16

The entire results and output of Automated Workload Analyzer (AWA) are
estimates. The format, syntax, CLI, results and output of AWA may
change in future Data ONTAP releases. AWA reports the projected cache
size in capacity. It does not make recommendations regarding the
number of data SSDs required. Please follow the guidelines for
configuring and deploying Flash Pool; that are provided in tools and
collateral documents. These include verifying the platform cache size
maximums and minimum number and maximum number of data SSDs.

### FP AWA Stats End ###

Second filer

### FP AWA Stats ###

Host corp-netapp-01 Memory 6106 MB
ONTAP Version NetApp Release 8.2.2 Cluster-Mode: Fri Aug 22 01:46:52 PDT 2014
AWA Version 1
Layout Version 1
CM Version 1

Basic Information

Aggregate aggr0
Current-time Wed Apr 22 14:21:50 UTC 2015
Start-time Wed Apr 22 11:42:05 UTC 2015
Total runtime (sec) 9585
Interval length (sec) 600
Total intervals 16
In-core Intervals 1024

Summary of the past 16 intervals
max
Read Throughput 62.208 MB/s
Write Throughput 49.940 MB/s
Cacheable Read (%) 58 %
Cacheable Write (%) 13 %
Max Projected Cache Size 186 GiB
Projected Read Offload 24 %
Projected Write Offload 14 %

Summary Cache Hit Rate vs. Cache Size

Size 20% 40% 60% 80% 100%
Read Hit 8 9 15 20 24
Write Hit 8 8 8 9 14

The entire results and output of Automated Workload Analyzer (AWA) are
estimates. The format, syntax, CLI, results and output of AWA may
change in future Data ONTAP releases. AWA reports the projected cache
size in capacity. It does not make recommendations regarding the
number of data SSDs required. Please follow the guidelines for
configuring and deploying Flash Pool; that are provided in tools and
collateral documents. These include verifying the platform cache size
maximums and minimum number and maximum number of data SSDs.

### FP AWA Stats End ###

Performance issue on our environment.

NetApp Wins Genesys Customer Innovation Awards

Join us on Discord