First some background: we are running OnTAP 9.5P14 on a 4-node AFF8080 cluster. I am moving volumes and shelves over to reduce this down to a 2-node cluster by the end of this month.
We have never used QOS policies because we've never had a need with a 4 node AFF. Now that we're dropping to 2 nodes, I believe we might potentially need to prevent a few test/dev volumes from dominating the CPU. As a result I am researching the QOS feature in detail for the first time. I am reading the basic documentation in the Documentation Center, however I have not been able to find a best practices guide for this feature. Is there one, and/or can anyone point to a good blog post or other article that provides some recommendations? I am not a performance expert and want to be sure I don't break anything. Would appreciate any recommendations! Thank you.
Thank you @darb0505 and @aladd ! I am checking out the links. I think I will likely create a policy with no settings and then review statistics to determine what the policy should be set to. One more question regarding this: does the qos statistics performance show command provide cumulative average statistics, or does it only show the current statistic? If the latter, how do you get the details to know which QOS policy to set? Can it be obtained from ActiveIQ UM? (We are on 9.6)?
Hi. This is great feedback. I'm one of the senior perf TSEs here in AMER and also have been working to improve our KB site....
I would say talking to the account team is definitely important here too. This is more an architecting question as to how to design/use the storage, and from the Support side we do the problems as we identify them.
A lot of customers use a three tier approach, and some of them use QoS on noisy neighbors (bully/shark workloads). You can definitely set it and see. You can fire up a test volume with a synthetic workload to see what it is like. Be careful not to set limits too low (5iops,5MB/s when application wants 40000iops,10000MB/s) otherwise it will overwhelm the network layer. Set it and monitor with qos statistics volume latency/performance show -volume <volume> -vserver <svm name>. Use both commands.
Thank you @paul_stejskal ! My only hesitation on 'set it and see' is the one you raised, and that I've seen in documentation - I don't want to inadvertently set the policy too low and cause a problem for the object in question. I do notice that AIUM provides specific QOS recommendations for volumes that are triggering their performance thresholds which is helpful. I will likely just make it very conservative to play it safe initially. Thank you again for the helpful links and info!