ONTAP Discussions

What's the impact of enabling Flexshare?

__jeremypage_3897
4,124 Views

I have a 3070 (7.3.1P2D11) that runs pretty hard (systat -x shows pretty constant low 50s) and I'd like to enable Flexshare but I am concerned about the overhead. Basically I have a low utilization volume that absolutely has to go to the front of the queue and it's sharing spindles with our test data warehouse. Normally things are fine but occasionally we'll do some massive data loads that push an extra 3-4k IOPS on that controller.

Think it's safe to enable? Can I just shut it off with no repercussions if it does seem to cause an issue?

4 REPLIES 4

rogilvieisc
4,125 Views

Hi Jeremy,

My understanding/experience of Flexshare is that the policies you setup across your volumes will only come into effect when your filer is under high stress - it will then implement a 'quality of service' across your volumes.

i.e. you may have an exchange volume with a high priority and a VMware dev/testing volume set to low priority - everything will function normally when you system is happily bashing out IOPs, but when it is off its brain, busting out IOPs like crazy FlexShare will kick in and make sure Exchange has priority over the VMware dev/test volume.

Once you've gone crazy with FlexShare have a look at FlexScale too, tell it which Vols can live in the intelligent deduped cache, awesome - as long as you're running 7.3.1

Also, even if your filer is running at 50% it doesn't mean FlexShare will kick in, maybe if its deleting snapshots from a schedule, snapmirroring and then dedupe kicks in at 12am and you see 100% CPU and extremely high CP time (90+) FlexShare will start to strangle you low priority vols etc

Hope that helps.

Ross

madden
4,125 Views

Flexshare provides workload prioritization (rather than quality of service which implies guarantees).  I suggest you read a really good whitepaper on the topic:FlexShare™ Design and Implementation Guide. Once you've read it you should have the knowledge to implement a prioritization policy, and verify it's helping (and not artifically limiting) by monitoring the performance counters.  The priority scheduler can be enabled/disabled/adjusted on-the-fly so if you see a behavior you don't like you can tweak the settings or turn it off to restore default (i.e. no workload prioritization) system behavior.

__jeremypage_3897
4,125 Views

I've read the paper, which is why I came here with the questions. I am not asking what it does, I am asking what the overhead involved is. For instance it's rare that my clients don't get the data they want but there are some clients that I'd prefer to have lower latency then others (i.e. be sent to the head of the queue). As I understand it with out flexshare the filer works on a FIFO basis, with flex share as long as it's not already servicing a request for data it will bump things with higher priorty to the head of the line. Obviously of there is no queing that's a moot point.

I guess I did not phrase it well. If I start using flexshare what is the impact on system resorces as a whole. I assume that it will use some CPU, otherwise should I see an impact? If I enabled flex share with everything getting even amounts of resources would anything slow down?

I have a single volume that needs  to be given priority no matter what else the filer is doing. The filer is sized so that under normal conditions that's not an issue, but I'd like to load it up a bit more and I'm concerned about this application.

madden
4,125 Views

>>If I start using flexshare what is the impact on system resorces as a whole. I assume that it will use some CPU, otherwise should I see an impact?

No, prioritization costs a negligible amount of CPU.

>>If I enabled flex share with everything getting even amounts of resources would anything slow down?

Possibly.  The solution implemented by FlexShare includes both work conserving and non-work conserving queuing schemes.  Work conserving means resources are not allowed to go unused, and non-work conserving means that resources are allowed to go unused.  Because some aspects are non-work conserving it means that even though resources might be available a given request might not be scheduled.  For example the queuing algorithm for read disk I/Os is non-work conserving.  As the I/O utilization of disks increases so does the latency of each individual I/O.  Imagine an aggregate with volumes A (high priority) and B (low priority).  In order to ensure that vol B doesn't drive the utilization of the disks too high and noticeably affect read I/O latency of vol A, we must allow (but not require) that resources can go unused.  Without priority enabled the FIFO queuing algorithm wouldn't make any choice and the incoming requests would simply be fulfilled in the order they were received up to using 100% of resources [that in turn might result in all requests having higher latency than desired].  From an implementation perspective see the prioirityqueue counter 'usr_read_limit' which defines the maximum outstanding number of usr reads for the volume, 'max_user_reads' which defines the maximum number of usr reads that have even been outstanding, and 'usr_read_limit_hit' which defines the number of times usr_read_limit has been reached.  If you were to apply FlexShare policies to your system today if you see the counter 'usr_read_limit_hit' (and other similar counters) incrementing then those volumes are being throttled.  You'd have to analyze further if the throttling was benefiting other [presumably higher priority] volumes or not.

>>I have a single volume that needs  to be given priority no matter what else the filer is doing. The filer is sized so that under normal conditions that's not an issue, but I'd like to load it up a bit more and I'm concerned about this application.

Yes, you could configure Data ONTAP to support these requirements.  Basically configure the single volume with a higher priority, and all other volumes with a lower priority, and set the balance between user vs. system priority.  Data ONTAP will allocate system limits across the number of volumes, priority applied, and usr vs. system settings; check the counters for the allocations made by the system and adjust as necessary.  Over time periodically monitor the counters to ensure that throttling is occurring when (and only when) it is desired.  Also pay attention to the 'user' and 'system' priorities because if set incorrectly you could be allocating resources for system processes much higher than you really desire (unless you want to prioritize things like snapmirror and dump).

Hope that helps.

Public