ONTAP Discussions

ONTAP Edge High CPU Utilization

jasonburrell
9,150 Views

I just installed ONTAP Edge and it runs 100% CPU across both CPU's as shown in vCenter but when I run sysstat in the console it shows 0% to 1% CPU utilization.  Is this normal behavior?  Will ONTAP Edge always burn 2 cores?

14 REPLIES 14

ashrut
9,109 Views

Yes it is normal behavior

jasonburrell
9,109 Views

Thanks ashrut. This makes the product unusable for my environment.  I home this will be fixed at some point.

kkoelliker
9,109 Views

yes, a hog indeed.  you can mitigate this by putting it into a resource pool and placing a limit on CPU consumption.

jasonburrell
9,109 Views

Yeah I ended up putting a limit on it and it works fine, still seems like a silly workaround.

-Jason

coon
8,136 Views

Hi Jason,

Just FYI, Netapp has not tested this with a limit like you've put on it. I want to be sure that you realize in uncharted territory anything could happen there. I wouldn't do that for production. I can reassure you that the developers are discussing a long term solution for this (balancing the need to know we'll have resources for I/O with competing for resources in a virtual environment).

I'm curious to know how small of a physical server you'd want to throw at a task like that (hosting a full storage solution and the applications)? What performance expectations would you have with that small a server that 2 CPUs can't be dedicated to I/O?

Thanks!

-Gerald

egrigson1
9,109 Views

I'd hardly describe this as 'normal behaviour' - it may be expected by Netapp but for everyone else it's likely to be a problem for various reasons;

  1. If the VMware scheduler doesn't correctly understand the physical CPU usage it will mess with the scheduler's allocations given to other VMs on the same host. Is it a recommendation to run this on a dedicated host?
  2. Any monitoring tools, scripts, alarms etc configured at the VMware level will be triggered - the only option I can see would be to explicitly exclude the EdgeVSA but then that introduces a gap in visibility - if the EdgeVSA really does fail you'll need different tools to track it.
  3. Presumably as no HLT instruction has been sent to the CPU it's actually physically 'working', generating heat and consuming power even if the Edge VSA isn't doing anything with it?

This wasn't the case for the Netapp Simulator which has been around for years. What's changed? NT4 running multiple processors used to have this issue too but I assumed we'd left that kind of problem behind years ago. I tried putting this in my lab which only has two hosts and it  seems to kill the pCPU - I believe other VMs were impacted but will recheck.

Ed

jgoldsch
9,109 Views

Jason,

As others have stated this is expected: Data ONTAP Edge requires two cores for operation.  This is architectural requirement for Data ONTAP and reducing the resources available will lead to adverse behavior in WAFL.

The reason VMware shows the CPUs at 100%, but sysstat shows something which is more accurate with regard to the current load on the system, is how VMware scheduling interprets the activity of the CPU.  Data ONTAP does not perform the halt CPU instruction when idle and this makes the idle thread appear to be hot on a vCPU.  This leads to VMware always reporting 100%.  Our performance measurements found that performing the halt instruction when idle had a definite impact on total IOPS for various tests.  Not performing halt when idle is the same behavior as on physical NetApp filers.

It was unclear from your post if you are talking about the evaluation version of Data ONTAP Edge or the production version.  The production version enforces a CPU and memory resource reservation for the virtual machine.  This ensures that Data ONTAP Edge will always get two vCPUs and 4GB of memory.

I hope this helps.

adamvh
9,109 Views

I tried to double the allocate more resources to the ONTAP-v virtual machine increasing ram from 4GB -> 8GB and 2 vCPU to 4vCPU.

NetApp has checks on boot and the virtual machine fails to boot anymore? Will increasing resources make performance any better?

jgoldsch
9,109 Views

Adam,

Data ONTAP enforces a specific configuration for this model of Data ONTAP-v: two CPUs, 4GB of memory.  Increasing memory or the number of vCPUS isn't possible.

julianwood
9,109 Views

I can understand from a ONTAP point of view why this was done, not giving the CPU halt instructions means more IOPS as it seems adding the additional CPU requests takes processing away from other things. This is how physical filers work so they've taken the same code.

Unfortunately this just isn't going to cut it in a production environment. ONTAP Edge is for remote offices and production workloads, probably going to be running on a single ESXi host without shared storage. A typical branch office server may have 2 CPUs with 4 cores per CPU. This means the VSA will be using a quarter of all resources available on the host and without remote physical shared storage and being only on a single host, there will be no vMotion or DRS to move it anywhere which will dramatically reduce the capacity of the host for other VMs.

So, you may land up having to purchase additional ESXi hosts.

Still think the VSA is a fantastic step forward but we see how difficult it can be to move things from the physical world especially with appliances where you assume complete control of your physical device to the virtual world where sharing is caring.

Hopefuly the boffins at NetApp are working on this and can silently insert those CPU halt instructions without affecting IOPS or find another way but currently this is certainly an issue.

coon
8,136 Views

Julian,

I'd be curious for your feedback on performance expectations with that setup. It's a partial picture the way you've drawn it.

I have:

- 2 CPUs (4 cores per CPU)

- ONTAP running in a VM

- Something else running on there (15 VMs with different apps?)

- Performance requirements and expectations = ?

Since storage I/O usually has a cascading effect on performance (making the other virtual machines work harder retrying I/O and duplicating traffic as well as slowing down the application that the whole environment was set up for) would you want them to equally be able to compete for virtual resources?

madden
8,136 Views

Hi Julian,

From a solution perspective I agree with you 100% but knowing how Data ONTAP works, especially when performing a consistency point (CP), having sufficient resources and completing CP in a timely manner is essential.  If Data ONTAP was CPU starved I can imagine that CPs might not be able to complete in a timely manner which has ill effects for both the Data ONTAP instance (quite possibly a panic) and storage clients.  My guess is jgoldsch's comment about "reducing the resources available will lead to adverse behavior in WAFL" includes this and possibly other adverse behaviors.  So the idea of setting a CPU limit using a resource pool is a risky and unsupported thing to do...at least outside the lab.

My guess is that some smaller guaranteed CPU allocation might be doable, but having no guarantee at all is probably a non-starter.

Maybe jgoldsch can comment?

alok858
8,136 Views

I have deployed this evaluation(811_v_eval) in my test environment and is experiencing exact issue. Altering the resources won't let the OnTap boot. Alarms configured on my lab DC keeps flooding in. Is there a fix available for this? Btw, I downloaded the ova just a week back.

Also, is there a solution for 505 error? I can't seem to get in. Tried from various operating systems, both 32 & 64.

URL: http://10.X.X.245/na_admin/

Error 500

Servlets not enabled

coon
8,136 Views

Alok,

The KB article I published linked here: https://communities.netapp.com/docs/DOC-23709 is still the most current word from NetApp on the CPU alarm.

Limiting Edge resources is not recommended because it can cause data loss.

NetApp continues to work in a compromise that allow the performance most people expect from Data ONTAP running in a virtual machine without the alarms.

I'm not familiar with the 505 error you're referring to, but depending on your experience, I'd steer you toward downloading the free OnCommand System Manager for a GUI interface for Edge.

Public