Subscribe

ONTAP Edge High CPU Utilization

[ Edited ]

I just installed ONTAP Edge and it runs 100% CPU across both CPU's as shown in vCenter but when I run sysstat in the console it shows 0% to 1% CPU utilization.  Is this normal behavior?  Will ONTAP Edge always burn 2 cores?

Re: ONTAP Edge High CPU Utilization

Yes it is normal behavior

Re: ONTAP Edge High CPU Utilization

Thanks ashrut. This makes the product unusable for my environment.  I home this will be fixed at some point.

Re: ONTAP Edge High CPU Utilization

yes, a hog indeed.  you can mitigate this by putting it into a resource pool and placing a limit on CPU consumption.

Re: ONTAP Edge High CPU Utilization

Yeah I ended up putting a limit on it and it works fine, still seems like a silly workaround.

-Jason

Re: ONTAP Edge High CPU Utilization

Jason,

As others have stated this is expected: Data ONTAP Edge requires two cores for operation.  This is architectural requirement for Data ONTAP and reducing the resources available will lead to adverse behavior in WAFL.

The reason VMware shows the CPUs at 100%, but sysstat shows something which is more accurate with regard to the current load on the system, is how VMware scheduling interprets the activity of the CPU.  Data ONTAP does not perform the halt CPU instruction when idle and this makes the idle thread appear to be hot on a vCPU.  This leads to VMware always reporting 100%.  Our performance measurements found that performing the halt instruction when idle had a definite impact on total IOPS for various tests.  Not performing halt when idle is the same behavior as on physical NetApp filers.

It was unclear from your post if you are talking about the evaluation version of Data ONTAP Edge or the production version.  The production version enforces a CPU and memory resource reservation for the virtual machine.  This ensures that Data ONTAP Edge will always get two vCPUs and 4GB of memory.

I hope this helps.

Re: ONTAP Edge High CPU Utilization

I tried to double the allocate more resources to the ONTAP-v virtual machine increasing ram from 4GB -> 8GB and 2 vCPU to 4vCPU.

NetApp has checks on boot and the virtual machine fails to boot anymore? Will increasing resources make performance any better?

Re: ONTAP Edge High CPU Utilization

Adam,

Data ONTAP enforces a specific configuration for this model of Data ONTAP-v: two CPUs, 4GB of memory.  Increasing memory or the number of vCPUS isn't possible.

Re: ONTAP Edge High CPU Utilization

I'd hardly describe this as 'normal behaviour' - it may be expected by Netapp but for everyone else it's likely to be a problem for various reasons;

  1. If the VMware scheduler doesn't correctly understand the physical CPU usage it will mess with the scheduler's allocations given to other VMs on the same host. Is it a recommendation to run this on a dedicated host?
  2. Any monitoring tools, scripts, alarms etc configured at the VMware level will be triggered - the only option I can see would be to explicitly exclude the EdgeVSA but then that introduces a gap in visibility - if the EdgeVSA really does fail you'll need different tools to track it.
  3. Presumably as no HLT instruction has been sent to the CPU it's actually physically 'working', generating heat and consuming power even if the Edge VSA isn't doing anything with it?

This wasn't the case for the Netapp Simulator which has been around for years. What's changed? NT4 running multiple processors used to have this issue too but I assumed we'd left that kind of problem behind years ago. I tried putting this in my lab which only has two hosts and it  seems to kill the pCPU - I believe other VMs were impacted but will recheck.

Ed

Re: ONTAP Edge High CPU Utilization

I can understand from a ONTAP point of view why this was done, not giving the CPU halt instructions means more IOPS as it seems adding the additional CPU requests takes processing away from other things. This is how physical filers work so they've taken the same code.

Unfortunately this just isn't going to cut it in a production environment. ONTAP Edge is for remote offices and production workloads, probably going to be running on a single ESXi host without shared storage. A typical branch office server may have 2 CPUs with 4 cores per CPU. This means the VSA will be using a quarter of all resources available on the host and without remote physical shared storage and being only on a single host, there will be no vMotion or DRS to move it anywhere which will dramatically reduce the capacity of the host for other VMs.

So, you may land up having to purchase additional ESXi hosts.

Still think the VSA is a fantastic step forward but we see how difficult it can be to move things from the physical world especially with appliances where you assume complete control of your physical device to the virtual world where sharing is caring.

Hopefuly the boffins at NetApp are working on this and can silently insert those CPU halt instructions without affecting IOPS or find another way but currently this is certainly an issue.