ONTAP Discussions

CPU getting killed on v3240 w/ONTAP 8.01 7-Mode

ogdenclinic

We recently installed a new v3240.  I have created all of 4 2.5 TB LUNs connected to two physical Win 2008 R2 hosts.  There is hardly any I/O going on as you can see from the output of sysstat-x below.  Dedupe is not configured for any of the volumes, snapmirror is not running, and compression is turned off.  We only use FCP--no NFS nor iSCSI.  WTF is killing my CPU here?  The chassis fans are blowing at full bore.  Thanks.

sysstat -x -c 10 1 output:

CPU    NFS   CIFS   HTTP   Total     Net   kB/s    Disk   kB/s    Tape   kB/s  Cache  Cache    CP  CP  Disk   OTHER    FCP  iSCSI     FCP   kB/s   iSCSI   kB/s
                                       in    out    read  write    read  write    age    hit  time  ty  util                            in    out      in    out
100%      0      0      0      75       1      1      32     24       0      0     1    100%    0%  -     3%      24     51      0     201    344       0      0
100%      0      0      0      38       1      1       0      8       0      0     1    100%    0%  -     1%       0     38      0     259     50       0      0
100%      0      0      0      41       1      1       0      0       0      0     1      -     0%  -     0%       0     41      0     128      1       0      0
100%      0      0      0      28       1      1      24     24       0      0     1    100%    0%  -     4%       0     28      0    1793     33       0      0
100%      0      0      0      31       1      1      48    296       0      0     1    100%   12%  Tf   12%       0     31      0     163     83       0      0
100%      0      0      0     115       1      1     276    252       0      0     1    100%   15%  :    10%       0    115      0     624    452       0      0
100%      0      0      0      46       1      0      24     28       0      0     1    100%    0%  -    13%       5     41      0     237     66       0      0
100%      0      0      0       6       1      0       0      4       0      0     1    100%    0%  -     2%       0      6      0      17      0       0      0
100%      0      0      0      93       1      1       0      0       0      0     1      -     0%  -     0%       0     93      0    1460     27       0      0
100%      0      0      0      13       1      0      24     24       0      0     1    100%    0%  -     4%       0     13      0      36     17       0      0

sysstat -m -c 10 1 output:

ANY  AVG  CPU0 CPU1 CPU2 CPU3
10%  60%   85%  74%  76%   5%
  9%  60%   85%  74%  76%   5%
14%  61%   85%  75%  77%   9%
12%  61%   85%  74%  77%   7%
10%  60%   85%  74%  76%   6%
15%  61%   85%  74%  76%   9%
11%  60%   85%  74%  77%   6%
12%  60%   85%  74%  77%   6%
15%  61%   85%  74%  77%   8%
14%  61%   85%  74%  77%   7%
25 REPLIES 25

Re: CPU getting killed on v3240 w/ONTAP 8.01 7-Mode

vmsjaak13

Try:

priv set advanced

then run:

ps

This might point you to the culprit.

If this continues, I'd create a call with NetApp support.

Regards,

Niek

Re: CPU getting killed on v3240 w/ONTAP 8.01 7-Mode

thomas_glodde

https://now.netapp.com/NOW/download/software/ontap/8.0.1P3/

There are several CPU related bugs fixed, you might want to give P3 a shot.

Re: CPU getting killed on v3240 w/ONTAP 8.01 7-Mode

ogdenclinic

We're indeed running P3.  I should have been more clear in my original post.

Re: CPU getting killed on v3240 w/ONTAP 8.01 7-Mode

ogdenclinic

vmsjaak13 wrote:

Try:

priv set advanced

then run:

ps

This might point you to the culprit.

If this continues, I'd create a call with NetApp support.

Regards,

Niek

I ran ps and am now more confused than ever.  If I'm reading the output correctly--which is in the attached text file--it shows that idle threads are using all the CPU time.  I know idle threads should be using the CPU when the CPUs are idle, but they most certainly are not idle right now.  I'm going to open a support ticket regardless.

sysstat -c 5 -x 1

CPU    NFS   CIFS   HTTP   Total     Net   kB/s    Disk   kB/s    Tape   kB/s  Cache  Cache    CP  CP  Disk   OTHER    FCP  iSCSI     FCP   kB/s   iSCSI   kB/s
                                       in    out    read  write    read  write    age    hit  time  ty  util                            in    out      in    out
99%      0      0      0      60       0      1       8     32       0      0    26    100%    0%  -     5%       0     60      0     250    181       0      0
100%      0      0      0      79       1      0       0      0       0      0    26    100%    0%  -     0%       0     79      0     294    382       0      0
100%      0      0      0      31       0      1      16      0       0      0    26    100%    0%  -     0%       0     31      0      34    156       0      0
100%      0      0      0      64       9     11       8     24       0      0    26    100%    0%  -     3%       3     61      0     115    255       0      0
100%      0      0      0     108       0      1       8      0       0      0    26    100%    0%  -     0%       0    108      0     181    648       0      0

Re: CPU getting killed on v3240 w/ONTAP 8.01 7-Mode

shaunjurr

Hi,

If you have a chance to generate an NMI-panic via the "RLM" (can't remember what the new name on the 32xx series is at the moment) then you can get a coredump to send along with your case and then this should get cleared up in a much more concrete way by engineering.

It will "crash" the filer "on purpose" to get a stateful coredump of what is going on.  If you have a cluster, it will failover and you could, in theory, do this in a lightly loaded production environment if the host timeouts are set correctly.  YMMV  (your mileage may vary  😉  )

Good Luck.

Re: CPU getting killed on v3240 w/ONTAP 8.01 7-Mode

dean

We're seeing this same behavior on 8.0.2. Have you found any more info on this?

Re: CPU getting killed on v3240 w/ONTAP 8.01 7-Mode

c_beseler

I have the same issue on Ontap 8.1RC3. I will try to make a cluster failover tonight. I hope the restart will fix the problem.

Re: CPU getting killed on v3240 w/ONTAP 8.01 7-Mode

c_beseler

it doesn't help. same behavior after restart. one cpu is consequently at 100%. no sis, no compression, nothing...

Re: CPU getting killed on v3240 w/ONTAP 8.01 7-Mode

dimitrik

Just looking at CPU is not a reliable indicator of performance. We will use CPUs opportunistically for all kinds of things.

You have to tie CPU to I/O. Your examples show very little I/O. Almost none.

This reminds me of another thread where someone thought their 6280 was slow because of the CPU.

They were able to add 3x the workload before noticing an increase in latency - all the time, CPU was high...

Add some real load to the system and see how it works please.

Re: CPU getting killed on v3240 w/ONTAP 8.01 7-Mode

c_beseler

I understand your thought. But let me provide an example: Each controllers of a FAS2040 has a dual-core CPU. On our second controller, one core is constantly at 100% load. The other core seems to handle all request of the netapp. By the way: at this time the Fabric Manager sends continous messages with "CPU too Busy". Anyway, the FAS2040 respond all request in a "normal" speed. So, your thought could be correct. But, if I activate now a DeDup Job on only one Volume on this controller. The leftover core jumps on 100% load and the netapp stop working and give no answer in an acceptable time. Normally, your are able to run a dedup job on a controller and one core is working on this job and the other is doing the rest. And this is not acceptable. Our storage system has no heavy workload and at night it has nearly nothing to do. But if i still have constantly 100% load on one core, it is not possible for me to run a dedup job. And that's an essential feature for us.

By the way: I could see under the "ps" command a process called "wafl_lopri" which run with 100%.

Best Regards,

Claudius

Re: CPU getting killed on v3240 w/ONTAP 8.01 7-Mode

dimitrik

Thanks for adding some info, Claudius.

In that case please open a ticket with support - they need to figure out what's happening. What  OS rev?

Regarding dedupe: The idea is that you run it when the system is not busy doing other things. So, I'd rather you run it before you start snapmirror sessions, or do any serious I/O, and, for efficiency reasons, before doing snaps.

However, even dedupe should back off if you want to make a request from the system.

Maybe you can try wafltop...

priv set -q diag

options stats.wafltop.config volume,process,message

wafltop start

(wait 5 min)

wafltop stop

options stats.wafltop.config off

You'll get an idea of what's happening in the system, and at the very least, if you can't understand what it's saying, the support personnel will get a head start.

Also be prepared to give support a perfstat pls.

D

Re: CPU getting killed on v3240 w/ONTAP 8.01 7-Mode

c_beseler

Ok.. Support Case is open. I'll report any new progresses.

Regards,

Claudius

Re: CPU getting killed on v3240 w/ONTAP 8.01 7-Mode

c_beseler

Netapp Technical Support Engineers analyzing the data now. They said "It seems a newly discovered bug might apply here."


exciting 😉

Re: CPU getting killed on v3240 w/ONTAP 8.01 7-Mode

dimitrik

Doesn't it feel great to be the first? Send me the support case # pls. Dimitri@netapp.com

Re: CPU getting killed on v3240 w/ONTAP 8.01 7-Mode

dgwilhelm

Are you running OnCommand? There are some reports in there that may help point to the culprit.

Earn Rewards for Your Review!
GPI Review Banner
All Community Forums
Public