Identifying the cause of High CPU on 1 Proc

russbbass · ‎2010-06-21

I am running sysstat -m and I am seeing one processor that always seems to be running at about 85% to 95%. I would love to identify the cause.

What are people using to identify the process that would be causing that?

Russ Bass

Univita Health

4x3160s

benjstarratt · ‎2010-06-21

One of the NetApp gurus told me that some processes are not as well threaded as others which causes the asymmetric CPU utilization. I believe it is possible to really see what is going on by doing a "priv set advanced" and then using the ps command but I haven't really tested that. I've also anecdotally been told that multi-threading is better on 7.3.2 and later.

kris_boeckx · ‎2010-06-22

Hi,

Use this with care

priv set diag

sysstat -M 1

Now you can see what's going on on your filer. You will see several items (called domains). Probably the "Kahuna" domain is hogging your CPU ...

The Kahuna domain contains "all the rest" that is not mentioned seperateley (wafl task, snapmirrir, deduplication and other system tasks are part of the kahuna domain - we still have ontap 7.2.x)

As of Ontap 7.3, some tasks are seperated from the kahuna domain. I don't know which tasks.

See for yourself and use with care.

If you don't understand what's going on on your filer, open a support case and the will ask a perfstat. Netapp support can see what's going on

Greetings,

Boeckx Kris

Pidpa

amiller_1 · ‎2010-06-23

If you have Operations Manager setup, you can also use Performance Advisor inside the NetApp Management Console (NMC) to see something of a graphical breakdown of what sysstat gives you.

radek_kubka · ‎2010-06-23

Hi Russ and welcome to Communities!

What ONTAP version are you running? As already said in this thread: 7.3.2 onwards is dealing with multithreading better than previous versions, so if you are on anything prior to 7.3.2, ONTAP upgrade may solve the issue.

Regards,

Radek

russbbass · ‎2010-06-23

We are running 7.3.2. It always seems to be the 4th proc that is the highest.

The ps cmd doesn't show anything consuming significant amounts of CPU except the idle_thread* process.

Russ Bass

Univita Health

4x3160s

v7.3.2

russbbass · ‎2010-06-23

Boeckx response does show Kahuna consuming 50-60% CPU, which corresponds to CPU3.

eric_barlier · ‎2010-06-23

did you run

priv set diag

sysstat -m 1

can you post some of it here?

Eric

kris_boeckx · ‎2010-06-23

Hi Eric,

You have to use a capital "M" like "sysstat -M 1"

A "sysstat -m 1" will show you:

ANY AVG CPU0 CPU1 CPU2 CPU3

A "sysstat -M 1" will show you: (for ontap version prior to 7.3.2)

ANY1+ ANY2+ ANY3+ ANY4+ AVG CPU0 CPU1 CPU2 CPU3 Network Storage Raid Target Kahuna WAFL_Ex(Kahu) Cifs Exempt Intr Host Ops/s CP

Greets,

Kris

eric_barlier · ‎2010-06-24

Hi Kris,

I am aware of the difference between capital and small m. I personally prefer m to M, thats all really. For us to help further we d need to know

more about controller type/model for starters and also what type of workload is being served off this controller. If ESX is served I d be looking

for misaligned file systems straight away, that ll cause kahuna to work extra. it seems virtually everybody suffers from this 🙂

Cheers,

Eric

kris_boeckx · ‎2010-06-24

Hi,

These are the commands I use to troubleshoot performance issues:

always start with "priv set diag"

sysstat -M -i 5

--> already explained

sysstat -x 5

--> shows you the different I/O ("Disk util" is an importend one and also "CPty" see http://now.netapp.com/NOW/knowledge/docs/ontap/rel707/html/ontap/cmdref/man1/na_sysstat.1.html)

lun stats -i 5

--> shows you the read / write / latency's of luns

stats show lun

--> shows you detailed info of every lun (you will want to capture this in an output file)

stats show volume

--> same as lun but now for the volumes (you will want to capture this in an output file)

reallocate status

--> shows if any reallocation jobs are running (walf scan status shows you even more info)

If this is not enough you can get some info with statit

a "statit -b" will start the data collection (wait a few minutes)

a "statit -e" will stop the collection and will give you the result. (you will want this to capture in an output file)

If you want to capture the output in a file, connect via PUTTY to the filer. In PUTTY, you can specify an capture file.

Greetings,

Kris Boeckx

davgardner · ‎2012-05-18

After setting the "priv set diag" once your done is there a command you need to use to take it out of the "priv set diag" mode?

kris_boeckx · ‎2012-05-18

Type "priv set" to go back to admin mode (= normal mode).

Also when you exit the console session, the priv set diag is reset to normal.

You can see the difference between "diag" and "admin" priviledge by the "*" (asterix) that is there after your filer's name in "diag" priviledge mode.

SIRTECHIE42 · ‎2012-12-11

Kris,

Can you shed some light on why a controller will show constant 99% CPU util with a >sysstat , and then an average of 50-60% with a > sysstat -m (all individual CPU's posting no higher than 70%) ?

There is some great info in this post!!

Thanks.

Brandon

shane_bradley · ‎2012-12-11

Sysstat basic is showing the peak of the highest single core over the sample period. systat -m is showing you average per core load.