FAS2040 cpu peak between 13:00-13:10

penguinssh · ‎2011-02-20

Hello guys,

We have a FAS2040 hosting VMware luns and Exchange luns using ISCSI. The environment is rather small and performance is nice overall. However, we've noticed that everyday, the both CPUs of the second controller peak at about 90-95%. This results in a drop of overall IOP/s and affects all the LUNs. This occurs precisely between 13:00 and 13:10. Not before, not after.

No specific jobs are supposed to be running at that time either on the Netapp side, Exchange side or VMware side (including servers). I don't see any tasks that would do this other than one on the NetApp controller.

I've attached some diagnostics command in priv set diag.

sysstat -x 2

sysstat -M 1

and so on.

I've determined that the Kahuna domain was taking about 70-80% of the CPU between 13:00 and 13:10. It normally takes about 10-15%. Now, I don't know where to look to determine what is running behind the Kahuna process.

Your help in pinpointing where is the issue or the task running would be greatly appreciated.

Thanks.

radek_kubka · ‎2011-02-20

Hi and welcome to the Communities!

Quick stab at your problem - any chance any dedupe scans are (rather awkwardly) scheduled to start at 13:00?

Regards,
Radek

penguinssh · ‎2011-02-20

Thanks for the welcome

Nope nothing is set for deduplication at 13:00.

nastorage2*> sis status

Path State Status Progress

/vol/Vol1Vmware Enabled Idle Idle for 12:34:55

/vol/Vol2Vmware Enabled Idle Idle for 10:51:14

Do you know if there is a way to check if something is scheduled to run at a specific time on ONTAP? Apart from that, we have snapshots running at 12:00PM but they all finish around 12:02-12:05 and they don't cause this CPU peak!

radek_kubka · ‎2011-02-20

How about any snapmirror updates? Nothing?

penguinssh · ‎2011-02-20

Nope nothing in Snapmirror or Snapvault.

Is there a way to see what processes are behind Kahuna domain? Or see any scheduled processes?

radek_kubka · ‎2011-02-20

Is there a way to see what processes are behind Kahuna domain? Or see any scheduled processes?

Not that I know of - but we will see what other folks say.

roman_verysell · ‎2011-02-20

Reallocation is enabled and 'reallocate schedule' is applied to 13:00 (1:00pm instead of 1:00am), maybe?

rorzmcgauze · ‎2011-02-21

Dont believe there is a way to see what its doing but Kahuna looks after WAFL, RAID tetris, clustering and admin commands to name the main parts, and code that needs to run serially.

rorzmcgauze · ‎2011-02-21

Looks like something is hammering the disks at that time and the system is having problems keeping up and is constantly doing consistancy points. Also the disk utilisation is very high even for a SAS/FC system. Seems strange tho you are only seeing this during that specific time.

As this is a 2000 system have you downloaded DFM and used ops mgr (inc lic with system)? in there is the Performance advisor module which would help with the monitoring of the system.

penguinssh · ‎2011-02-21

Found it!

I looked through the NetApp Management Console and one of the luns would bump the total throughput/sec from ~1 000 000 to ~50 000 000 bytes per seconds. That lun is a VMware lun which is hosting a SQL Server. For some reason, the DBA scheduled transactional logs backup and a database dump at that specific time.

I'll have to talk to the DBA for this. Thanks all for your help, it helped me understand more Netapp controllers and how awesome they are