VMware Solutions Discussions
VMware Solutions Discussions
Hello guys,
We have a FAS2040 hosting VMware luns and Exchange luns using ISCSI. The environment is rather small and performance is nice overall. However, we've noticed that everyday, the both CPUs of the second controller peak at about 90-95%. This results in a drop of overall IOP/s and affects all the LUNs. This occurs precisely between 13:00 and 13:10. Not before, not after.
No specific jobs are supposed to be running at that time either on the Netapp side, Exchange side or VMware side (including servers). I don't see any tasks that would do this other than one on the NetApp controller.
I've attached some diagnostics command in priv set diag.
sysstat -x 2
sysstat -M 1
and so on.
I've determined that the Kahuna domain was taking about 70-80% of the CPU between 13:00 and 13:10. It normally takes about 10-15%. Now, I don't know where to look to determine what is running behind the Kahuna process.
Your help in pinpointing where is the issue or the task running would be greatly appreciated.
Thanks.
Hi and welcome to the Communities!
Quick stab at your problem - any chance any dedupe scans are (rather awkwardly) scheduled to start at 13:00?
Regards,
Radek
Thanks for the welcome
Nope nothing is set for deduplication at 13:00.
How about any snapmirror updates? Nothing?
Nope nothing in Snapmirror or Snapvault.
Is there a way to see what processes are behind Kahuna domain? Or see any scheduled processes?
Is there a way to see what processes are behind Kahuna domain? Or see any scheduled processes?
Not that I know of - but we will see what other folks say.
Reallocation is enabled and 'reallocate schedule' is applied to 13:00 (1:00pm instead of 1:00am), maybe?
Dont believe there is a way to see what its doing but Kahuna looks after WAFL, RAID tetris, clustering and admin commands to name the main parts, and code that needs to run serially.
Looks like something is hammering the disks at that time and the system is having problems keeping up and is constantly doing consistancy points. Also the disk utilisation is very high even for a SAS/FC system. Seems strange tho you are only seeing this during that specific time.
As this is a 2000 system have you downloaded DFM and used ops mgr (inc lic with system)? in there is the Performance advisor module which would help with the monitoring of the system.
Found it!
I looked through the NetApp Management Console and one of the luns would bump the total throughput/sec from ~1 000 000 to ~50 000 000 bytes per seconds. That lun is a VMware lun which is hosting a SQL Server. For some reason, the DBA scheduled transactional logs backup and a database dump at that specific time.
I'll have to talk to the DBA for this. Thanks all for your help, it helped me understand more Netapp controllers and how awesome they are