2010-11-11 01:55 AM
In my company, we are using vcs cluster running oracle and the storage is on the Netapp filer FAS3020 version 7.2.4P7.
There is one time some service group on VCS went down. and at the same time the filer having high CPU usage.
I didn't capture the CPU usage at that time, But this is what i got after that:-
What puzzle me, There is no clear error in /etc/messages on the filer shows there is problem with the filer. Only the snapshot delete.
The filer some time having CPU spike, since there is no issue, we just ignore it.Please help to investigate this.
2010-11-11 07:59 PM
Anyone have any ideas? In the VCS log shows nothing abnormal. My question maybe, what can cause CPU high?
Some possibilities :-
1. Overload on the NFS --> how to check and confirm there is overload at that time?
2010-11-11 08:09 PM
It doesn't look that loaded down to me - just the normal noises in here.
I'd leave the options open - it may nat have been the filer that caused the crash.
Performance is an end to end thing with complex environs in this day and age.
How about some 'sysstat -x 1' output ?
I hope this response has been helpful to you.
At your service,
(P.S. I appreciate points for helpful or correct answers.)
2010-11-12 12:49 AM
Currently i receive this error from the console.
No change been done. and the filer having slowness right now. Can anyone tell what it is, tried to google found nothing useful.
2010-11-12 01:12 AM
This is the output of "ps" command. After sort it out base on CPU.
2 RR i 60% 368 8% idle_thread1
601 RR k 41% 9640 29% wafl_hipri
1 RR i 23% 368 8% idle_thread0
251 RR n 13% 3552 10% nfsadmin
184 BR r 12% 4608 28% raidio_thread
76 RR n 7% 3768 11% 10/100/1000-VI/e0a
72 RR n 6% 3544 10% 10/100/1000-V/e1a
63 BR s 5% 2672 32% ispfc_main
73 BR n 5% 3316 10% 10/100/1000-V/e1b
2010-11-13 01:30 AM
Next time it happens drop into priv set advanced and run a statit -b wait a little bit (30 seconds or so) and run a statit -e post the results.
Is this NFS connected? if so and nfsstat would help.
On a side note 10k op/s would have to be getting close to the max a FAS3020 could do? (dont know havent really worked with them much)
2010-11-14 07:04 PM
Right now the CPU is running normal. on the statit command , what particular event should i focus to?
You mention 10k op/s would have to be getting close to the max a FAS3020, How can i calculate what is the max it can go.
Sorry for newbie question.
2010-11-14 09:34 PM
Never apologise for being a newbie, everyone was a newbie once This is IT face it everyone becomes a newbie all over again every 3-5 years
The statit gathers a pile of useful stats for troubleshooting performance issues, its not quite as hardcore as a perfstat but it has alot of good info for working out where a performance bottleneck maybe.
Run it when you hit the next high CPU event.
There are a few questions to ask too, how often do they happen? do they happen regularly? the last performance issue i saw was on a FAS2040 turns out it was due to a DBA running a database dump into the snapinfo lun at a certain time every day.
The op/s comment was more a guess based on what i've seen FAS3050 and 6080's do. How many "op's" a box can do is dependant on a lot of things the least of which is what size.
2010-11-15 05:41 PM
Thanks for the advice.
From DFM the CPU spikes few times last month and this month, but not at the same timing.
Yes, the filer has oracle run on VCS in it, i might need to check on oracle after this.
Just another question, is there any possibilities too big size of volume may cause the slowness and latency?
Just throwing a wild guess cause i found one bigger size of volume was there.
thanks for the feedback.