ONTAP Discussions
ONTAP Discussions
I'm looking for suggestions on troubleshooting an issue we are seeing on our FAS3170 on DOT 7.3.7P3. It's been running fine, but today we noticed that one of the CPU cores is pegged at 100%, with the Network% at 115+%. The filer is in a HA pair, and the partner is running fine while processing more Ops.
Here's snipet from sysstat
sysstat -M 5
ANY1+ ANY2+ ANY3+ ANY4+ AVG CPU0 CPU1 CPU2 CPU3 Network Storage Raid Target Kahuna WAFL_Ex(Kahu) WAFL_XClean SM_Exempt Cifs Exempt Intr Host Ops/s CP
100% 36% 21% 15% 43% 25% 24% 23% 100% 114% 12% 8% 3% 5% 26%( 16%) 0% 0% 1% 3% 2% 0% 3397 0%
100% 45% 28% 19% 48% 28% 32% 32% 100% 115% 13% 11% 3% 11% 28%( 17%) 4% 0% 2% 5% 2% 0% 3415 32%
100% 42% 26% 19% 47% 29% 31% 28% 100% 116% 12% 11% 4% 7% 28%( 18%) 0% 0% 1% 6% 2% 0% 3258 69%
100% 39% 24% 18% 45% 25% 30% 27% 100% 115% 12% 8% 3% 7% 29%( 18%) 0% 0% 1% 6% 2% 0% 3029 0%
100% 46% 29% 20% 49% 29% 33% 34% 100% 115% 13% 11% 3% 12% 28%( 18%) 6% 0% 1% 6% 2% 0% 2966 31%
100% 47% 29% 21% 50% 34% 34% 31% 100% 118% 14% 13% 3% 9% 31%( 19%) 0% 0% 1% 7% 2% 0% 3832 100%
100% 38% 23% 16% 45% 26% 26% 26% 100% 116% 12% 7% 3% 7% 26%( 17%) 0% 0% 1% 4% 2% 0% 3918 10%
100% 36% 22% 16% 44% 24% 28% 25% 100% 115% 11% 8% 3% 6% 27%( 17%) 0% 0% 1% 4% 2% 0% 3537 0%
100% 67% 43% 30% 60% 44% 47% 51% 100% 121% 17% 20% 3% 22% 40%( 24%) 5% 0% 1% 10% 2% 0% 4809 89%
100% 55% 35% 24% 54% 37% 39% 38% 100% 118% 14% 16% 3% 13% 31%( 19%) 7% 0% 1% 10% 2% 0% 4218 61%
100% 53% 37% 27% 55% 40% 39% 39% 100% 120% 14% 13% 3% 12% 39%( 24%) 0% 0% 1% 14% 2% 0% 4108 63%
100% 54% 38% 29% 56% 40% 41% 42% 100% 125% 15% 11% 3% 12% 42%( 26%) 0% 0% 1% 11% 2% 0% 4752 0%
I've checked for the usual running sis processes and looked for zombie blocks.
statit showed that CPU3 spent 99% of its cycles on the nwk_legacy domain. KB3014084 says that nwk_legacy is IP processing, NFS protocol processing, hmm, ok, so I checked nfsstat next.
After clearing the counters, and enabling per client stats, we added the per volume NFS ops, and they do add up to roughly the same Ops/s as shown by sysstat which ranges between 3500 - 6000. Nothing that a FAS3170 can't handle.
DFM is still collecting stats, and the average filer network throughput is around 150mbps over the past day, which is lower than the 180-200mbps average it has seen over the past week or so.
So what could cause the CPU and Network util to be so high?
Thanks!
Solved! See The Solution
Hi,
We recommend you to open a support case to troubleshoot this issue.
You may have to provide systat -x 1 output, and also a perfstat with 5 minutes interval 20 iterations ( https://www.youtube.com/watch?v=NzSKZYJJkz4 ) once you open a case.
Thanks
Hi,
We recommend you to open a support case to troubleshoot this issue.
You may have to provide systat -x 1 output, and also a perfstat with 5 minutes interval 20 iterations ( https://www.youtube.com/watch?v=NzSKZYJJkz4 ) once you open a case.
Thanks
Hi deepuj
We ended up failing the workload over to the partner to reboot the node. The failover definitely took longer than usual because one of the CPU cores was so busy, but after the reboot and give back, everything is back to normal now.
Thanks!