I'm looking for suggestions on troubleshooting an issue we are seeing on our FAS3170 on DOT 7.3.7P3. It's been running fine, but today we noticed that one of the CPU cores is pegged at 100%, with the Network% at 115+%. The filer is in a HA pair, and the partner is running fine while processing more Ops.
I've checked for the usual running sis processes and looked for zombie blocks.
statit showed that CPU3 spent 99% of its cycles on the nwk_legacy domain. KB3014084 says that nwk_legacy is IP processing, NFS protocol processing, hmm, ok, so I checked nfsstat next.
After clearing the counters, and enabling per client stats, we added the per volume NFS ops, and they do add up to roughly the same Ops/s as shown by sysstat which ranges between 3500 - 6000. Nothing that a FAS3170 can't handle.
DFM is still collecting stats, and the average filer network throughput is around 150mbps over the past day, which is lower than the 180-200mbps average it has seen over the past week or so.
So what could cause the CPU and Network util to be so high?
We ended up failing the workload over to the partner to reboot the node. The failover definitely took longer than usual because one of the CPU cores was so busy, but after the reboot and give back, everything is back to normal now.