Hi there,
I recently saw very high CPU load (around 90%) coupled with high disk utilization of between 65-90% on one node of a clustered FAS2040 system.
The nodes are simple snapvault destinations and are not serving any data, and since no deduplication and only two snapvault transfers were running I couldn't figure out why the system was under so much load.
Performance graphs of the system showed that it normally just trotted along at 2-3% CPU utilization, but since 08:00 this morning it had started having very high load instead.
I checked the logs on the filer, not finding anything suspicious in messages or any other, except for the etc/log/volread log. It showed lines like the following, several each second:
Wed Jan 25 08:00:22 CET Ignoring stale resume trigger (Backup UUID 00000000-0000-0000-0000-000000000000)
Wed Jan 25 08:00:36 CET Ignoring stale resume trigger (Backup UUID 00000000-0000-0000-0000-000000000000)
Wed Jan 25 08:00:36 CET Ignoring stale resume trigger (Backup UUID 00000000-0000-0000-0000-000000000000)
Wed Jan 25 08:00:36 CET Ignoring stale resume trigger (Backup UUID 00000000-0000-0000-0000-000000000000)
<cut>
Wed Jan 25 11:53:14 CET Ignoring stale resume trigger (Backup UUID 00000000-0000-0000-0000-000000000000)
Wed Jan 25 11:53:14 CET Ignoring stale resume trigger (Backup UUID 00000000-0000-0000-0000-000000000000)
Wed Jan 25 11:53:16 CET Ignoring stale resume trigger (Backup UUID 00000000-0000-0000-0000-000000000000)
Wed Jan 25 11:53:16 CET Ignoring stale resume trigger (Backup UUID 00000000-0000-0000-0000-000000000000)
Wed Jan 25 11:53:16 CET Ignoring stale resume trigger (Backup UUID 00000000-0000-0000-0000-000000000000)
These are the first and last entries in that log (of a total of 105321 lines!), and indeed the system now seems back to normal after 12:00.
Has anyone seen anything like this before? Could it happen again? This is an 8.0.2P3 7-mode release I might add.
Thanks and best regards,
Peter