Hey Jason,
You can determine disk busy by looking at the sysstat in the following manner.
As you can see below my Disk util hit 27%. If this number is high (like 80%) than your disks (spindles/aggrs) are getting hammered.
That would be when you would want to look at doing a wafl reallocate or to rebuild the aggr.
I am not sure how to SNMP this, but you could write a script to rsh into the filer and pull the stats regularly and email them to you.
ANTHONYS-FILER> sysstat -s -x 1
CPU NFS CIFS HTTP Total Net kB/s Disk kB/s Tape kB/s Cache Cache CP CP Disk FCP iSCSI FCP kB/s
in out read write read write age hit time ty util in out
0% 0 0 0 0 8 0 132 0 0 0 >60 100% 0% - 6% 0 0 0 0
0% 0 0 0 0 4 0 128 0 0 0 >60 98% 0% - 2% 0 0 0 0
1% 0 0 0 0 4 0 4 0 0 0 >60 100% 0% - 4% 0 0 0 0
0% 0 0 0 0 5 0 132 0 0 0 >60 100% 0% - 7% 0 0 0 0
0% 0 0 0 0 6 0 128 0 0 0 >60 100% 0% - 6% 0 0 0 0
1% 0 0 0 0 5 1 744 684 0 0 >60 100% 30% T 27% 0 0 0 0
0% 0 0 0 0 4 0 124 0 0 0 >60 100% 0% - 4% 0 0 0 0
0% 0 0 0 0 5 0 128 0 0 0 >60 100% 0% - 4% 0 0 0 0
0% 0 0 0 0 12 0 132 0 0 0 >60 100% 0% - 4% 0 0 0 0
0% 0 0 0 0 9 0 128 0 0 0 >60 100% 0% - 5% 0 0 0 0
0% 0 0 0 0 12 0 128 0 0 0 >60 100% 0% - 5% 0 0 0 0
0% 0 0 0 0 12 0 124 0 0 0 >60 100% 0% - 8% 0 0 0 0
1% 0 0 0 0 6 0 140 0 0 0 >60 100% 0% - 6% 0 0 0 0
--
Summary Statistics ( 13 samples 1 secs/sample)
CPU NFS CIFS HTTP Total Net kB/s Disk kB/s Tape kB/s Cache Cache CP CP Disk FCP iSCSI FCP kB/s
in out read write read write age hit time ty util in out
Min
0% 0 0 0 0 4 0 4 0 0 0 >60 98% 0% * 2% 0 0 0 0
Avg
0% 0 0 0 0 7 0 167 52 0 0 >60 100% 2% * 6% 0 0 0 0
Max
1% 0 0 0 0 12 1 744 684 0 0 >60 100% 30% * 27% 0 0 0 0
Hope this helps.
Anthony Feigl