2016-09-23 10:18 AM
I'm running OCUM 5.2P1 on RHEL 5.11 against 4 HA Pairs of various hardware, all running 8.2.3P3 7-mode. I see this issue on one controller of one HA pair - all other controllers look fine.
I went into the management console to look at volume performance for this one controller, and the latest info displayed was from several months ago (Jul 31). All the physical components (aggrs, vifs, procs) for this controller have current data. All the components of all the other controllers DFM monitors (including volumes) have current data.
I ran dfm host diag for the controller, everything looks fine. Perf Advisor Transport shows HTTPS Ok and a green data collection status. Although, because the controller is getting other stats, I don't really suspect a communications issue.
If I check the enabled volume counters between this controller and one that is working, they match.
I run dfm perf data list for the controller, and all counter groups have current records except for the volume group, whose newest record is July 31.
On July 31 we upgraded this HA pair from 8.1.2P3. it seems something happened during that upgrade (on just that one controller) to stop volume stats from updating.
Anyone have any idea how I would re-enable those?
I appreciate any input...
2016-09-23 11:32 AM
I found a command to enable counters - dfm perf data enable. Even though perf data describe showed all counters enabled for the volume group, I tried enabling them again with dfm perf data enable <host> volume all, which failed for "unknown reasons". I found in dfmserver.log:
Failed to add columns to <path>/perf_103_3801_178: No space left on device, even though there was plenty of space in the filesystem. The file in question is the file that perf data describe shows as belonging to the volume data for the host in question, and was 4.3G.
Thinking I'd hit a file size max, I reduced the retention for the group with dfm perf data modify -f -G volume -o <host> -r 20days (from 30 days), which worked - perf data describe shows the oldest record 10 days newer than before, and the file shrank to just over 3G. I ran dfm perf data enable again, which came back successful, but I still don't seem to be updating that group - perf data describe still shows the newest record as July 31. I've restarted DFM.