To monitor my Exchange 2003 MS clusters, I report on average lun latency for the database storage group luns. I uses DFM to collect the data then export to Excel to get an average value which I record and report on each month. This shows the trend and gives me some insight into if things are getting better or worse. Can you recommend any better methods or KPI I should be looking at?
As far as the method used to collect and monitor performance trends I honestly think it comes down to whatever works best. The method your currently using works. The Exchange performance analyzer is another good way to get an insight as to performance. Systems center is another good way to get at the information.
As far as just monitoring the performance monitoring averaged read and write latency and ensuring both are under 20ms with spikes no higher than 50ms is one of the key things. One of the other things I like to watch is the Averaged RPC latency. While this isn't completely disk related, it can be a good indicator of client side performance. This should be kept at under 50 ms. Once this starts to creep higher clients may notice connection issues. This impact can be seen expecially with clients in online mode.
The Averaged RPC latency is the measure of how long it takes to for a packet to be processed. Below is the description on the counter from Microsoft taken from the Exchange Server Analyzer tool.
The RPC Averaged Latency performance counter records the average time, in milliseconds (ms), that it takes for the last 1024 packets to be processed. The latency represents how long it takes from the time the Store.exe process received the packet to the time it returned the packet. The RPC Averaged Latency performance counter does not include any network latency or any latency that is introduced by anything other than the Store.exe process. Although the RPC Averaged Latency performance counter data does not include network transit time, it does provide data about the shortest time period that client computers have waited for a response from the server. If the RPC Averaged Latency performance counter data is lower than 50 ms, the server can process the requests in a reasonable amount of time. If the counter stays greater than 50 ms for more than several seconds, this indicates that the server is having difficulty keeping up with the load. As a result, users may experience delays when accessing their e-mail. If average latency is greater than 100 ms, users will receive the following pop-up window from their Microsoft Office Outlook® client computers: "Retrieving data from Exchange Server."
This is monitored from perfmon as part of the MSExchangeIS object.