I need to setup performance monitoring in Operations manager to get a better handle on my storage systems performances.
I thought I would start at the lowest level with my disks,
I have an aggregate with 12 data disks SATA 7200 RPM. I ham going on the assumption that 50 IOPs pers this disk type is a good measure for performance. If these
disks go over 50 IOPs then I want to know.
So In Operations manager (via NetApp mamnagment console), how can I setup a threshold for this? It looks like I can setup a threshold with a counter selection as "total_transfers". Is this the same as IOPS????? should I set this to 50 for disk???
Am I on the right track????
Also tis there a way to do this at the aggregate level? for example my max aggreagte iops would be 600 ( 50 times 12).
You are on the right track. total_transfers for both the "disk" and "aggregate" counter group refers to the total number of IOPS. However, attempting to create thresholds and alarms on individual disk drives is going to require far too much management overhead. You'd be better off defining threshold and alarms at the aggregate level. An easy guideline is to take your expected IOPS per disk drive, and multiply it by the number of *data* disks in the aggregate. So in your 12-drive SATA aggregate, you'd expect the aggregate to deliver (10 data disks x n IOPS per disk = nnn IOPS for the aggregate). I think your estimate of 50 IOPS per SATA disk is a bit conservative. I calcuate that these drives could do ~100 IOs per second (1 / (1 / (7200 RPM / 60) + .002 )). So if I were doing this, I'd create a threshold of (10 data disks x 100 IOPS per disk) = 1000 IOPS. If my aggregate is doing more than 1000 IOPS, I'd expect latency to start rising to un-acceptable levels and I'd want to be alerted.
Monitoring latency of the volumes is your best metric for monitoring performance. If latency starts to rise above acceptable levels, its the best indicator that something else is too hot in the storae controller (IOPS of the aggregate, throughput of the interface, CPU utilization, etc....). "Acceptable levels" is unique to each customer's environment, and each application, so you need to know what these latency values should be.
You can do your own calculation to find IOPS. The first thing you need to do is calculate your drives rotational latency. We can get that from the drives RPM. If you do maximum rotational delay that will be 60 / drive RPM (7200 in your case), but if you want to do the average delay you can do 30 / 7200. So we can assume the average latency for a 7200 RPM drive will be about 4.2ms,and the maximum will be 8.3. The next step will be to input this formula to get the drive IOPS: 1 / (average latency in ms + average seek time in ms). You can look up the manufacturers info on the drive to get the seek time, but often times it is printed on the drive label. So I'll just guess and say the seek time on that drive is about 8ms. The formula will look like 1 / (.0042 + .008) which will yield 83 IOPS. Worst case scenario (meaning the drive always has to do a full rotation to read or write its data) would be 61 IOPS.
Maximum Roational Latency = 60 / RPM
Average Roational Latency = 30 / RPM
IOPS Calculation = 1 / (average latency in ms + average seek time in ms)
I got 100 IOPS per disk SATA using one aggregate with 69 disks sata of the 3TB 7200. My aggregate got 8000 IOPS with many types of the volumes configured.
I configured volumes to environment oracle rac, volumes to vmware datastore and exported many volumes like nfs to mysql.
But when you configure your volume to your application be sure to don´t configure misalignment luns on the your operation system, because misaligned lun cause a poor performance on the aggregate and controller.
I executed on the pass one consult on the Customer which have on aggregate with 14 disk satas configured to lun via FCP and this luns was configured misalignment causing down throughput to 60 Mbps, and when we configured correct lun alignment on the SO the througput got 120 Mbps.