Active IQ Unified Manager Discussions
Active IQ Unified Manager Discussions
Hi - I'm looking to find (for each aggregate in a FAS3070 Filer) how many IOPS are being delivered at peak.
I'm working on the theory that each data disk should be capable of delivering approx 180 IOPS (they're 15k FC disks), hence I can create a "theoretical" performance limit for each aggregate, (180 x number of data disks), beyond which latency is likely to suffer.
The reason I want to know is that we are considering moving to 300GB disks, but there is little point in getting extra capacity if I can't practically make use of it (I get double the capacity but I get no more performance).
My storage admnistrator has produced an Operations Manager aggregate report (OPS Manager – ‘dfm report view aggregates-performance-summary’) showing at peak a level of IOPS that would equate to approx 600 IOPS per data disk.
600 IOPS per disk doesn't sound realistic (nobody is complaining of poor latency/performance)
Could it be that the aggregate performance report is including IOPS from Cache Hits??? (In which case this is the wrong report for me - I am purely interested in Disk IOPS).
Can anyone help?
Stu
P.S. Summary below:
Filer Name | Aggregate | # Data | Theoretical | Monitored | % utilised |
Filer1 | aggr1 | 70 | 12601 | 10748 | 85% |
Filer2 | aggr1 | 68 | 12241 | 8841 | 72% |
Filer3 | aggrebg | 47 | 8461 | 10239 | 121% |
Filer4 | aggr1 | 37 | 6661 | 21168 | 318% |
Harish,
I'm still unable to see the IOPS report that I created with the DFM (Operation Manager). I don't even know where to look and how to troubleshoot this. I called NetApp support and they are usually very good and patient, but I stumbled up on a guy who told me - "Are you asking me to train you how to use DFM?" Perhaps that's was the treatment I got since I only have a demo license, until I evaluate the product and see if we should buy it or not. I gave him analogy about fast car and lack of odometer, but soon after gave up on trying to explain him my frustration and turned to online community. Let me know if you have any ideas on how to go about figuring out why I'm unable to see the data in those reports.
Thanks again for responding to my rants.
Regards,
Ivan
Ivan,
Let us start from the basics, please bear with me. I ran the following
steps that details how to
setup the storage system and how to access performance information
related to aggregates.
Please follow these steps and let me know if you encounter something
different, which could
cause performance information not being reported.
In the following examples, toaster is the name of my storage system or
filer.
1. Add a storage system to DFM.
dfm host add -N toaster
2. Set login credentials for the storage system so that it starts
collecting performance data.
dfm host set toaster hostlogin=root hostpassword=XXX
3. Make sure the login is correctly set and DFM is able to communicate
with the storage system.
dfm host diag toaster
perfAdvisorTransport Passed
4. Wait for a while (say 30 minutes).
5. Make sure aggregate data is being collected.
dfm perf data describe "aggregate basic" toaster
Counter Group: Aggregate Basic
Host Name: toaster
File Name: perf_3_1891_8
Number Records: 32
Interval (secs): 60
Max Records: 10080
Used Space (bytes): 2048
Oldest Record: Wed Jan 13 16:10:08 2010
Newest Record: Wed Jan 13 16:41:10 2010
From the above output, I can see that 32 records have been collected.
6. Run a performance reported (available by default) to view performance
data of aggregates.
dfm report view aggregates-performance-summary
Object ID Aggregate Storage System Total Ops/Sec Perf Threshold
Violation Count Perf Threshold Violation Period (Sec)
Harish,
I followed your steps down to the tee. I already have both of my filers added, but I went on and put different domain account, instead of root. Both of my filers are getting 'Passed' result for 'perfAdvisorEnabled' and perfAdvisorTransport, when I run dfm host diag. However, when I run 'dfm perf data describe "aggregate basic" {filer}' here's what I get:
X:\>dfm perf data describe "aggregate basic" chnetapp9
Counter Group: Aggregate Basic
Host Name: CHNETAPP9
File Name: perf_3_76_8
Number Records: 28
Interval (secs): 60
Max Records: 10080
Used Space (bytes): 6512
Oldest Record: Mon Jan 04 20:10:30 2010
Newest Record: Mon Jan 04 20:37:29 2010
X:\>dfm perf data describe "aggregate basic" chnetapp10
Counter Group: Aggregate Basic
Host Name: CHNETAPP10
File Name: perf_3_74_8
Number Records: 120
Interval (secs): 60
Max Records: 10080
Used Space (bytes): 20768
Oldest Record: Mon Jan 04 20:38:49 2010
Newest Record: Mon Jan 11 10:30:18 2010
As you can see it looks that first filer collected 28 and other one 120 records. The way I read this is that data gets collected every 60 seconds on CHNETAPP10. How come I have only 120 records and the oldest one is from 1/4/2010. I'd expect to have more than 120 if the interval is set for 60 seconds. Please advise.
I still do not see any data in aggregate reports, though.
X:\>dfm report view aggregates-performance-summary
Object ID Aggregate Storage System Total Ops/Sec Perf Threshold Violation Count Perf Threshold Violation Period (Sec)
--------- -------------------- -------------------------------- ------------- ------------------------------ -------------------------------------
397 ch9_450fc15k_aggr02 CHNETAPP9.domain.com
399 ch9_450fc15k_aggr01 CHNETAPP9.domain.com
401 ch9_300fc10k_aggr01 CHNETAPP9.domain.com
403 aggr0 CHNETAPP9.domain.com
92 ch10_450fc15k_aggr02 CHNETAPP10.domain.com
94 ch10_450fc15k_aggr01 CHNETAPP10.domain.com
96 aggr0 CHNETAPP10.domain.com
What am I missing here?
On a closer look I see that last time data was collected was on 1/4 and 1/11. So what's preventing perfdata to be collected?
Thanks,
Ivan
For the first storage system (or filer), data is collected for around 28
minutes. Hence 28 records is what is
expected and you should see performance data for this storage system.
Are you sure that
"dfm report view aggregates-performance-summary" CLI does not report
data for aggregates of this
storage system? In that case it appears to be a defect. Can you also
please check if you can see
aggregate data for this storage system from NMC? Let me know if you need
help accessing NMC.
For the second storage system, only 120 samples have been collected
during Jan 4 and Jan 11, which is
way lesser than expected. For a 7 day period, around 10,000 samples
should have been collected. Some
reasons I could think of for this discrepancy are:
1. DFM was not initiating data collection (process is not running, there
is no space to store data)
2. DFM is unable to contact the storage system (either the storage
system is down or its too busy
or there was a network problem).
Regards
Harish
Harish,
I’m sure there’s no data collected for either of the two filers. It looks like I was able to collect some of the data initially but that’s not working any more. “Top Aggregate” report in NMC does not have any data in it. I would expect that I can see at least some results in Operations Manager since it collected
1. How do I check if there’s a lack of space to store the data?
2. Wouldn’t I be able to see if the DFM is unable to contact the storage system. When I ran ‘dfm host diag’ I see no errors. I also have green status and “Good” login credentials for both storages in NMC > Setup Hosts.
Thanks,
Ivan
"Top Aggregates" report in NMC (or any other bar chart in NMC for that
matter) shows data collected in the last 10 minutes. If no data is
available for last 10 minutes, no date is displayed, as seen in your setup.
If there is no space to store data, "Not enough space available;
Stopping Performance Advisor" message will be logged into server.log file.
If server is unable to contact the filer, messages will be logged into
server.log file. Is it possible to send us your server.log file?
Regards
Harish