Re: Do IOPS on Aggregate Performance Report include IO from cache Hits? - Page 2

husselbees · ‎2009-10-20

Hi - I'm looking to find (for each aggregate in a FAS3070 Filer) how many IOPS are being delivered at peak.

I'm working on the theory that each data disk should be capable of delivering approx 180 IOPS (they're 15k FC disks), hence I can create a "theoretical" performance limit for each aggregate, (180 x number of data disks), beyond which latency is likely to suffer.

The reason I want to know is that we are considering moving to 300GB disks, but there is little point in getting extra capacity if I can't practically make use of it (I get double the capacity but I get no more performance).

My storage admnistrator has produced an Operations Manager aggregate report (OPS Manager – ‘dfm report view aggregates-performance-summary’) showing at peak a level of IOPS that would equate to approx 600 IOPS per data disk.

600 IOPS per disk doesn't sound realistic (nobody is complaining of poor latency/performance)

Could it be that the aggregate performance report is including IOPS from Cache Hits??? (In which case this is the wrong report for me - I am purely interested in Disk IOPS).

Can anyone help?

Stu

P.S. Summary below:

Filer Name	Aggregate	# Data Disks	Theoretical IOPS maximum	Monitored IOPS	% utilised
Filer1	aggr1	70	12601	10748	85%
Filer2	aggr1	68	12241	8841	72%
Filer3	aggrebg	47	8461	10239	121%
Filer4	aggr1	37	6661	21168	318%

ivan_huter · ‎2010-01-12

Harish,

I'm still unable to see the IOPS report that I created with the DFM (Operation Manager). I don't even know where to look and how to troubleshoot this. I called NetApp support and they are usually very good and patient, but I stumbled up on a guy who told me - "Are you asking me to train you how to use DFM?" Perhaps that's was the treatment I got since I only have a demo license, until I evaluate the product and see if we should buy it or not. I gave him analogy about fast car and lack of odometer, but soon after gave up on trying to explain him my frustration and turned to online community. Let me know if you have any ideas on how to go about figuring out why I'm unable to see the data in those reports.

Thanks again for responding to my rants.

Regards,

Ivan

harish · ‎2010-01-13

Ivan,

Let us start from the basics, please bear with me. I ran the following

steps that details how to

setup the storage system and how to access performance information

related to aggregates.

Please follow these steps and let me know if you encounter something

different, which could

cause performance information not being reported.

In the following examples, toaster is the name of my storage system or

filer.

1. Add a storage system to DFM.

dfm host add -N toaster

2. Set login credentials for the storage system so that it starts

collecting performance data.

dfm host set toaster hostlogin=root hostpassword=XXX

3. Make sure the login is correctly set and DFM is able to communicate

with the storage system.

dfm host diag toaster

perfAdvisorTransport Passed

4. Wait for a while (say 30 minutes).

5. Make sure aggregate data is being collected.

dfm perf data describe "aggregate basic" toaster

Counter Group: Aggregate Basic

Host Name: toaster

File Name: perf_3_1891_8

Number Records: 32

Interval (secs): 60

Max Records: 10080

Used Space (bytes): 2048

Oldest Record: Wed Jan 13 16:10:08 2010

Newest Record: Wed Jan 13 16:41:10 2010

From the above output, I can see that 32 records have been collected.

6. Run a performance reported (available by default) to view performance

data of aggregates.

dfm report view aggregates-performance-summary

Object ID Aggregate Storage System Total Ops/Sec Perf Threshold

Violation Count Perf Threshold Violation Period (Sec)

ivan_huter · ‎2010-01-13

Harish,

I followed your steps down to the tee. I already have both of my filers added, but I went on and put different domain account, instead of root. Both of my filers are getting 'Passed' result for 'perfAdvisorEnabled' and perfAdvisorTransport, when I run dfm host diag. However, when I run 'dfm perf data describe "aggregate basic" {filer}' here's what I get:

X:\>dfm perf data describe "aggregate basic" chnetapp9
   Counter Group:      Aggregate Basic
   Host Name:          CHNETAPP9
   File Name:          perf_3_76_8
   Number Records:     28
   Interval (secs):    60
   Max Records:        10080
   Used Space (bytes): 6512
   Oldest Record:      Mon Jan 04 20:10:30 2010
   Newest Record:      Mon Jan 04 20:37:29 2010

X:\>dfm perf data describe "aggregate basic" chnetapp10
   Counter Group:      Aggregate Basic
   Host Name:          CHNETAPP10
   File Name:          perf_3_74_8
   Number Records:     120
   Interval (secs):    60
   Max Records:        10080
   Used Space (bytes): 20768
   Oldest Record:      Mon Jan 04 20:38:49 2010
   Newest Record:      Mon Jan 11 10:30:18 2010

As you can see it looks that first filer collected 28 and other one 120 records. The way I read this is that data gets collected every 60 seconds on CHNETAPP10. How come I have only 120 records and the oldest one is from 1/4/2010. I'd expect to have more than 120 if the interval is set for 60 seconds. Please advise.

I still do not see any data in aggregate reports, though.

X:\>dfm report view aggregates-performance-summary
Object ID Aggregate            Storage System                   Total Ops/Sec Perf Threshold Violation Count Perf Threshold Violation Period (Sec)
--------- -------------------- -------------------------------- ------------- ------------------------------ -------------------------------------
397       ch9_450fc15k_aggr02 CHNETAPP9.domain.com
399       ch9_450fc15k_aggr01 CHNETAPP9.domain.com
401       ch9_300fc10k_aggr01 CHNETAPP9.domain.com
403       aggr0                CHNETAPP9.domain.com
92        ch10_450fc15k_aggr02 CHNETAPP10.domain.com
94        ch10_450fc15k_aggr01 CHNETAPP10.domain.com
96        aggr0                CHNETAPP10.domain.com

What am I missing here?

ivan_huter · ‎2010-01-13

On a closer look I see that last time data was collected was on 1/4 and 1/11. So what's preventing perfdata to be collected?

Thanks,

Ivan

harish · ‎2010-01-13

For the first storage system (or filer), data is collected for around 28

minutes. Hence 28 records is what is

expected and you should see performance data for this storage system.

Are you sure that

"dfm report view aggregates-performance-summary" CLI does not report

data for aggregates of this

storage system? In that case it appears to be a defect. Can you also

please check if you can see

aggregate data for this storage system from NMC? Let me know if you need

help accessing NMC.

For the second storage system, only 120 samples have been collected

during Jan 4 and Jan 11, which is

way lesser than expected. For a 7 day period, around 10,000 samples

should have been collected. Some

reasons I could think of for this discrepancy are:

1. DFM was not initiating data collection (process is not running, there

is no space to store data)

2. DFM is unable to contact the storage system (either the storage

system is down or its too busy

or there was a network problem).

Regards

Harish

ivan_huter · ‎2010-01-14

Harish,

I’m sure there’s no data collected for either of the two filers. It looks like I was able to collect some of the data initially but that’s not working any more. “Top Aggregate” report in NMC does not have any data in it. I would expect that I can see at least some results in Operations Manager since it collected

1. How do I check if there’s a lack of space to store the data?

2. Wouldn’t I be able to see if the DFM is unable to contact the storage system. When I ran ‘dfm host diag’ I see no errors. I also have green status and “Good” login credentials for both storages in NMC > Setup Hosts.

Thanks,

Ivan

harish · ‎2010-01-14

"Top Aggregates" report in NMC (or any other bar chart in NMC for that

matter) shows data collected in the last 10 minutes. If no data is

available for last 10 minutes, no date is displayed, as seen in your setup.

If there is no space to store data, "Not enough space available;

Stopping Performance Advisor" message will be logged into server.log file.

If server is unable to contact the filer, messages will be logged into

server.log file. Is it possible to send us your server.log file?

Regards

Harish