Subscribe
Accepted Solution

lun latency increasing

[ Edited ]

I have a lun which holds the SQL data file and it is showing increased lun latency.

The data is from this month's monthly report for storage, shows LUN latency increasing for the database data file LUNs.  Past experience has shown, users become affected at about the 11 ms mark and action should be taken to avoid this scenario.  So a NetApp support case was opened to try and address the performance issue.

How do I show if this is a caching or fragmentation issue?  I have been looking through the prefstat reports and verything looks healthy apart from the latency.  Not sure what "cp_dirty_allocation_blks" is but it is 1000+

Read Write   Read  Write Average   Queue  Lun
  Ops   Ops     kB     kB Latency  Length
    0     0     32      0    7.54    0.07 /vol/sqf02/diskf.lun
    0     0     37      0    7.00    1.00 /vol/sqf02/diske.lun


Read Write   Read  Write Average   Queue  Lun
  Ops   Ops     kB     kB Latency  Length
    1     0     80      0   11.31    0.08 /vol/sqf02/diskf.lun
    2     0    123      0   10.96    0.08 /vol/sqf02/diske.lun

CPU   NFS  CIFS  HTTP   Total    Net kB/s   Disk kB/s     Tape kB/s Cache Cache  CP   CP Disk    FCP iSCSI   FCP  kB/s iSCSI  kB/s
                                  in   out   read  write  read write   age   hit time  ty util                 in   out    in   out
20%     0  1541     0    2140   376  5881  12354   7140     0     0    19   94%  16%  F   28%    595     2  9228  5290     1     0
26%     0  1162     0    2172   498  3993  14646  20264     0     0    18   95%  39%  3f  31%   1006     2 18393  8112     1     0

lun:sqf02/diske.lun-XXXXZZZZZ:display_name:/vol/sqf02/diske.lun
lun:sqf02/diske.lun-XXXXZZZZZ:read_ops:0/s
lun:sqf02/diske.lun-XXXXZZZZZ:write_ops:2/s
lun:sqf02/diske.lun-XXXXZZZZZ:other_ops:0/s
lun:sqf02/diske.lun-XXXXZZZZZ:read_data:36798b/s
lun:sqf02/diske.lun-XXXXZZZZZ:write_data:19872b/s
lun:sqf02/diske.lun-XXXXZZZZZ:queue_full:0/s
lun:sqf02/diske.lun-XXXXZZZZZ:avg_latency:22.17ms   <-------------------- Why?
lun:sqf02/diske.lun-XXXXZZZZZ:total_ops:3/s
lun:sqf02/diske.lun-XXXXZZZZZ:scsi_partner_ops:0/s
lun:sqf02/diske.lun-XXXXZZZZZ:scsi_partner_data&colon;0b/s
lun:sqf02/diske.lun-XXXXZZZZZ:read_align_histo.0:98%
lun:sqf02/diske.lun-XXXXZZZZZ:read_align_histo.1:0%
lun:sqf02/diske.lun-XXXXZZZZZ:read_align_histo.2:0%
lun:sqf02/diske.lun-XXXXZZZZZ:read_align_histo.3:0%
lun:sqf02/diske.lun-XXXXZZZZZ:read_align_histo.4:0%
lun:sqf02/diske.lun-XXXXZZZZZ:read_align_histo.5:0%
lun:sqf02/diske.lun-XXXXZZZZZ:read_align_histo.6:0%
lun:sqf02/diske.lun-XXXXZZZZZ:read_align_histo.7:0%
lun:sqf02/diske.lun-XXXXZZZZZ:write_align_histo.0:86%
lun:sqf02/diske.lun-XXXXZZZZZ:write_align_histo.1:0%
lun:sqf02/diske.lun-XXXXZZZZZ:write_align_histo.2:0%
lun:sqf02/diske.lun-XXXXZZZZZ:write_align_histo.3:0%
lun:sqf02/diske.lun-XXXXZZZZZ:write_align_histo.4:0%
lun:sqf02/diske.lun-XXXXZZZZZ:write_align_histo.5:0%
lun:sqf02/diske.lun-XXXXZZZZZ:write_align_histo.6:0%
lun:sqf02/diske.lun-XXXXZZZZZ:write_align_histo.7:0%
lun:sqf02/diske.lun-XXXXZZZZZ:read_partial_blocks:1%
lun:sqf02/diske.lun-XXXXZZZZZ:write_partial_blocks:13%

Thanks ifyou know the answer

Bren

Re: lun latency increasing

Sorry better image of graph...

Re: lun latency increasing

Hi Bren,

Not saying this is definitely the case of fragmentation, but can you check the LUN in question against this issue first?

reallocate measure [-l logfile] [-t threshold] [-i inter_val] [-o] pathname | /vol/volname
Start a measure-only reallocation on the LUN, large file or volume.
A measure-only reallocation job is similar to a normal reallocation job except that only the check phase is performed. This allows the optimization of the LUN, large file or volume to be tracked over time, or measured ad-hoc.

Regards,

Radek

Re: lun latency increasing

What sort of load does it put on the filer, can I run it during the day? I have a snapmirror of this volume. Can I run it on the remote site and still get a valid result?

Thanks

Bren

Re: lun latency increasing

Yeah, as per this thread (which you're already familiar with) reallocation is rather poorly documented:

http://communities.netapp.com/message/20969#20969

My (informed) guesses:

- reallocate should be run on the original LUN, not its mirror, as logical to physical layout may be different at the destination, hence different results are likely in my opinion

- arguably there is some additional load on the filer (actual reads are undertaken), so running reallocate outside of peak ours seems to be a reasonable approach

Regards,
Radek

Re: lun latency increasing

If 1 is good and 5 is very bad:

"Allocation check on '/vol/test/diske.lun' is 5, hotspot 0 (threshold

4), consider running reallocate."

I think that is a fair clue as at to what is wrong....

Thanks

Bren

Re: lun latency increasing

The actual scale goes up to 10:

http://now.netapp.com/NOW/knowledge/docs/ontap/rel732_vs/html/ontap/cmdref/man1/na_reallocate.1.htm

The threshold when a LUN, file or volume is considered unoptimized enough that a reallocation should be performed is given as a number from 3 (moderately optimized) to 10 (very unoptimized). [...].The default threshold is 4.

Having said that, I've heard stories from people getting fairly low numbers during measurement, yet when they actually run reallocation, their performance vastly improved

Regards,
Radek

Re: lun latency increasing

Hi Bren,

Did you by any chance manage to verify that fragmentation was an issue, indeed?

Did you do actual reallocation & did it reduce latency?

Kind regards,

Radek

Re: lun latency increasing

Still working on the issue with TSE. Have not found the root problem yet. Will post back with solution once it has been discovered.

Bren

Re: lun latency increasing

TSE have said to run reallocate against the luns.  Have to wait until the 7th April due to change control to get the results.  Will post back if it is a success.

Thanks all for help

Bren