Re: Data assurance (T10Pi) performance impacts

Bugs · ‎2018-08-20

Is there a performance impact on volumes which have data assurance enabled? If there is a performance impact, how much is the performance impact.

I am seeing poor performance with RAID6 (8+2) LUNS accessed through fibre channel.

For each E5700 with 210 drives I am seeing close to 6GB/sec reads. Request size is 8MB and application is GPFS.

MitchBlackburn · ‎2018-08-20

There can be a slight performance degradation with data assurance turned on, but it should not be severe. Can you give me some more information? Drive type and percent read/write would help.

Bugs · ‎2018-08-20

These are regular SATA 7200 RPM 10TB drives. With 100% sequential reads, all 8MB block size with ~15 - 20 concurrent IOs per LUN, we are seeing about 300MB/sec for a single (8+2) RAID6 lun. We have 21 LUNs, so ~6GB/sec.

How can I identify the bottleneck. The bottleneck is not the HBAs or the servers or the network.

MitchBlackburn · ‎2018-08-20

How many ports on the back of each controller are connected to the network? Are you using 32 or 16 Gb? Is this system already in production?

Bugs · ‎2018-08-20

We have a total of 8 32Ge ports and 4 16ge ports and we are using all of them. Three servers connected to the storage, with two of the servers attached at (4 * 32Ge) and the third one attached at (4*16Ge).

Each of the 3 servers is connected at (4 * 25ge) network.

We have a similar setup but with different storage and are seeing 2X performance, which makes me think that the problem is purely with storage amd not with application(GPFS) or linux configuration or host/network connectivity.

Yes, system is in production but we can run short synthetic tests.

What kind of sequential streaming performance can be expect from a E5700 with 212 10TB drives? Note - Block size is fairly large (8MB)

DannyL · ‎2018-08-20

If you can, could you run a quick test to cache to make sure that your system is setup properly? Please let me know what number you hit on this test.

Thanks,

Danny

Bugs · ‎2018-08-20

Which cache are you referring to? Our controller cache is tiny (32GB). How can I run a test to cache?

DannyL · ‎2018-08-20

What io tool are you running? Depending on this you will reduce the maximum disk size of the lun youre running to to something small like 8MB.

Bugs · ‎2018-08-22

Reading from controller cache, I get pretty good speeds. I was able to get 17GB/sec reading from cache.

This is what I get with production traffic when I run iostat.

Device: rrqm/s wrqm/s r/s w/s rMB/s wMB/s avgrq-sz avgqu-sz await r_await w_await svctm %util

dm-12 132.00 14.00 37.00 42.00 157.91 24.28 4722.94 4.73 60.00 126.70 1.24 12.70 100.30

The RAID6 (8+2) LUN is doing a total of 79 IOPS and the LUN is already saturated (%util ~ 100). Average request size is not that small. I am sure a (8+2) LUN can do more than this.

What other things can I look at to troubleshoot this further?

DannyL · ‎2018-08-22

It looks to me like you're configured properly, which is great. Could you get me the following information:

navigate to the queue folder on one of your DM devices. To do this: cd /sys/block/dm-12(for example)/queue/

Then grep all of the information in that directory. Run: > grep . *

Please send the me the grep output.

Could you also send me a support bundle? If possible, collect it once, throw it away, run your workload again and then collect the support bundle again so we get a clean log. To collect the support bundle open up your Santricity System Manager in a browser. Navigate to Support --> support center --> diagnostics --> collect support bundle.

Thanks,

Danny

Bugs · ‎2018-08-23

[root@queue]# grep . *
add_random:1
discard_granularity:0
discard_max_bytes:0
discard_zeroes_data:0
hw_sector_size:512
grep: iosched: Is a directory
iostats:1
logical_block_size:512
max_hw_sectors_kb:32767
max_integrity_segments:65535
max_sectors_kb:8192
max_segments:1024
max_segment_size:65536
minimum_io_size:4096
nomerges:0
nr_requests:128
optimal_io_size:0
physical_block_size:4096
read_ahead_kb:1024
rotational:1
rq_affinity:1
scheduler:noop [deadline] cfq
unpriv_sgio:0
write_same_max_bytes:33553920
[root@ queue]#

Bugs · ‎2018-08-23

Attaching the support data file.

DannyL · ‎2018-08-24

Could you send that support bundle to me directly, please. You can send it to daniel.landes@netapp.com

Thanks,

Danny

DannyL · ‎2018-08-27

Could you also get a screenshot of the segment size you're using for your volumes and your caching options.

You can get this information from SANtricity System Manager. Go to Storage --> Volume (highlight any volume) --> View/Edit Settings --> advanced tab

Thanks,

Danny

Data assurance (T10Pi) performance impacts

And the Legacy Continues! 🏆