Is there a performance impact on volumes which have data assurance enabled? If there is a performance impact, how much is the performance impact.
I am seeing poor performance with RAID6 (8+2) LUNS accessed through fibre channel.
For each E5700 with 210 drives I am seeing close to 6GB/sec reads. Request size is 8MB and application is GPFS.
There can be a slight performance degradation with data assurance turned on, but it should not be severe. Can you give me some more information? Drive type and percent read/write would help.
These are regular SATA 7200 RPM 10TB drives. With 100% sequential reads, all 8MB block size with ~15 - 20 concurrent IOs per LUN, we are seeing about 300MB/sec for a single (8+2) RAID6 lun. We have 21 LUNs, so ~6GB/sec.
How can I identify the bottleneck. The bottleneck is not the HBAs or the servers or the network.
We have a total of 8 32Ge ports and 4 16ge ports and we are using all of them. Three servers connected to the storage, with two of the servers attached at (4 * 32Ge) and the third one attached at (4*16Ge).
Each of the 3 servers is connected at (4 * 25ge) network.
We have a similar setup but with different storage and are seeing 2X performance, which makes me think that the problem is purely with storage amd not with application(GPFS) or linux configuration or host/network connectivity.
Yes, system is in production but we can run short synthetic tests.
What kind of sequential streaming performance can be expect from a E5700 with 212 10TB drives? Note - Block size is fairly large (8MB)
Reading from controller cache, I get pretty good speeds. I was able to get 17GB/sec reading from cache.
This is what I get with production traffic when I run iostat.
Device: rrqm/s wrqm/s r/s w/s rMB/s wMB/s avgrq-sz avgqu-sz await r_await w_await svctm %util
dm-12 132.00 14.00 37.00 42.00 157.91 24.28 4722.94 4.73 60.00 126.70 1.24 12.70 100.30
The RAID6 (8+2) LUN is doing a total of 79 IOPS and the LUN is already saturated (%util ~ 100). Average request size is not that small. I am sure a (8+2) LUN can do more than this.
What other things can I look at to troubleshoot this further?
It looks to me like you're configured properly, which is great. Could you get me the following information:
navigate to the queue folder on one of your DM devices. To do this: cd /sys/block/dm-12(for example)/queue/
Then grep all of the information in that directory. Run: > grep . *
Please send the me the grep output.
Could you also send me a support bundle? If possible, collect it once, throw it away, run your workload again and then collect the support bundle again so we get a clean log. To collect the support bundle open up your Santricity System Manager in a browser. Navigate to Support --> support center --> diagnostics --> collect support bundle.
[root@queue]# grep . *
grep: iosched: Is a directory
scheduler:noop [deadline] cfq
Could you also get a screenshot of the segment size you're using for your volumes and your caching options.
You can get this information from SANtricity System Manager. Go to Storage --> Volume (highlight any volume) --> View/Edit Settings --> advanced tab