V-Series Array LUN Sizes

MRJORDANG · ‎2013-07-24

Hello,

The Best Practices for V-Series Guide states the following:

"On the storage array, you should build LUNs on numerous dedicated array RAID groups. Ideally, from a given RAID group, you should build two or more large LUNs. When assigning the array LUNs to array ports, you should have at least 16 LUNs per port to make use of all the available disk queues. Data ONTAP treats each LUN as a separate device, and each LUN is counted toward the maximum device count limitations for a given V-Series platform, so a few dozen large LUNs are generally better than hundreds of small LUNs."

We have a small Clariion CX4-120 carved into 2 raid groups (raid5). Each raid group consists of 7 disks. For the sake of simplicity, just assume we are using 2 Clariion array ports.

Does the above statement from the V-Series Best Pracitices Guide mean that we would be best off creating a minimum of 32 of the largest LUNs we have space for? (2 raid groups - each with 16 luns. 16 luns per array port. 2 array ports).

Thanks,

MRJG

isaacs · ‎2013-07-24

Most deployments are using many more disks than a single shelf. That recommendation is more geared towards them. Just make 8 LUNs total per aggr.

Thanks!

MRJORDANG · ‎2013-07-24

Thanks for the response! You are correct in that our CX4-120 only consists of one shelf (15 disks - one of which is used as a hot spare). Also, the shelf on the CX4-120 is full of SSD (EFD). We will only be plan to use the CX4-LUNS to build a single aggregate on the V-Series.

A few questions:

1. Why did you choose 8 total array LUNs for the aggregate?

2. EMC's best practices guide indicates that sharing disks between SP's can result in reduced performance. In response to that, we decided to carve up our single shelf of SSD (on the Clariion - total of 14 data disks) into two separate Raid Groups (2x7) instead of using just one (1x14). With two separate raid groups, we will assign all LUNs from a RG to a single SP to eliminate the possibility of SP disk contention. I believe disk contention between SP's is a bigger issue with mechanical drives than SSD but the document we read didnt explicitly make note of that. With so few disks in our Clariion, is it a wise decision to split up the disks into two separate Raid Groups in an effort to avoid SP disk contention? Would it be better to create a single Raid Group consisting of all 14 disks so each SP has more available disk throughput with the caveat that there may be disk contention?

Not sure if you have experience in that area but I thought I'd ask. Thanks again for the response.

isaacs · ‎2013-07-25

Ah, that changes EVERYTHING! SSDs are a different beast. They don't have spindles, so we don't have to worry about spindle contention. It's just not an issue. There may still be issues with SP's getting in each others way, however, so I would follow EMC Best Practices with respect tot he number of RAID groups.

But we don't need to worry about having a lot of LUNs; we can just worry about LUN/disk queues.

The most disk queues that ONTAP can assign to a single device (spinning disk or array LUN) is 32. But DQs are not an infinite resource. Two things limit the number of them available to hand out.

1. Target ports typically have 1024 or 2058 disk queues, to be split up between all the LUNs shared on those ports.

2. Initiator ports typically have 256, or with DOT 8.2, 512 DQs.

So the limiting factor us is the initiator port. Previous to DOT 8.2, we only had 256 DQs. And since we can use up to 32 per device, to ensure we use all of them we should have at least 256/32 devices. Or, 8 LUNs per port.

MRJORDANG · ‎2013-07-25

Thanks Daniel! That explains it perfectly.

With respect to the topic of SP disk ownership and drive contention, this is the section of the EMC Clariion Best Practices for Performance and Availability guide that has me thinking about Clariion SP contention and multiple or single Clariion raid groups. (screenshot because this PDF must not allow copy/paste).

http://www.emc.com/collateral/hardware/white-papers/h5773-clariion-best-practices-performance-availability-wp.pdf

The key points for me are:

Drives are dual ported and can accept IO from both SP's at the same time.
Dual ownership may result in less predictable drive behavior (higher response time) than single ownership because of deeper drive queue lengths.
Single SP ownership of a drive is preferred
I am assuming all of the above points are valid in the context of SSD as well.

The catch for us is that we are not adding another shelf of disks and creating a second raid group. That would make things very easy. Instead, we are considering carving the existing 15 disk shelf (SSD) into two Raid Groups rather than creating a single 15 disk Raid Group. That will cut the total IOPs/throughput available to a single SP in half but the total available throughput will be the same.

A bit more information....the 3rd party aggregate made up of the Clariiion SSD LUNs will be used to host a single workload. That is to say the IO footprint of this workload is predictable, most of the time, because there are not multiple hosts/workloads sharing the same LUNs. Of course, the LUNs will be split between SP's to distribute the IO evenly between SPs.

We tried the single 15 disk raid group, werent completely satisfied with the results and then came across the Clariion Performance and Availability Guide. That is why we are now considering two raid groups instead of one.

The other catch is that we have no Analyzer license for the Clariion array. So, we have to open a case with EMC to see any performance data which makes troubleshooting performance with the Clariion a week long process.

isaacs · ‎2013-07-25

I'm thinking that caution is provided with spinning disks in mind. I know they have a communities site much like this. Perhaps it's worthwhile to pose that question there. I don't like to get in the habit of thinking I know better than them.

MRJORDANG · ‎2013-07-25

Good point. I'll check out the EMC forums as well. I posted here because the V-Series Netapp factors into the equation as it is technically the client attached to the Clariion. The NetApp guides seem to contain V-Series specific Clariion configurations though there is little mention of Raid Groups. My question seems more of the general Clariion Raid Group variety so perhaps to folks in the EMC forums will have some info.

Thanks again for the info. The note regarding the # of LUNs and queue depths was really helpful!

aborzenkov · ‎2013-07-24

I always wondered ... As DataONTAP will stripe IO across all disks, dividing single raid group into large number of LUNs should increase disk contention in this raid group. Or do I miss something obvious?

martin_fisher · ‎2013-07-25

I would expect the same, smaller single raid groups, with an even number of large LUN's, i would expect to see more disk contention. If would also depend on how busy the LUN is itself and the I/O required, for example by a SQL server.

The contention could be shifted from the SP to the disks instead, although with SSD's they should be able to cope a bit better that conventional disks.

isaacs · ‎2013-07-25

Exactly. But that is only true of spinning disks. With SSDs it is not an issue, as there is no mechanical arm moving to and fro to find sectors.