ONTAP Discussions

Performance impact by putting multiple LUNs into ONE volume or one LUN to one dedicated volume for the same controller/aggregate

mhe
NetApp
7,043 Views

All experts,

I am doing a Volume/LUN Layout design for a large hosting company.  One of component is how to map multiple LUNs to flexible volumes.

For example, given the same Controller and aggregate to place 8 LUNs

  • option1: 8 LUNs in one shared volume
  • option2: 8 LUNs in 8 volumes, LUN and volume has one-to-one mapping

For both options, is there any performance impact, do we gain any performance (read or write) by doing LUN to volume one-to-one mapping?

I could not find any guideline or position or "best practice" on this topic.

The reason I like to put 8 LUNs in one shared volume because there is 500 volume limit FAS systems. Unless necessary, I don't want to increase the volume count.

Recently, there is a burt 724792 seems to be related put multiple files in shared volume causing contention/bottleneck. There is some contentions/bottleneck for volume or aggregate affinities, see Jeff Steiner's email about this.

So, question is if this burt is going to be fixed?

Architecturally, or performance wise, what should be the mapping between LUNs and volumes. If we do need to distribute multiple LUNs into multiple volumes, what is the threshold for doing so, for example 500 IOPS, 500 write PS,

500 read PS, 80 Mb throughput, etc.

Any feedback is appreciated. I am have meeting with customer tomorrow.

Thanks,

Michael He

____________________________________________________________________________________________________________________________________________________________________________________

Hi all. We had a case where a customer went from 7.3 to 8.1 (with 7-mode) and experienced serious latency problems. We’ve had a long running support case and have learned some very surprising things about ONTAP on the new platforms.

ONTAP is getting better at parallelizing work, but in some cases it can cause a performance problem if your configuration prevents the parallelization from happening. The two key concepts are the volume and aggregate affinities. I’m sure the numbers will vary from platform to platform and with time, but for this example let’s say there are 5 volume affinities and 2 aggregate affinities.

If you have 10 databases, and the datafiles from each database are in one and only one volume, the workload would be distributed across the 5 volume affinities with 2 volumes in each. In the case of our recent case, one of their databases was a very VERY active Oracle standby database running on a perfectly configured 10Gb network. That one volume is hammering the filer with random write activity, and it’s causing performance problems even though we are nowhere near the PQR numbers for the 6280 system.

Engineering believes that, in part, there is a bottleneck because we’re only using one of the available volume affinities for processing. We may need to spread the database over multiple volumes. Furthermore, engineering believes that, in part, there is a bottleneck because we’re only using one of the available aggregate affinities. Increasing the number of aggregates may be required to improve performance. This would be a significant change from the way we’ve normally managed databases where we never want to take one big aggregate and make two smaller aggregates.

There’s one additional factor here too – workloads interfering with other workloads. The customer reported that when the standby latency spiked, they had problems on other databases too. We believe that the standby database was monopolizing the volume affinity where some other databases were running. This means that the workload on one volume (or aggregate) could interfere with the workloads on a subset of other volumes (or aggrgates) but leave the others unaffected. It would depend on luck of which volumes were associated with which affinities.

I doubt this will affect most customers, but if you have a standby database or any sort of project involving a small number of very active workloads you might need to consider creating more volumes than usual or more aggregates than usual. That might even include splitting a single big aggregate into two smaller aggregates.

If you want to see more details, it’s EPS BURT 724792 and case 2003883537. There’s perfstats attached to the case too.

Michael He

Professional Services Consultant
PS-North America - Central

NetApp
847-598-4916 Direct
630-452-0027 Mobile
Michael.He@netapp.comwww.netapp.com

2 REPLIES 2

brianchama
7,043 Views

I have read through the post and it makes me think,we have a customer who has recently upgraded the core banking application from Oracle 9 to 10g. Before they ran the system on Solaris 10,using UFS and a few LUNs for data files and logs.After migrating to the new system which runs on Solaris 11 and LDOMs,they decided to use oracle's ASM and created many LUNs (about 50) in the same volume.

The controller is a 3240 with SSDs (flashpool with SAS disks),but they get high latencies during peak hours. Do you think it's contention happening at the volume level?

Please assist

Regards

CHRISMAKI
7,043 Views

Best practice is one LUN per volume, but I can't remember why. I've broken this before to take advantage of dedupe which is at the volume level.

Public