Data Backup and Recovery

Aggregate Design


Hello all. I am having a hard time understanding a core concept of the netapp architecture relating to aggregates. My problem relates to aggregate design across active/active controllers and the proper way to design a solution with performance in mind. Take this scenario:

2 Controllers (let's say 3240s) in an A/A pair.

1 Tray of 24 15K SAS Disks for simplicity. Forget about the aggr0 for each controller/raid groups, etc. at the moment.

I will assume 180 IOPS per drive.

I want to leverage both controllers for active I/O so I have to create an aggregate for ownership by each controller. I want to have luns active on each controller.

I will create aggr1 with 12 drives for a max IOPS of 2160 (12 * 180) owned by controller 1. I will create a flexvol and luns for SQL Server here.

I will create aggr2 with 12 drives for a max IOPS of 2160 (12 * 180) owned by controller 2. I will create a flexvol and luns for Exchange here.

Let's say that the SQL Server is pushing 2000 IOPS from the luns on controller 1.

Let's say that the Exchange Server is pushing 500 IOPS from the luns on controller 2.

If the SQL Server needs another 500 IOPS, my only recourse is to buy more disks and expand the aggregate. My problem is that the Exchange Server has 1660 IOPS (2160 - 500) of performance just sitting there doing nothing.

Why is it that I cannot create 1 large pool of storage across all 24 disks giving me 4320 IOPS and carve up my luns accordingly? Having a larger pool of disks to get the performance is what I am looking for (a la 3Par). In this case, I could create any number of luns and assign the ownership to controllers. As long as my total IOPS for SQL and Exchange do not exceed 4320 IOPS, I am ok.

The recommended solution from the Netapp engineers was to create a single large aggregate owned by one controller (so as to leverage all the possible spindles), and have this single controller own all the luns. The second controller would just sit there in case of failure of the first controller. They claim that the 15000 IOPS that I require would be sufficiently handled by 1 controller. My problem with this approach, is that I bought a second controller that is sitting ther doing nothing until the first fails.

Am I missing something?



You are correct.  The limitation (or how netapp works in this case with disk assignment) is as you describe that disks must be owned by one controller or the other without pooling of an aggregate between controllers.  If you are spindle bound then you will need to combine the aggregate on one controller or get more disks.  Depending on workloads and if you have FlashCache the NEtApp SEs can do sizing based on your workloads to see if it will work with the disks split between the two nodes 12 and 12. 


Another option would be, not to assign the disks "balanced" to the 2 controllers.

assign 5 Disks to the "excahnge" controller to get some space and roughly 500 IOPs.

then assign the rest of the disks to the "sql" controller to make sure you have the IOPs where you most probably need them.