I'm new to NetApp technology and in the learning phase. I have a customer using NetApp as their storage solution for our SQL Server based OLTP application. The database is in the 1-2 TB range of which about 90% is LOB data since we store binaries in the database and not outside. Now the standard installation configuration only has one filegroup, which means both row data and LOB data is in that filegroup which only has one file. I know the customer only has one NetApp aggregate, according to them its "best practice", and separate LUNs for OS, SQL Server application, our application database, the transaction logs and system databases. My first question is regarding the one aggregate configuration - since the transaction logs and database files are on the same aggregate they must share the same disks even though they are on different LUNs? Or does that depend on how the LUNs were created? Would a second aggregate be better for the transaction logs?
I'm also trying to convince the customer that migrating LOB column data to its own separate filegroup - with the files on dedicated disks - allowing for row data to reside on their own disks and not be fragmented by LOB pages would help performance. They are claiming that this change would not help performance whatsoever due to their NetApp design. I disagree but I'm not familiar with NetApp technology so have a hard time arguing with them.
it depends slighty on the size of your netapp head and the ammount of disks. Usualy you have a netapp cluster, eg FAS3140A and each head holds a set of disks. then yes, you can have the databases on one head and the log files on the other head.
Besides that, you usualy want to create the biggest aggregate possible to have the largest amount of raid groups & spindles in a given aggregate to get the smoothest performance. netapp usualy prefers 1 big aggregate over 2 smaller ones.
I would split row & LOB to two different luns but still on the same head & disks if possible. Having a seperate lun will give you a different lun queue for that io but still the same disk performance of a most biggest aggregate. Fragmentation will happen on every file system out there but unless the netapp isnt 99% full, fragmentation shouldnt be a huge concern. Netapp has way more "magic" in its disk subsystem & caching algorithms than usual UNIX or Windows servers.
Thanks for your reply. I can see that the performance of a bigger aggregate is something you want, but surely this must depend on the nature of the workload as well - relation between random access read/write of the database and sequential write of the transaction log if they must compete for access to the same disks. Also I wasnt concerned about the fragmentation on a file system level, but the internal fragmentation in the (now) single .mdf file that contains the database pages - both rowdata and LOB data. I would expect a performance boost having separate files for these types of pages - especially for queries that only access row data.
Another simple question, when you create a LUN on an aggregate, can you define which RAID groups to utilize so you can, even though maybe you shouldnt, define which disks a LUN uses? Or does the aggregate anonymize the disks complete as I think it does?
Concurrent, different IO characteristics are of course a concern, its a matter tho if you have the needed amount of disks. If you have 2-3 disk shelves, having one bigger aggregate is probably better than creating 2 aggregates using 1 1/2 disk shelves each. If we are talking about 10+ disk shelves, of course you can create (and probably must because of maximum aggregate size) different aggregates for your different IO needs.
A LUN gets setup in a FlexVol which is setup on an aggregate. The LUN will be properly spread over all disks & raidgroups from the underlying aggregate. So yes, the disks are "anonymous" for the LUN.
The way it works is:
physical disk -> raid group (means usualy at least 2 parity and 1 data disk, hopefuly more like 16-20 disks for each raid group) -> plex (used for synchronous mirroring, so its the netapp "RAID1") -> aggregate
The aggregate is your last "real" physical measurement. After that its all logical stuff done over the aggregate layers. The raid/plex/aggr layer can only grow btw, never be shrunk.
aggregate -> flexvol (Flexible Volumes/Partitions which can be grown or shrinked on the fly)-> qtree (just a plain logical divider, used for quotation and for asynchronous replication) -> lun (the logical disk as you know it)
I see that Thomas Glodde is saying that it is best to keep the backend as a single disk or have at most a couple of disks. And, I lean towards agreeing in terms of manageability (less calls in the middle of the night that one disk is full or almost full).
There are a couple of additional areas I will like to explore:
a) As you 're using MS SQL Server, I assume you 're running on MS Windows. It seems that if you are using HBA Cards, the max Queue Depth is 256 or so. In addition, it seems this max Queue Depth is based on each Disk; and to get more 'Queue Depth' you would have to expose more physical\logical disks. Does anyone know if this are Logical or Physical Disks?
b) Also, it is always helpful to pre-allocate MS SQL Server datafiles (before needing them). You might be able to reap a side-effect of (pre) allocating in big enough increments \ sizes to gain a semblance of having your data (whether regular, LOB, or log) in more contiguous chunks.
c) if you do not care much for pre-allocation, your datafile growth size might help, as well.
d) Also via "Local Security Policy", grant "Perform Volume Maintenance Tasks" to the Account that the "MS SQL Server" Service is running as. This affords you the benefits of "Instance File Initialization".
Now my follow-up questions:
a) is anyone aware of NetApp specific Performance Counters that can be used within MS Windows
b) Does NetApp have Native Performance \ Throughput Measurement Tools. The tools I have seen in the open-market tend to want to create their own traffic\data. But, it our case we will more likely want to measure throughput against our own normal business data \ traffic