We currently have a live SQL Server instance on a 2 node cluster with storage on Netapps. The aim is to get SMSQL implemented with Snap Mirror so we can have a DR solution. I have a number of questions.
1. How to SMSQL tran log backups work to provide point-in-time recovery?
As I understand it so far, SMSQL will take copies of the transaction log to the SnapInfo LUN. How does taking copies of a file, say every 15 minutes, allow point-in-time recovery? I'm confused because a standard SQL Server tran log backup carries out certain SQL operations such that LSNs are tracked and this will allow a tran log restore to be rolled forward to the the end or to a point-in-time. Tail backups can also be carried out to capture remaining transactions. I read plenty of Netapps documention that says that tran log 'backups' can be configured via SMSQL and point-in-time recovery is possible. I haven't found any documentation to say how this is actually done and how it works. Can someone explain or point me at a suitable document?
2. Best practices for high transaction OLTP databases - Aggregates, Volumes & LUNs etc.
If we are talking about a 1 to 2 Tb database supporting a busy e-commerce web application with around 75 million rows being written and deleted (archived) daily, which equates to about 10 Gb in and out, what would be a good design for the way the volumes and LUNs are configured? I've seen general requirements for separate LUNs for Data, Logs, Tempdb, System Databases and SnapInfo but I have not seen any information about how these LUNs could be configured at a lower level to provide better I/O performance? Also, as I understand it, the general recommendation is to put all the LUNs relating to a SQL instance on a single aggregate? What if there are two SQL named instances on a single server/cluster? Would the recommendation be for a separate aggregate for each instance? If the advice is always, 'just add more disks' for performance, what calculations can be done to estimate the number of disks required for the I/O profile in question?
3. DR and System Databases (SQL Server)
I have read that the reason the SQL Server system databases should be on their own LUN is that system database have to be 'streamed' by SMSQL (together with any user databases on the same LUN). What I can not find any information on is what this exactly means for recovering in DR. For example, SMSQL and Snap Mirror give us the fast volume/LUN level recovery of user databases in DR, that's great. However, because the system databases are streamed, does this imply that DR server are normally maintained as separate SQL Server instances such that a DBA has to implement procedures to either manually restore the system databases or, alternatively, script out the SQL Server 'server level objects' such as Logins and Jobs in order to bring the DR server into line with the live server? Or is there a feature in SMSQL that takes care of all this? What's best practice?
A bunch of good questions - and not that basic! Hopefully I can address some of them:
I have not seen any information about how these LUNs could be configured at a lower level to provide better I/O performance?
The general rule of thumb is: the more spindles, the better. It actually may be feasible to create more than one aggregate for resiliency (not to keep all your eggs in one basket) or to clearly separate different workloads on different spindles. The whole point about single aggregate is, that there is no direct performance benefit of having multiple aggregates, if the total spindle count is the same in both cases.
As for the sizing, you can either go easy way & assume 180-220 random IOPS per 15k disk, or do fully blown exercise by collecting stats from the existing environment (if there is any).
does this imply that DR server are normally maintained as separate SQL Server instances such that a DBA has to implement procedures to either manually restore the system databases or, alternatively, script out the SQL Server 'server level objects' such as Logins and Jobs in order to bring the DR server into line with the live server?
For physical environments I was always assuming this is the (painful) case. IMHO it is much easier in a virtual world thanks to things like VMware SRM, and arguably will be dead easy with SQL 2012 DAG-like functionality!