Tech ONTAP Articles
Tech ONTAP Articles
As you move toward the goal of 100% virtualization in your data center, careful attention to the virtualization of business critical Microsoft® applications—including Microsoft Exchange, Microsoft SQL Server®, and Microsoft SharePoint® Server—becomes essential. To get from where you are today to an environment that delivers all the benefits of virtualization, including efficiency, improved availability, and decreased cost, you have to focus on virtualization of all layers of your infrastructure, including virtualization software, servers, networks, and storage. Figure 1) Key elements of the joint NetApp, VMware, and Cisco solution. That’s why NetApp joined forces with Cisco and VMware to create a complete solution for virtualizing Microsoft applications. This architecture combines the benefits of VMware® vSphere 4 virtual infrastructure, Cisco Nexus unified fabric, and NetApp® unified storage hardware and software. This flexible architecture allows you to virtualize a mixed workload Microsoft application environment to deliver the full benefits of server, network, and storage virtualization. We’ve tested the performance of Microsoft applications on this solution to make sure there are no bottlenecks and that all performance metrics are well within Microsoft’s published parameters. This article briefly describes the reasons for virtualizing Microsoft applications and highlights the most important architecture and deployment considerations to help you get started. For full details on the joint solution, you can refer to the NetApp technical report “NetApp Solutions Guide: Microsoft Exchange Server, SQL Server, and SharePoint Server Mixed Workload on VMware vSphere 4, NetApp Unified Storage (FC,...” Why Virtualize Microsoft Applications?The reasons for virtualizing Microsoft applications with this solution are in large part the same reasons for virtualizing any application:
Despite these obvious benefits, there are two persistent concerns about virtualizing critical Microsoft applications, but these concerns have been addressed:
Key Design ConsiderationsOne of our key goals when architecting the joint solution was to provide clear design guidelines and at the same time provide enough flexibility so that you can create a solution that is tailored to meet the requirements of your environment This section is structured around some of the key questions you will want to ask yourself as you move your Microsoft applications to a fully virtualized environment. What storage protocol should I choose? One of the great things about this solution—like all solutions that include NetApp storage—is that you have the flexibility to choose whatever storage protocol makes sense for your environment. We provide architecture guidelines for all protocols: FC, iSCSI, and NFS. A joint NetApp and VMware performance study demonstrates that all protocols perform within 10% of one another, so there is no reason based on performance to choose one protocol over another. If you already have Fibre Channel (FC) infrastructure, you can continue to use it. If not, NFS and/or iSCSI can easily meet your storage needs. I advise you to look at each protocol in terms of cost to you (capital and operational), manageability, scalability, and flexibility and choose the one that fits your needs best. (A few more specific guidelines are forthcoming in the section on storage layout.) What NetApp software will I need? We strongly recommend the use of a core set of four NetApp products:
In addition, you will want to install NetApp SnapDrive® and the application-specific SnapManager product inside guest VMs that hosts an Exchange Mailbox server, SQL Server, or SharePoint database and index server to provide application-consistent backup and granular restores of databases, logs, and so on. (Backup and DR are covered in more detail later.) What storage layout should I use for different data components? The storage layout you choose will depend in part on the storage protocol you have selected. Rather than trying to cover all possible storage layout and protocol options here, I’ll simply focus on one of the most flexible IP-based storage layout options. If you are deploying from scratch or your infrastructure will support this approach, the layout shown in Figure 2, combining NFS and iSCSI, is the one I would suggest. For FC or iSCSI layouts, refer to TR-3785. (The approach in all cases—and the logic behind them—is similar in most respects.) Figure 2) Storage layout using NFS data stores and iSCSI LUNs. Here are the guidelines at a high level:
This approach is recommended over guest-connected LUNs using the Microsoft iSCSI software initiator because if you want to implement VMware vCenter Site Recovery Manager for disaster recovery—now or at some point in the future—the failover/failback process is much simpler with application data on iSCSI RDMs, and you’ll get better support from VMware. You also should put all data stores and RDM LUNs on the same storage system if you are going to use VMware vCenter Site Recovery Manager. To leverage the benefits associated with SnapDrive and (either iSCSI RDMs as recommended above or guest-connected RDMs using the iSCSI S/W initiator), if you want to use application-specific SnapManager tools for backup of your Exchange, SQL Server, and/or SharePoint data, you must use RDMs (either FC RDMs, iSCSI RDMs as recommended above, or guest-connected LUNs using the Microsoft iSCSI S/W initiator). If, for some reason, you must configure your environment using VMFS or NFS data stores for application data, your best backup option is SMVI. SMVI is capable of producing consistent backups for all three applications but with some limitations. Currently, because of limitations in the VMware VSS Requestor (VMware uses copy enumeration for shadow copy), SMVI cannot provide automatic transaction log truncation or backup verification. Both have to be done manually. Also, the VMware VSS Requestor does not currently support application consistency for VMs running Windows® Server 2008. Therefore this solution is limited to scenarios where granular transaction-level restore is not required (for example, point-in-time restore for SQL Servers), manual backup verification can be performed after the backups, and alternate methods of transaction log truncation are possible, for example, with SQL Server databases in simple recovery model (SQL Server provides an automated method for log truncation). How do I perform application-consistent backup and recovery? The best way to achieve application-consistent backups for Microsoft applications is to install SnapDrive and the appropriate SnapManager product (SnapManager for Microsoft Exchange, SnapManager for Microsoft SQL Server, SnapManager for Microsoft SharePoint Server) inside the guest OS for each VM as needed. These tools deliver the specific capabilities to provide application-consistent backups, automated backup verification, and granular restores. For example, SnapManager for Exchange provides single mailbox recovery capabilities. You can learn more about these SnapManager tools in a previous Tech OnTap article. What’s the best way to implement DR? NetApp SMVI and application-specific SnapManager products can provide replication and disaster recovery for VMs and hosted Microsoft apps. Fully automated disaster recovery can be achieved using VMware vCenter Site Recovery Manager in conjunction with these products. This solution provides complete failover workflow automation for complex environments as described in the Tech OnTap article Using VMware Site Recovery Manager to Simplify DR. Figure 3) Combining NetApp SnapManager, SnapMirror, and VMware Site Recovery Manager creates a complete data protection solution for backup/recovery and disaster recovery. How do I implement multipathing? If you want your environment to be robust, you must implement multipathing. For an FC-based architecture, I would recommend the Asymmetric Logical Unit Access (ALUA) protocol and round robin (RR) path selection policy. ALUA allows for the autonegotiation of paths between SCSI target devices and target ports, enabling dynamic reconfiguration. ALUA is enabled by default on ESX hosts. On NetApp storage arrays, ALUA should be enabled on the initiator groups, resulting in a more dynamic, or plug-and-play-like, SAN architecture. The RR path selection policy (PSP) provides path redundancy and bandwidth aggregation. Note that there is no need for device-specific module (DSM) inside the guest VM. For iSCSI, vSphere introduced support for multiple TCP sessions at the ESX host level for multipathing. You can have two vmkernel ports and use round robin PSP to achieve plug-and-play multipathing. It provides multiple active paths, and no DSM is required inside the guest VM. Also, both the traditional and multiswitch trunking network designs can be used, as described in TR-3749. For NFS, multipathing can be achieved for both traditional and cross-stack switches. For details, see NetApp TR 3749. When using Cisco Nexus 10 Gigabit Ethernet (10GbE), only two 10GbE ports are required on the ESX host. Cisco virtual port channeling (vPC) provides redundancy, fault tolerance, and security. Figure 4) Using Cisco Nexus vPC to connect ESX hosts and NetApp storage. Are there benefits to using deduplication and thin provisioning? One of the benefits of this configuration is that no matter which protocol you choose, you can take advantage of NetApp storage efficiency capabilities (FlexClone, deduplication, and thin provisioning) to significantly reduce the amount of storage space you need. Typical virtual environments have many copies of the same OS and application binaries in different VMs, consuming large amounts of space on expensive shared storage. By using NetApp storage efficiency capabilities, you can achieve more than 50% storage savings on primary storage. Figure 5 illustrates the 92% space savings we achieved while validating the joint solution. Figure 5) Space savings due to combining NetApp storage efficiency techniques. How do I size my environment? Sizing your environment includes sizing both VMware data stores (containing the guest OS, application binaries, VM page file, and vswap file) and LUNs hosting application databases and logs. NetApp has developed sizing tools to properly size your environment. Your NetApp systems engineer or reseller can help you size your environment based on information gathered from your site:
How do I validate the performance of my virtualized Microsoft application environment? You can use the same set of performance validation tools available from Microsoft and third-party vendors that are used in physical environments. These tools can help you determine if performance is within Microsoft guidelines. To test this joint solution, we used the Microsoft Exchange Load Generation Tool, Microsoft SQLIOSim utility, and AvePoint Sharepoint Test Environment Creator and Usage Simulator to validate performance. Several load tests were conducted for these applications, all running at the same time. Performance validation methods and success criteria for each application are described in TR-3785. Our tests validated that:
ConclusionAs you march toward your goals of a 100% virtualized data center, I hope the information provided in this article is helpful in understanding the process of virtualizing Microsoft applications. This article only covers the high points of the joint solution for Microsoft application virtualization. You can get all the information you need to deploy this solution in the detailed, 50-page solutions guide, which covers all the configuration details based on the careful work done by NetApp, VMware, and Cisco. The guide covers FC, iSCSI, and NFS implementations. In addition to the various links embedded in this article, other valuable resources include: NetApp and VMware vSphere Storage Best Practices (TR-3749). Best practices to implement VMware with NetApp storage.Using the Performance Acceleration Module with Exchange 2007 (TR-3767). This technical report describes how PAM can boost the number of Exchange users you can support without adding spindles. SnapManager guides: SnapManager 5.0 for Microsoft Exchange Best Practices Guide (TR-3730) SnapManager for MOSS: Best Practices Guide (TR-3776) Protecting Exchange Server 2007 with NetApp SnapManager for Exchange (TR-3598) SnapManager for Virtual Infrastructure Best Practices (TR-3737) SRM NetApp and VMware vCenter SRM Best Practices (TR-3671) Got opinions about virtualizing MS apps? | Explore Quality of Service for the Cloud Want to keep up with what’s happening with SAS Cisco, NetApp, and VMware have teamed to create a cloud architecture that delivers on the four pillars of multi-tenancy:
Read the Tech OnTap article from January 2010 to learn about this unique design, find out how it delivers quality of service from end to end, and download the Cisco Validated Design guide. Case Study: Virtualizing Exchange 2007 with VMware Many of you out there have been virtualizing Microsoft applications for years. This case study from 2008 describes the advantages that one NetApp customer saw after virtualizing an Exchange environment. |
All content posted on the NetApp Community is publicly searchable and viewable. Participation in the NetApp Community is voluntary.
In accordance with our Code of Conduct and Community Terms of Use, DO NOT post or attach the following:
Continued non-compliance may result in NetApp Community account restrictions or termination.
Hello,
Re this paragraph:
Locate your application data (databases, logs, and so on) on iSCSI raw device mapping (RDM) LUNs, directly created and connected inside the guest VM using NetApp SnapDrive software (version 6.2 or higher must be installed on guest OS).
Can the LUNs be mapped directly to an in-guest initiator or must they be actual VMware RDM?
Yes, LUNs can be mapped directly to an in-guest initiator but using RDM would allow for support for advanced features such as MSCS, SRM , and Vmotion. You can also manage the storage from ESX server instead of individually from the vm's.
Hi,
MSCS inside VMware guests is a very hairy topic & in fact it's not supported at all on iSCSI, no matter how disks are mapped: http://www.vmware.com/pdf/vsphere4/r40/vsp_40_mscs.pdf. Having said that, MSCS normally works just fine in both scenarios.
From my experience VMotion works OK with software initiator.
SRM is another hairy topic, because during failover it reverts target volumes to the latest, 'dirty' SnapMirror update. So for that reason any LUNs containing data should be failed over using relevant SnapManager products anyway.
What I've heard though is that the performance is much better when using iSCSI RDMs, rather than software initiators.
Regards,
Radek
The problem we are having when engineering this solution specifically for SQL Server is how the system databases are replicated to the DR site. We have vSphere 4.0 Update 1 at both locations and have SRM working well. We are using NFS datastores for the OS/binaries and ESX ISCSI Software Initiators RDM LUNs for the application data as detailed in TR-3822. The issue I have come across that I cannot get an answer to is how to replicate my System databases (which reside on their own LUN). Because of the inability (I understand this is an MS limitation) to take a snapshot of the System databases (this is instead backed up to the Snapinfo LUN), I am unable to bring up those databases in a quiesced state on the DR side via SRM. I could manually take snapshots and then SnapMirror them over to the DR site but this is not quiesced and would potentially be unusable. The answer NetApp has provided is to restore the System databases once over to the DR site but because you have a non-quiesced copy of those databases, you are put into a "chicken or egg" situation. You need the system databases online to perform a restore of the system databases but since they are not stable, you cannot start the SQL Services to perform the restore.
The couple recommendations NetApp made were to do a repair of the system databases on the DR side during a failover and then do a restore....I would think this is far from an optimal DR solution and almost impossible to script. The other idea offered was to perform a database "pause" nightly and take a snapshot and replicate that system LUN to the DR site. I am not sure of the effect of this database "pause" on the entire SQL instance. This also does not cleanly tie into SMSQL as you would expect.
As a result of these limitations, we are now considering a DoubleTake DR solution for our SQL Servers which is a shame with all of this great SAN replication and virutal infrastructure.
The questions I have are:
Any assistance you can provide would be very greatly appreciated.
Thanks,
Joe
Here's another post by Abhinav that discusses the SRM / SQL Server DR scenario in great detail . VMware SRM and NetApp
I thought this would be another good reference for those looking for SQL HA/DR. Disaster Recovery solution for Microsoft SQL Server
Hi watan,
I've read through these and neither seem to address the issue of how you failover a guest from one site to another with respect to the system databases. If you are bringing up the same guest in your DR site relying on NetApp SnapMirror replication, their is no way that I see to replicate over the system databases in a guaranteed quiesed state like you do all the user databases.
If you have any other recommendations, I would certainly appreciate it.
Thanks!
Joe
Guys,
This gap apparently has been just closed!
I can't find any further details at the moment, but literally this week on Partner Academy I saw a slide deck describing SRM integration with SnapDrive instance at the fail-over site. This in turn allows rolling back to a 'clean' SnapManager snapshot.
Cool, isn't it?
Regards,
Radek
If you can get me any level of detail on this, beers are on me if you are ever in the philadelphia area!!!
You'll need to use SnapManager for SQL for this and you'll need to restore from SnapManager backup sets on the DR site if you run into any consistency issues.
As Radek mentioned, the next version of SnapDrive for Window 6.3 will have integration between SMVI, SMSQL and, SDW which will allow for a cleaner way to recover. Currently scheduled to ship end of July.
The issue I have using SMSQL to perform a restore of the system databases is that if they do not come up clean, I cannot start the MSSQL services to perform the restore from the Snapinfo directory to get the System databases back to form. The only thing I can think of would be to do a repair of SQL and rebuild the system databases then do a restore from the Snapinfo directory at this point. Do you agree that is what needs to be done if the system databases do not come back clean?
Our recommendation is to host the System DBs on a separate LUN (e.g. S:\) and perform a daily verified backup of the System DBs (along with SnapMirror update) using SMSQL. We also recommend performing verified backups when major changes are made to the System DBs.
This will ensure that you have application consistent snapshot of the system DBs to fall back upon in case the SQL Server fails to recover at the DR site.
After SRM failover, in case the SQL Server in the guest VM fails to start because of corruption with System DBs, our recommendation is to leverage SnapDrive in the guest VM and restore the LUN containing System DBs from the last verified backup of System DBs. Once the System DB restore is complete, start the SQL Server as you would normally do and verify the availability and functionality of the production DBs.
Worst case, if the System DB recovery is unsuccessful, rebuild the system DBs using the procedure described in this Microsoft KB: http://msdn.microsoft.com/en-us/library/dd207003.aspx
We will update both the TR-3785 and 3822 with this recommendation. Thanks for bringing this concern to our attention.
Hope this helps.
Regards,
Abhinav
I still think you are missing a big piece to this. Because SMSQL does not store the snapshot of the System database LUN in that separate volume as you describe but instead only updates the Snapinfo directory, you do not have the ability to restore on the destination side using SnapDrive. To go one step further, since snapshots are not stored in this volume at all, there is no System database volume on the destination side to restore from....only the Snapinfo directory which requires functional system databases to restore from....hence the chicken and the egg problem. The only solution I can think of is what I previously noted of repairing system databases and then restoring from the Snapinfo directory at that point which would make it a large challenge to automate this process for DR.
@jwhelan27
If the system DB is stored on it's own LUN which is on it's own separate volume, then SMSQL will not replicate System DBs to the DR site when it triggers a SnapMirror update to occur at the end of a backup job. However, if the system DB is on a LUN that is stored in the volume that also contains the Production DBs (the DB with your data in it) then it will be included with the replication of that volume. The snapshots on this volume will contain quiesced system DBs along with a quiesced Production DBs. This is because system DBs are quiesced at the same time that the Production DB is quiesced. The LUN with the system DBs is quiesced by snapdrive at the same time that the LUNs containing the Production DB is quiesced. So a quiesced system DB and LUN is captured in the same volume snapshot. The designs discussed here and in NetApp TR-3785 and
TR-3822 describe storing the system DB in a volume with the Production databases.
Hope this helps clarify.
The reason the system luns are separated out from the user db's is by following the best practice guide recommendations. I was also told from Sourav at netapp that even if that configuration was implemented, the system db's will get replicated but still would not be in a quiesced state due to the nature of what they are. If you tried to use that LUN without any kind of restore attempt from Snapinfo, there is still no guarantee that the system dbs are quiesced. This only assists in replicating the dbs to your DR site.
Just as an update to this subject....I have been in communcation directly with folks at NetApp regarding this specific item. So unfortunately it does look like I am correct that there is no way to replicate a guaranteed quiesced copy of the System LUN over to the DR site with the current NetApp tools. The latest version of SDW 6.3 still in development will not be addressing this issue. A standby SQL Server also has similiar limitations since the system database are not replicated in these scenarios either, it is on the customer to script a procedure for regularly backing up all system database info (security items, stored procedures, etc….) that would be necessary to properly bring up these user databases on the DR side.
Just for clarification, if the system database is corrupt on the DR side, you cannot just use the SnapInfo data to restore it because of the “chicken and the egg problem”. In order to restore from SnapInfo, you need the SQL Server instance running but in order to get the SQL Server instance running, you need to restore from SnapInfo. The only real way around this is to rebuild the system database as documented in http://msdn.microsoft.com/en-us/library/dd207003.aspx or to do a repair from the SQL Server 2008 media. Once you would have the database back to an “out of the box” state, you should then be able to perform a restore with the SnapInfo data. The best recommendation NetApp engineer have at this time is to use SnapDrive to create a crash consistent snapshot of the system database LUN (database status unknown however), then mirror it. If the system databases come up corrupted, then you must perform the previous steps I just noted.
Hope this helps anyone running into this similiar issue. If anyone finds a better workaround to this issue that ties nicely into VMware Site Recovery Manager, I would love to hear from you.
Just for future reference, here is another excellent document that discusses the NetApp + SQL solution.
Accelerating Development of Microsoft SQL Applications in Heterogeneous Environments