Tech ONTAP Blogs
Tech ONTAP Blogs
For applications and SAN data services where downtime can significantly disrupt your businesses, it's vital to provide better predictability for your business and the broad NetApp portfolio can help you create resilient and predictable solutions for your mission-critical business processes. Your assessments for predictability and resiliency should address crucial questions such as the types of maintenance and disaster scenarios your business must prepare for, the acceptable amount of data loss if a disaster occurs, and the required speed of recovery.
The recovery point objective (RPO) defines how much data, in terms of time, can be lost or the point to which you can recover your data. On the other hand, the recovery time objective (RTO) indicates the maximum allowable downtime or how swiftly data services must be restored. Ensuring zero RPO is essential to prevent any data loss, and it's equally important to achieve a very low RTO to promptly restore data services to achieve business continuity.
This blog provides an introduction to the FlexPod SAN solution with Cisco UCS X-Series Direct and NetApp ASA NetApp Verified Architecture (NVA) and how NetApp® SnapMirror® active sync can be used to provide synchronous replication of application-consistent data to another ONTAP cluster to level up storage resiliency for a FlexPod® SAN solution to achieve zero RPO and near zero RTO for an unplanned site disaster.
FlexPod is a best practice converged infrastructure datacenter architecture that includes the following components from Cisco® and NetApp®:
Shown in Figure 1 are some of the components utilized for creating FlexPod datacenter solutions. These components in the FlexPod solutions are connected and configured according to the best practices of both Cisco and NetApp, documented as Cisco Validated Design (CVD) or NetApp Verified Architecture (NVA), to provide an ideal platform for running a variety of enterprise workloads with confidence.
Figure 1: FlexPod datacenter solution components
Each of the FlexPod component families shown (Cisco UCS, Cisco Nexus/MDS switches, and NetApp storage) provides platform and resource options to scale the infrastructure up or down as per application requirement, while supporting the features and functionalities that are required under the configuration and connectivity best practices of FlexPod. FlexPod solutions can also be replicated for environments that require multiple consistent deployments by rolling out additional FlexPod stacks.
The new NetApp ASA scale-out storage systems are simple, powerful, optimized for block deployments, and they support advanced data management and protection features. The ASA systems are all-flash SAN arrays that support IP-based and FC-based SAN protocols with symmetric active-active multipathing.
Table 1 highlights some of the key technical specifications of the new NetApp ASA storage systems. (For specification details of each ASA model and its supported limits, please refer to information in the NetApp ASA datasheet and NetApp Hardware Universe web site.)
Table 1: Select specifications for the new NetApp ASA storage systems
|
Select specifications |
ASA A1K |
ASA A90 |
ASA A70 |
ASA A50 |
ASA A30 |
ASA A20 |
ASA C30 |
|
Form factor |
2 x 2U |
4U |
4U |
2U |
2U |
2U |
2U |
|
Max cluster size |
12 nodes |
12 nodes |
12 nodes |
12 nodes |
8 nodes |
6 nodes |
8 nodes |
|
Max raw capacity per HA pair |
2.67PB |
2.67PB |
2.67PB |
2.67PB |
2.21PB |
1.47TB |
1.47PB |
|
PCIe expansion slots per HA pair |
18 |
18 |
18 |
8 |
8 |
8 |
8 |
|
Max FC speed |
64Gbps |
64Gbps |
64Gbps |
64Gbps |
64Gbps |
64Gbps |
64Gbps |
|
Max Ethernet speed |
400Gbps |
400Gbps |
400Gbps |
100Gbps |
100Gbps |
100Gbps |
100Gbps |
Shown in Figure 2 is a small FlexPod SAN infrastructure design using two Cisco UCS C225 M8 rack servers, two Cisco Nexus 93600CD-GX switches for IP-based SAN, and a highly available entry-level NetApp All-flash SAN Array (ASA) A20 dual-controller storage system.
Figure 2: Small scale FlexPod SAN infrastructure
This design utilizes redundant components and connectivity to provide resilient and highly available infrastructure. The two network switches provide dual fabric for IP-based SAN deployment. Each server is connected to both SAN fabric A and B. Similarly, the two storage controllers in the ASA A20 system each has connectivity to the two SAN fabric as well.
When a medium-scale SAN solution is required, the FlexPod SAN solution with Cisco UCS X-Series Direct and the mid-range NetApp ASA A50 is a great combination. Like its UCS Mini predecessor, the X-Series Direct integrates two Cisco UCS Fabric Interconnect 9108 100G in the back of X9508 chassis, thus reducing the UCS rack space requirements and the costs for building medium-scale FlexPod SAN solutions.
By configuring some of the FI ports as appliance ports, you can directly attach the NetApp ASA storage systems to the FIs without requiring additional Ethernet switches or Fibre Channel switches in between UCS compute and NetApp storage. Figure 4 shows the medium-scale FlexPod IP-based SAN solution with X-Series Direct and NetApp ASA storage in a direct-attached / switchless configuration.
Figure 4: Medium-scale FlexPod SAN with UCS X-Series Direct and NetApp ASA
The UCS X-Series Direct with direct-attached ASA storage system design allows you to scale your compute and storage resources to a medium scale. You can start with just a few compute nodes in the chassis. When you need more compute resource as your business grows, you can simply add additional compute nodes without adding any additional cables. You can add additional storage capacity by connecting external disk shelves to the storage controller. Please refer to the FlexPod SAN solution with Cisco UCS X-Series Direct and NetApp ASA NVA for deployment and solution details.
FlexPod datacenter solutions are designed with security as the foundation and provide validated designs for large-scale business-critical applications so you can deploy the solutions with confidence. To see the latest available Cisco Validated Designs (CVDs) for FlexPod, please check out the solutions available from the FlexPod Design Guide web site.
Shown in Figure 3 is a FlexPod datacenter SAN infrastructure design using Cisco UCS X-Series Modular System with X9508 Chassis, 9108 Intelligent Fabric Modules (IFMs), X215c M8 Compute Nodes, UCS 6536 Fabric Interconnects (FIs), Nexus 9336C-FX2 switches, MDS 9132T switches, and a highly available high-end NetApp ASA A90 dual-controller storage system.
Figure 3: Large scale FlexPod SAN infrastructure
This large-scale FlexPod datacenter infrastructure is designed with scalability in mind and includes redundant components and connectivity for resiliency and high availability. The compute block can be scaled massively by adding additional X9508 chassis and compute nodes. Additional disk shelves and ASA A90 storage controller HA pairs can be added to increase storage capacity and storage performance to a massive scale. The solution includes both Nexus Ethernet switches as well as MDS Fibre Channel switches to support IP-based and FC-based storage protocols, including FC, iSCSI, Non-volatile Memory Express (NVMe) over FC (NVMe/FC), and NVMe over TCP (NVMe/TCP).
Using the NetApp Snapshot™ technology in NetApp ONTAP® software, your mission-critical SAN data services deployed on ASA systems can be rapidly restored with data from Snapshot-based backups.
To further protect against potential infrastructure disasters such as fires, hurricanes, or tornadoes, safeguard your data with multisite NetApp ASA clusters. You can synchronously replicate data in a storage cluster by implementing NetApp SnapMirror® active sync, previously known as SnapMirror Business Continuity, to replicate SAN data included in application-specific consistency groups.
SnapMirror active sync delivers zero RPO and near-zero RTO to achieve business continuity. It seamlessly provides failover support for mission-critical workloads such as Microsoft SQL Server and Oracle RAC databases and VMware vSphere Metro Storage Cluster (vMSC). The following subsections provide an overview of SnapMirror active sync solution components and how they work together to achieve business continuity. For additional details, please refer to the ONTAP SnapMirror active sync documentation.
To withstand a site failure, two NetApp ONTAP ASA clusters are deployed at a safe distance away to minimize the risk of simultaneous site failures due to power loss or natural disasters. These clusters are peered and configured with SnapMirror active sync for synchronous data replication and disaster recovery.
You can go to ONTAP System Manager, navigate to Protection > Overview screen to add a cluster peer to an ONTAP cluster, after you have physical connectivity between the two clusters and configured intercluster Logical Interfaces (LIFs) for them to communicate with each other. Figure 5 shows the Protection > Overview screen in ONTAP System Manager with the configured Intercluster LIFs, cluster peer information, and a button to start the ONTAP Mediator configuration.
Figure 5: ONTAP System Manager cluster peers configuration
Along with the ONTAP ASA clusters, the ONTAP Mediator completes the quorum for the SnapMirror active sync solution. The ONTAP Mediator is deployed in a third failure domain as shown in Figure 6. It receives health information from the peered ONTAP clusters and nodes and orchestrates between them to determine their status. The health data helps clusters distinguish between various types of failures and decide whether to perform an automated failover.
Figure 6: SnapMirror active sync solution components
Before installing the ONTAP Mediator, check out the Install or upgrade ONTAP Mediator documentation for prerequisites, supported Linux versions, and the procedures for installing them on the various supported Linux operating systems.
After the ONTAP Mediator is installed and configured, you can then add it to the ONTAP clusters by using ONTAP System Manager or ONTAP CLI. Figure 7 is a screenshot of the ONTAP System Manager Protection> Overview screen which shows the configured ONTAP Mediator and the peered cluster.
Figure 7: ONTAP System Manager Mediator configuration
A consistency group comprises LUNs that provide a consistency guarantee for applications requiring protection for business continuity. For instance, a consistency group for a Microsoft SQL Server database might include LUNs for databases as well as for logs. Consistency group enables a simultaneous quiescence and snapshot of the entire dataset, providing a consistent restore point across all LUNs in the consistency group.
From the Protection > Consistency groups view in ONTAP System Manager, you can create consistency groups for the various application-specific datasets. Figure 8 shows two consistency groups already created: one for the base VMware infrastructure and the other for the Microsoft SQL Server specific LUNs.
Figure 8 Protection > Consistency groups view in ONTAP System Manager
To allow hosts / clients to use the replicated LUNs, create igroups in the remote cluster, assign the replicated LUNs to the corresponding igroups, and set up the hosts / clients so they can find the iSCSI target and LUNs available from the remote cluster as additional paths.
As illustrated in Figure 9, enterprise applications can be deployed on a storage client using iSCSI-based LUNs hosted by two ONTAP ASA clusters configured for synchronous data replication between sites. A consistency group, shown in the diagram, includes three application-specific LUNs where data is synchronously replicated to the peered cluster by SnapMirror active sync.
Figure 9: SnapMirror active sync symmetric active-active SAN Multipathing
The LUNs in a SnapMirror active sync solution are configured to be accessible from both storage cluster sites, allowing the client OS to view all paths from both storage clusters with active-optimized status. The screenshot in Figure 10 shows one of the mapped iSCSI LUNs and the 8 total paths to the LUN, provided by the two peered ONTAP clusters, along with their respective iSCSI LIF IP addresses listed in the Target column.
Figure 10: VMware ESXi host LUN path information
SnapMirror active sync relationship enables reading and writing of LUNs from both sites, supporting clustered applications. If a storage cluster disaster occurs, ONTAP directs the surviving site to continue all I/O operations for mission-critical applications such as Microsoft SQL Server and Oracle RAC databases, and VMware vSphere Metro Storage Cluster (vMSC).
For Microsoft SQL Server database validation, HammerDB database testing tool was installed and configured to exercise database workload. For detailed configurations of the HammerDB workload used for this testing, please refer to the Microsoft SQL Server database validation section in the FlexPod SAN solution with Cisco UCS X-Series Direct and NetApp ASA NVA.
After creating consistency groups to protect the VMware infrastructure and the Microsoft SQL Server database using SnapMirror active sync relationships, and enabling the clients to discover the LUNs in those consistency groups from both ONTAP clusters, it is essential to perform failover testing to confirm proper configurations as well as application behavior for a failover event.
To perform testing of the SnapMirror active sync solution for a site maintenance scenario, i.e. planned failover, we can manually fail over a consistency group so the source and destination relationship for the LUNs in the consistency group is reversed. Planned failover can be initiated from the Protection > Replication view in ONTAP System Manager for a selected consistency group as shown in Figure 11.
Figure 11: Invoke manual failover of consistency group relationship in ONTAP System Manager
As a result of the failover operation, IO transactions paused briefly in the HammerDB Transaction Counter view and then resumed quickly. The failover operation was transparent to the database application, and no manual intervention was required for database transactions to resume as shown in Figure 12.
Figure 12: HammerDB application transaction counter view for planned failover
To perform an unplanned failover of the SnapMirror active sync solution to simulate a site failure scenario as shown in Figure 13, we can invoke power off command from the service processors of the site A storge cluster nodes to power off the site A storage cluster.
Figure 13: FlexPod SAN VMware solution site A storage cluster disaster simulation
The LUN path example in Figure 14 shows that the four LUN paths to site A storage cluster went dead after the site A storage cluster was powered off. Despite the site A storage cluster disaster, the selected LUN was still accessible from the site B storage cluster paths shown with Active(I/O) status.
Figure 14: ESXi host LUN paths unavailable for the site which goes down
The screenshot in Figure 15 captured the Transaction Counter view of the HammerDB application around the time when the site A storage cluster nodes were powered off from their service processors. As the deployed ONTAP Mediator was monitoring the solution, an automatic failover happened after the health evaluation period. Afterwards, HammerDB database transactions resumed. From the HammerDB application perspective, the site A storage cluster disaster was transparent to the application operations as the disaster only caused the database transactions to pause briefly before database transactions resumed without human interventions.
Figure 15: HammerDB database transaction counter view for site A storage cluster outage
In summary, FlexPod SAN solutions are highly available by having redundant components and redundant connectivity in the solution designs. The published FlexPod CVDs and NVAs provide detailed guidance for best practices implementation of the highly available infrastructure and they allow customers to quickly achieve time to value with their FlexPod SAN deployments. Depending on the desired solution scale, small, medium, or large FlexPod architecture designs are available to address the needed solution scalability and performance.
To increase the storage resiliency of the deployed FlexPod SAN solutions, SnapMirror active sync can be deployed to peer two ONTAP clusters together for synchronous data replication with application-specific consistency groups. As demonstrated in this blog, the FlexPod SAN VMware solution with SnapMirror active sync was able to achieve zero RPO and near-zero RTO for the mission-critical Microsoft SQL Server database services, the HammerDB application, and the underlying VMware virtual infrastructure.
Please refer to FlexPod SAN solution with Cisco UCS X-Series Direct and NetApp ASA NVA and NetApp SnapMirror active sync documentation for additional details.