By Yuvaraju Balaraman, NetApp Technical Marketing Engineer
Failover planning is a critical component of your disaster recovery plan. Having a good disaster recovery plan can mean the difference between business success and failure. One of the reasons businesses choose NetApp is because NetApp storage systems are built for high availability and non-disruptive operations—even when a storage node fails.
NetApp continues to innovate and deliver when it comes to data management simplicity and high availability. With NetApp® OnCommand® Performance Manager 7.0, you now have access to a new node failover planning feature that helps you estimate the performance impact if a node in a high-availability pair fails. It also allows you to plan for hardware maintenance window to minimize the performance impact on your storage systems as well as service delivery to your customers.
This new capability is available for NetApp storage systems running:
To get started, go to the the Failover Planning page in OnCommand Perforamnce Manager, which displays performance statistics for a selected time range. This includes the performance statistics for the primary node, the partner node, and the estimated performance statistics for the primary node after it takes over the failed partner.
The failover example below shows that the primary node will be at 42% utilization, with 65% performance capacity used. Based on these values, the primary node will be able to handle workloads from both the primary—as well as partner—nodes.
Non-Disruptive Maintenance Planning
Historical performance statistics are also provided with failover planning, to help you identify the optimal time for initiating a failover and minimize the possibility of overloading the takeover node. The recommendation is to set the time range for the last 7 days. Then use the performance capacity used charts in comparing pane to find the window to perform the hardware maintenance activity. This allows you to schedule hardware maintenance when the predicted performance of the takeover node is acceptable, and the performance impact minimal.
The Comparing pane, shown below, displays the performance capacity used by the primary and partner nodes, as well as the estimated takeover statistics for the primary node.
Estimate the Performance impact of Takeover
The Performance Capacity Used screen, shown below, charts the estimated takeover performance statistics for the primary node. The example below shows the user that estimated takeover was 126% at 11:30 AM on December 7th. If a failover had occurred at that time, the combined workloads running on primary node would experience an increase in latency. This is an example of when not to schedule maintenance.
Performance Manager also provides a detailed Performance Capacity Used (Advanced) chart. You can see the performance capacity used for both the primary and partner nodes, as well as the amount of free performance capacity in the screen shown below. With this information, you can determine potential performance issues if the partner node should fail. If the free performance capacity is at 0%, a failover will result in an increased in latency on the takeover node—across the combined workloads.
The Performance Capacity Used (Advanced) chart breaks the values for each node into user protocols and background processes to help you understand if there are any system processes running in the background, and the impact if a maintenance window is scheduled during this time.
We recommend that you watch the Node Failover Planning Using Performance Manager demonstration video to understand how to use the node failover planning feature.