During a recent conversation with a senior IT strategist about NetApp's technology solutions and capabilities, I noticed a theme that inspired me to write this blog. The strategist indicated that he was eager to learn about our solutions, yet he kept asking, “Why are NetApp's technologies better than other companies’?” He was particularly interested in snapshots. “Everyone claims to have snapshots" he commented, "What makes NetApp’s snapshot solutions different from, and perhaps better than, other vendors’?”
In this blog we’ll discuss snapshot basics -what a snapshot is, how it works, and how NetApp implements snapshots based on redirect-on-write (RoW) compared to other competitive implementations that are based on copy-on-write (CoW) architecture. We’ll explore key architectural and operational considerations and highlight important differences. This information is meant as a guide to IT practitioners (IT managers, architects, system administrators) and anyone involved in evaluating and testing snapshots as the basis for designing a backup, replication, and operational/disaster recovery strategy.
What is a snapshot?
In general terms, a snapshot is a locally retained, read-only, point-in-time virtual copy of a file system or volume. Most snapshots are time-and space-efficient. When properly implemented, they’ll enable faster operational recovery (OR) and help meet tighter recovery point objectives (RPOs), recovery time objectives (RTOs) and service level agreements (SLAs). Snapshots are not a replacement for backups but can be foundational to implementing a solid backup strategy. Research conducted by International Data Corporation (IDC) shows that enterprises are increasingly relying on disk base backup/restore software to meet their shrinking backup windows and meet application availability requirements. Interestingly, 46% of backup and restore implementations are disk based followed by tape at 38% (according to IDC), which is a dramatic change from the past. This change indicates the necessity of understanding the options – the capabilities and limitations of the different snapshot implementations.
Are all snapshots the same?
No. Snapshots differ in architectural design and implementation, which will have an impact on space utilization, performance, reliability, scalability, ease of operations, and restoration capabilities. It’s important to understand the similarities and differences of snapshots as they may have an impact on your meeting your business and technical requirements.
RoW and CoW snapshot technologies create time and space efficient snapshots…
it’s what happens next when handling changes that clear differentiation begins
What are two primary snapshot implementations?
Next, we’ll cover the two most widely used and adopted snapshot implementations: redirect-on-write and copy-on-write.
1. Redirect-on-Write snapshots (NetApp)
At the core of NetApp snapshots is WAFL (Write Anywhere File Layout) which is built in to Data ONTAP, the software that runs on FAS storage controllers. The WAFL file system was developed by NetApp to enable high-performance, high-integrity storage systems. By using a set of pointers (metadata) to the individual blocks of data, the file system knows where everything is and by making a copy of those pointers, and not the data, an instantaneous image of the entire file system can be captured. (Figure 1)
WAFL leverages the “redirect-on-write” technique to keep track of changes to snapshots. Redirect-on-write (RoW) is similar to copy-on-write (CoW) in that it’s time and space efficient. By design a RoW snapshot is optimized for write performance so any changes/updates are redirected to new blocks. Instead of writing one copy of the original data to a snapshot reserved space (cache, LUN reserve, or snapshot pool –the name changes according to the vendor) plus a copy of the changed data that is required with CoW, RoW writes only the changed data to new blocks.
Creation of a snapshot is space (a few KBs) and time (less than a second) efficient; only volume metadata is copied to the snapshot. Snapshots track changes to original volume; read requests are satisfied from the original volume.
Any changes/updates to the original volume are performed as follows:
Step 1: The filesystem writes updates to new blocks. WAFL keeps track of available blocks, which allows for changes to be done very efficiently. For example, as data blocks (B, C) are changed/updated, pointers in the active file system are redirected to new blocks (B’, C’); however the snapshot pointers still point to the original blocks to preserve that point-in-time image. (Figure 2)
In summary, a write to a volume/LUN takes:
• 1 write (1x write I/O)
It is important to understand the limitations of non-NetApp implementations of snapshot technology. Competitive offerings typically read and then write the old data to a new location before writing out the new data. This is often explained as a feature called “copy-on-write” (next section) but this feature adds dramatically to the system overhead. For each block of data changed in the copy-on-write process, there is a read and two writes, compared to a single write for NetApp.
2. Copy-on-Write snapshots (Other vendors’ snapshots)
When “copy-on-write” snapshots are first created, only the metadata about where the original data is stored is copied. No physical copy of the data is done at the time the snapshot is created. Therefore, the creation of the snapshot is time- and space-efficient.
As blocks on the original volume change, the original data is copied (moved over) into the pre-designated space (reserved storage capacity) set aside for the snapshot prior to the original data being overwritten. The original data blocks are copied just once at the first write request (after the snapshot was taken; this technique is also called copy-on-first-write). This process ensures that snapshot data is consistent with the exact time the snapshot was taken, and is why the process is called "copy-on-write."
After the initial creation of a snapshot, the snapshot copy tracks the changing blocks on the original volume as writes to the original volume are performed. The implementation of “copy-on-write” snapshots requires the configuration of a pre-designated space (typically 10-20% of the size of volume/LUN) to store the snapshots. A snapshot cache/reserve pool gets initiated; read requests are satisfied from the original volume. (Figure 3)
Any changes/updates to the original volume are performed as follows:
Step 1: The filesystem reads in original data blocks (1 x read I/O) in preparation for the copy. In this example blocks B and C will be updated with new data. (Figure 4)
Step 2: Once original data (B, C) is read by the production filesystem/LUN, data is copied (1 x Write I/O) into the designated storage pool that is set aside for the snapshot before original data is overwritten, hence the name "copy-on-write”. (Figure 5)
Step 3: Write the new and modified data blocks (B’, C’) to original data block location (1 x write I/O) and re-link the blocks to the original snapshot. (Figure 6)
In summary, a write (change/update) to a volume/LUN takes:
• 1 read (1 x read I/O) and
• 2 writes (2x write I/O)
Note that original data blocks are copied only once into the snapshot storage when the first write request is received, subsequent writes to the modified block are not copied to the snapshot reserved area (until a new snapshot is created) Copy-on-write snapshots will impact performance on the original volume while it exists, because write requests to the original volume must wait while original data is being "copied out" to the snapshot reserved pool. CoW snapshots require original copy of the data to be valid, similar to RoW implementations.
IT Operational considerations:
In conclusion, you can see there are clear and significant differences between the CoW-and RoW-based snapshot implementations. It’s important to understand these differences and evaluate what solution provides the most value and benefits to achieving your business and technical requirements.