Are All Snapshots Created Equal?

During a recent conversation with a senior IT strategist about NetApp's technology solutions and capabilities, I noticed a theme that inspired me to write this blog. The strategist indicated that he was eager to learn about our solutions, yet he kept asking, “Why are NetApp's technologies better than other companies’?” He was particularly interested in snapshots. “Everyone claims to have snapshots" he commented, "What makes NetApp’s snapshot solutions different from, and perhaps better than, other vendors’?”


In this blog we’ll discuss snapshot basics -what a snapshot is, how it works, and how NetApp implements snapshots based on redirect-on-write (RoW) compared to other competitive implementations that are based on copy-on-write (CoW) architecture. We’ll explore key architectural and operational considerations and highlight important differences. This information is meant as a guide to IT practitioners (IT managers, architects, system administrators) and anyone involved in evaluating and testing snapshots as the basis for designing a backup, replication, and operational/disaster recovery strategy.


What is a snapshot?

In general terms, a snapshot is a locally retained, read-only, point-in-time virtual copy of a file system or volume. Most snapshots are time-and space-efficient. When properly implemented, they’ll enable faster operational recovery (OR) and help meet tighter recovery point objectives (RPOs), recovery time objectives (RTOs) and service level agreements (SLAs). Snapshots are not a replacement for backups but can be foundational to implementing a solid backup strategy. Research conducted by International Data Corporation (IDC) shows that enterprises are increasingly relying on disk base backup/restore software to meet their shrinking backup windows and meet application availability requirements. Interestingly, 46% of backup and restore implementations are disk based followed by tape at 38% (according to IDC), which is a dramatic change from the past. This change indicates the necessity of understanding the options – the capabilities and limitations of the different snapshot implementations.


Are all snapshots the same?

No. Snapshots differ in architectural design and implementation, which will have an impact on space utilization, performance, reliability, scalability, ease of operations, and restoration capabilities. It’s important to understand the similarities and differences of snapshots as they may have an impact on your meeting your business and technical requirements.


RoW and CoW snapshot technologies create time and space efficient snapshots…

it’s what happens next when handling changes that clear differentiation begins


What are two primary snapshot implementations?

Next, we’ll cover the two most widely used and adopted snapshot implementations: redirect-on-write and copy-on-write.


1. Redirect-on-Write snapshots (NetApp)

At the core of NetApp snapshots is WAFL (Write Anywhere File Layout) which is built in to Data ONTAP, the software that runs on FAS storage controllers. The WAFL file system was developed by NetApp to enable high-performance, high-integrity storage systems. By using a set of pointers (metadata) to the individual blocks of data, the file system knows where everything is and by making a copy of those pointers, and not the data, an instantaneous image of the entire file system can be captured. (Figure 1)

WAFL leverages the “redirect-on-write” technique to keep track of changes to snapshots. Redirect-on-write (RoW) is similar to copy-on-write (CoW) in that it’s time and space efficient. By design a RoW snapshot is optimized for write performance so any changes/updates are redirected to new blocks. Instead of writing one copy of the original data to a snapshot reserved space (cache, LUN reserve, or snapshot pool –the name changes according to the vendor) plus a copy of the changed data that is required with CoW, RoW writes only the changed data to new blocks.

Creation of a snapshot is space (a few KBs) and time (less than a second) efficient; only volume metadata is copied to the snapshot. Snapshots track changes to original volume; read requests are satisfied from the original volume.



Figure 1


Any changes/updates to the original volume are performed as follows:


Step 1: The filesystem writes updates to new blocks. WAFL keeps track of available blocks, which allows for changes to be done very efficiently. For example, as data blocks (B, C) are changed/updated, pointers in the active file system are redirected to new blocks (B’, C’); however the snapshot pointers still point to the original blocks to preserve that point-in-time image. (Figure 2)




In summary, a write to a volume/LUN takes:


• 1 write (1x write I/O)

It is important to understand the limitations of non-NetApp implementations of snapshot technology. Competitive offerings typically read and then write the old data to a new location before writing out the new data. This is often explained as a feature called “copy-on-write” (next section) but this feature adds dramatically to the system overhead. For each block of data changed in the copy-on-write process, there is a read and two writes, compared to a single write for NetApp.

2. Copy-on-Write snapshots (Other vendors’ snapshots)

When “copy-on-write” snapshots are first created, only the metadata about where the original data is stored is copied. No physical copy of the data is done at the time the snapshot is created. Therefore, the creation of the snapshot is time- and space-efficient.

As blocks on the original volume change, the original data is copied (moved over) into the pre-designated space (reserved storage capacity) set aside for the snapshot prior to the original data being overwritten. The original data blocks are copied just once at the first write request (after the snapshot was taken; this technique is also called copy-on-first-write). This process ensures that snapshot data is consistent with the exact time the snapshot was taken, and is why the process is called "copy-on-write."
After the initial creation of a snapshot, the snapshot copy tracks the changing blocks on the original volume as writes to the original volume are performed. The implementation of “copy-on-write” snapshots requires the configuration of a pre-designated space (typically 10-20% of the size of volume/LUN) to store the snapshots. A snapshot cache/reserve pool gets initiated; read requests are satisfied from the original volume. (Figure 3)



Any changes/updates to the original volume are performed as follows: 



Step 1: The filesystem reads in original data blocks (1 x read I/O) in preparation for the copy. In this example blocks B and C will be updated with new data. (Figure 4)




Step 2: Once original data (B, C) is read by the production filesystem/LUN, data is copied (1 x Write I/O) into the designated storage pool that is set aside for the snapshot before original data is overwritten, hence the name "copy-on-write”. (Figure 5)




Step 3: Write the new and modified data blocks (B’, C’) to original data block location (1 x write I/O) and re-link the blocks to the original snapshot. (Figure 6)



In summary, a write (change/update) to a volume/LUN takes:


• 1 read (1 x read I/O) and

• 2 writes (2x write I/O)


Note that original data blocks are copied only once into the snapshot storage when the first write request is received, subsequent writes to the modified block are not copied to the snapshot reserved area (until a new snapshot is created) Copy-on-write snapshots will impact performance on the original volume while it exists, because write requests to the original volume must wait while original data is being "copied out" to the snapshot reserved pool. CoW snapshots require original copy of the data to be valid, similar to RoW implementations.


Architecture considerations:


  1. Plan for appropriate storage capacity: copy-on-write based snapshots require the pre-allocation and provisioning of a dedicated and reserved storage capacity for snapshots. This space is off-limits to other workloads, thus reducing usable storage on the system. RoW based systems (NetApp) allow for more flexibility and “snapshot reserve” is an optional setting.
  2. Consider RPO, RTO and SLA requirements when evaluating the appropriate snapshot implementation. RoW based snapshots are designed to meet stringent RPO/RTO requirements.
  3. Consider the performance impact of frequent snapshots on write intensive workloads (OLTP). Tight RPOs mean frequent (multiple times during the day, or even hourly) snapshots which in turn may have a performance impact on the systems. CoW based snapshots are sensitive to write intensive workloads.
  4. Consider  the “data change rate” of targeted workloads. This is important to understand, as a high data change rate will mean high IO (reads/writes) overhead on any changes/updates after snapshots are taken. Again, CoW based snapshots are sensitive to high data change rates.
  5. Scalability of the solution – an important question to ask storage vendors is the maximum number of snapshots they support. This number will be important when designing your data protection scheme. Most vendors support up to 64 snapshots, NetApp supports 255 snapshots.
  6. Consider how your snapshot design will fit into your overall data protection-and-recovery process and all its downstream processes (data replication, backup, DR…). Ideally you want fewer components and moving parts.
  7. Consider application awareness and integration. For example, does it integrate with my Oracle DB infrastructure? How about Exchange? SQL Server? ... For additional information about Oracle, refer to this NetApp Technical Report (TR)
  8. Compatibility with other systems: How well does this integrate with my existing environment? For example: can I leverage my existing backup infrastructure? How about replication? DR?


IT Operational considerations:


  1. Snapshot management:  It’s important to consider the management overhead of any snapshot implementation. For example, it’s fair to aim for a unified snapshot solution for both blocks and files. IT departments need one way to create and manage snapshots, whether it's files or blocks.
  2. Monitoring and alerting is another key area to consider. Make sure to implement the appropriate level of monitoring and alerting for the dedicated “snapshot” volume for CoW-based implementations -lack of space will prevent the creation of additional snapshots.
  3. Clean-up: Creating and defining a snapshot policy should include clean up and removal of old snapshots. Avoid “snapshot” sprawl.
  4. License: Do you need a separate license for snapshots? If so, consider cost and maintenance.


In conclusion, you can see there are clear and significant differences between the CoW-and RoW-based snapshot implementations. It’s important to understand these differences and evaluate what solution provides the most value and benefits to achieving your business and technical requirements.


Your article, while helpful with diagrams and attendant explanation, would be more credible if you correctly summarized the I/O operations involved with RoW vs COW.  An update operation under RoW involves 1 read and 1 write.  Where is the read operation that lets you create B' C' from B C?  Update operation under COW involves 1 read and 2 writes.  Your omission of a read operation under RoW creates a false impression.

An update operation using redirect on write does not require any additional read operation...there is no other "read operation to create B' and C'" - B' and C' are simply the new blocks written to storage. There may be some sort of allocate operation for new blocks, but the same is true of CoW implementations to create a snapshot cache/reserve pool....nothing is read from the snapshot blocks when using RoW.



Thanks for the note and comment regarding snapshot updates on RoW based implementations. In the case of NetApp, my blog entry is accurate in that an update takes 1 write only. The beauty of WAFL is that it keeps track of the next available block(s) so as data changes, the updates get re-directed to the new blocks -preserving the original snapshot in place. At any rate, the key point to keep in mind is that there are significant differences to consider when evaluating different snapshots technologies.




Thanks Jake, that's precisely how NetApp' snapshot implementation works.



you have mentioned that "RoW writes only the changed data to new blocks(say, B', C'), and active file system pointers get redirected to new blocks B' and C'; At this point of time, only snapshot holds the original blocks B and C. If I delete this snapshot, then there won't be any pointers associated to B and C. So, how can I have the original data in B and C ?

Is there any technical difference between NetApp RoW snapshots and other vendors RoW technology? Until recently I always thought this was NetApp proprietary technology, but after having several other vendor sessions they use exactly this same process, I believe EQL, Compellent and anything running ZFS to name drop for comparison. I was ready to give my argument about how rubbish their snapshots are and how NetApp snapshots are the best until it was pointed out the theory is identical, which left me unusually without anything to say.

The FUD argument (which I won't personally support, but might be important to clarify) is that RoW might be a quick way of creating snapshots, but as these are stored on primary storage and effectively the data is locked on disk at that stage, it can create performance deficiencies later on, where-as CoW can be (although not generally recommended) placed on a seperate tier of disk. Even if on the same tier of disk, as it logically seperated, there is no performance issues created at a later date. As I say, I don't believe these claims personally as a well maintains NetApp system will continue its guaranteed performance levels, but it's important to understand what the CoW vendors are claiming. Or CoFW (Copy-on-First-Write) which is perhaps a minor distinction to make around the performance hit of using it as you only get the 3x overhead the first time you overwrite a block.



If I understand your question correctly, once you delete a snapshot you are effectively removing the pointers to it -so access to that data won't be possible. If this was done in error you can always retrieve the data from the backup copy (if it's been replicated).





Regarding your question about other vendors' implemention of RoW, I've focused on NetApp's own implementation -so I couldn't comment appropriately about other vendors. Even if others' RoW implementation is similar to NetApp's, it's a good idea to look at the big picture: management, app integration, unified support for blocks/files, get my drift.

On the comment about performance, specifically about the "locality of where the snapshot is kept" (primary volume vs. dedicated snapshot cache), it's pretty well understood that performance could be an issue with CoW (short-term) -long term performance depends on other factors outside of the snapshot implementation (length of time snapshot is kept, number of snapshots kept, volume/LUN capacity......).

Thanks for the comments and questions, really appreciate it.


Adding to my query, I set " 30 * * * " schedule for volume snapshots. That is for every 30 min, a snapshot will be created. I have enabled snap mirroring and the snapshot.1(say) transferred to destination. I'm preserving 1 snapshot at source. After 30 min (scheduled), another snapshot should be created as per schedule. As I'm preserving only one snapshot at a time, the previous one(snapshot.1) get deleted at source and new one will get created. And this new snapshot points to the blocks( B' C' ) which are pointed by active file system. So I'm losing the blocks (B, C) pointed only by previous snapshot(snapshot.1). Please clarify this.

thanks and regards,




Blocks B' and C' contain the new data, so effectively B and C get overwritten. Just curious, have you considered keeping more than one snapshot? This will give you more "point in time" copies" of your data.



Do you mind if I just clarify this point a little. NetApp will never overwrite a data block, all data (changed or new) is written to free blocks. As a background process data blocks that are no longer refereced (B and C in the above case) will be scrubbed and returned as free blocks. This is fundamental to guaranteeing the data integrity, deleting a snapshot is quick and easy to do, confirming that the data blocks are no longer referenced needs careful validation and so is performed as a background task.



Great point -thanks for clarifying and adding to the discussion.



1. In principal all redirect on write snapshots should have similar benefits, though specific differences in implementation, especially around handling and cacheing of metadata, will determine the performance and usability of the solution. Ultimately the measure of how effective the solution is can be measured by how many people actually use that feature. Based on autosupport data over 98% of Netapp customers use snapshots. In contrast a number of years ago when I suggested using snapshot backups on a CX-3  in conjunction with Networker, the storage admins looked at me as if I'd just suggested using their array as an altar to sacrifice a goat. Suffice it to say, they ended up going with a split mirror ... As far as I can see snapshots with or without a similar tier of disk for the snap-pool isnt generally reccomended by most vendors for performance sensitive applications.

2. Creating a seperate physical pool for the COW snapshots creates a management and efficiency problem, often these pools are poorly sized and undermanaged leading to a host of performance and reliability problems.

3. The "creates performance deficiencies later on" argument was thoroughly disproved in the initial FAS3040 SPC-1 submission over three years ago, where multiple snapshots were continuously created and destroyed over the period of the test without change in the performance, any and that submission didnt even include the use of reallocate on read.  I've yet to see any evidence that using snapshots changes the rate or characteristics of WAFL ageing other than consuming additional space.

Another thing that other vendors dont talk much about either is RTO for their snapshots .. Snaprestore has an RTO of less than one minute no matter how much data is being recovered, and that's why I still think it's one of the best backup/recovery technologies I've come across in the last 10 years.


Leaving out the read is over cooking it a bit. The assumption in the above is that the implied changes to B & C => B' & C', is that they are modified from cached copies. If they are not modified copies they should be different letters (E' & F').

This a best case scenario, and other vendors also use cache to improve their implementations, but even in the worst case scenario of un-cached date NetApp ROW is better with 1 x Read (into Cache to be modified) and 1 x Write (Modified data to the file system - new block) vs. 1x Read (Cache), 1x Write (Snapshot Cache),1x Write (Modified data to file system). The NetApp ROW snapshot is far superior to COW, but by over cooking it they attract unnecessary criticism.

While other vendors have ROW snapshots they seldom have the complete package of ROW Snapshots, block sharing aware cache, a RAID 6 implementation without a 50% performance drop, Dedupe, etc. Also none of them have more than 15 years of proven reliability. :-)

NetApps snapshot technology was the on to beat in the DCIG 2011 MIDRANGE ARRAY SNAPSHOT SOFTWARE BUYER’S GUIDE (Accessable via free registration)..



Couldn't agree with you more -even RoW architectures differ on how they implement Snapshots which will have an impact on performance and management. Thanks for your comments.



Great points, thanks Tom.


Can I conclude RoW is faster during write, since it writes less blocks?  But what about read?

I'am a dba and I lack knowledge about SAN/NAS solutions, but if I clearly understand the message, I have to conclude that RoW implies fragmentation, while this is less the case for CoW, since CoW writes the changes on the same blocks.  So, I understand with RoW there's more chance for random reads than sequential reads??  That should be a pitty since my sql server likes sequential reads.  That's why we reserve some extra free space in our database files during creation and that's why we do index rebuilds every night ... but as I understand ... this hasn't much impact.  I mean, SQL thinks the data is organized sequential, but has no idea were Netapp puts it's data, correct?

I'm happy with NetApp, I'm happy with my collegues keeping it in good shape and I'm waiting for the day that I can show I can restore our 1Tb database in 2 minutes.  However I keep looking for all best practices to maximize performance in a SQL-NetApp environment.

Kind regards




Regarding your question about RoW read performance compared to CoW implementations and the possible/potential fragmentation effects, it's worth noting that "fragmentation" is a universal problem that's been around for a while, remember the old days when you had to defrag your PC? On enterprise systems this could become a larger issue if not handled properly. You are correct RoW is optimized for write performance but in the case  of Data ONTAP it's also designed to handle random and sequential reads in a very efficient and effective manner by constantly coalescing the free space and improving the data layout. The ability to do this effectively (managing free space and fragmentation) is even more important now 'cause of the complexity and nuances of "storage virtualization" (snapshots, flexclones, thin provisioning, deduplication) -this is something that NetApp has done very well for a long time.

Hope this helps, thanks for the note and great job on keeping your systems running smooth.



It would be great If you can list up what major vendors use which snapshot technique among your major competitors like EMC, Quantum...

It sounds like you're comparing it with old technique. As far as I know, copy-on-write snapshot is not used in almost major vendors these days.

I'm not commenting just to be against what you're saying here, but curious about the facts that make you believe your snapshot technique is better than others.


The survey I linked above is a comparison of all the top vendors snapshot technology, and the NetApp / IBM N series snapshots are the ones to beat for functionality and application support. I recommend downloading it as it gives an unbiased assessment of the different features of each vendors snapshots.

Good Morning Cesar,

Love your article & love the image below:


Thanks & Good w/e



Thank you Henry. Let me know if I can be of any help and good luck on your search.



Snapshot copy can it be used as a proper backup strategy ? For example consider copy one write implementation , only the before image of the changed block alone will be copied to the snapshot ? So let us say we have created a snapshot and what if the data is not changed,it wont be availble in snapshot and how do we restore that data incase a failure happens to the actual volume ? Thanks for the wonderful explanation above.



A Snapshot is part of an overall backup strategy, firstly you have to ask "what is a backup ?", the answer is that it is a copy of data in a separate failure domain. The failure domain that causes the most problems is that of human error, where for example someone accidentally deletes a file or drops a tablespace. A snapshot protects against failures of that type very well. The next most common failure domain is that of the datacenter as a whole, where air-conditioning, plumbing, fire and electrical accidents cause widespread outages, many of which destroy data not only on primary disk, but also on any tape and secondary backups that are kept on tape or VTL or secondary. To protect against that failure domain you really must send your data offsite, the most efficient way of doing this is via periodic block level replication via products such as NetApp snapmirror or snapvault, though regularly sending backup tapes offsite works too, its horrendously inefficient from a cost and operational point of view. If you don't send data offsite today, then snapshots alone will almost certainly exceed the reliability and recoverability of your current backup regime.

The failure domain that I didn't mention was mechanical disk drives, while disk failures are a regular event, this risk is pretty much entirely mitigated by use of dual parity raid, hot spares and good operational practices. The state of the art these days is such that tyou're more likely to be hit by a meteorite on your way home tonight than suffer data loss via multi-disk failures. You could spend money protecting against that, but you're better off spending your money on something that is more likely to cause you problems.

Snapshots are replication to offsite locations together protect against pretty much every form of data loss with the possible exception of malicious damage from people with administrative access, if that's a concern for you , enhancing this with occasional backup to offline medium (ie a tape in a fireproof safe and not in an autochanger/robot) or using products like snaplock will also remove that as a potential failure domain.



hi ,

i was bit confused with the below statement

"When “copy-on-write” snapshots are first created, only the metadata about where original data is stored is copied. No physical copy of the data is done at the time the snapshot is createdWhen “copy-on-write” snapshots are first created, only the metadata about where original data is stored is copied. No physical copy of the data is done at the time the snapshot is created"

So does this mean that we dont have a inital base line snapshot for copy on write and redirect on write implemention ? i think snap mirror and snap vault makes a inital baseline copy and changes will be stored in snap preserve ? I want to take the oracle database snapshot in the same server where database is stored using netapp smo? When i take the first snapshot will it capture the entire db snapshot as baseline or meta alone and start caputring the changes only during update ?  If changes alone then i need orginal datablock for all unchanged blocks ?? i am a bit confused here ? Is the concept similar to flash back feature in oracle database where flash back logs are not valid if the source db is corrupted ?

On the topic of read performance, ROW techniques are write optimised, so there is a read performance impact, particularly with regard to sequential reads. They way to mitigate read performance issues is to at least have all metadata cached, so that the system doesn't require multiple IOs to access a set of data blocks.

That combined with the atomic writes (Therefore much lower chance of data corruption) and 50% better write performance should minimise the impact on the DB and lead to better availability in the long term. Any performance benefit of COW would be eliminate any time the database get corrupted by a failed write.

Hi Jack,

Copy on write is other vendors. NetApp uses Re-allocate on write were the original copy of the data is retained for the Snapshot retention period, so effectively a local base line. Each Snapshot is effectively a separate read-only copy of the data, separate from the live file system and other snapshots.

That is the reason we can set different retention periods for different snapshots (Scheduled, VSM etc.)

hi Tom,

So it will be like this

     Snapshot 1 will be the basline copy

     Snapshot 2 will be the delta ( changing the pointers to the snapshot reserve )

Correct me if i am wrong


OK, some clarifying points:

  1. ONTAP does redirect on EVERY write. Not just for snaps. Effectively, NetApp snaps are a by-product of how the system writes, anyway. Every few seconds a snap is created.
  2. The fact that ONTAP redirects every write is the big differentiation vs other vendors that redirect on first write only but, underneath the covers, still suffer from having to update in-place blocks, and all the RAID penalties this entails (6 I/Os for every RAID6 write, for example).
  3. Redirection on every write can result in fragmentation. ONTAP has automatic, real-time ways for dealing with this (free space reallocate, read_realloc). Those optimizations only intervene when necessary in order to optimize the block layout. These features are off by default but are easily enabled and can now work on all data (in the past, reallocation didn't work fully with deduplicated data).
  4. Competing vendors that offer redirect on every write (like ZFS) don't have the block optimization capabilities ONTAP has, which results in degraded performance long-term.
  5. The read_realloc is especially interesting for Database files. For example, consider the case of a DB doing a sequential read after random write (many DB workloads are like that, especially analytics). ONTAP will realize the read attempted was sequential but the blocks had been written randomly in the past by the DB. ONTAP will actually move the blocks in such a way as to make future queries faster (resulting in an intentional fragmentation since the data is moved from where it was to a new place). This is way cool stuff.


Thank you cesaro for the detailed explanation.

Could you help me to understand the following questions.

1. How does WAFL decides that any request is, new write or modification of existing data.

      1a. If it is a new write, Data ONTAP will write the data on free block (one with zeros) and create a pointer (update the root inode about this new block) for that block. Please confirm.

      1b. If it is modification of existing data, how does Data ONTAP identifies it is modification of exiting data and writing the modified data into new block also how Data ONTAP destroying the pointer (AFS Pointer) for the old block and creating new pointer for the modified data?

Is it because Data ONTAP treats the modification request as delete and recreate operation. Obviously when any data is deleted that block is retained by snapshot and the modified(recreated) data will be written into new block. Please confirm.

2. Why does Data ONTAP has the limit of 255 snapshots ? Please explain.

Hi. Post lacks images. Can you fix this issue?

Unable to view the images. Can anyone please fix the issue.


Also have a query.

When the new block is written to the snapshot space then how it is used during read operation as part of the AFS.

Is it like the new block is redirected to from the snapshot space to the AFS or is it like the pointers from the AFS points to the block in the snapshot reserve space?