I have several basic NetApp Snapshot questions that I’ve been unable to find the answers to on the internet, NetApp’s website and at NetApp Support (Case # 2001517538). I’m trying to find a definitive guide/document on how NetApp Snapshots work and how to calculate storage requirements. I have found documents around these subjects, but they explain the process very poorly and leave me with more questions than answers. Question 1: I’m told that the very first volume snapshot is not actually a snapshot of the entire disk, but instead is a snapshot of the file metadata with pointers to the actual data. So if I have a 2 GB volume, with 1 GB of consumed spaced, my first initial snapshot of that volume will be a fraction of the 1 GB of consumed space because the snapshot only capture the metadata with pointers. If this is true, and I had to complete a volume level restore from my first snapshot, then I would need both the snapshot and the original data located on the volume, because my first snapshot only contains metadata with pointers to the actual data on the volume. Is that correct? If my above statement is correct, then does that mean in order to complete a volume level restore, at any point, you need the snapshot along with the original data on the volume? If that’s true, then hypothetically, If I had a volume on shelf1 and completed a snapshot of that volume and the snapshot resided on shelf5, and shelf1 was completely destroyed (for some strange reason) my snapshot located on Shelf 5 is useless, because it only contains metadata and not the actual data that resides on Shelf1. Is there a document that explains all this in detail? I have several other questions, but I’ll hold off on posting them until I get an answer for the questions above. Thanks in advance.
First, keep in mind that snapshots are not intended to protect you against massive disk or shelf failures. For that you will need some kind of backup or replication either via your backup vendor of choice or one of NetApps products. Snapshots protect you against incorrect file deletions, modifications, or corruption. This can be in a file basis, or an entire volume.
Now that we are hopefully clearer there, how do snapshots work? When you first take a snapshot, it consumes no space as it just a copy of the metadata of the volume. However, you can recover the any or all of the data in that volume assuming the underlying disks/aggregate is in tact. This is because after a snapshot is taken any no block that exists at that time can be altered as changes are reflected in new blocks (data or metadata, it doesn't matter). The space consumed by a snapshot will be the amount of 4KB blocks that are changed over the life of that snapshot.
How much space is that? I can't tell you as it will depend on your change rate as well as the lifespan of the snapshot in question. The good news is that 4KB blocks are typically better than whole file incrementals which most backups do. The other good thing is that volume size as well as snap reserve (if you choose to use one which depends on the type of data as to if that makes sense) are totally virtual in that they can be grow or shrunk on the fly. That along with autogrow and autodelte policies make this part of space management easier than it sounds.
So where to start? Pick a value like 10% and watch it for a few weeks. It will become clear if you need to tweak it. Start higher if you want to be more conservative and use the policies described above.
That's my first cut at helping. Feel free to post follow-ups and I'll do my best.
Thanks for the response. Attached to this post is a document I got on NetApp’s website titled “Accessing Volume Size” states “The first Snapshot copy lock uses disk space equal to the LUN object size itself (therefore, double the requirement). Additional Snapshot copies increase the amount of required disk space.” But you posted “ When you first take a snapshot, it consumes no space as it just a copy of the metadata of the volume. However, you can recover the any or all of the data in that volume assuming the underlying disks/aggregate is in tact. This is because after a snapshot is taken any no block that exists at that time can be altered as changes are reflected in new blocks (data or metadata, it doesn't matter). The space consumed by a snapshot will be the amount of 4KB blocks that are changed over the life of that snapshot.” So there’s a contradiction in the two statements, one saying a Snapshot takes virtually not space and the other saying a snapshot is equal to the LUN object size. Can you please clarify the two. I’ve attached the document for your review.
Ok, I think I know the confusion here. LUNs are a slightly different case, although it doesn’t have to be that way (and for most customers these days it isn’t).
What happens by default (and I think this will change as a default behavior) is LUN space reservations will kick in at the first snapshot. This is based on the old idea of 2x+deltas, which is not typically used at customer sites these days. Way back when LUNs were first introduced it was a best practice, but that has been long changed.
So in SAN environments these days, I recommend turning off all LUN space reservations and managing space with volume autogrow and snap autodelete. These policies are a much more efficient way to manage volume space in a SAN volume.
Basically there is a problem that ANY SAN vendor that offers snapshots has to solve. The issue is what happens when your snapshot space runs out? The problem is that the host thinks that there is space left, but there is no more room due to snapshots? In the early days, we would reserve 100% of the space of the LUN up-front and then when the space fills, we would stop allowing new snapshots and then use that reserve space to handle any over-writes. However that is really not space efficient, so subsequently (and by that I mean 5+ years ago now), we introduced the ideas of volume autogrow and snap autodelete to handle space issues.
So if you don’t use LUN space reservations, snapshots work as I described.