2012-02-16 09:36 AM - edited 2015-12-18 03:01 AM
The company I work for has decided on a strategy that I am not familiar with and therefore very hesitant about. Let me try to explain…..
They want to perform a daily snapshot for each FS and keep each snapshot for 45 days. Every 30 days they will do a backup to tape for long term retention. They will not do any daily traditional backups to tape or disk. They will only rely on the daily snapshot for DR or your everyday file deletion. It is also my understanding that each snapshot will be replicated to an offsite facility.
I have grown up with EMC so I am not familiar with the NetApp snapshot. Is this strategy a good one or should we rethink this?
2012-02-16 10:25 AM
That is quite safe as long as you have enough room (snap reserve space) to store your delta's for 45 days. otherwise delta will eat the space from the volume capacity and then replicate the volume or qtree to a DR site nightly. I follow the same strategy in my work environment for all the virtual infrastructure and for physical we use TSM. I hope this helps you somewhat, if you still have Q's then consult your netapp PSE and they will guide you....Good Luck
2012-02-16 10:35 AM
That's not the worst idea ever, Where I work we do daily's, but essentially the same as the rest of the proposed architecture change. The daily Snapshots are very trustworthy and an excellent solution if budgetary concerns are driving this, however snapshot replication to another Filer or Snap Vaulting is a great idea. There are a ton of NetApp technologies you can deploy that you may already be licensed for, look at a solution called Sync Sort SyncSort.com, they can leverage all of those on one single pane of glass and make that a little easier on you the admin. 251 individual snapshots per Vol at any time and job scheduling just like netbackup or CommVault. We're doing all our VMware ESXi5, SharePoint, SQL, Exchange 2010 and CIF data...But if you can't go that route, the solution above is sound, just confirm the SnapVaulted off daily snapshots are good and accessible.
2012-02-16 11:16 AM
If you have grown up with EMC snapshots, there are some big differences. The snapshot reserve is "tagged" onto the production space volume. You can use production space if the snapshots outgrow the reserve.
Also EMC snapshots, as of DART 5.5 and 5.6(the last two I used), did not rely on the existing file system. NetApp snapshots do, so corruption of the file system makes the snapshots useless. This is good because they are nearly instantaneous/can be de-duped/only used changed blocks. But I suggest you look into snapvault to backup the system off of the production file system(but still not having to go to tape.)
2012-02-16 11:17 AM
There are many NetApp tools at your disposal relying on the snapshot technology for faster data retrieval during restore: SnapMirror, OSSV, SMVI, SnapVault just to name a few. Which specific strategy to follow would depend on your environment. SAN/NAS protocols? (quiescing on the app level is needed when leveraging the snapshot technology on data hosted on LUNs unless you are using an application that takes care of it during the snapshot creation).
Based on your SLA, you could create flexible volumes that could hold 45 nightly/daily snapshots and configure those volumes at the primary site to be replicated to a secondary site using SnapMirror on an hourly basis (Leverage deduplication, compression, throttling as needed). When configuring the volumes, look into setting the "nosnapdir" option to on otherwise you will have double copies of your data (that is active filesystems backed up to tape & snapshots of the active filesystems ---> higher tape usage.) Please note that the data retention of 24 hours on the primary site seems too long in my opinion, you might consider taking some hourly snapshots during the day on top of the nightly/daily snapshots.
Best of luck with your implementation
2012-02-16 12:35 PM
Thanks, everyone, for your response. My main concern which stems from my lack of knowledge of how a snapshot works is that if the underlying FS (this is all NAS data) gets corrupt. If we are doing 45 day snapshots and a monthly backup to tape every 30 days, what happens if on day 29 the FS gets corrupt? Would we be able to restore from the 28th day snapshot? Or would we have to go all the way back to 30 days ago and restore from tape, losing 29 days worth of changed data??
2012-02-16 08:32 PM
A NetApp snapshot is a COMPLETE point-in-time copy of the data EXACTLY as it was when the snapshot was taken. This means that if on day 28 you take a snapshot, then on day 28.1 corruption occurs, then you can roll back safely to day 28 and your data will be exactly as it was when the snap was taken. The corruption can't really "roll back" and affect previous snapshot data unless they were corrupt to begin with.
2012-02-17 04:09 AM
Just to expand on that reply, perhaps the piece missing from your knowledge is that NetApp snapshots are absolutely read-only (don't confuse with writable FlexClones - they are based on read-only snapshots ;-)
Hence if you were to get data corruption somehow, the corrupt blocks would be new data blocks: the snapshot would remain a safe/sane place to recover from.
In the very rare case where you somehow hit some underlying WAFL corruption, there are support tools that would be used to recover, and your backups (snapshots) should not get impacted. I hesitate to say "will never" because that would bring down the gods of misfortune on us all, but almost all NetApp customers will use snapshots as their recovery point, and are (in my experience!) all very happy with the workings !
As others have said, there are other mechanisms to consider such as SnapMirror, SnapVault and MetroCluster that can extend the protection. From your original description, my only concern would be that you had outlined a good strategy to protect data against many potential data loss scenarios, but not against the loss of your primary site: fire/flood or worse could all impact that primary copy, and many customers now consider some off-site DR sstrategy a "must-have". For that, one or more of the other mechanisms mentioned would generally be recommended !