Backups are working nearly flawlessly on two separate NFS volumes. I'm snapping each VM nightly and keeping 2 weeks worth of snaps through VSC Backup & Recovery.
Both volumes are fine and the operation of VSC is fine (unless disk usage is too high and then I have problems on any given VM.. not the issue I want to discuss here).
The problem I'm running into is on 2 separate VMs on both of the datastores. Most of the VMs have snapshots that are maybe a few hundred MB each per snap to maybe 1-2 GB. These 2 problem children are snapping nightly 10-15 GB each. I can't figure out why there is so much change within the snapshots so I'm trying to figure it out.
Are there any tools or methods that you use or recommend to determine where disk changes are being made for any given VM? One of them used to have a SQL database server running locally on the disk, but I have since moved the databases to a SQL server that is being managed by SMSQL
I can tell by looking at the snapshots on the Volume
The screenshot of my snapshots with space usage is attached. The two snapshots with their usage boxed in red are from the same VM. So, on a nightly basis, they are generating that much change to make that large a snapshot. I'm just trying to determine what exactly is changing on the disks so much that would make that particular machine snapshot so huge. If you notice, the other snapshots are relatively small in size and include snapshots of an exchange server (that has its data store on the local disk vs on a SMEX managed LUN), 2 SQL servers, a highly used file server and a couple others.
I'm just going based of those nightly snapshots and trying to figure out what on the local disks is changing so much to generate so large of snapshots.
And no, the VM isn't leaving snapshots. Not that are visible in the vSphere client.
So is there only the one VM in the datastore? Those snapshots are at the Volume level which means changes made by any of the VMs contained within the volume will contribute to the snapshot size. Not just the one VM that you quieced. If you have multiple VMs in the Volume you would like want to back them all up at the same time in order to minimize the number of snapshots taken on the volume.
Isn't the whole concept of VSC supposed to mean that you can back up individual VMs without snapping the whole volume? These snapshots are not made from a filer level, they are made within the vSphere Client at a VM level. If this isn't the case, then I think Netapp needs to a better job of making that clear. My understanding is that using the VSC plugin in vCenter and Backup and Restore within the plugin will only snapshot the individual VM that is being backed up, not the entire volume. I can do a VM level Single File Restore and that doesn't seem to cause any issues. I have also been able to restore a single VM without screwing up all the other VMs in the volume so I'm inclined to think that my understanding is the correct one.
I'm sorry if that wasn't clear to you. It is true that you can perform VM level restores and even Single File restores but the backups created are at the Volume level. By selecting a single VM in the VSC you will quiece only that one VM but the snapshot happens at the volume level.
Because of this I recommend customer build their VSC backup jobs at the datastore level rather than at the VM level. This way, all the VMs will be backed up with a single snapshot on the storage controller including any new VMs that are added to the Datastore. You don't need to create or modify the backup job. You still get VM level restores.
Of course having more than one backup job is fine as additional snapshots on the volume don't require more space necessarily or impact performance at all, as long as you stay under the 255 snapshot limit. It does just mean it is difficult to determine which VM is generating the changed blocks.
Ok, I guess that does make sense that a snapshot would be from a volume level rather than a VM level. I really wasn't sure how it was being accomplished but just took it at face value that my VMs were individually backed up. Just a bit confusing is all, I guess.
I'll make sure that I remove the individual jobs and create it at the volume level instead. It certainly will make management of the snapshots easier.
Thank you for your patience in my slowness to understand. I appreciate the information.
Ok, so now that I'm clear on what is actually being snapped, is there a way that I could go through each VM individually and try to determine what has changed from one snap to the next? I do think I found a tool which visually maps out the drives which allows you to save that visual mapping. Then you can take a separate mapping later on (after 24 hours or so) and then can compare the two snapshots so you can see what kind of data change you are looking at.
It could resolve my issues altogether though if one machine was having a virtual snapshot taken while another was taking a volume snapshot at the same time. Basically I had my VMs snapping at staggered intervals so one could be quesing a vmware snapshot while one already completed that, snapped the volume and then deleted the vm. So, in essence, my snapshots may be larger in part because of that whole staggered approach.
Again, thank you for the information. That may make things a ton easier.