Hi Pierre
pierrek wrote:
--> an agent running on the OpenStack controller calls the nova command that takes the snapshot of a set of Instances
first question: this is a qcow2 2 snapshot I guess? Where is it stored? Does it make sense to have a snapshot of the ephemeral storage of a running instance?
For the OpenStack installation I used devstack and went mostly with the default configuration to get running quickly (2 day time limit for the challenge). I only configured cinder with the iSCSI driver to use a NetApp volume for cinder volume storage. IIRC per this configuration the instance snapshots were stored within the file system the installation was running on, from what I remember under /opt/openstack/nova/instances/. IIRC, those snapshots were in qcow2 format. I'm not very knowledgeable about the correct way to consistently backup VMs, so I can't comment whether it makes sense to snapshot the ephemeral storage. In my plug-in and scenario I didn't take instance snapshots since in my setup they were not stored on the NetApp storage. But I don't see any reason why this shouldn't work, it should mostly be another yet similar API call.
pierrek wrote:
second question: does this snapshot also work for the vda disk provided by cinder?
This is actually what I did. I suspended the VMs in OpenStack which had a cinder volume attached. I then took cinder volume snapshots of all those attached and also not attached volumes. I don't claim this is the right way to do snapshot based VM backups, there is probably more to it like sync'ing and freezing the file system, but in my controlled environment this approach was good enough; remember: 2 day time limit for the challenge.
pierrek wrote:
--> When the nova snapshot is made, we freeze the state of the instances in a NetApp snapshot (on both the cinder NFS volume and the volume used to store instance ephemeral storage).
--> Finally the nova snapshot is removed using a command executed by the SC agent on the OpenStack controller.
This approach is kind of inspired by what the VSC does in a VMWare context, but is it valid? Even conceptually only? What do you think?
Again, I can't comment on whether this is a valid approach for consistent snapshotting, but from what I learned while doing the proof-of-concept it should be reasonably straight forward to create an agent plug-in that orchestrates the needed API calls required for the outlined approach. How would a restore be orchestrated?
If you plan to work on this I would be very interested to collaborate.
Thanks,
Thomas