Tech ONTAP Blogs
Tech ONTAP Blogs
Authors: Ore Adesina, Diane Patton, NetApp
In today's cloud-native landscape, data protection is paramount, especially when dealing with stateful applications deployed on Kubernetes. Many enterprise customers use dynamic volume creation and Container Storage Interface (CSI) provisioners to create individual volumes for their workloads. Although CSI-compliant provisioners can do manual snapshots, according to the specification they may not do snapshot schedules, backups, or backup schedules. Therefore, another overlay solution must be used to add data protection to the volume when it is created by a dynamic provisioner, such as NetApp® Trident™.
Because Google Cloud NetApp Volumes is a first-party Google Cloud service, it fully integrates with other Google Cloud services, forming a complete and comprehensive solution. The creation of a new volume is automatically logged in Google Cloud Observability and Logging. The log entry can be filtered and then sent to a topic in Google Pub/Sub for distribution elsewhere. In this case, we send the log entry to Google Cloud Run functions via Eventarc, thereby invoking a script to add data protection to the newly created volume. The following figure illustrates this solution.
This blog post explores how several Google Cloud services can work in tandem to automatically add robust data protection capabilities to new volumes created dynamically. As an example, we delve into the mechanisms that allow Google Cloud Logging to communicate the existence of new volumes on NetApp Volumes to a Pub/Sub topic. We can then subscribe Google Cloud Run functions to that topic and invoke a script on a trigger in Google Cloud Run functions to turn on data protection for the new volume in NetApp Volumes.
The Container Storage Interface (CSI) is a specification for exposing arbitrary block and file storage systems to the container orchestrator, Kubernetes.
Unlike monolithic applications deployed on bare metal or virtual machines, applications deployed on Kubernetes are generally deployed as microservices, meaning that they are deployed as small separate interconnected pieces. These pieces (e.g. Kubernetes pods) deploy quickly and are constantly being created and deleted to scale the application up and down and provide seamless upgrades. They may also need persistent volumes.
CSI provisioners are easy to use and automatically create persistent volumes (PVs) on demand when a new Kubernetes persistent volume claim (PVC) is discovered—generally when it is deployed by a developer. Some CSI provisioners create an entire volume to attach to a Kubernetes PV. This means that volumes are created on the fly as needed, not by humans or predetermined scripts. Therefore, if it is required to protect these volumes, they must first be discovered before a backup policy is applied. This is where the other Google Cloud services come in to accomplish that task.
In summary, CSI provisioners enable highly automated and dynamic storage management in Kubernetes environments, creating volumes as needed for microservices. However, the specification doesn't natively handle advanced data protection features like scheduled snapshots or backups, so integration with other services is required for a complete solution.
Google Cloud Run functions, along with Pub/Sub and Eventarc, can play a crucial role in dynamically provisioning data protection for dynamic volumes by automating the interaction between volume log entries and data protection. This automation ensures that as new volumes are dynamically created, the necessary data protection policies are applied automatically and consistently.
This section describes how to set up automation to apply predetermined backup policies to new Trident created volumes. A similar process can be used to attach snapshot schedules or other configuration changes to dynamically created volumes.
In the console, locate the Pub/Sub service and the option to create a new topic. You need to provide a name for the new topic and specify any relevant configurations, such as access controls or message retention policies. Once configured, you can confirm the creation, and the new Pub/Sub topic is ready to receive messages, as shown in the following screenshot.
If you don’t create a default subscription to the topic, you need to create a manual subscription.
As an example, we created a default subscription, which we’ll edit in Step 5.
In your Google Cloud project, go to Observability and Logging and click Logs Explorer. When Trident creates a new volume, it is logged in Cloud Logging, as shown in the following example.
In this case, we see the principal used for the volume, the region, and the volume name. Trident volumes are created with “pvc” as a preface, and we can see the authentication information, so we know that the volume was created by Trident. Any of the attributes shown in the example can be filtered on to ensure that the correct new volumes are selected and attached to the correct backup policy. Be sure to include last: true in your inclusion filter to prevent duplicates.
On the left select Log Router. You will route logs of the Trident created volumes to the newly created Pub/Sub topic.
On the Log Router page, click Create a New Sink. In the new page that opens, select Cloud Pub/Sub Topic for the sink service and select the new topic you just created in Pub/Sub.
Create an inclusion or exclusion filter by using logging query language to include the unique logs of the Trident created volume in the sink. (Hint: Gemini can help with this.)
We used the following inclusion filter for the example; it may look different than yours. We put the entire filter under line 1 in the Build inclusion filter. We are only filtering new volumes created by Trident in zone us-east1-b.:
resource.type="audited_resource"
AND operation.last="true"
AND protoPayload.serviceName="netapp.googleapis.com"
AND protoPayload.methodName="google.cloud.netapp.v1.NetApp.CreateVolume"
AND protoPayload.resourceName =~ "^projects/REDACTED/locations/us-east1-b/volumes/pvc-"
AND protoPayload.authenticationInfo.principalSubject="ServiceAccount:trident-gke-cloud-identity@cvs-pm-host-1p.iam.gserviceaccount.com"
AND severity="NOTICE"
Google Cloud Run functions provide an ideal serverless platform for executing the automation necessary for dynamic data protection. The event-driven nature of the platform allows it to react to changes in your Kubernetes and Google Cloud environment, making it perfect for automating data protection. We will use Eventarc to trigger a Cloud Run functions script based on the filtered logs with Pub/Sub as the transport layer to send the log to Cloud Run. We will need to pull out the volume name from the log to attach the policy.
You may want to create a new service account to run Eventarc with limited roles, adopting the principle of least privilege. As example, we created a new service account called cloud-run-account and assigned the roles Cloud Functions Invoker, Eventarc Event Receiver, Google Cloud NetApp Volumes Admin, and Pub/Sub Subscriber.
In Google Cloud Run, click Create a Service. Select “Use an inline editor to create a function.” .
Enter a service name and select your region and runtime. Select Pub/Sub Trigger from the drop down Trigger menu.
The following screen opens. Name the trigger and select the region and Pub/Sub topic created in Step 1.
Service URL Path can remain blank because this process is entirely within Google Cloud.
Select a service account that this process will run from, with appropriate roles. Optionally add a label and then save the trigger.
For more information about creating triggers from Pub/Sub events, see the documentation.
Next, open “Containers, volumes, networking and security.” Enter any relevant information. [Hint: if you plan to use environmental variables in your function, you can enter them here.]
Click Create to open the service.
Write a script to attach a backup policy to the volume. Be sure that you already have a backup policy and a backup vault configured in NetApp Volumes.
Open the Cloud Run service that you just created and go to the Source tab, where you can enter your code to attach the backup policy.
After entering all relevant information, click Deploy to deploy the function.
Go back to your Pub/Sub topic (Step 1) and edit your subscription. Change from pull to push, and enter the endpoint for the application you just created in Cloud Run functions.
If you want to create a specific service account for the subscription, be sure that it includes the iam.ServiceAccountTokenCreator role. For more information about authentication, see the documentation.
Now, whenever Trident creates a new PV and backend volume, the specified backup policy and backup vault are added to it automatically.
$kubectl create -f pvcsampleflexrwx1.yaml -n test
persistentvolumeclaim/flex-pvc-rwx1 created
~$kubectl get pvc -n test
NAME STATUS VOLUME CAPACITY ACCESS MODES STORAGECLASS VOLUMEATTRIBUTESCLASS AGE
flex-pvc-rwx1 Bound pvc-a852414d-979d-4315-902b-a1623c7771e3 10Gi RWX gcnv-flex-k8s-nfs <unset> 12s
~$kubectl get pv
NAME CAPACITY ACCESS MODES RECLAIM POLICY STATUS CLAIM STORAGECLASS VOLUMEATTRIBUTESCLASS REASON AGE
pvc-a852414d-979d-4315-902b-a1623c7771e3 10Gi RWX Delete Bound test/flex-pvc-rwx1 gcnv-flex-k8s-nfs <unset> 9s
Looking at the volume in the Google Cloud console, we see that the backup policy has been automatically applied!
The integration of Google Cloud NetApp Volumes with other Google Cloud services provides a robust and automated solution for data protection of dynamically provisioned volumes in Kubernetes environments. By leveraging Google Cloud Logging, Pub/Sub, Eventarc, and Cloud Run, organizations can be sure that their stateful applications running on Kubernetes are resilient and that their data is consistently protected, even with the ephemeral nature of microservices and dynamic volume creation. This proactive approach to data protection minimizes manual intervention, reduces the risk of data loss, and enhances the overall reliability of cloud-native deployments. Try it out! Let us know what you think.