Tech ONTAP Blogs

Repeatable cloud infrastructure: Using Terraform for Kubernetes with 1st party NetApp cloud storage

MichaelHaigh
NetApp
661 Views

 

Introduction

 

The cloud infrastructure landscape demands tools that offer speed, scalability, and consistency. Infrastructure as code (IaC), spearheaded by Terraform, empowers teams to automate and manage infrastructure with precision and ease, using code to provision and maintain resources across diverse cloud environments.

 

This blog introduces a GitHub repository designed to fast track the deployment of Kubernetes clusters across the leading cloud providers—AWS, Azure, and Google Cloud. Each provider is paired with NetApp's first-party cloud storage—FSx for NetApp ONTAP, Azure NetApp Files, and Google Cloud NetApp Volumes. The code in each is tailored for NetApp® customers and partners, facilitating a quick start with Kubernetes and first party NetApp cloud storage, leveraging the efficiency of Terraform and the reliability of NetApp storage solutions.

 

With dedicated directories for each cloud provider, the repository sets up Kubernetes clusters configured with NetApp Trident™ container storage interface (CSI), complete with the necessary back ends and storage classes. It showcases how users can quickly deploy NetApp storage solutions in a Kubernetes context, with the flexibility to adapt the code to their specific needs. Join us as we delve into the details of this repository and illustrate how it can serve as a launchpad for your cloud-native storage initiatives with Kubernetes and NetApp.

 

Local repository setup

 

The NetApp 1st Party Cloud Storage and Kubernetes Terraform IaC repository is provided under the MIT license, which means that you’re free to use the code without restriction. If you want to use just a portion of the code, forking or cloning the repository is probably not necessary; instead, you can just copy the necessary components.

 

If you instead want to modify the code, but plan to keep the overall structure, it is recommended that you fork the repository. If you’re not sure how you’ll use the code, or if you just want to get hands-on with Kubernetes and NetApp first-party storage, simply clone the repository and change into the created directory:

 

git clone https://github.com/MichaelHaigh/netapp-1p-k8s-terraform.git
cd netapp-1p-k8s-terraform

 

Before we can deploy our NetApp first-party cloud storage and Kubernetes cluster, we need to configure the credentials that Terraform uses.

 

Credential configuration

 

Although cloud platforms vary in how access credentials and service accounts are created, the Terraform code in this repository works in a similar fashion on each platform by reading the credentials from a file path on the host system. Within each cloud directory the default.tfvars file contains one or more variables with which you can customize the location and name of this file.

 

However, for each cloud there are a number of other ways to authenticate (see AWS, Azure, and Google Cloud), so feel free to modify the code to use a different method.

 

AWS

 

The AWS code has a variable called aws_cred_file that defines the location for the AWS access key file:

 

$ grep aws_cred_file fsxn-eks/default.tfvars 
aws_cred_file            = "~/.aws/aws-terraform.json"

 

This file must be in the following format:

 

$ cat ~/.aws/aws-terraform.json
{
        "aws_access_key_id": "AKIAIOSFODNN7EXAMPLE",
        "aws_secret_access_key": "wJalrXUtnFEMI/K7MDENG/bPxRfiCYEXAMPLEKEY"
}

 

To generate these keys for your user, see this AWS page.

 

Azure

 

The Azure code has a variable called sp_creds that defines the location for the Azure service principal file:

 

$ grep sp_creds anf-aks/default.tfvars 
sp_creds   = "~/.azure/azure-sp-tme-demo2-terraform.json"

 

This file must be in the following format:

 

$ cat ~/.azure/azure-sp-tme-demo2-terraform.json
{
        "subscriptionId": "acb5685a-dead-4d22-beef-ad9330cd14b4",
        "appId": "c16a3d0b-dead-4a32-beef-576623b3706c",
        "displayName": "azure-sp-terraform",
        "password": "11F8Q~4deadbeefNOBbOtnOfN3~FRhrsD9N0SaCP",
        "tenant": "d26875b4-dead-456e-beef-bafc77f348b5"
}

 

To create an Azure service principal and generate the needed password, see this Terraform document.

 

Google Cloud

 

The Google Cloud code has four variables that much be updated to match your environment:

 

$ grep -e sa -e gcp_project  gcnv-gke/default.tfvars
sa_creds           = "~/.gcp/astracontroltoolkitdev-terraform-sa-f8e9.json"
gcp_sa             = "terraform-sa@astracontroltoolkitdev.iam.gserviceaccount.com"
gcp_project        = "astracontroltoolkitdev"
gcp_project_number = "239048101169"

 

To gather the service account key and email, click the service account name on the service account page of the console. You can gather your Google Cloud project name and number from the welcome page of the console.

 

Authorized networks

 

The bottom of the default.tfvars file in each cloud provider directory contains an authorized_networks variable, which is a list of arrays. To access the resources that Terraform destroys, your IP address must be present in this list. If you’re not sure of your IP address, run the following command:

 

curl http://checkip.amazonaws.com

 

Two sample networks are present, one for a range of company VPN addresses and another for a home address. Feel free to modify these examples or to add any number of additional entries.

 

Other variables

 

There are a handful of other variables in each of the default.tfvars files that are worth reviewing at a high level, because they may need to be updated for your environment. In the AWS deployment:

 

  • aws_region: The region to deploy resources.
  • availability_zones_count: The number of availability zones to deploy resources into (must match the length of the eks_p*_subnet_cidrs variables in the VPC Settings section).
  • creator_tag: Change to your username or unique identifier, because  it’s used for tagging and identification purposes.
  • # VPC Settings: These variables control the virtual network and Kubernetes address space. You can leave them as is, unless you plan to peer the deployed virtual network with existing networks that currently have an overlapping address space.
  • # EKS Settings: These variables control the Kubernetes version, number of nodes, and node types to deploy.
  • eks_addons: A list of arrays of EKS add-ons to have Terraform automatically install in the cluster. Optionally add more, or update the versions of the existing add-ons if needed.
  • # FSxN Settings: These variables control the FSx for NetApp ONTAP storage and throughput capacities.

 

In the Azure deployment:

 

  • azr_region: The region to deploy resources.
  • creator_tag: Change to your username or unique identifier, because it’s used for tagging and identification purposes.
  • # VNet Settings: These variables control the virtual network and Kubernetes address space. You can leave them as is, unless you plan to peer the deployed virtual network with existing networks that currently have an overlapping address space.
  • # AKS Cluster Settings: These variables control the Kubernetes and Trident version deployed. They may need to be updated to more recent versions, depending on when you cloned the repository.
  • # Node Pool Settings: These variables control the Kubernetes node pool settings (number and type of nodes).
  • # ANF Settings: These variables control the Azure NetApp Files service level and capacity pool size (in TiB).

 

In the Google Cloud deployment:

 

  • gcp_region: The region to deploy resources.
  • gcp_zones: A list of zones to deploy resources (affects the *_node_count variables).
  • creator_label: Change to your username or unique identifier, because it’s used for tagging and identification purposes
  • # VPC Settings: These variables control the virtual network and Kubernetes address space. You can leave them as is, unless you plan to peer the deployed virtual network with existing networks that currently have an overlapping address space.
  • # GKE Cluster Settings: These variables control the Kubernetes and Trident version deployed. You may need to update them  to more recent versions, depending on when you cloned the repository.
  • gke_private_cluster: Leave as true to have the Kubernetes nodes have only internal IP addresses, and firewalls configured to permit access only from IP addresses defined in the authorized_networks variable. Not recommended: Set to false for public IPs and to permit access from the entire internet (0.0.0.0/0).
  • # Node Pool Settings: These variables control the Kubernetes node pool settings (number and type of nodes).
  • # GCNV Settings: These variables control the Google Cloud NetApp Volumes service level and pool size (in TiB).

 

Now that we’ve covered the available variables and configurations, we’re ready to initialize the environment.

 

Terraform init

 

The provider versions in the main.tf file in each cloud directory are constrained by the ~> operator, as shown in this example:

 

$ head -13 anf-aks/main.tf
terraform {
  required_version = ">= 0.12"
  required_providers {
    azuread = {
      source = "hashicorp/azuread"
      version = "~> 2.53.1"
    }
    azurerm = {
      source = "hashicorp/azurerm"
      version = "~> 4.1.0"
    }
  }
}

 

This operator sets both an upper and a lower bound on the code version, which ensures code compatibility while enabling bug fixes and other minor updates. Depending on your use case and the difference between your usage and when the repository was last updated, it may be beneficial to either:

 

  • Change this operator to >=
  • Update the version number of the required providers to the most recent release

 

This change allows you to use the most up-to-date provider code; however, you run the risk of the underlying Terraform code needing to be updated due to software incompatibilities. If you’re just experimenting, modifications are not needed.

Assuming that you’re not already in a cloud provider directory, change into it now. You need to initialize Terraform, which downloads and installs the specified providers:

 

terraform init

 

We’re now ready to deploy our infrastructure.

 

Plan and apply

 

We first run Terraform plan, which enables us to view the proposed deployment and make sure that there aren’t any issues with the updates made so far. (See the next section if you’re curious about the workspace component.)

 

terraform plan -var-file="$(terraform workspace show).tfvars"

 

Make sure that there are no errors reported and that you aren’t prompted to input any variables. Scroll through the output. If the infrastructure looks as expected, then run Terraform apply:

 

terraform apply -var-file="$(terraform workspace show).tfvars"

 

Enter yes at the prompt and then wait for the infrastructure to be deployed. This can take anywhere between 10 and 60 minutes, depending on the cloud and options selected.

 

The final step in each cloud deployment is installing the Trident back-end configuration and Kubernetes storage classes. Although the precise steps vary by cloud, the overall workflow is:

 

  1. Gather kubeconfig credentials for the deployed Kubernetes cluster.
  2. Download (if it’s not already present on the local file system) and install NetApp Trident (Azure and Google Cloud only; AWS uses the Trident add-on).
  3. Create one or more Trident back ends.
  4. Create one or more Kubernetes storage classes.  

 

To verify that everything was created correctly, you can run the following two commands from your terminal—the deployed Kubernetes cluster should already be your current context:

 

kubectl -n trident get tbc
kubectl get sc

 

The exact output will vary by cloud and deployment configuration, but here are AWS, Azure, and Google Cloud examples.

 

AWS:

 

$ kubectl -n trident get tbc
NAME                    BACKEND NAME            BACKEND UUID                           PHASE   STATUS
backend-fsx-ontap-nas   backend-fsx-ontap-nas   bddd862e-7af0-4584-8e80-5a1d22089450   Bound   Success
backend-fsx-ontap-san   backend-fsx-ontap-san   0a25e388-51cb-439c-a55f-eb8ad0ca1801   Bound   Success
$ kubectl get sc
NAME                        PROVISIONER             RECLAIMPOLICY   VOLUMEBINDINGMODE      ALLOWVOLUMEEXPANSION   AGE
fsx-netapp-block            csi.trident.netapp.io   Delete          Immediate              true                   10m
fsx-netapp-file (default)   csi.trident.netapp.io   Delete          Immediate              true                   10m
gp2                         kubernetes.io/aws-ebs   Delete          WaitForFirstConsumer   false                  36m

 

Azure:

 

$ kubectl -n trident get tbc
NAME                             BACKEND NAME                     BACKEND UUID                           PHASE   STATUS
backend-aks-default-netapppool   backend-aks-default-netapppool   34956d74-70de-4dab-a4ee-06bc4852305c   Bound   Success
$ kubectl get sc
NAME                                    PROVISIONER             RECLAIMPOLICY   VOLUMEBINDINGMODE      ALLOWVOLUMEEXPANSION   AGE
azure-netapp-files-standard (default)   csi.trident.netapp.io   Delete          Immediate              true                   34s
azurefile                               file.csi.azure.com      Delete          Immediate              true                   30m
azurefile-csi                           file.csi.azure.com      Delete          Immediate              true                   30m
azurefile-csi-premium                   file.csi.azure.com      Delete          Immediate              true                   30m
azurefile-premium                       file.csi.azure.com      Delete          Immediate              true                   30m
default                                 disk.csi.azure.com      Delete          WaitForFirstConsumer   true                   30m
managed                                 disk.csi.azure.com      Delete          WaitForFirstConsumer   true                   30m
managed-csi                             disk.csi.azure.com      Delete          WaitForFirstConsumer   true                   30m
managed-csi-premium                     disk.csi.azure.com      Delete          WaitForFirstConsumer   true                   30m
managed-premium                         disk.csi.azure.com      Delete          WaitForFirstConsumer   true                   30m

 

Google Cloud:

 

$ kubectl -n trident get tbc
NAME                                BACKEND NAME                        BACKEND UUID                           PHASE   STATUS
backend-gke-default-standard-pool   backend-gke-default-standard-pool   7aa74b9e-e1ee-4751-bf1b-d97a0a674bb1   Bound   Success
$ kubectl get sc
NAME                             PROVISIONER             RECLAIMPOLICY   VOLUMEBINDINGMODE      ALLOWVOLUMEEXPANSION   AGE
netapp-gcnv-standard (default)   csi.trident.netapp.io   Delete          Immediate              true                   17m
premium-rwo                      pd.csi.storage.gke.io   Delete          WaitForFirstConsumer   true                   22m
standard                         kubernetes.io/gce-pd    Delete          Immediate              true                   22m
standard-rwo                     pd.csi.storage.gke.io   Delete          WaitForFirstConsumer   true                   22m

 

Workspaces

 

In the commands in the previous section, you probably noticed the -var-file argument, which references a Terraform workspace command. Workspaces enable separate instances of state data within the same working directory, so that you can have multiple deployments running at the same time.

 

All of the code in this repository has been designed to natively support workspaces. Perhaps you’ve deployed an environment in the eastern United States, and now realize that you need a second environment in the western U.S. for business continuity and disaster recovery. Simply run the following command to create a new Terraform workspace—providing a descriptive, unique workspace name:

 

terraform workspace new <workspace-name>

 

Then copy the default.tfvars variable file to match your new workspace name:

 

cp default.tfvars <workspace-name>.tfvars

 

Open the <workspace-name>.tfvars file and modify the region variable to a western U.S. region. Optionally update any other variables; for instance, perhaps the default node count can be lower for a disaster recovery workload. Then create your new environment with the same apply command that we previously ran:

 

terraform apply -var-file="$(terraform workspace show).tfvars"

 

The resources deployed contain the <workspace-name> field in their name, rather than default from the default workspace. If you need to switch to the default workspace, run:

 

terraform workspace select default

 

Finally, to view all available workspaces, run:

 

terraform workspace list

 

Destroy

 

When your Kubernetes and NetApp first-party cloud storage has reached the end of its lifecycle, Terraform makes it easy to clean up the deployed resources. Before running the following command, be sure to clean up any resources deployed outside of Terraform, such as  Kubernetes resources like elastic IP address and persistent volumes, or NetApp volume Snapshot™ copies.

 

terraform destroy -var-file="$(terraform workspace show).tfvars"

 

Make sure that the infrastructure displayed in the output is the components you want to destroy, then enter yes at the prompt. This process will take around 10 to 60 minutes, depending on the cloud environment and infrastructure choices.

 

Conclusion

 

In summary, the GitHub repository we've delved into is a testament to the power of combining Terraform's infrastructure as code with NetApp's cloud storage for Kubernetes environments. The combination offers a streamlined pathway for NetApp partners and customers to deploy robust, scalable Kubernetes clusters across the major cloud platforms with the added performance of NetApp first-party storage.

 

The repository's modular design allows customization and scalability, meeting diverse infrastructure needs. With Terraform, you can automate your infrastructure provisioning, manage multiple environments with workspaces, and easily decommission resources when necessary, showcasing flexibility and control over your cloud resources.

 

Whether you're starting fresh or optimizing your cloud infrastructure, this repository stands as a foundational asset for your cloud-native journey. As the cloud landscape evolves, employing such resources will be key to maintaining a competitive edge in the fast-paced world of technology. We encourage you to engage with the community, share your insights, and contribute to the ongoing enhancement of cloud-native storage and management practices.

Public