Tech ONTAP Blogs

Best practices for deploying Amazon FSx for NetApp ONTAP with Terraform

ckeith
NetApp
304 Views

 

If you’re looking for a code-based way to run your infrastructure, it’s easy to deploy Amazon FSx for NetApp ONTAP (FSx for ONTAP) using the AWS Terraform Provider FSx Resource. But are you sure you know what all the arguments are doing?

 

This post will explain the best practices for deploying FSx for ONTAP using the AWS Terraform provider FSX resource. The goal is to highlight important implementation, configuration, and access considerations when using this module. By following these best practices, a developer or administrator will be able to deploy FSx for ONTAP with the correct configuration for their specific needs.

 

Overview of FSx for ONTAP and Terraform

FSx for ONTAP brings the power of NetApp® ONTAP® to AWS as a fully managed AWS-native storage service. It supports storage virtual machines (SVMs), logical volumes, and advanced features such as high availability (HA) with various forms of data protection (NetApp Snapshot™ copies, backups, and cross-region replication) as well as many cost saving features like deduplication, compression, cloning, and tiering.

 

These components provide a robust foundation for critical workloads with support for service continuity and disaster recovery. The ONTAP architecture enables you to easily replicate on-premises environments to the cloud while maintaining familiarity with NetApp tools.

 

AWS Terraform resources that support FSx for ONTAP file systems

The AWS Terraform provider has several Terraform resources that allow you to create and maintain a FSx for ONTAP file system. These resources include:

  • aws_fsx_ontap_storage_virtual_machine: The SVMs keep data isolated, enabling multi-tenant environments and granular storage management. Check out the documentation to learn more. 
  • aws_fsx_ontap_volume: Manages data volumes, optimizing logical storage with security and tiering options. For more information, refer to this documentation. 
  • aws_fsx_backup: This resource allows you to manage the AWS backups of your FSx for ONTAP volumes. For more info, see this documentation.

Best practices for FSx for ONTAP and Terraform deployment

This section details the various parameters for the resources that were just introduced.  For each one, I’ll explain what the parameter does, and give you best-practice recommendations on how to set it.

 

FSx for ONTAP File System

The following details relate to parameters of the aws_fsx_ontap_file_system resource. The parameter name is listed between the parentheses. 

 

Deployment type and high availability (HA) considerations (deployment_type)

An FSx for ONTAP file system is always deployed as a pair of nodes in an HA (High Availability) configuration such that if one of the nodes becomes unavailable for any reason, the other node will pick up operations, typically before any I/Os to the file system have failed. The deployment_type parameter defines two things:

Supported settings are:

  • SINGLE_AZ_1 – Single-AZ with generation 1 systems.
  • MULTI_AZ_1 – Multi-AZ with generation 1 systems.
  • SINGLE_AZ_2 – Single-AZ with generation 2 systems.
  • MULTI_AZ_2 – Multi-AZ with generation 2 systems.

Best practices

  • For deciding between multi-AZ and single-AZ, critical workloads that require the highest level of fault tolerance, multi-AZ is recommended.
  • For deciding between generations, that is dependent on the desired performance requirements, with future growth in mind, since you can’t easily move between generations. If you can predict you’ll need more than what can be achieved with generation one based file system, then you should choose generation two. Otherwise, generation one based systems will be more appropriate.
  • Note that there are cost differences between single and multi-AZ deployments, as well as between generation one and generation two based systems. Read this for more info on the cost differences.

 

File System Credentials (fsx_admin_password)

This parameter sets the ONTAP administrative password for the fsxadmin account which you can use to administer your file system using either RESTful APIs or the ONTAP command-line interface.

 

Best practice

  • It’s not necessary to set a password here immediately. You can do that later using the API or AWS console. The reason you would wait is because otherwise, Terraform would store the password in its state database. It’s okay if you set a password immediately, but the best practice is to retrieve the password from an AWS secret, so you don’t hardcode it in the Terraform script. The sample below follows this best practice.

 

Encryption Key (kms_key_id)

AWS Key Management Service (KMS) manages the encryption keys that protect your data at rest. FSx for ONTAP supports native integration with KMS to ensure that all data is encrypted per security policies.

 

Best practice:

  • If a customer has a security requirement to manage their own encryption keys, then they can provide the ARN to theirs here. Otherwise, most customers just use the AWS managed key. Note that all data is encrypted.

 

Storage Type (storage_type)

In FSx for ONTAP, the storage type is always 'SSD'. Note that some of the other FSx services, such as FSx for Windows, give you different options here, but FSx for ONTAP only supports SSD for the primary storage. However, with its tiering capabilities, FSx for ONTAP can seamlessly move dormant data to capacity storage as a cost-saving option.

 

Disk IOPS Cap (disk_iops_configuration)

This sets the maximum IOPS of the backend disk for the file system. By default, you get 3 IOPS per GB of provisioned SSD storage.

 

Best practice:

  • The default setting is typically fine for most workloads. Workloads that have especially heavy performance demands should have this value set higher. The maximum IOPS is dependent on the throughput_capacity setting, with the ceiling being 160,000 IOPS for Generation-1 file systems and 200,000 IOPS for Generation-2. You can find the limits for each throughput_capacity settings here.

 

Number of HA Pairs (ha_pairs)

With this setting you choose the number of HA pairs deployed within a single file system. In this way, you can have a very large amount of aggregated IOPS, throughput, and storage capacity under a single namespace. The maximum is 12 HA pairs. To have more than one HA pair, you must use a generation two based file system deployed in a single-AZ.

 

Best practice

  • This is based purely on your performance or capacity requirements. If you need more than 6GB/s of throughput or more than 200,000 IOPS. or need more than 512TB of SSD storage in a single file system, then increase this to the number of HA pairs you’ll need to achieve your requirement.

 

Throughput Capacity (throughput_capacity_per_ha_pair)

This parameter controls read and write throughput in MB/s to the backend disks. Unlike IOPS, throughput is crucial for workloads that handle large data transfers, such as distributed file systems.

 

This parameter will set your file system’s throughput capacity in MBps. The valid values you can use here are 12825651210242048 and 4096 for Generation-1 file systems and 384, 768, 1536, 3072 and 6144 for Generation-2 file systems.

 

Note the field throughput_capacity_per_ha_pair supersedes the previous field throughput_capacity therefore it is recommended to use the newer field “throughput_capacity_per_ha_pair” as it is valid for both multiple HA pairs as well as a single HA pair.

 

This parameter also affects the maximum client-side network bandwidth, as well as the CPU capacity for the file system.

 

Best practice

  • Your performance requirements are very dependent on this determinator. Set this to the throughput that you expect your workload will need from the FSx for ONTAP file system. Note that this parameter can be changed after deployment, so it is a good idea to monitor the “Performance” tab, under the “Monitoring & Performance” tab in the AWS console while the file system is under heavy load to see if this parameter should be adjusted (up or down).

 

Capacity (storage_capacity)

The storage capacity is defined in GiB and corresponds to the total high performance SSD space allocated in the FSx for ONTAP filesystem. Note that parameters can be adjusted after deployment, but only by raising it, not lowering it.

 

Valid numbers are from 1024 (1 TiB) to 196608 (192 TiB) for Generation-1 file systems. Up to 524288 (512 TiB) for single HA pair Generation-2 file systems, and up to 1048572 (1 PiB) for Generation-2 instances with more than two HA pairs.

 

Note that the more storage you provision here, the higher your IOPS will be, though it is possible to force a specific IOPS value using the disk_iops_configuration parameter.

 

Best practices

  • Since you can’t lower the amount of SSD space once you provision it, it’s a best practice to avoid overprovisioning. 
  • If you plan to tier data, you can keep this parameter smaller than the total amount of space you need. Thin provisioning will create volumes that equal the size you need and tiering will move data from the SSDs to the capacity tier, freeing up the high-performance SSDs for frequently accessed data.

 

Management and Data Endpoint IP Address Range (endpoint_ip_address_range)

This parameter defines the IP address range from which the NAS and management endpoints will be allocated. If you don’t provide an endpoint IP address range, AWS will select an IP address from within the 198.19.0.0/16 range.

 

Note that this parameter is only used for multi-AZ type deployments. For single-AZ type deployments, the endpoints will be allocated from the subnet’s address range.

 

Best practice:

  • Note that the IP address range must be assigned to your VPC, but for routing reasons, it must not be assigned to any subnets within that VPC. So, it is common practice to add an IP address range to your VPC, without assigning that range to any subnets.

 

Routing (routing_table_ids)

To be able to route traffic to the endpoint_ip_address_range , your routing tables must be modified for that specific address range. This parameter allows you to specify route tables IDs you want AWS to modify to properly route traffic to the file system endpoints.  Note that it only applies to multi-AZ deployments.

 

Network Security (security_group_ids)

This is a list of security group IDs, which will be applied to network interfaces to provide access to the file system. These security groups will apply to all network interfaces

 

Best practice

 

Backup Window (daily_automatic_backup_start_time)

This parameter sets the start time for automatic daily backups to run.

 

Best practices:

  • During any testing phases, it is recommended to disable automatic backups to reduce costs, but in production environments, regular backups are crucial for disaster recovery.
  • Determine the ideal time for backups to take place. Consider that creating backups doesn’t cause any downtime, but for an FSx for ONTAP systems under heavy load there might be a slight performance dip during the backup process.

 

Backup Longevity (automatic_backup_retention_days)

This parameter determines the length of time—in days—that you want to retain the automatic backups, with valid ranges from 0 to 90. Setting this to 0 disables automatic backups.

 

Note that any time you delete an FSx ONTAP file system, by default AWS creates a final backup for all the volumes in that file system. These backups will exist in perpetuity, so unless you want costs to pile up, they should be deleted them manually.

Best practice:

  • Setting your backup retention period is mostly a business decision based on how critical the data is that is stored on the file system, and how far back you think you'll ever be willing to go to retrieve data. On the other hand, consider if you need backups for the system you’re creating—test systems, for instance, probably don’t need backups, and this parameter should be set to 0.

 

Maintenance Window (weekly_maintenance_start_time)

AWS reserves the right to perform maintenance on your FSx ONTAP file system, at the most, once a week. This argument allows you to determine the day of the week and time that maintenance can take place. Note that since every FSx for ONTAP system is deployed as an HA pair, you shouldn’t experience any outages during a maintenance event, which in most cases is just upgrading the operating system.

 

Note that because of the way the CIFS protocol works, all CIFS connections will break twice during a maintenance event (failover, and fail back). However, most CIFS clients will know how to handle this without the risk of data loss (i.e. they will reconnect and retry their last transmission), but some clients don’t so it is something to keep in mind.  The NFS and iSCSI protocols handle failover events without losing any data.

 

Best practice:

  • Set the time of the weekly maintenance during the least critical time of the workloads using the FSx for ONTAP file system.

FSx for ONTAP SVMs

The following details relate to parameters of the aws_fsx_ontap_storage_virtual_machine resource. The parameter name is listed between the parentheses. 

 

Active Directory (active_directory_configuration)

This parameter allows you to integrate FSx for ONTAP with Microsoft Active Directory (AD), enabling centralized authentication and permission management to access data stored in FSx for ONTAP.

 

Best practice

  • This is an ideal solution for corporate environments that use AD to control user access.

 

File System ID (file_system_id)

This parameter allows you to specify the ID of FSx for ONTAP file system on which this SVM will be created.

 

SVM Name (name)

This parameter allows you to set the name of the SVM. You can be up to 47 characters long but restricted to numbers, letters, and the underscore character.

 

Best practice

  • Try to keep it simple. Be careful too: Since the name might be included in telemetry reports sent outside of your company so don't embed company secrets in the name.

 

File Permissions Security Style (root_volume_security_style)

Specifies the root volume security style. There are three values that can be used here: UNIX, NTFS, or MIXED. This root security style will be automatically applied to all the volumes that you create under this SVM, unless explicitly specified when creating the volume. The default value is UNIX.

 

Note that this setting does not affect the SVM’s ability to offer a volume using multiple protocols (i.e., NFS, CIFS, S3). See the security_style setting in the volume section below for more details.

 

Best practice

  • Set this parameter to align with the security style that most of the volumes underneath it will require. Note that MIXED is highly discouraged and should only be used in a specific use case and is not related to the ability to provide multiprotocol access.

 

AWS Tags (Tags)

This parameter lists the possible tags you can associate with the SVM.

 

Keep in mind that if the provider definition in the Terraform configuration file has a default_tags block, any tags defined here with the same key will override the values defined in the default_tags block.

 

Credentials (svm_admin_password)

This password is assigned to the vsadmin user who will manage the SVM.

 

Best practice

  • It’s a best practice to avoid specifying passwords in a Terraform configuration file, since any password you enter will be stored in a plain text file (either the configuration file or the Terraform state file). As an alternative, you can omit using a specific password here, and instead associate the SVM with an AWS Secrets Manager secret with a rotating password function, such as the one defined here. That makes sure people who are unauthorized won’t be able to see the password.

FSx for ONTAP Volumes

The following details relate to parameters of the aws_fsx_ontap_volume resource. The parameter name is listed between the parentheses. 

 

Aggregate to Allocate Space From (aggregate_configuration)

This parameter relates to the aggregate configurations used by FLEXGROUP volumes.

 

Best practice

  • The best practice is to follow AWS’s default of creating eight constituents per aggregate and using all the aggregates in the file system (there will be one aggregate per HA pair).

 

SVM Assignment (storage_virtual_machine_id)

Use this parameter to specify the SVM where the volume will be created.

 

Volume Name (name)

Use this parameter to specify the volume name. It can be up to 203 characters long. Only alphanumeric and the underscore (_) characters are allowed.

 

Best practice

  • Try to keep it simple. Avoid putting company confidential information in the name.

 

File Permissions Security Style (security_style)

This parameter sets the security style for the volume being created. Options include NTFS, UNIX or MIXED. This is useful for controlling access permissions based on the client operating system that will be using the volume.

 

Best practice

  • Avoid using the MIXED security style. It does not imply multi-protocol use. It sets the security style for the volume on a per file basis based on the type of client that made the most previous permissions change (i.e. through either a UNIX or SMB client). That almost always leads to management challenges. You should pick the security style based on the type of client that will be writing to the volume most frequently.

 

AWS Tags (Tags)

This parameter allows to you assign AWS tags to the volume.

 

SnapLock Configuration (bypass_snaplock_enterprise_retention)

Set the parameter to true will give SnapLock® admins the ability to delete FSx for ONTAP SnapLock Enterprise volumes that contain write-once, read-many (WORM) files.

 

Best practice

  • The decision here really comes down to business factors more than anything else, so make sure it aligns with your organization’s retention requirements.

 

Copy AWS Tags when backing up (copy_tags_to_backup)

This parameter indicates whether the volume’s tags should be copied to its backup copies.

 

Best practice

  • More of a business decision, but why risk losing tags that might be needed if you need to restore from a backup?

 

Junction Path (junction_path)

This parameter determines the SVM namespace location for the volume to mount. You need to start with a forward slash for the junction_path, for example/vol9.

 

Best practice

  • Usually, it is a good idea to specify a junction path, unless you are creating the volume to hold LUNs. Otherwise, you can’t access the volume until it has been given a junction path

 

Volume Type (ontap_volume_type)

This parameter determines the ONTAP volume type. Can be set to RW (read-write) for standard volumes or DP (data protection) for replication volumes.

 

Best practice

  • Only set it to DP if you’re planning on making it a SnapMirror® destination volume.

 

Volume Size in bytes (size_in_bytes)

This parameter will determine your volume’s size in bytes. Note that size_in_bytes or size_in_megabytes can be used, though size_in_bytes is required if your volume is going to be larger than 2 petabytes. FLEXGROUP volumes have a minimum size of 100GiB per constituent.

 

Best practice

  • The smallest volume is 20MB for a FlexVol. All volumes are created without a guarantee (i.e. thin provisioned), so the size is more like a quota since it doesn’t consume space until something is put in it.

 

Volume Size in Megabytes (size_in_megabytes)

This parameter determines your volume’s size in megabytes. Note that size_in_megabytes can be used when your volume is smaller than 2 PB. Like with size_in_bytes , there is the 100GiB minimum per constituent for Flexgroup volumes.

 

Best practice

  • The smallest volume is 20MB for a FlexVol. All volumes are created without a guarantee (i.e. thin provisioned) so the size is more like a quota.

 

Perform Final Backup (skip_final_backup)

This parameter determines whether a backup is made of a volume before it is deleted.

 

Best practice

  • If your volume isn’t handling critical data, this value can be set to true so you don’t accidentally create a backup that won’t be automatically deleted.

 

SnapLock Configuration (snaplock_configuration)

The parameter determines whether SnapLock is enabled for this FSx for ONTAP volume.

 

Best practice

  • This will depend on your business needs for WORM storage. Organizations without strict regulatory and indelibility requirements may want to leave this null.

 

Snapshot Policy (snapshot_policy)

This parameter allows you to set the snapshot policy associated with the volume.

 

Best practice

  • More of a business decision. Snapshot creation doesn’t cause any performance degradation and doesn’t consume too much space as long as the volume doesn’t change a lot on a daily basis. Having said that, it’s always a good idea to be aware of your overall disk space.

 

Storage Efficiencies (storage_efficiency_enabled)

This parameter allows you to enable Data ONTAP efficiencies (data compression, deduplication, and compaction)

 

Best practice

  • Recommend enabling this, unless you know you won’t benefit from the efficiencies (e.g. a volume with databases on it).

 

Tiering Policy (tiering_policy)

This parameter sets the tiering policy for the volume The tiering policy allows inactive data to be automatically moved to a lower-cost capacity pool, reducing operational costs. Options include AUTO, SNAPSHOT_ONLY, ALL, and NONE. When setting to AUTO, you can specify the number of days before “cold data” (data that hasn’t been accessed) is moved to the capacity tier. In environments where storage cost is a critical factor, enabling automatic tiering is essential.

 

Best practice

It’s highly recommended to enable tiering in most cases to reduce the amount of costly SSD storage consumed. It provides for seamless tiering of cold blocks to a cost-effective capacity pool. Recommendations are:

  • AUTO for ideal cost/performance for production workloads
  • ALL for backup destinations.
  • SNAPSHOT_ONLY for latency-sensitive mission critical workloads.

 

Volume Type (volume_style)

This parameter allows you to specify the type of volume. Valid values FLEXVOL and FLEXGROUP.

 

Best practice

  • The volume style should be considered in conjunction with your scaling strategy. For a scale-up file system, FLEXVOL is usually the best choice. The only advantage with FlexGroup is that these volumes can be quite large. With scale-out, FlexGroup becomes the default setting. Note, these volumes are typically more performant, unless you have a lot of metadata operations.

 

Convenience Module

NetApp created a Terraform module that helps with the deployment of a FSx for ONTAP file system using Terraform. The module simplifies the deployment by doing the following:

  • Creates the FSx for ONTAP file system.
  • Creates a default SVM.
  • Creates an initial volume.
  • Creates a Security Group with all the default storage-related network ports opened.
  • Creates an AWS Secrets Manager secret, with a rotating password, that will control the password for the default FSx for ONTAP admin user (fsxadmin).
  • Creates an AWS Secrets Manager secret, with a rotating password, that will control the password for the admin user (vsadmin) of the SVM that is created.

Cleaning up after a PoC

When completing a PoC, be sure to perform proper cleanup of resources. This typically starts by running ‘terraform destroy’. Note, this will fail if you created volumes after Terraform has created the file system, so be sure to delete those volumes first. Once everything has been deleted, confirm that you don’t have any backups left over. Automatic backups and persistent volume backups can continue to incur costs even after the primary file system is removed.

Conclusion

With these detailed practices, you’ll be well equipped to deploy FSx for ONTAP with Terraform with the optimal configurations for your workload while maintaining security, performance, and cost controls.

 

To learn more, head to the GitHub repository and see the FSx for ONTAP deployment documentation on AWS.

 

 

Public