Accelerating Remote Work: Harnessing FlexCache in AWS WorkSpaces for Data Locality

Shaun_P · ‎2024-04-09

Supporting a robust Virtual Desktop Infrastructure (VDI) for a worldwide workforce presents complex challenges. IT teams must dynamically scale remote user access to meet fluctuating business demands and strategically locate applications and data, whether on-premises or in the cloud. In this context, NetApp's FlexCache, a remote caching feature within Amazon FSx for NetApp ONTAP (FSx for ONTAP), is pivotal for achieving seamless scalability and optimal data placement. This blog post will discuss how FlexCache can significantly improve data access performance across multi-regional AWS WorkSpaces deployments.

Customer Success Story

A global energy company's transition to remote work during the COVID-19 pandemic exemplifies the challenges of data locality. The company's datacenters, previously near their engineers and geoscientists, suddenly became distant from the remote workforce, leading to a spike in support cases due to degraded data access performance. Collaborating with AWS and NetApp, the company successfully implemented a scalable, secure VDI solution using Amazon WorkSpaces in conjunction with FSx for ONTAP. With the implementation of FlexCache, data retrieval times was significantly decreased, empowering their remote workforce to access and analyze seismic data with near-on-premises efficiency.

Amazon WorkSpaces is a fully managed Desktop as a Service, providing cloud-based desktops running Microsoft Windows, Amazon Linux 2, or Ubuntu, accessible from any location and device. FSx for ONTAP offers a fully managed shared storage on the AWS Cloud, complete with the popular data access and management capabilities of ONTAP.

FlexCache Technical Overview

FlexCache is an intelligent remote caching feature within FSx for ONTAP, designed to optimize file distribution across NFS and CIFS/SMB protocols while minimizing WAN latency and bandwidth costs. This solution automates the caching process, dynamically adapting to user access patterns and reducing the need for manual cache management, thus ensuring peak performance with minimal administrative effort.

Engineered for scalability, FlexCache allows the creation of multiple caches from a single origin volume, enabling efficient data caching across various AWS Regions and Availability Zones — key to supporting the dynamic needs of Amazon WorkSpaces. It employs a 'write-around' process to maintain high levels of data consistency and freshness, with write operations being directed to the origin volume for coordinated updates and cache invalidation. This process guarantees that all users have access to the latest data, regardless of their location.

Ideal for read-intensive applications, FlexCache provides ideal solutions for workloads such as:

EDA - Electronic Design Automation (Chip Manufacturing)
Film, TV and Streaming media rendering
Data-intensive scientific computing like AI, machine learning, and deep learning
Unstructured NAS data
DevOps and software-build environments
Any workload where cloud-bursting is required, for example bursting to AWS to test GenAI solutions like Amazon Bedrock

For a more comprehensive understanding of FlexCache and its capabilities, refer to the FlexCache Technical Report.

Solution Overview

In the architecture outlined in Figure 1, Amazon WorkSpaces users spanning Region 1 and 2 require rapid access to seismic data crucial for their modelling and visualization tasks. This data, consisting of frequently updated surface resolution maps, resides within the FSx for ONTAP file system located in Region 1.

With a setup and configuration optimized according to AWS and NetApp best practices, FlexCache can significantly improve the performance of data access in a multi-regional AWS WorkSpaces environment. This walkthrough assumes that the environment in Figure 1 has been configured as per AWS best practices for VPC, network setup, and WorkSpace deployments. For detailed information about VPC design and basic environment setup, see the Appendix at the end of this blog.

Figure 1. VPC design for a multi-regional AWS WorkSpaces and FSx for ONTAP deployment Figure 1. VPC design for a multi-regional AWS WorkSpaces and FSx for ONTAP deployment

The subsequent sections will delve into the creation of a FlexCache volume bridging Region 1 and 2, aimed at delivering low-latency data access to Linux users through NFS. Equivalent configurations are applicable to Windows users utilizing CIFS/SMB.

Figure 2 shows the workflow to create a FlexCache volume. The high-level steps include:

Establish a Cluster Peer Relationship to facilitate inter-cluster communication.
Create an SVM (Storage Virtual Machine) Peer Relationship to manage storage resources and permissions across clusters.
Configure a FlexCache Volume on the remote cluster that will serve the frequently accessed data.
Optionally, pre-populate the FlexCache Volume to prime the cache with data and accelerate initial access times for users.
Monitor data transfer from the FlexCache volume to evaluate the performance improvements realized through this configuration.

Figure 2. FlexCache volume creation workflow Figure 2. FlexCache volume creation workflow

Create a cluster peer relationship

As the FlexCache volume will be created in a different file system, a cluster peer relationship needs to be created between the two file systems. Inter-cluster traffic between the two file systems must be allowed.

On the destination file system in Region 2 (FsxIdREGION2), create a peer relationship with the source file system in Region 1 (FsxIdREGION1) using the ONTAP CLI.

FsxIdREGION2::> cluster peer create -generate-passphrase -offer-expiration 2days -initial-allowed-vserver-peers *
Notice:
        Passphrase: +Wv3SiXFcugb40suv9qZgnTK
        Expiration Time: 10/02/2024 10:10:00 +00:00
        Initial Allowed Vserver Peers: *
        Intercluster LIF IP: 172.31.30.166
        Peer Cluster Name: Clus_gnTK (temporary generated)

        Warning: make a note of the passphrase - it cannot be displayed again.

On the source file system FsxIdREGION1, accept and authenticate the cluster peer request using the inter-cluster endpoint IP addresses of the destination file system FsxIdREGION2 depicted in Figure 3.

Figure 3. Administration tab of destination file system FsxIdREGION2 in AWS Console Figure 3. Administration tab of destination file system FsxIdREGION2 in AWS Console

FsxIdREGION1::> cluster peer create -peer-addrs 172.31.30.166, 172.31.7.72
Notice: Use a generated passphrase or choose a passphrase of 8 or more
        characters. To ensure the authenticity of the peering relationship, use
        a phrase or sequence of characters that would be hard to guess.

Enter the passphrase: <ENTER PASSPHRASE GENERATED IN THE PREVIOUS STEP>
Confirm the passphrase: <ENTER PASSPHRASE GENERATED IN THE PREVIOUS STEP>

Notice: Clusters "FsxIdREGION1" and "FsxIdREGION2" are peered.

FsxIdREGION1::> cluster peer show
Peer Cluster Name         Cluster Serial Number Availability   Authentication
------------------------- --------------------- -------------- --------------
FsxIdREGION2              1-88-000000           Available      ok

Create an SVM peer relationship

On the source file system FsxIdREGION1, create a Storage Virtual Machine (SVM) peer relationship between the source SVM (origin_SVM) and the FlexCache SVM (flexcache_SVM).

FsxIdREGION1::> vserver peer create -vserver origin_SVM -peer-vserver flexcache_SVM -peer-cluster FsxIdREGION2 -applications flexcache
Info: [Job 44] 'vserver peer create' job queued

Accept the SVM peering request on the destination file system FsxIdREGION2.

FsxIdREGION2::> vserver peer accept -vserver flexcache_SVM -peer-vserver origin_SVM
Info: [Job 57] 'vserver peer accept' job queued

FsxIdREGION2::> vserver peer show
            Peer        Peer                           Peering        Remote
Vserver     Vserver     State        Peer Cluster      Applications   Vserver
----------- ----------- ------------ ----------------- -------------- ---------
flexcache_SVM
            origin_SVM  peered       FsxIdREGION1      flexcache      origin_SVM

Create a FlexCache volume

When sizing a FlexCache volume, a best practice is to consider the working set — the subset of data actively used during a specific task or process. To determine the optimal size for the FlexCache volume, calculate the size of the working set and add an overhead of approximately 25% to accommodate any unpredictable increases in data access patterns. If the working set size is not well-defined, a general rule of thumb is to allocate 10-15% of the origin volume size to the FlexCache volume. For further guidance on sizing FlexCache volumes, refer to the FlexCache Technical Report.

For an origin volume of 1 TB, containing a working set of surface resolution maps that totals 31 GB, calculating the ideal FlexCache volume size involves adding a 25% buffer to the working set size:

Start with the working set size: 31 GB
Calculate 25% of the working set size for the overhead: 31 GB * 0.25 = 7.75 GB
Add the overhead to the working set size to determine the total FlexCache volume size: 31 GB + 7.75 GB = 38.75 GB

Round the FlexCache volume size to 40 GB to facilitate volume management and provisioning. This aggressive sizing strategy ensure that all read operations are primarily serviced by the cache, effectively reducing the frequency of data retrieval from the origin volume, and improving overall access speed.

FsxIdREGION2::> volume flexcache create -volume flexcache_vol -vserver flexcache_svm -origin-vserver origin_SVM -origin-volume origin_vol -size 40GB -junction-path /flexcache_vol -aggr-list aggr1 -foreground true
[Job 394] Job is queued: Create FlexCache volume flexcache_vol.
[Job 394] Job succeeded: Successful.

FsxIdREGION2::> volume flexcache show
Vserver Volume      Size       Origin-Vserver Origin-Volume Origin-Cluster
------- ----------- ---------- -------------- ------------- --------------
flexcache_svm
        flexcache_vol
                    40GB       origin_SVM     origin_vol    FsxIdREGION1

Prepopulate the FlexCache volume

A newly created FlexCache volume will initially contain no data, which means that the first client read request will experience a performance penalty as data must be fetched from the origin volume. However, from the perspective of the client that has mounted the FlexCache volume, the volume will appear exactly like the origin volume, regardless of whether anything is cached. To mitigate this initial performance hit, it is recommended to pre-warm the cache with the necessary data, a process known as pre-population in ONTAP.

In destination file system FsxIdREGION2, run the prepopulate command at the advanced privilege level and pass the path of the directory to be prepopulated in path-list (/seismic_data).

FsxIdREGION2::> set -privilege advanced
Warning: These advanced commands are potentially dangerous; use them only when directed to do so by NetApp personnel.
Do you want to continue? {y|n}: y

FsxIdREGION2::*> volume flexcache prepopulate start -cache-vserver flexcache_svm -cache-volume flexcache_vol -path-list /seismic_data -isRecursion true
[JobId 439]: FlexCache prepopulate job queued.

FsxIdREGION2::*> job show -id  439 -instance
                      Job ID: 439
              Owning Vserver: flexcache_svm
                        Name: FlexCache prepopulate job for volume "flexcache_vol" in Vserver "flexcache_svm".
                 Description: FLEXCACHE PREPOPULATE JOB
                    Priority: Medium
                        Node: FsxIdREGION2-02
                    Affinity: Cluster
                    Schedule: @now
                  Queue Time: 10/02 10:28:47
                  Start Time: 10/02 10:28:47
                    End Time: 10/02 11:05:39
              Drop-dead Time: -
                  Restarted?: false
                       State: Success
                 Status Code: 0
           Completion String: Successful.Total number of files read by FlexCache prepopulate job are "1".
                    Job Type: FLEXCACHE PREPOPULATE
                Job Category: FLEXCACHE
                        UUID: abc123ab-c123-abc1-23ab-c123abc123ab
          Execution Progress: -
                   User Name: fsxadmin
Restart Is Delayed by Module: -

Examine FlexCache performance

To assess the performance enhancements provided by FlexCache, a comparative analysis of read operations can be conducted between the origin volume and the FlexCache volume within an AWS WorkSpaces client situated in Region 2.

Begin by mounting both the origin and FlexCache volumes on a WorkSpaces client in Region 2.

sphua@U-111:~$ mkdir origin_vol
sphua@U-111:~$ sudo mount -t nfs -o nconnect=16 10.0.4.172:/origin_vol ./origin_vol

sphua@U-111:~$ mkdir flexcache_vol
sphua@U-111:~$ sudo mount -t nfs -o nconnect=16 172.31.255.213:/flexcache_vol ./flexcache_vol

Once both volumes are successfully mounted, measure the read performance from the origin volume using the dd command.

sphua@U-111:~/origin_vol/seismic_data$ ls -lh
total 31G
-rwxrwxrwx 1 sphua domain users 31G Feb 16  2024 surface_res_map.tif

sphua@U-111:~/origin_vol/seismic_data$ dd if=/home/sphua/origin_vol/seismic_data/surface_res_map.tif of=/dev/null bs=1M
31000+0 records in
31000+0 records out
32505856000 bytes (33 GB, 30 GiB) copied, 3709 s, 8.8 MB/s

The operation's result indicates that the read operation from the origin volume averaged a rate of 8.8 MB/s, taking 3709 seconds to complete. Next, evaluate the read performance from the FlexCache volume using the same dd command.

sphua@U-111:~/flexcache_vol/seismic_data$ ls -lh
total 31G
-rwxrwxrwx 1 sphua domain users 31G Feb 16  2024 surface_res_map.tif

sphua@U-111:~/flexcache_vol/seismic_data$ dd if=/home/sphua/ flexcache_vol/seismic_data/surface_res_map.tif of=/dev/null bs=1M
31000+0 records in
31000+0 records out
32505856000 bytes (33 GB, 30 GiB) copied, 4.00398 s, 8.1 GB/s

The result shows a significant performance increase, with the pre-populated FlexCache volume achieving an average read rate of 8.1 GB/s and completing in just 4 seconds. This represents a dramatic improvement of 99.9% in read speed compared to the origin volume, highlighting the potential efficiency gains achievable with FlexCache. However, it is crucial to note that actual performance may vary based on factors such as network conditions, file size, and available system resources.

Conclusion

In summary, FlexCache is a powerful tool within Amazon FSx for NetApp ONTAP that tackles the complex challenges of data locality by delivering data closer to users with enhanced throughput. Through strategic data caching, organizations can overcome geographical constraints, boosting productivity for dispersed teams and remote users significantly.

Embrace the power of FlexCache and witness a transformative uplift in your organization's operational efficiency. Start by initiating a pilot program today, and let the compelling results speak for themselves. The advantages outlined in this blog post are just the beginning of what FlexCache can offer.

I hope this exploration into FlexCache technology has been informative and helpful. The FlexCache Technical Report is the definitive source for the technical intricacies and best practices of FlexCache. If you have any questions or require additional guidance on optimizing your FlexCache deployment, don't hesitate to reach out for professional support.

Appendix

A. VPC design for multi-regional WorkSpaces and FSx for ONTAP deployment

Designing a VPC for a multi-regional WorkSpaces and FSx for ONTAP environment involves considering many factors, such as the location of existing infrastructure, data, applications, and users. Refer to this whitepaper on the best practices for deploying Amazon WorkSpaces.

Figure 1 shows a possible multi-regional WorkSpaces and FSx for ONTAP environment implemented across two AWS Regions, four Availability Zones (AZ), and an existing on-premises network. This design benefits a geographically dispersed user base, as users can connect to their nearest WorkSpace Region. This setup allows users to connect using lower latency network connections than they would have if WorkSpaces were implemented within a single global Region.

The corporate data center hosts the Active Directory Domain Controller and a proxy server, ensuring global internet connectivity for all WorkSpaces. AWS Transit Gateway service is used to link all the geographically dispersed VPCs and on-premises network into a single network. This approach also allows the shared use of a common AWS Direct Connect connection from AWS back to the on-premises network across all Regions and the implementation of a Site-to-Site VPN as a redundant connection in the event of loss of connectivity when using AWS Direct Connect. The use of an AWS Transit Gateway addresses the transitive routing limitation imposed when using VPC peering and simplifies the addition of new regions into this environment.

B. Basic Environment Setup

To implement the environment as shown in Figure 1, complete the following high-level steps:

Create an AD Connector to the on-premises AD in a dedicated VPC (WorkSpaces VPC) in Region 1.
Launch Amazon WorkSpaces in the WorkSpaces VPC in Region 1 with the AD Connector created above.
Launch a multi-AZ FSx for ONTAP filesystem in a dedicated VPC (FSx VPC) in Region 1.
Create a Transit Gateway in Region 1 and attach both the WorkSpaces VPC and FSx VPC to the gateway.
Add routes between the transit gateway and the VPCs.
Repeat steps 1 to 5 in Region 2.
Create a peering attachment between the Transit Gateways in Region 1 and 2, and add a static route to the transit gateway route table that points to the peering attachment.
Associate the Direct Connect Gateway with the Transit Gateway in Region 1 and create a transit virtual interface to the Direct Connect Gateway.
Create a transit gateway VPN attachment to the Transit Gateway in Region 1.

Shaun Phua is a Cloud Solutions Engineer at NetApp specializing in Amazon FSx for NetApp ONTAP file services. In this role, Shaun collaborates with customers to design and build secure, scalable, hybrid cloud storage solutions across their on-premises datacenters and the cloud. Outside of work, Shaun loves travelling and trying new foods 🥘 and drinks 🍻!