Tech ONTAP Blogs
Tech ONTAP Blogs
Supporting a robust Virtual Desktop Infrastructure (VDI) for a worldwide workforce presents complex challenges. IT teams must dynamically scale remote user access to meet fluctuating business demands and strategically locate applications and data, whether on-premises or in the cloud. In this context, NetApp's FlexCache, a remote caching feature within Amazon FSx for NetApp ONTAP (FSx for ONTAP), is pivotal for achieving seamless scalability and optimal data placement. This blog post will discuss how FlexCache can significantly improve data access performance across multi-regional AWS WorkSpaces deployments.
A global energy company's transition to remote work during the COVID-19 pandemic exemplifies the challenges of data locality. The company's datacenters, previously near their engineers and geoscientists, suddenly became distant from the remote workforce, leading to a spike in support cases due to degraded data access performance. Collaborating with AWS and NetApp, the company successfully implemented a scalable, secure VDI solution using Amazon WorkSpaces in conjunction with FSx for ONTAP. With the implementation of FlexCache, data retrieval times was significantly decreased, empowering their remote workforce to access and analyze seismic data with near-on-premises efficiency.
Amazon WorkSpaces is a fully managed Desktop as a Service, providing cloud-based desktops running Microsoft Windows, Amazon Linux 2, or Ubuntu, accessible from any location and device. FSx for ONTAP offers a fully managed shared storage on the AWS Cloud, complete with the popular data access and management capabilities of ONTAP.
FlexCache is an intelligent remote caching feature within FSx for ONTAP, designed to optimize file distribution across NFS and CIFS/SMB protocols while minimizing WAN latency and bandwidth costs. This solution automates the caching process, dynamically adapting to user access patterns and reducing the need for manual cache management, thus ensuring peak performance with minimal administrative effort.
Engineered for scalability, FlexCache allows the creation of multiple caches from a single origin volume, enabling efficient data caching across various AWS Regions and Availability Zones — key to supporting the dynamic needs of Amazon WorkSpaces. It employs a 'write-around' process to maintain high levels of data consistency and freshness, with write operations being directed to the origin volume for coordinated updates and cache invalidation. This process guarantees that all users have access to the latest data, regardless of their location.
Ideal for read-intensive applications, FlexCache provides ideal solutions for workloads such as:
For a more comprehensive understanding of FlexCache and its capabilities, refer to the FlexCache Technical Report.
In the architecture outlined in Figure 1, Amazon WorkSpaces users spanning Region 1 and 2 require rapid access to seismic data crucial for their modelling and visualization tasks. This data, consisting of frequently updated surface resolution maps, resides within the FSx for ONTAP file system located in Region 1.
With a setup and configuration optimized according to AWS and NetApp best practices, FlexCache can significantly improve the performance of data access in a multi-regional AWS WorkSpaces environment. This walkthrough assumes that the environment in Figure 1 has been configured as per AWS best practices for VPC, network setup, and WorkSpace deployments. For detailed information about VPC design and basic environment setup, see the Appendix at the end of this blog.
The subsequent sections will delve into the creation of a FlexCache volume bridging Region 1 and 2, aimed at delivering low-latency data access to Linux users through NFS. Equivalent configurations are applicable to Windows users utilizing CIFS/SMB.
Figure 2 shows the workflow to create a FlexCache volume. The high-level steps include:
As the FlexCache volume will be created in a different file system, a cluster peer relationship needs to be created between the two file systems. Inter-cluster traffic between the two file systems must be allowed.
On the destination file system in Region 2 (FsxIdREGION2), create a peer relationship with the source file system in Region 1 (FsxIdREGION1) using the ONTAP CLI.
FsxIdREGION2::> cluster peer create -generate-passphrase -offer-expiration 2days -initial-allowed-vserver-peers *
Notice:
Passphrase: +Wv3SiXFcugb40suv9qZgnTK
Expiration Time: 10/02/2024 10:10:00 +00:00
Initial Allowed Vserver Peers: *
Intercluster LIF IP: 172.31.30.166
Peer Cluster Name: Clus_gnTK (temporary generated)
Warning: make a note of the passphrase - it cannot be displayed again.
On the source file system FsxIdREGION1, accept and authenticate the cluster peer request using the inter-cluster endpoint IP addresses of the destination file system FsxIdREGION2 depicted in Figure 3.
FsxIdREGION1::> cluster peer create -peer-addrs 172.31.30.166, 172.31.7.72
Notice: Use a generated passphrase or choose a passphrase of 8 or more
characters. To ensure the authenticity of the peering relationship, use
a phrase or sequence of characters that would be hard to guess.
Enter the passphrase: <ENTER PASSPHRASE GENERATED IN THE PREVIOUS STEP>
Confirm the passphrase: <ENTER PASSPHRASE GENERATED IN THE PREVIOUS STEP>
Notice: Clusters "FsxIdREGION1" and "FsxIdREGION2" are peered.
FsxIdREGION1::> cluster peer show
Peer Cluster Name Cluster Serial Number Availability Authentication
------------------------- --------------------- -------------- --------------
FsxIdREGION2 1-88-000000 Available ok
On the source file system FsxIdREGION1, create a Storage Virtual Machine (SVM) peer relationship between the source SVM (origin_SVM) and the FlexCache SVM (flexcache_SVM).
FsxIdREGION1::> vserver peer create -vserver origin_SVM -peer-vserver flexcache_SVM -peer-cluster FsxIdREGION2 -applications flexcache
Info: [Job 44] 'vserver peer create' job queued
Accept the SVM peering request on the destination file system FsxIdREGION2.
FsxIdREGION2::> vserver peer accept -vserver flexcache_SVM -peer-vserver origin_SVM
Info: [Job 57] 'vserver peer accept' job queued
FsxIdREGION2::> vserver peer show
Peer Peer Peering Remote
Vserver Vserver State Peer Cluster Applications Vserver
----------- ----------- ------------ ----------------- -------------- ---------
flexcache_SVM
origin_SVM peered FsxIdREGION1 flexcache origin_SVM
When sizing a FlexCache volume, a best practice is to consider the working set — the subset of data actively used during a specific task or process. To determine the optimal size for the FlexCache volume, calculate the size of the working set and add an overhead of approximately 25% to accommodate any unpredictable increases in data access patterns. If the working set size is not well-defined, a general rule of thumb is to allocate 10-15% of the origin volume size to the FlexCache volume. For further guidance on sizing FlexCache volumes, refer to the FlexCache Technical Report.
For an origin volume of 1 TB, containing a working set of surface resolution maps that totals 31 GB, calculating the ideal FlexCache volume size involves adding a 25% buffer to the working set size:
Round the FlexCache volume size to 40 GB to facilitate volume management and provisioning. This aggressive sizing strategy ensure that all read operations are primarily serviced by the cache, effectively reducing the frequency of data retrieval from the origin volume, and improving overall access speed.
FsxIdREGION2::> volume flexcache create -volume flexcache_vol -vserver flexcache_svm -origin-vserver origin_SVM -origin-volume origin_vol -size 40GB -junction-path /flexcache_vol -aggr-list aggr1 -foreground true
[Job 394] Job is queued: Create FlexCache volume flexcache_vol.
[Job 394] Job succeeded: Successful.
FsxIdREGION2::> volume flexcache show
Vserver Volume Size Origin-Vserver Origin-Volume Origin-Cluster
------- ----------- ---------- -------------- ------------- --------------
flexcache_svm
flexcache_vol
40GB origin_SVM origin_vol FsxIdREGION1
A newly created FlexCache volume will initially contain no data, which means that the first client read request will experience a performance penalty as data must be fetched from the origin volume. However, from the perspective of the client that has mounted the FlexCache volume, the volume will appear exactly like the origin volume, regardless of whether anything is cached. To mitigate this initial performance hit, it is recommended to pre-warm the cache with the necessary data, a process known as pre-population in ONTAP.
In destination file system FsxIdREGION2, run the prepopulate command at the advanced privilege level and pass the path of the directory to be prepopulated in path-list (/seismic_data).
FsxIdREGION2::> set -privilege advanced
Warning: These advanced commands are potentially dangerous; use them only when directed to do so by NetApp personnel.
Do you want to continue? {y|n}: y
FsxIdREGION2::*> volume flexcache prepopulate start -cache-vserver flexcache_svm -cache-volume flexcache_vol -path-list /seismic_data -isRecursion true
[JobId 439]: FlexCache prepopulate job queued.
FsxIdREGION2::*> job show -id 439 -instance
Job ID: 439
Owning Vserver: flexcache_svm
Name: FlexCache prepopulate job for volume "flexcache_vol" in Vserver "flexcache_svm".
Description: FLEXCACHE PREPOPULATE JOB
Priority: Medium
Node: FsxIdREGION2-02
Affinity: Cluster
Schedule: @now
Queue Time: 10/02 10:28:47
Start Time: 10/02 10:28:47
End Time: 10/02 11:05:39
Drop-dead Time: -
Restarted?: false
State: Success
Status Code: 0
Completion String: Successful.Total number of files read by FlexCache prepopulate job are "1".
Job Type: FLEXCACHE PREPOPULATE
Job Category: FLEXCACHE
UUID: abc123ab-c123-abc1-23ab-c123abc123ab
Execution Progress: -
User Name: fsxadmin
Restart Is Delayed by Module: -
To assess the performance enhancements provided by FlexCache, a comparative analysis of read operations can be conducted between the origin volume and the FlexCache volume within an AWS WorkSpaces client situated in Region 2.
Begin by mounting both the origin and FlexCache volumes on a WorkSpaces client in Region 2.
sphua@U-111:~$ mkdir origin_vol
sphua@U-111:~$ sudo mount -t nfs -o nconnect=16 10.0.4.172:/origin_vol ./origin_vol
sphua@U-111:~$ mkdir flexcache_vol
sphua@U-111:~$ sudo mount -t nfs -o nconnect=16 172.31.255.213:/flexcache_vol ./flexcache_vol
Once both volumes are successfully mounted, measure the read performance from the origin volume using the dd command.
sphua@U-111:~/origin_vol/seismic_data$ ls -lh
total 31G
-rwxrwxrwx 1 sphua domain users 31G Feb 16 2024 surface_res_map.tif
sphua@U-111:~/origin_vol/seismic_data$ dd if=/home/sphua/origin_vol/seismic_data/surface_res_map.tif of=/dev/null bs=1M
31000+0 records in
31000+0 records out
32505856000 bytes (33 GB, 30 GiB) copied, 3709 s, 8.8 MB/s
The operation's result indicates that the read operation from the origin volume averaged a rate of 8.8 MB/s, taking 3709 seconds to complete. Next, evaluate the read performance from the FlexCache volume using the same dd command.
sphua@U-111:~/flexcache_vol/seismic_data$ ls -lh
total 31G
-rwxrwxrwx 1 sphua domain users 31G Feb 16 2024 surface_res_map.tif
sphua@U-111:~/flexcache_vol/seismic_data$ dd if=/home/sphua/ flexcache_vol/seismic_data/surface_res_map.tif of=/dev/null bs=1M
31000+0 records in
31000+0 records out
32505856000 bytes (33 GB, 30 GiB) copied, 4.00398 s, 8.1 GB/s
The result shows a significant performance increase, with the pre-populated FlexCache volume achieving an average read rate of 8.1 GB/s and completing in just 4 seconds. This represents a dramatic improvement of 99.9% in read speed compared to the origin volume, highlighting the potential efficiency gains achievable with FlexCache. However, it is crucial to note that actual performance may vary based on factors such as network conditions, file size, and available system resources.
In summary, FlexCache is a powerful tool within Amazon FSx for NetApp ONTAP that tackles the complex challenges of data locality by delivering data closer to users with enhanced throughput. Through strategic data caching, organizations can overcome geographical constraints, boosting productivity for dispersed teams and remote users significantly.
Embrace the power of FlexCache and witness a transformative uplift in your organization's operational efficiency. Start by initiating a pilot program today, and let the compelling results speak for themselves. The advantages outlined in this blog post are just the beginning of what FlexCache can offer.
I hope this exploration into FlexCache technology has been informative and helpful. The FlexCache Technical Report is the definitive source for the technical intricacies and best practices of FlexCache. If you have any questions or require additional guidance on optimizing your FlexCache deployment, don't hesitate to reach out for professional support.
Designing a VPC for a multi-regional WorkSpaces and FSx for ONTAP environment involves considering many factors, such as the location of existing infrastructure, data, applications, and users. Refer to this whitepaper on the best practices for deploying Amazon WorkSpaces.
Figure 1 shows a possible multi-regional WorkSpaces and FSx for ONTAP environment implemented across two AWS Regions, four Availability Zones (AZ), and an existing on-premises network. This design benefits a geographically dispersed user base, as users can connect to their nearest WorkSpace Region. This setup allows users to connect using lower latency network connections than they would have if WorkSpaces were implemented within a single global Region.
The corporate data center hosts the Active Directory Domain Controller and a proxy server, ensuring global internet connectivity for all WorkSpaces. AWS Transit Gateway service is used to link all the geographically dispersed VPCs and on-premises network into a single network. This approach also allows the shared use of a common AWS Direct Connect connection from AWS back to the on-premises network across all Regions and the implementation of a Site-to-Site VPN as a redundant connection in the event of loss of connectivity when using AWS Direct Connect. The use of an AWS Transit Gateway addresses the transitive routing limitation imposed when using VPC peering and simplifies the addition of new regions into this environment.
To implement the environment as shown in Figure 1, complete the following high-level steps:
Shaun Phua is a Cloud Solutions Engineer at NetApp specializing in Amazon FSx for NetApp ONTAP file services. In this role, Shaun collaborates with customers to design and build secure, scalable, hybrid cloud storage solutions across their on-premises datacenters and the cloud. Outside of work, Shaun loves travelling and trying new foods 🥘 and drinks 🍻! |