In February 2010, an epic snowstorm struck Washington, D.C., and the surrounding area, virtually shutting down the U.S. capital for four days. The U.S. government lost nearly $70 million a day due to lost productivity. While this number is staggering, it could have been as large as $100 million per day. Fortunately, 30% of government workers were able to telecommute, allowing them to continue working during the storm.
Beginning in 2001, the U.S. Congress began passing laws that mandated telework solutions for federal employees as a way to reduce gridlock on Washington, D.C., roadways. It quickly became apparent that teleworking was a way to increase productivity, reduce requirements for office space, increase hiring flexibility, and decrease carbon footprint.
A 2005 statute requires the Department of Commerce to provide a telework solution for every worker who qualifies. My company, Project Performance Corporation, was awarded the contract by one large agency within the Department of Commerce to help meet the telework mandate. Today, the solution we put in place with help from our partners VMware and NetApp supports over 3,000 teleworkers. Based in part on our efforts, this agency has received a number of awards from The Telework Exchange—a public-private partnership—including a 2009 award for best use of innovation and technology.
In this article, I want to describe how we were able to achieve these results using a virtual desktop infrastructure (VDI) solution from VMware in conjunction with NetApp® storage and data management.
Challenges and Requirements
The agency we were working for presented some unique challenges and requirements. For example, the workers have their own union with specific service-level agreements (SLAs) in place. This meant that the solution for each user had to look and feel like the user’s existing desktop and, if there was degradation in performance, SLAs would be violated. This also meant that network performance was critical and that we had to take extra care to ensure that bandwidth and port counts were adequate to support the expected number of simultaneous users.
Because of the requirement to look and feel like the existing environment, there were initially many desktop baselines. We could not simply pick a single baseline configuration to clone or use to build each desktop. Some existing baselines contained legacy applications with hard-coded references to the desktop C:\ drive, which meant that a significant amount of additional storage had to be allocated for each C:\ drive.
Finally, in addition to providing the same look and feel, we had to provide collaboration tools to make it easy for remote users to work together, as well as offer adequate support and training.
We initially considered four possible solution alternatives:
Figure 1) Scorecard showing the relative ranking of various approaches versus requirements.
Because of the aggressive mandate for getting a solution in place, we ultimately decided on a two-stage rollout. We started with rack and stack as a cheap and easy way to get the process started—despite the complications it created in terms of server room space, extra network infrastructure, and other similar requirements.
Ultimately, we converted to a full VDI implementation using VMware and NetApp. This required additional training as well as some infrastructure changes, but was much better able to address the full project requirements. Because PPC already had substantial experience with VDI deployments on VMware and NetApp, we were able to quickly create a complete lifecycle management plan that addressed all of the agency’s specific requirements.
NetApp was selected as the storage solution for a variety of reasons, including:
The current VDI solution architecture is illustrated in Figure 2. (Because this project began in 2005, the infrastructure has evolved over time from VMware ESX 2.x to the current ESX 3.x.)
Figure 2) VDI configuration. Two NetApp systems provide VMware virtual desktop storage via SAN and home directory access via NAS. VMotion™ allows individual virtual desktops to be transparently moved between ESX servers.
We currently use 16 VMware ESX servers in a “farm,” each supporting 14 desktops, for a total of 224 desktops per farm. We deploy multiple farms to support the required number of concurrent teleworkers. VMotion allows us to transparently move running desktops between ESX servers as needed.
The two NetApp systems illustrated in Figure 2 are actually shared by all the active farms. In other words, two storage systems support the entire environment with SAN storage for use by VMware and its virtual desktops as well as CIFS storage for home directory access. Although we currently have access partitioned, such that one NetApp system supports Fibre Channel SAN and another supports CIFS (which is not required—a single system can do both, if desired), there is still a significant management advantage to being able to support both types of storage access on a single platform rather than using two different platforms to meet the storage need.
We recently completed an upgrade in which we replaced NetApp FAS980 systems with NetApp FAS6080 clusters. This prepared the infrastructure to scale beyond 3,000 users in the future.
More details of the back-end storage configuration are shown in Figure 3.
Figure 3) Storage details. Disk-to-disk backups are performed to secondary storage at the same site. Deduplication is used to reduce capacity required on secondary storage. For DR, backups are replicated to a NetApp V-Series system front-ending an IBM DS4000 storage array.
Continuous operation is another mandate that the solution must meet. We currently perform disk-to-disk backups using NetApp SnapVault® software between our primary storage systems and secondary storage. We run NetApp deduplication on secondary storage, which reduces total backup storage requirements by 80%. We then replicate this secondary storage system to a NetApp V-Series system at our DR site that is front-ending an IBM DS4000. (NetApp V-Series makes the full suite of advanced NetApp data management capabilities available on your existing third-party storage.) Because the source storage for replication is deduplicated, the DR site sees the same level of storage savings and required WAN bandwidth is substantially reduced.
We have also added NetApp Performance Acceleration Modules (PAMS). These intelligent caches improve the end-user experience, accelerate backup and antivirus scans, and make our infrastructure more resistant to boot storms. Learn more in TR-3705.
This solution has been extremely successful and is set to scale beyond the 3,000-user requirement that was initially established. Teleworkers work an average of four days per week from home. To encourage adoption, the agency initially offered incentives to eligible employees. Today approximately 80% of eligible workers in the agency choose telework.
Despite this success, we continue to look for ways to improve the resiliency, performance, and efficiency of the solution. Important initiatives include efforts to improve provisioning by using NetApp Rapid Cloning Utility (RCU) to efficiently clone new virtual desktops. This approach can dramatically reduce the storage required for thousands of copies of the same desktop operating system. Enabling NetApp deduplication on our primary storage systems will further boost overall storage efficiency and reduce the amount of primary storage needed. We are also considering replacing our current Fibre Channel environment with NetApp NFS. This would not only eliminate the need to maintain a separate Fibre Channel infrastructure, it would also streamline management by making it easy to expand or shrink storage volumes and could possibly allow more virtual desktops per ESX server. Our ultimate goal is to evolve the infrastructure to a full cloud model in which desktops are provided as a service and the user neither knows, nor cares, where his or her desktop is coming from.
Because of the success of this program and others like it, the U.S. Congress recently approved increased funding for the government’s telework initiative, so it is clear that all telework programs, including this one, will continue to expand.
Got opinions about supporting telecommuters with VDI?Ask questions, exchange ideas, and share your thoughts online in NetApp Communities.
This NetApp Community is public and open website that is indexed by search engines such as Google. Participation in the NetApp Community is voluntary. All content posted on the NetApp Community is publicly viewable and available. This includes the rich text editor which is not encrypted for https.