Insight 2016 Day 3 – Customers Discuss Self-Service Storage, HPC, and HCI

InsightBy Larry Freeman, Senior Technologist, NetApp

 

What do students in Iowa have in common with scientists in Australia and public officials in Richmond VA? Their lives are all made better by IT teams who understand the value of data and how important it is to each person. Following the same topic as Wednesday's post, in today’s Insight 2016 sessions I continued to listen to our customer’s stories, and learn more about the reasons they chose NetApp over other vendor’s data storage products.  So, here are three more use cases from today’s sessions:

 

University of Iowa

Kevin Keyser, System Architect at the University of Iowa, explained how this medical research facility with 14,000 employees, 36,000 students, and over 500 medical researchers migrated to a self-service file recovery service for over 100,000 user home directories. Moving to a self-service model reduced helpdesk tickets by 80%, and reduced the size of the university’s large Tivoli Storage Manager (TSM) tape-based backup and recovery environment.  The move to a self-service file restore model is particularly important when you consider that the university supports 7,000 new users (i.e. students) each fall, while retaining the files of exiting students for another 3-4 years.

 

Kevin explained that the self-service restore model utilizes Microsoft Volume Shadow Services (VSS) in conjunction with background NetApp snapshot copies.  To access files for recovery, users simply right-click on their home drive and select “Previous Versions” from the dropdown menu. NetApp supports VSS application-consistent shadow copies on both SAN and NAS aggregates, and is often used in conjunction with SnapManager for Hyper-V.  More information on integrating NetApp Snapshot copies with VSS can be found here.

 

National Computational Infrastructure (NCI)

NCI, Australia’s national research computing service, supports 3,000 researchers and 600+ projects in a variety of scientific research areas.  During another of today’s Insight sessions, Daniel Rodwell, Manager of Data Services at NCI, explained how his team built an 8PB Lustre file system using E-Series and EF-Series storage arrays from NetApp.

 

NCI utilizes a Fujitsu Raijin 1.2 petaflop supercomputer running several Lustre file systems. The latest file system to move into production, gdata3, was the main topic of Daniel’s session.

As Daniel described, there were many requirements put forth by NCI for their gdata3 storage environment, including: 120GB/sec sustained reads, 80GB/sec sustained writes, 99.99 system availability, fully redundant HA controller pairs, and the ability to scale beyond 10PB, all of which were fully met by the NetApp storage arrays.

 

Now in full production, the gdata3 file system utilizes multiple E5600s for storage of persistent data, and a single EF-550 all-flash array for storage of file system metadata. Both systems are configured with 40Gb Infiniband – the standard inter-node and storage connectivity used throughout NCI. Interestingly, Brendon Eliot of NetApp Australia noted that prior to NIC, NetApp had 0% experience with High-Performance Computing (HPC) in Australia, but had a 100% commitment to make this project successful, leveraging NetApp’s expertise in HPC with customers such as Lawrence Livermore National Labs and UCLA. For more information on NetApp HPC solutions, visit this page.

  

County of Henrico

Anchored in Richmond, VA, the county of Henrico is home to a population of around 300,000 people. Brian Viscuso, IT Manager of Systems Engineering for the county, discussed his transition to a NetApp FlexPod architecture and the benefits realized.

 

As a background, within the county’s IT infrastructure, networking was becoming cumbersome, with multiple hops taking TCP/IP packets on too-long routes between server and storage.  The county did not use any block-based SAN storage, but rather, all their storage was served via network-attached storage.

 

Knowing there needed to be a change, Brian and his team investigated hyper-converged storage (HCI) but quickly discontinued this when they discovered that in order to scale HCI systems, they would have to sacrifice compute for storage or vice versa, creating unfavorable application software licensing costs.

 

Instead, Brian and his team leveraged their existing investment NetApp FAS storage and Cisco Nexus networking gear, and replaced their Dell servers with Cisco UCS servers.  Leverage their existing environment was key, as Henrico County now had the basis for a true FlexPod infrastructure, and they could leverage published best practices with a validated, converged architecture.  In addition, with FlexPod, they could add compute, networking, or storage resources independently and without paying application software penalties or sinking costs into resources that were not needed.

 

The results – fewer hops between storage and servers, 40%-50% improvement in latency, and county web site server response times that dropped from an average 1.8-2 seconds to 550 milliseconds.  On average, external page load times dropped by about 1 second.  More importantly, “featured” pages with rich images that formerly loaded after an average 2.6 sec now loaded in just 86 milliseconds.

 

The reduced latencies were largely a result of putting servers and storage on the same networking plane, which enabled the use of jumbo frames.  A side benefit was the reduction in ESX footprint from 22 to 12 servers.  With this reduction of 20 CPU sockets, Henrico county saw substantial savings in software license fees, and also removed 5 tons of HVAC requirements out of the data center.  For more information, here is a list of validated FlexPod architectures.

 

For more information on NetApp Insight 2016, select the link below to listen to our daily recap Podcast.