The NetApp Kilo-Client 3G

by NetApp Staff on ‎2010-03-04 10:08 AM

Since 2006, Tech OnTap has been chronicling the evolution of the NetApp Kilo-Client—NetApp’s large-scale engineering test environment. For this article, Tech OnTap asked Brad Flanary of the NetApp RTP Engineering Support Systems team to describe the goals and technologies behind the next planned iteration of this important and innovative facility. [Tech OnTap editor.]

The NetApp® Kilo-Client is a test environment that  allows NetApp to quickly configure and boot a large number of physical and/or  virtual clients to run tests against NetApp storage hardware and software. The  first iteration of the Kilo-Client was deployed in 2005 (as described  in an early TOT article). That iteration initially offered  1,120 physical clients that booted over iSCSI instead of from local disk.

By mid-2007, the Kilo-Client had evolved to include 1,700  physical clients that could boot over iSCSI, FC, or NFS and could be deployed  as physical clients running Windows® or Linux® or in  virtualized VMware® environments. A Tech OnTap article that  appeared at that time  focused on the techniques we used to rapidly provision  physical servers and virtual environments using NetApp FlexClone®  and other NetApp technologies.

This configuration has served NetApp well (a few more servers  have been added since the last article was published to support heavy  virtualization) but now, almost three years later—with the lease on the  original server equipment due to expire—it’s time to evolve the configuration  once again to keep up with the latest technology and cloud computing  developments.

This article focuses on the third-generation Kilo-Client design,  which when built will allow us to:

  • Perform tests using up to 75,000 virtual clients at one time  (making the name Kilo-Client increasingly inaccurate).
  • Test a broader range of network configurations including  10-Gigabit Ethernet and Fibre Channel over Ethernet (FCoE).
  • Deploy hundreds or thousands of clients in a matter of hours.

We’ll begin by describing the new requirements we faced, talk  about hardware evaluation, and then describe the design of Kilo-Client 3G,  which will go live in the first half of this year. We’ll also discuss the  unique design of the NetApp data center facility where the Kilo-Client is  housed.

Gathering Requirements

Based on meetings with our internal customers as well as  requests that the current configuration is unable to meet, we began to form an  idea of what was needed in the next-generation Kilo-Client. However, to be  certain, we started the refresh process with a detailed survey of our existing  internal customers plus other potential Kilo-Client users within NetApp. You  can see the survey we used by clicking through to the full document shown in  Figure 1. (You’ll notice that some questions are targeted toward virtualization  because we specifically wanted to learn whether our customer needs could be met  by virtual rather than physical clients.)

Kilo-Client survey and results.

Figure 1) Kilo-Client survey and results.

Major findings included:

  • Most of our customers could be serviced by virtual rather than  physical hardware.

  • There was a high demand for 10-Gigabit Ethernet.

  • There was demand for FCoE in the near future (since the survey was held a number of months ago, that demand is arriving now).

This survey process was extremely valuable. It confirmed our suspicion that most of our customers could be serviced with virtual rather than  physical hardware. This is obviously consistent with the current move in the IT  industry toward increased virtualization and cloud computing. It’s also  consistent with a recent drive toward more server virtualization within NetApp.  (A Tech OnTap article from April 2009  described the physical-to-virtual migration at the NetApp engineering lab in  Bangalore, India.)

Evaluating Hardware

With a sense of our requirements for the new Kilo-Client, our  next step was to start evaluating server hardware. We sent out an RFP to a number  of server vendors to get products for evaluation. Our testing process focused  on several things:

  • Ability to support converged network adapters (CNAs) capable of  supporting both FCoE and 10GbE (see this recent Tech OnTap article for  more on CNAs)
  • Support for virtualization
  • Performance
  • Ability to scale up and down as required

We evaluated all servers in terms of the performance they could  deliver from a CNA and how well they supported virtual machines at large scale  as well as how well they ran a battery of standard benchmarks.

We quickly discovered that for our needs, servers based on Intel®  Nehalem-microarchitecture processors dramatically outperformed the older, Intel  Core™ microarchitecture processors (Dunnington). The two server models we chose  both use Nehalem processors.

On the network side, we recently deployed a Cisco Nexus  infrastructure in our new Global Dynamic Laboratory (GDL). That network  infrastructure will continue to be used to meet the FCoE and IP needs of the  Kilo-Client. Brocade switching will be used for Fibre Channel.

The Planned Kilo-Client 3G Deployment

Servers:

  • 468  Fujitsu RX200 S5, 48GB, 2CPUs: 2.26 GHZ Intel Xeon E5520 processor (Nehalem); (these  are 4-core, 8-threaded processors delivering 8 cores and 16 threads per system)
  • 160 Cisco UCS servers (same processor configuration as Fujitsu):
    • 48 with 48GB memory
    • 112 with 24GB memory

In total, this will deliver 628 clients with 5,024 cores. These  will replace three pods of the original Kilo-Client or 728 physical clients with  1,456 cores. These clients can all run as virtual servers primarily or be  deployed as physical clients. At a possible density of 120 VMs per physical  server, we will be able to deliver up to 75,360 VMs from the Kilo-Client.

The remaining approximately 1,000 clients from the  previous-generation Kilo-Client will remain in place and continue to be used  for testing. They will be phased out and returned as they come off lease.

Networking:

  • Core: Nexus 7018 (16 I/O modules, backplane scalable to 15Tbps)
  • Aggregation: Nexus 5010 and 5020
  • Access: Nexus 2148T (FEX)
  • Fibre Channel: Brocade DCX Director and 5320 Edge switches

Storage:

  • FC Boot: 4 NetApp FAS 3170 storage systems
  • NFS Boot: 16 NetApp FAS 3170 storage systems
  • Other storage: complete selection of the latest NetApp storage  platforms and disks

We typically boot 500 VMs per NFS datastore. We use SnapMirror®  to replicate golden images from a central repository to each boot storage  system as needed.

Booting Physical Hardware and Virtual  Machines

The real key to the Kilo-Client is its ability to perform fast,  flexible, and space-efficient booting. As in any cloud infrastructure, we have  to be able to quickly repurpose any number of clients for any task—physical or  virtual. The Kilo-Client uses a combination of FC and FCoE boot to boot each  physical server and NFS boot to support virtual machines booting on servers  configured to run virtualization.

We chose FC boot for physical booting because it has proven very  reliable in the existing Kilo-Client infrastructure. In most large server  installations, a physical server boots the same boot image every time. It might  boot Linux or Windows in a physical environment or VMware ESX in a virtual one,  but it’s always the same. That’s not the case for the Kilo-Client. One of our  servers might boot Linux one day, VMware the next day, and Windows the day  after that. We use FC boot in combination with our dynamic LUN cloning  capability to rapidly and efficiently boot our physical and virtual servers.

As described in previous articles, we maintain a set of "golden"  boot images (as Fibre Channel LUNs) for each operating system and application  stack we use. Using NetApp SnapMirror® and FlexClone, we can quickly  reproduce hundreds of clones for each physical server being configured for a  test. Only host-specific "personalization" needs to be added to the  core image for each provisioned server. This unique approach gives us  near-instantaneous image provisioning with a near-zero footprint.

The process of booting virtual machines builds on the same steps:

  • Boot VMware ESX on each host for the test.
  • Register those hosts dynamically in VMware Virtual Center  (vCenter™).
  • Prepare the correct network settings and datastores for virtual  machines.
  • Use the NetApp Rapid Cloning Utility (RCU)  to  clone the appropriate number and types of virtual machines. RCU registers the  VMs in vCenter automatically.
  • Dynamically register the servers in DNS and DHCP and boot the virtual  machines.
  • Check to make sure everything is correct.

Complete  Automation. Over the past several years we’ve created Perl scripts that work  in conjunction with NetApp and VMware tools to automate the steps above such  that we can routinely deploy 500 to 1,000 virtual machines in 2 to 3 hours.  (This includes both the physical booting process and the VM booting process.  This is different than some of the other deployments described in Tech OnTap in  which time to deployment is based on servers already running VMware.)

Maximum  Space Efficiency. The other unique piece of the process is that because we use  FlexClone to clone “golden images” rather than make copies, very little storage  is required. We routinely deploy 500 virtual machines using just 500GB of  storage space (1GB per client) and can use even less space if necessary.

With the new infrastructure, we’ll be able to configure up to  75,000 virtual machines for very large tests. Once we have all the new hardware  in place, we’ll be able to report how quickly this can be done. We should note that,  in general, the clients that make up the Kilo-Client are carved up into  multiple smaller pieces all doing testing in parallel.

Physical  Layout. The previous-generation Kilo-Client design was based on “pods”  that colocated servers, networking, and boot storage. This approach made sense  in a design in which hardware was in close proximity and manual setup and  teardown might be required.

We’ve rethought and reengineered the pod approach for the new  Kilo-Client. The new design concentrates all boot infrastructures in one  location. Servers and storage systems will now be grouped into pods that  include just the necessary switching (IP and FC) to meet the needs of the pod.  This will make the pods easy to replicate and it will be easy to grow and scale  the Kilo-Client in any dimension by adding another pod of that type. (In other  words, we can add a pod of servers or a pod of storage, etc.) Since manual  setup and teardown are no longer required (or desired), news pods can and will  be deployed anywhere in the data center as more space is needed, so that the  data center itself operates with maximum efficiency.

Our Global Dynamic Laboratory

The Kilo-Client is physically located in the NetApp Global  Dynamic Laboratory, an innovative new data center located at the NetApp  facility in Research Triangle Park, North Carolina. The Kilo-Client will be  part of NetApp Engineering’s Shared Test Initiative (STI), which will provide  multiple test beds and will focus heavily on  automation for deployment, test execution, and results gathering. STI will help  bridge these resources so that we can do dynamic sharing between all resources  in our labs.

The GDL was designed with efficiency and automation in mind. It  includes 36 cold rooms, each with approximately 60 cabinets, for a total of  2,136 racks.

Critical design elements for a modern data center   such as GDL include:

  • How much power you can deliver per rack—today’s hardware  consumes more power from a smaller footprint
  • How much space you need per rack to provide adequate cooling
  • How efficiently you can use power—the current benchmark for  power efficiency is a power usage effectiveness—PUE—of 2.0

For GDL, power and cooling distribution is based on 12 kW per  rack on average, for a total of 720 kW per cold room. The power distribution  within a rack is 42 kW. Using our proprietary pressure-control technology, we  are able to cool up to 42 kW in a cabinet or have any combination of loads as  long as the total cooling load in a cold room does not exceed 720 kW.

GDL uses a combination of technologies to run at maximum power  efficiency, including:

  • Outside air is used for cooling whenever possible
  • Pressure-controlled cooling limits energy used by fans and pumps
  • Elevated air temperatures (70–80 degrees F versus the typical  50–60 degrees) and chilled water temps
  • Reclaim waste heat for offices, etc.

These and other techniques allow the GDL to achieve an  annualized PUE estimated at about 1.2. This translates into an operating  savings for the GDL of over $7 million per year versus operating at a PUE of  2.0 and a corresponding avoidance of 93,000 tons of CO2. You can  learn more about the NetApp approach to data center efficiency in a recent  white paper.

Conclusion

The next-generation NetApp Kilo-Client will take full advantage  of the latest server hardware, networking technology, and NetApp storage  hardware and software to create a flexible, automated test bed for tests that  require a large number of virtual or physical clients. When completed, the  Kilo-Client will be able to deliver 75,000+ virtual clients and be able to take  advantage of Gigabit Ethernet, 10-Gigabit Ethernet, Fibre Channel, or FCoE—all  end to end.

While the next-generation Kilo-Client will greatly expand the  capabilities of the existing version, ultimately it will reduce the physical  server count.

Got opinions about the NetApp Kilo-Client?
Ask questions, exchange ideas, and share your thoughts online in NetApp Communities.

Author Alt Text

Brad Flanary
Engineering Systems Manager
NetApp

Brad joined NetApp in 2006 and currently leads a team of six engineers responsible for the NetApp Dynamic Data Center, the RTP Engineering Data Center, and NetApp’s global engineering lab networks. Prior to joining NetApp, Brad spent almost seven years at Cisco Systems as a LAN switching specialist. In total, he has over 13 years of experience in large-scale LAN and data center design.


Author Alt Text

The Kilo-Client Team
NetApp

The Engineering Support Systems team is made up of Brandon Agee, John Haas, Aaron Carter, Greg Cox, Eric Johnston and Jonathan Davis.

Explore

Warning!

This NetApp Community is public and open website that is indexed by search engines such as Google. Participation in the NetApp Community is voluntary. All content posted on the NetApp Community is publicly viewable and available. This includes the rich text editor which is not encrypted for https.

In accordance to our Code of Conduct and Community Terms of Use DO NOT post or attach the following:

  • Software files (compressed or uncompressed)
  • Files that require an End User License Agreement (EULA)
  • Confidential information
  • Personal data you do not want publicly available
  • Another’s personally identifiable information
  • Copyrighted materials without the permission of the copyright owner

Files and content that do not abide by the Community Terms of Use or Code of Conduct will be removed. Continued non-compliance may result in NetApp Community account restrictions or termination.