At PeakColo, we specialize in providing turnkey cloud infrastructure designed specifically for value-added resellers (VARs) and managed service providers (MSPs), including such industry leaders as Sayers, BitRefinery, Data Fortress, Parsec Data Management, and Lewan.
Our premier WhiteCloud infrastructure as a service (IaaS) offering lets a VAR or MSP become a fully branded VMware vCloud® services provider in a matter of hours without the upfront capital investment that would otherwise be required.
Over the last three years, we’ve experienced 100% growth every year. What fuel this growth, and what make our services attractive to our customers, are the tremendous performance, availability, and feature set that we deliver.
By building our infrastructure on NetApp® Data ONTAP® 8 Cluster-Mode, we’ve been able to:
- Deliver thousands of IOPS of storage performance—whether you need 1TB or 1PB
- Achieve 100% SLAs
- Deliver advanced capabilities such as deduplication, compression, thin provisioning, and replication
- Create a secure, multi-tenant environment
In this article I’ll describe how we built an agile IT infrastructure—compute, networks, and storage—that addresses our unique challenges and explain how that infrastructure delivers tremendous advantages to our customers and to us. I’ll also talk about new technologies such as Flash Pool, Infinite Volume, and parallel Network File System (pNFS) that we’re excited about deploying in the future. I think you’ll find that you can benefit from many of the same technologies we use, whether you deploy them in your own data centers or contract for them in the cloud.
For a typical IT infrastructure, planning downtime for hardware maintenance can be extremely difficult; when you have hundreds of tenants sharing the same infrastructure, it becomes impossible. In the first two iterations of our infrastructure—before NetApp and especially Data ONTAP 8 were introduced—our SANs were purely a physical construct. Even though we had multiple storage systems, a problem that required maintenance on a particular storage system inevitably affected multiple customers. That made achieving a 100% SLA impossible.
From the standpoint of provisioning, a customer might ask for a new chunk of storage—for example, .5PB or 1PB—and we had no way to bring that online without disrupting the customer's ongoing operations.
We were also limited in our ability to share a SAN among multiple customers. iSCSI was our protocol of choice, but we couldn’t do that in a secure, multi-tenant fashion. We had to control access to physical hosts, which limited the types of service we could offer and also limited the value we could pass on to our reseller partners. They themselves needed the ability to support multi-tenancy for their own customers and a way to make sure of compliance for regulations such as HIPAA and standards such as PCI.
Finally, it’s the nature of our business that we never have more than limited visibility into the demands of the various workloads running on our infrastructure at any given time. We really needed an infrastructure that could elastically adapt to spikes in demand from different workloads—for example, virtual desktop infrastructure (VDI) boot storms—and that would also allow us to load balance workloads to accommodate longer term trends.
Design of Our Cloud Architecture
We currently have five Type-II SSAE 16/SOC 1 data centers: four in the United States and one in the United Kingdom. Our cloud architecture is designed throughout with bandwidth and redundancy as top priorities. Key components include:
- Blended Internet connections for performance and reliability
- Carrier-class Brocade CER and VDX networking components
- Servers based on the Open Compute Platform
- NetApp FAS3200 series storage systems running Data ONTAP 8.1.1 and operating in Cluster-Mode
Figure 1) Overview of PeakColo architecture.
All our data centers use a blended Internet access approach. We load balance across the top 16 Internet carriers to increase flexibility, reduce costs, and deliver the best possible performance and reliability. We typically achieve latencies of 40 milliseconds or less in the continental United States.
We use carrier-class Brocade VDX and CER networking components in our networks:
- Brocade CER 1U routers let us distribute Internet access across three or four carriers.
- Brocade VDX switches allow us to constantly expand our networks without worrying about the limitations of older spanning tree protocols.
The needs of a cloud provider routinely exceed the limits of enterprise-class hardware. We chose Brocade because it gives us great modularity and scalability and a roadmap for the future.
The recent acquisition of Nicira by VMware underscores the important role that software-defined networking (SDN) is likely to play in the future. Brocade’s commitment to the OpenFlow protocol gives us confidence that we’ll be able to fully leverage SDN as the standard matures.
PeakColo extends the idea of SDN a step further with our patent-pending Layer 2 process that we use to cross-connect customers’ Layer 2 resources into our cloud environment. Use cases for this are hybrid cloud deployments where an enterprise might want to keep its existing firewall, AS400, legacy storage and tape, or other physical resources, but also leverage cloud components and services from PeakColo.
I discuss the network technologies we use in more detail in a recent interview.
On the server side, we use our own Open Compute Platform servers, each with dual 10 Gigabit Ethernet (10GbE) network interface cards (NICs). The NICs are active-active and carry both user and data traffic. We use cross-fabric link aggregation (LAG) to provide load balancing and eliminate single points of failure. In total, we have about 2,500 such servers in our five data centers right now.
We chose Open Compute because it lets us source servers from multiple vendors to fulfill our procurement cycles and not get locked into a single vendor. We are able to purchase servers with the same defined set of components, parts, and drivers and have them shipped preinstalled with VMware®. Open Compute platforms are supported by VMware and everybody else, so we know that when we deploy new servers, they are going to work just like the existing ones without any surprises.
For storage, PeakColo uses NetApp FAS3240 and NetApp FAS3270 systems exclusively. These are configured in clusters of four nodes using Data ONTAP 8 Cluster-Mode. We currently have two NetApp clusters deployed and will be deploying two more very soon. If you aren’t familiar with Cluster-Mode, you can read more about it in an article from last month’s issue of Tech OnTap®. There are also an article on Cluster-Mode block performance in this issue and a recent article on Cluster-Mode NAS performance and scaling.
We chose Data ONTAP 8 Cluster-Mode because no other storage technology out there came close to delivering the same level of scalability, flexibility, performance, and features. NetApp marketing describes this architecture as intelligent, immortal, and infinite. While those sound like pretty bold claims, the technology delivers for us. It’s intelligent in terms of storage efficiency, and near infinite in terms of scaling. Support for nondisruptive operations allows us to meet 100% SLAs, and we think it will also let us get much more life out of each storage system we deploy. I’ll talk more about this later.
Each of our customers maps to a separate Vserver on a storage cluster; this is the key to our multi-tenant environment and enables many of our key capabilities. A Vserver is a secure, virtualized storage container that includes its own administration, security, IP addresses, and namespace. A Vserver can include volumes on multiple nodes in the cluster and is not tied to any particular node. We can move Vservers as necessary to do maintenance or rebalance load without disrupting workloads running on those Vservers.
Figure 2) Data ONTAP 8 Cluster-Mode uses Vservers to provide multi-tenancy and enable nondisruptive operations (NDO).
Each of our clusters has a mix of SSD, SAS, and SATA disks, and all nodes have Flash Cache. Our customers contract for the amounts and types of storage they need from each tier. We use six 10GbE connections per storage system (including redundant cluster interconnects) to deliver the necessary connectivity and throughput.
Figure 3) NetApp connectivity in the PeakColo architecture uses six 10GbE connections per storage system.
As part of its vCloud initiative, VMware established the VMware Service Provider Program (VSPP) as a framework to allow service providers like us to consume and offer VMware virtualization solutions in a way that aligns with our business models. We are a premier-level VSPP partner.
Our WhiteCloud service can deliver a fully branded and dedicated solution based on vCloud Director to our customers. We can also provide other virtualization platforms such as Hyper-V™ and Citrix XenServer or a mix of physical and virtual servers. We are able to do this because all servers (physical and virtual) connect back to a Vserver and a dedicated VLAN on a NetApp cluster, providing the necessary multi-tenancy support. Because of the performance we deliver, we have many customers that use us to support VDI solutions such as XenDesktop.
Operational Advantages for PeakColo
Using Data ONTAP 8 Cluster-Mode as the foundation of our architecture helped us address our infrastructure challenges and gives us significant operational advantages.
The ability to perform critical operations on our NetApp clusters without disrupting users is critical to our ability to deliver on our 100% SLA to our customers. Maintenance activities such as firmware and software upgrades and hardware upgrades and replacements can be performed by moving active Vservers off of a cluster node prior to performing a given operation so that customers are not disrupted. This is done in a round-robin fashion in cases when all nodes require upgrading. For storage provisioning, we can bring new storage online without disruption and transparently migrate a customer’s data to that new storage.
The same ability to move active Vservers also provides a convenient way of doing load balancing. OnCommand® System Manager makes it easy for our administrators to see what’s happening across all Vservers in order to make load-balancing decisions.
Multi-Tenancy, Feature Pass-Through, and Delegated Management
Because multi-tenancy is built into Cluster-Mode through the ability to create Vservers, we are able to better share infrastructure between our customers for greater infrastructure efficiency without sacrificing customer isolation. In addition, we can delegate the management of a Vserver to a customer (if they want) and pass control of the full NetApp feature set—including deduplication, compression, thin provisioning, backup, replication, and more—through to them.
Since our customers are service providers themselves, this is important. We can deliver true IaaS to them—giving them full management control over their infrastructure instead of just managed IaaS where we retain control over most of the bells and whistles.
Many of our customers are NetApp VARs themselves, so they already know how to manage NetApp storage and understand the value of the NetApp feature set, and we provide some pretty intensive training for new customers to make sure they understand how and when to take advantage of NetApp features. We can see that deduplication is enabled on all NetApp volumes (except for a few volumes containing geospatial data, where compression was a bigger space saver) for a total space savings of about 70%. Those space savings translate to significant cost savings for PeakColo and our customers.
Preserving Existing Investments
Most scale-out storage uses specialized building blocks. Another great thing about Cluster-Mode is that it uses the same building blocks as Data ONTAP 7 and Data ONTAP 8, 7-Mode. We already had a number of systems running 7-Mode, which could be repurposed and used in our Cluster-Mode clusters. We accomplished this by migrating the data from the 7-Mode systems to an existing cluster using tools such as VMware Storage Motion and then joining the hardware to the cluster. This means that if you’re not ready for NetApp clustering today, you can start with NetApp 7-Mode and convert to Cluster-Mode when you need it.
Keeping Storage Longer
As a service provider, we want to get the longest possible life and the maximum utilization from our infrastructure investments. Customer performance requirements, however, traditionally drive a fairly rapid upgrade cycle in which you have to replace storage systems every two to three years.
Cluster-Mode will allow us to hold onto storage hardware longer. NetApp clusters don’t have to be built from identical building blocks; cluster nodes can be heterogeneous. This means we’ll be able to add the latest generation of storage nodes to our existing clusters as we need them. Then we can migrate Vservers that need the highest performance to the new nodes, while retaining the older hardware in the cluster as another tier of storage that we can offer our customers. We anticipate holding onto storage systems five to seven years.
Advantages for PeakColo Customers
Using our cloud infrastructure, PeakColo can create a fully branded vCloud Director solution with a virtual SAN and 10TB to 500TB of storage in just four to eight hours. We think that gives a customer a lot of market value in a very short time. Some of our VAR and MSP customers contract for a single Vserver, which they share among their own customers. Others that have customers that need, for instance, HIPAA or PCI compliance might have a Vserver for each individual customer. Service providers that want to offer disaster recovery services contract for Vservers in multiple PeakColo data centers.
Possibly the biggest advantage that we deliver to our customers is the full value of NetApp cluster performance combined with the NetApp feature set. Even for a customer with a modest requirement—say, just 10TB of storage—we are able to deliver dramatic I/O performance as well as deduplication and other storage efficiency technologies that actually reduce the amount of storage for which they have to pay. The also get the full advantages of NetApp Snapshot™ and all the other NetApp data management and data protection features. It’s as if they purchased a multinode NetApp cluster of their own.
Our ability to deliver a high level of I/O performance is a key differentiator for us. We’ve never lost a prospective customer after they do testing or a proof of concept with us; the clear performance advantage we provide just blows people away.
Data ONTAP 8.1.1 includes several new technologies that we’re excited about and currently investigating for future deployment.
Infinite Volume and pNFS
NetApp Infinite Volume technology provides a compound volume in which data is distributed across multiple constituent volumes spread across all the nodes of the cluster. This dramatically increases the throughput that can be delivered from a single volume. We think when you combine the capabilities of Cluster-Mode with the pNFS capability of NFS version 4.1, it has the potential to be a market changer, allowing us to drastically increase speed while dramatically increasing usability in comparison to specialized, parallelized file systems such as GPFS and Lustre. We’ll be able to pass that value down to our existing VARs and MSPs that serve science, engineering, and other big data markets such as Hadoop. Advantages include:
- Simplified infrastructure. The total infrastructure for pNFS is simpler in comparison to those of other parallel file systems, which require many dedicated servers in addition to storage.
- Manageability. Typically, pNFS includes multiple file servers that have to be managed separately. Cluster-Mode will let us manage all components as a single entity.
- Nondisruptive operations. A pNFS installation on a NetApp cluster will be able to benefit from nondisruptive operations for maintenance and load balancing just like any other workload.
We’re already using pNFS with one customer and are investigating ways we can combine it with Infinite Volume in the future to create new high-performance services.
As a service provider, we are able to exercise almost no control over the workloads that run on our infrastructure. Problems such as misaligned volumes or VMs, boot and login storms, and similar events can occur without warning, so any technology that allows our infrastructure to better adapt to these kinds of unexpected events is welcome.
We see NetApp Flash Pool technology as a potentially important tool to help us better accommodate these unexpected events and also as another value-add that will give us the ability to create and offer new tiers of storage. Flash Pool is part of the NetApp virtual storage tiering (VST) technology, which adapts automatically to keep hot datasets on high-performance storage. It allows you to create aggregates of NetApp disks that combine traditional disk drives with SSDs. Random write and read data is automatically cached on SSDs to accelerate performance.
We’re currently testing the performance of Flash Pools, which combine high-capacity SATA disks with SSDs, and hope to offer it as a new tier of storage to our customers in the future.
For PeakColo, continued growth means continuing to deliver the performance, scalability, and capabilities that our VAR and MSP customers need. We’re confident that we’ve chosen the best technology partners to sustain our growth and accommodate important industry trends such as the adoption of flash-based storage, software-defined networking, and big data. NetApp Data ONTAP 8 Cluster-Mode is delivering the capabilities we need to succeed now and to meet our future challenges. The flexibility of Cluster-Mode is invaluable to us as a cloud service provider, allowing us to be much more agile than our competitors.