Five Tests for Cloud Ready Infrastructure

Alex Jauch -- Architect, Microsoft Private Cloud

As an architect of our private cloud solution, I spend quite a bit of time working on making the components of that solution “Cloud Ready.”  What’s funny about this work is that there really isn’t a definition of that term.  What makes infrastructure “Cloud Ready” anyway?  What types of infrastructure are better or worse for private cloud?

    

One way to think about this is to go back to core principals.  What is our definition of cloud?  As we have discussed before in this blog, we use the NIST model of cloud computing for our cloud definition framework.  Those characteristics are on-demand self-service, broad network access, resource pooling, rapid elasticity and measured service.  This means that our infrastructure must have features that support those five characteristics.  Taking a look one layer deeper into actual implementations, the infrastructure that supports these characteristics has to have all the normal infrastructure characteristics.  At Microsoft we used to call these the “abilities.”  Supportability, Scalability, Stability, etc.

However, there are implications to aspects like self-service, rapid elasticity and measured service that require additional feature sets that not all infrastructure has.  For example, you cannot perform self-service without some degree of automation.  There is an assumption in the architectural model that the infrastructure provisioning process can to some extent be automated.   In addition, the rapid elasticity and resource pooling characteristics assume an ability to operate at scale.  That is to say, you are not going to run a cloud on one server.  You’re going to have dozens or hundreds.  This also implies remote operations.  Anything that requires you to login to a local console is going to be just too cumbersome.

  

So, this implies that there are some critical features that you need to build a cloud.  Here are five tests to see how well your infrastructure supports these key characteristics:

  1. Automation.  One of the foundations of cloud is self-service.  This implies automation.  Any infrastructureyou deploy in support of your cloud efforts will benefit from a high degree of automation.
  2. Measured Service.  Because cloud solutions must be carefully managed to ensure you don’t over-subscribe, you need to be sure that your infrastructure components support a robust performance management interface.  Ideally, all the components of your solution use the same management infrastructure so you can establish a single view of your cloud.
  3. Rapid Provisioning.  Because clouds need to appear infinite, you really need to be able to perform provisioning tasks quickly.  Some of this is achieved via the automation bullet above, but in some cases expensive operations like copying a large file can slow things down enough to make the solution a poor candidate for cloud.
  4. Scale.  Because of the complexity of cloud style architectures, there is a lower sizing boundary below which you really don’t want to go. At that point, the cost per VM just gets too high and you can’t really justify all the complex systems needed to support things like self-service provisioning.  That proof point varies depending on your business requirements, but it’s safe to say that a solution that supports less than 100 VM’s is going to have a tough time producing a positive ROI.
  5. Availability.  I only put this one last because it’s really nothing new.  Yes, this is hugely important but in most cases it’s already a consideration for your infrastructure.  Most DC’s already operate on a “no single point of failure” rule but I will repeat it here because it is so vital.

 

As an example, let’s take this set of rules and apply them to our own Private Cloud offering.  How well do we eat our own dog food?

  1. Automation.  Our solution has two automation options.  In our case, 100% of core server and storage operations can be performed via PowerShell.  This is the automation tool most commonly used by Windows shops so we focus a great deal of our solution development on PowerShell.  However, many of our customers are also moving to System Center Orchestrator which is Microsoft’s orchestration platform.  For this reason, we also support a native OIP for System Center Orchestrator which allows common operations like mounting a lun or sub-LUN clone to be done without writing code.
  2. Measured Service.  As part of our solution, we include OnCommand Plug-In for Microsoft (OCPM) 3.0.  One of the core components we get from that product is tight integration with Microsoft’s System Center Operations Manager (SCOM).  This tight SCOM integration allows us to present a “single pane of glass” view to administrators that includes information about Windows Servers, NetApp storage controllers and our Cisco UCS blade centers.  This complete view gives admins a much richer picture of their total service offering than separate management tools.
  3. Rapid Provisioning.  Our solution includes several examples of this.  For VM provisioning, we use Sub-LUN cloning to allow administrators to create a new VM clone very quickly in support of a user request.  We also support FlexClone based boot LUNs for the UCS chassis which means we can provision a new blade into the environment very quickly and via a fully scripted process.  These features along with the rest of the solution offering allows us to support very fast SLA’s for provisioning tasks that aren’t possible with traditional methods such as WDS or other PXE based on demand
    deployment solutions.
  4. Scale. Our target configuration is for about 1000 VMs in the default configuration.  We have a minimum size of four blades and a maximum size of sixteen blades in our core infrastructure.  This along with the storage gives us enough capacity to support the requirements of most mid-sized
    companies and is of sufficient scale to provide a full cloud experience.
  5. Availability.  As per NetApp best practice, our solution has no single point of failure.  The storage has fully redundant controllers, redundant storage network paths, redundant network switches, redundant blade chassis, etc.  For NetApp, this is a normal requirement for all solutions, but it applies to private cloud very nicely as well.

As you can see, we are very focused on ensuring that our solutions are a full, complete platform for private cloud.  We will continue to improve and there is some great stuff in store for us this year in this space, but we feel that we’re proceeding from a very strong technical foundation that is enabling our future offerings and technologies.  I would encourage you to examine your private cloud plans and measure your planned or existing infrastructure against this set of requirements.