Self-service Configuration for Auto-scaling Cloud Applications

Before the cloud, deploying an application into production meant downloading the pre-requisite software, application code, installing pre-requisite software, installing application code, configuring it from CLI/UI, switch configuration files to use production setup (and not test setup), tweak them to access production databases/servers etc etc. If your operations guy was organized, he would have a lengthy check-list of things to do. And god bless you, if your application requires Microsoft Sharepoint, SQL Server, BizTalk etc. They had several pages of check-lists and instructions on how to set them up. Your luck falls out very quickly if your environment had two or more o these servers to be installed – with all the dependencies, service packs and what not.

But if you are one of those forward looking people and burnt your fingers earlier, you would have probably automated several post-installation configuration procedures. Cloud brings this culture of automation to masses. If you are directly using IaaS service, pre-configured VMs takes care of many of the OS and application stack installation issues (If you are running your application on top of a PaaS (Platform as a Service) service, one donot even need to worry about VMs, as the PaaS platform takes care of them. But PaaS is different issue to talk about – probably for another post).  Now, you only worry about automating the provision your application instance. And this is the crux of this post.

If you really want to take advantage of load-balancing and auto-scaling in cloud, human-driven automation of application provisioning is not good enough. And here are some reasons as to why:

  • You want to dynamically add more instances (with some constraints and bounds) as the load increases, but the last thing you want is to wake up your operations guy in the middle of the night and ask him to run the automated script on the new instances before it can go live.
  • You have deployed HA using Active-Active  or Active-Passive setup and you need the instances to come back up online automatically right after the failure and switch-over.
  • Sometimes instances go down due to bugs or memory leaks. You need new instances to be brought up to continue to handle the traffic as if nothing happened.
  • You want to make sure that your system is ready to deal with any unknown failures – as part of this, your QA/Test infrastructure needs to bring-down various application instances randomly and see if the system recovers.

To achieve the above, you need to design  your application instances to obtain the provisioning/configuration information dynamically once it has come up.  There are couple of ways to do this, in the listed priority order:

1. Get it from known location –  In this design approach, an application instance reaches out to the central configuration repository to pull-in the necessary configuration.  This central configuration repository could be your own server serving the configuration or it could be built on top of other highly available cloud services such as Amazon’s SimpleDB.

Discovering the central repository itself could happen via a limited broadcast message (within a sub-net) or by embedding the repository server identification information (as a DNS name for e.g.,) in the application instance image itself.  Most high-traffic sites in the cloud are designed this way.

2. Baked cookies – This is the easiest approach to begin with.  Use one of the VM cloning methods provided by your cloud service provider to create a golden image of freshly configured instance. And use this golden image to spin-up new instances. The only down side with this approach is that with every new patch or new version of your application, you need to re-create the golden images.

Many well-known and high-traffic sites like Zynga and Netflix use similar techniques. Back in apigee days working with Netflix as our customer, I remember they using a mix of both the approaches for different instance types (second approach for soft appliance and the first approach for the application server instances).

Once you have figured out how you can bring up your instances without requiring manual/human-driven automated scripts for provisioning, solving the above use cases is trivial matter of  working with your cloud vendor’s load balancer ( for e.g., Amazon’s ELB) or cloud management system (such as RightScale) to configure for auto-scaling and high availability.


Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )


Connecting to %s