AWS recently announced that Amazon ECS now supports a state for container instances that can be used to drain a container instance in preparation for maintenance or cluster scale down. AWS reports that the draining state prevents new tasks from being started on the container instance and notifies the service scheduler to move tasks that are running on the instance to other instances in the cluster. This is great news that we expect to save a lot of time and scripting when it comes to updating or removing containers from a cluster.
Prior to this announcement, to avoid downtime, we would have to take a number of manual steps to ensure that we could remove a container instance that runs tasks from a cluster. As fervent adopters of AWS automation wherever possible, we were glad to see this blog post by Chris Barclay that outlines how to automate container instance draining with AWS Autoscaling groups and AWS Lambda with the new ECS support feature. With adopting automation as a best practice in mind, let’s walk through a couple use cases where automated container instance draining in Amazon ECS would have been helpful.
Use Case: Scale Down Cluster Size
We recently worked with a large specialty retailer who has an AWS microservices architecture. The company has two layers of load balancers that front the microservices application. However, handling in-memory sessions when scaling down the number of containers in service was problematic as the two layers of load balancers needed to support session draining. The process for doing so was manual and required lengthy wait times.
To ensure that our load balancers stopped sending requests to instances that were being scaled down, while keeping the existing connections open, we needed to use connection draining as it enables load balancers to complete in-flight requests while not allowing new requests to be made. Our solution for this organization was to automate the process through scripting and a combination of AWS ELB with other third party tools.
However, now with an ECS state for container draining, combined with Lambda and auto scaling groups, we can fully automate the process. The AWS Autoscaling groups invoke lifecycle hooks that in turn invoke a Lambda function that sets the ECS container to instance state to draining and from here checks if there are any tasks left on the container instance. If there are running tasks still in process of draining, it posts a message to SNS so that the Lambda function is called again. Lambda repeats step two until there are no tasks running on the container instance, or the heartbeat timeout on the lifecycle hook is reached. Afterward, control is returned to the Auto Scaling lifecycle hook, and the instance terminates.
Not only would this level of automation have saved this retailer time and effort in scaling down its cluster size, but automation serves to decrease risk as it helps remove inadvertent human error and bakes security processes in from the outset.
Use Case: New Amazon Machine Images Rollout or other System Update
Whether to introduce a new Amazon Machine Image, change instance types, or make some other system update, it’s not rare that an organization will want to use container instance draining to simplify these operational activities. The key for many of our customers in these cases is that they want to make these changes without interruption to their services -- and ultimately their customers.
For example, we work with a large manufacturer who built a new AMI with Hortonworks Data Platform (HDP) services that connected back to Mongo at the company. (For additional background on this company’s AWS story, please read the case study here.) The AWS and HDP teams needed to work together to decide how to handle the service draining in order to update services with the new AMI.
Rather than a lengthy team exercise with a great deal of scripting in order to automate the process, this manufacturer would have benefited greatly from this new feature. Additionally, as Chris Barclay points out in his blog, AWS CloudFormation and CodePipeline could also be used to automate this process.
If you are looking to explore new ways to grow your automation, check out resource page on AWS automation. And, for additional tips, AWS best practices and commentary delivered regularly to your inbox, click the button below to sign up for our blog.