Jenkins, Elixir, and ECS: CICD and Living on the Operational Edge

Richard Duarte
Tripping Engineering
6 min readJul 19, 2018

--

If you Google for the terms “Jenkins,” “Elixir,” and “ECS,” you’ll be met with disappointment, and a few posts from my colleague, Khaja Minhajuddin, who is also at Tripping.com and dealing with this apparently uncommon tech stack.

I really like Jenkins — he’s an old, reliable friend who’s ugly and with some fatal flaws, but gets the job done. I’ve deployed Jenkins for my automation needs at several of my past gigs, and Tripping is no different. It offers enough plugins to do things that are largely plug-n-play, as well as the flexibility to do things that are well out of reach for the existing plugins in the Jenkins ecosystem. On the topic of things that are out of the realm of the Jenkins ecosystem — Elixir and ECS are high on that list.

We’re sticking with AWS Elastic Container Service (ECS) currently because we’re a small team, and it lets me deploy our microservices relatively easily. It makes my life simpler as far as operational management goes, even if the ECS service can be a bit opaque to manage.

There’s not many plugins for ECS with Jenkins, and they’re mostly centered around running Jenkins slaves in ECS. There are currently zero plugins that mention Elixir in Jenkins. Deployments into ECS aren’t really covered by anything that already exists with the flexibility that I need, and there aren’t really any resources out there that cover our specific Jenkins/Elixir/ECS deployment stack, so lets dive into what I noodled together.

My Requirements

  • Our Elixir app running in Docker on ECS
  • Push button building/testing/deployments in Jenkins into ECS
  • Push button rollbacks in case of failure
  • Zero downtime deployments
  • Our Elixir applications run using Erlang releases (enables us to remote shell into our app if needed for debugging/development)

Testing

  • Any time something is pushed to a Pull Request or to the Master branch, the Testing Jenkins job triggers.
  • If we’ve specifically pushed something to the master branch, we trigger the job to build our Docker image.
  • We use docker-compose to run Unit/Integration tests in isolation on our Jenkins nodes.
  • Using docker-compose gives us the ability to run all the services together and run multiple jobs concurrently

Building the Docker image

  • Our docker images are built by Jenkins automagically
  • First, we inject some environment information for staging/testing vs production and run docker build and push to ECR (AWS Elastic Container Registry)
  • To ensure our production environment has less of a chance to become tainted by a bad release, I ensure we keep two separate ECR repos — one for production ONLY, and the other for whatever else
  • This Docker image contains our compiled Elixir app as an Erlang release (We use distillery to build our releases). This way our app can get started right off the bat without the need to compile when it is started.
  • To support my rollback scenario — I always want to know exactly which version of the code is being pushed up to ECR. To manage that, when I run docker build I’ve automated adding tags
  • Our build step looks something like:
docker build --tag $ECR_URL:$GIT_REFERENCE --tag $ECR_URL:latest 
  • Where $GIT_REFERENCE is git rev-parse --short HEAD of the repo where the Docker container is being built, i.e. $GIT_REFERENCE = a905828
  • This allows me to easily auto-deploy our latest container, but also to let me go back in our commit history and deploy any version of our code that maps directly with our commit history in Github
  • This job triggers anytime something is pushed to the master branch of our Elixir Github repo, so we have always have a version built and ready to go.

Deployment

  • Our deployments are also managed by Jenkins
  • The most important parameter to my deployment job into ECS is the $TAG field. This maps to which version of the container I’d like to deploy from ECR
  • By default, this field is set to latest — but I have the flexibility to deploy any older version sitting in ECR by using the $GIT_REFERENCE as the TAG if I wish, thus allowing us to deploy back to any point in time very quickly.
  • I have a set of scripts that have generalized our microservice Docker/Elixir deployment into ECS. Since we’re only running 2 ECS clusters (one production, one non-production), and all these services are running behind a minimal number of Application Load Balancers (ALBs), the scripts were ripe to make generic — so all of our Elixir microservices share the same ECS deployments scripts, with just a few changes in parameters (this is explained more below, under Create and Register a Task Definition)
  • Our ECS tasks are set up in a way that allows us to run multiple containers of the same service on a single EC2 instance (Dynamic Port Mapping). The documentation on this is… lacking. To make it work, set your hostPort=0 and your containerPort=9999 (or whatever you port number is for your service. Your containerDefinitions in your task_definition.json should look something like:
{  
"containerDefinitions": [
"portMappings": [{
"hostPort": 0,
"protocol": "tcp",
"containerPort": $YOUR_PORT
}],
....
}

The Deployment Script

Create and Register a Task Definition

  • Each task has its own task definition. Each task definition is stored as an .ERB template file, so that I can inject configuration details into the task as needed (this keeps things like region, port, environment, and most importantly, our $GIT_REFERENCE that maps to our $TAG in ECR). An excerpt of our service_task_def.json.erb is as follows (Pay special attention to the image portion, as that’s how I handle easily deploying any version of our code base from ECR into ECS):
# service_task_def.json.erb
{
"containerDefinitions": [
{
"portMappings": [
{
"hostPort": 0,
"protocol": "tcp",
"containerPort": <%= port %>
}
],
"cpu": 1024,
"memory": 1024,
"image": "99999999.dkr.ecr.<%= region %>.amazonaws.com/<%= repo %>:<%= tag %>",
"essential": true,
"environment": [
{
"name":"MIX_ENV",
"value":"<%= environment %>"
},
{
"name":"PORT",
"value":"<%= port %>"
}
]
"name": "<%= service%>-<%= environment %>"
}
],
"placementConstraints": [],
"family": "<%= repo %>",
"volumes": []
}
  • To inject data into the service_task_def.json.erb file:
erb port="$PORT" region="$REGION" 
repo="$SERVICE-$ENVIRONMENT" \
environment="$ENVIRONMENT" tag="$TAG" \
service_task_def.json.erb > $_GENERATED_TASK_DEF_JSON_FILE_PATH
  • Now to deploy our newly created task_definition.json file to ECS
aws ecs register-task-definition --region $REGION --cli-input-json file://PATH_TO_GENERATED_JSON

Update the service in ECS

We’ll tell ECS to use our newly updated task definition

aws ecs update-service --region $REGION --cluster $CLUSTERNAME --service $SERVICE-ENVIRONMENT --task-definition $SERVICE-ENVIRONMENT

Then we’ll ask our script to hang around until the cluster is stable. During this time, the service is draining connections out of the old containers, and pushing new connections to the new containers. This allows a zero-downtime deployment.

aws ecs wait services-stable --region $REGION --cluster $CLUSTERNAME --services $SERVICE-$ENVIRONMENT

By waiting for ECS to become stable, we can get an idea in Jenkins whether or not a deployment passed at an infrastructure level — if the service doesn’t stabilize, then the job will show a failure in Jenkins

The final step is to do some simple curls to known endpoints for our newly launched ECS microservices to ensure we’re getting 200 HTTP Responses from our service.

Failure Recovery

If at any point the deployment fails, or we start seeing something in our monitoring that we don’t like — we simply redeploy, and instead of putting latest as our $TAG, we simply point to our last known good deploy’s $GIT_REFERENCE and redeploy from there.

In a nutshell

This deployment strategy isn’t new nor novel — deploying containers to ECS is nothing new, but I do think its a good strategy for managing Elixir deployments for microservices. By sticking to strict conventions for tagging builds in ECR, keeping our task definitions generic and as templates, and keeping our deployment scripts generic — we’re able to keep our Elixir deployments simple. This strategy would easily work for non-Elixir apps as well, of course — but I think Elixir works well in ECS since its lightweight, lends itself to scaling (as does ECS), and its easy to have lots of Elixir projects all sharing a single ECS node.

I would also recommend this approach to other DevOps folks who suddenly find themselves in the Elixir ecosystem and may find themselves out of sorts. Instead of working with edeliver or rebuilding your deployment logic in Ansible/Chef/etc to deal with yet another language, making it generic and being a language-agnostic Docker container is a much easier proposition to manage.

Shout out to Khaja Minhajuddin for helping out with this article!

--

--