AWS Step Functions: the Deployment Orchestrator that CodePipeline should have been

AWS Step Functions: the Deployment Orchestrator that CodePipeline should have been

Luc van Donkersgoed

Luc van Donkersgoed

I have been very vocal about AWS CodePipeline’s limitations and design choices. It generally doesn’t align with my requirements of a deployment pipeline. Yet deployments need to take place, so what are our alternatives? I’ve taken to using AWS Step Functions for my deployment pipelines, and with great results!

In this article I want to highlight three powerful features in a Step Functions pipeline that are hard or impossible to achieve in CodePipeline. The first is parallel builds, the second is dynamic source branches and the third is conditional execution branches. I will first describe why these features are important, followed by an example implementation. To give you an idea of what’s coming, here is a diagram of a fully implemented pipeline in Step Functions:

Step Functions Deployment

Parallel builds

A single CodePipeline can only run once at a time. This introduces three limitations:

  1. Deploying to multiple non-conflicting environments (e.g. dev and acceptance) at the same time is impossible.
  2. If CodePipeline is building commit A1, and during that build commits A2, A3 and A4 are pushed to the repository, CodePipeline will continue with A4 when it has completed A1. Commits A2 and A3 will never be built.
  3. If there is a manual approval step in the pipeline, CodePipeline will block every other execution until the execution waiting for approval has completed.

There are many use cases where these limitations are problematic. Examples include an application with many dev environments, or a pipeline that should run unit or integration tests on every single commit, branch or pull request.

Step Functions State Machines are designed to run in parallel. Invoke the state machine 5 times in the same second, and it will run 5 parallel builds. The quotas and limits suggest there is no actual limit to the amount of parallel invocations.

One downside: executing parallel deployments to the same environment can lead to issues. For example, it’s impossible to run two CloudFormation deployments on a single environment - one of the two will fail with a UPDATE_IN_PROGRESS error. This can be fixed with a queue and a semaphore, which is covered in my article Building a multi-value semaphore with DynamoDB.

Dynamic source branches

Another hard-to-solve problem with CodePipeline is deploying any or every branch of a Git repository. The CodePipeline source action is designed for a static branch, configured once. There are many scenarios where you might want to deploy a hotfix/* or feature/* branch to your environments. With CodePipeline this means you need to update your pipeline configuration before running the build.

In Step Functions this can be solved by moving the responsibility for checking out Git branches to the build phase. This might need some explanation. In our solution we don’t have a source action - we just trigger the pipeline through an API call. The pipeline has sensible defaults for checkouts, for example the develop or main branch. When the API is called without parameters, the build step (implemented in CodeBuild) checks out the repository’s default branch, then builds and deploys it. This is not a big departure from the CodePipeline solution, although the order of operations is a bit different.

The two solutions diverge when you call the API with override parameters, for example:

{
    "override_branches": {
        "api_v1": "hotfix/emergency_patch"
    }
}

These parameters are picked up by CodeBuild, which checks out this alternative branch and deploys it. This capability can easily be extended to trigger a build for every commit or pull request.

Conditional execution branches

The third powerful feature in Step Functions is the ability to implement Choice states. These take a certain input, and based on their value they lead the State Machine execution down a certain path. A simple example is starting the State Machine with a skip_api_v1: true input. This value will be parsed by the Choice step, which will skip the deployment of API v1, while still deploying other components. This introduces a new level of flexibility: by default all components or micro services will be deployed, but if required you can choose to deploy a subset of services. This might speed up deployments or allow you to avoid interacting with sensitive environments unless strictly required.

Bringing it all together

At the top of the article we displayed the following diagram:
Step Functions Deployment

This is a snapshot of an actual pipeline we use in production. It can be run multiple times in parallel (not shown in this diagram), it can be executed with a custom Git branch configuration, and it uses choice flags to dynamically change its execution paths. The State Machine is started through an API which accepts JSON data like this:

{
    "environment": "acceptance",
    "skip_console_v1": true,
    "skip_console_v2": false,
    "skip_api_v1": true,
    "skip_api_v2": false,
    "override_branches": [
        {
            "project": "project-api-v2",
            "branch": "hotfix/important_patch"
        }
    ]
}

The first line configures which environment the application will be deployed to. This can be acceptance, test, production or a personal development environment.

Line 2-5 configure which components should be deployed. As you can see we are skipping the v1 steps, which matches the output in the diagram above.

Finally, we are overriding the branch to be checked out for project-api-v2.

Conclusion

There are many more benefits to State Machines which were not discussed in this article. Examples include parallel execution branches with a shared result state, combining and filtering inputs and outputs, integrations with a ton of other services, human approval and human choice states, wait loops and much more.

As always, there are also tradeoffs when choosing one solution over another. State Machines are more complex to build and maintain and therefore require more specialized knowledge. Also CodePipeline’s Source Action, although limited, is a very powerful way to trigger a pipeline. Which service to use depends on your use case, requirements, and context.

My personal approach is to use CodePipeline only for the most basic applications. As soon as the deployment mechanism consists of multiple components, requires dynamic branches or has a future in which any of these features might be required, I tend to choose Step Functions.

Comparing CodePipeline with S3, the holy grail of cloud infrastructure

ACM Queue recently released a great interview with Werner Vogels, titled “A Second Conversation with Werner Vogels”. The article looks back at the history and success of S3. A very strong recurring topic regarding the broad adoption of S3 is their philosophy to keep it simple. In the words of Werner Vogels:

A little before we started S3, we began to realize that what we were doing might radically change the way that software was being built and services were being used. But we had no idea how that would evolve, so it was more important to build small, nimble tools that customers could build on (or we could build on ourselves) instead of having everything and the kitchen sink ready at that particular moment. […] Could we have built a complex system? Probably. But if you build a complex system, it’s much harder to evolve and change, because you make a lot of long-term decisions in a complex system. It doesn’t hurt that much if you make long-term decisions on very simple interfaces, because you can build on top of them. Complex systems are much harder to evolve.

I believe this gets at the core difference between CodePipeline and Step Functions: CodePipeline a complex system, but most of all it is opinionated. It assumes a certain way of working and structures its components around that. If you have a different way of working, you’ll have a hard time shaping CodePipeline to do what you want. Step Functions on the other hand, seems to assume nothing. It doesn’t tell you how to use it or how to design for it. It provides the most basic building blocks like Step, Choice, Map, Parallel and Invoke. It’s up to you how to use those components. And because the building blocks are so simple, the possibilities are endless.

I share posts like these and smaller news articles on Twitter, follow me there for regular updates! If you have questions or remarks, or would just like to get in touch, you can also find me on LinkedIn.