As long as there are changes, we deploy our web-frontend to production once an hour through the workday without fear. And if something important is wrong, the deployment is automatically blocked and prevented from ever reaching production.

There are some prerequisites to do this safely. These are the ones we have focused on, but they might vary between organizations.

  • High test coverage and multiple security nets. A high focus on automated testing on all levels and a well split up code base allows us to run various suites of unit tests, integration tests, visual tests using Chromatic and Storybook and full end-to-end tests as needed, depending on what has changed. In total, tens of thousands of tests are run several times every day.
  • A notion of what is critical functionality, so we can run a smaller end-to-end test suite before every single deploy. We use Cypress, but there are probably others that'd work too. We simply decided we do not need to test everything end-to-end before each release. The rest can be tested more asynchronously to the release pipeline. This, however, must be up to you. The important bit is that it is automated.
  • The whole deployment process was, before automatically pushing to production, just clicking a button. Zero human interaction except hitting that button.
  • Having a dedicated, controlled test environment to use for this. This might not be necessary everywhere, but we have a lot of time-sensitive test data, so we need to be able to control what the environment looks like before the tests are run. Also fully automated.

So, how?

In short, this is what we do. I'll go into detail on each step below.

  1. Build and package the release.
  2. Deploy the release to the environment we want to test in, and prevent further promotion of the release in Octopus Deploy.
  3. Run the tests. If they fail, leave the release locked and post a message to Slack.
  4. On success, remove the release block and deploy to production.

To orchestrate this, we've built a small program we trigger from TeamCity which deals with steps 2-4. We wrote ours in C#, and it uses the Octopus.Client library which makes it easier to interact with Octopus Deploy. If this doesn't float your boat, Octopus Deploy has other offerings or allows you to go directly to the REST-API from your platform of choice. This is what it looks like in the TeamCity project list for us.

Shows a list of three builds in TeamCity. They are called "CI1 - Deploy to Test3 and prevent progression", "CI2 - Smoke tests" and "CI3 - Promote and deploy to production"
The "CI" naming is purely for order.
ℹ️
An advantage to writing the orchestrator yourself is that you can add business rules to it. For example, we restrict deployments on certain days which is not practical to configure elsewhere. We can even return error messages explaining exactly why it refuses to deploy, which makes the error self-explanatory, saving people time spent investigating a non-issue.

1. Build the release

Build a single package that can be deployed into any environment. This ensures that it is the same code that is tested and pushed to production. For us, this builds, runs some tests and packages the application and sends it to Octopus Deploy as a NuGet package, causing a new release to be created there in the corresponding project.

2. Deploy the release for testing

Corresponds to the CI1-step in the picture above. This step looks up the currently newest release in our OctopusDeploy project and deploys it into our "Test3" environment. It then raises a defect with a suitable message on that release through the Octopus client, so it cannot be deployed to production before it is unlocked.

ℹ️
Note: Blocking manual production deployments like this depends on your deployment lifecycle, and will only be possible if your production environment is in another, later phase of deployment.

It also writes the current release-version to a text file called releaseVersion.txt on the root of the repository, so TeamCity can publish it as a build artifact for the next steps to depend on, like shown below:

3. Run the tests

Corresponds to the CI2-step above. Runs our smoke tests - the tests we have deemed necessary to feel safe deploying to production. In our case this is around 60 full end-to-end tests, and they take around 8 minutes to complete.

This step has both a snapshot and artifact dependency on the previous step. This causes it to auto-start the step before when triggered, as well as allowing us to propagate the releaseVersion.txt file on to the next step.

Screenshot showing how the CI2 build configuration has a snapshot dependency and an artifact dependency (pointing to the releaseVersion.txt file) on CI1. Important to note: When CI1 fails, let this one fail too.
Note how the Snapshot dependency is set to Mark build as failed to start on all kinds of failures.

In addition to this, make sure to add releaseVersion.txt on a separate line under Artifact Paths in General Settings on this build configuration. This enables the next step, CI3, to pick up this version number.

4. Remove the release block and deploy to production

This corresponds to the CI3-step above. If any tests in the previous step failed, this step should also be marked as failed immediatley to avoid pushing a faulty release to production. This is what our Dependencies-configuration looks like:

Screenshot showing how the CI3 build configuration has a snapshot dependency and an artifact dependency (pointing to the releaseVersion.txt file) on CI2. Important to note: When CI2 fails, let this one fail too.
Note how the Snapshot dependency is set to Mark build as failed to start on all kinds of failures.

When all the tests are green, the little orchestrator here reads the releaseVersion.txt to find what release we've just tested, removes the defect, allowing the release to progress, promotes and deploys the release to production.

ℹ️
The reason we bring with us the releaseVersion.txt file is to be sure we're working with the same release in all the the steps. The whole chain can take up to 12-14 minutes to complete, including waiting time for deployments, and new releases can be created during this time.

This build configuration also has a scheduled trigger on it. For us, this is once an hour monday to friday, between 08:00 and 15:00. Because of the dependency chain we have set up, this is the only step that requires a trigger, as it will pull CI2 along with it, which in turn pulls CI1 along with it. If any one of them fail, the whole chain breaks down, and no faulty releases land in production.