Testing in Production: Routing Traffic During a Release

10 May 2017 on releasemanagement, devops

DevOps is a journey that every team should at least have started by now. Most of the engagements I have been on in the last year or so have been in the build/release automation space. There are still several practices that I think teams must invest in to remain competitive – unit testing during builds and integration testing during releases are crucial foundations for more advanced DevOps, which I’ve blogged about (a lot) before. However, Application Performance Monitoring (APM) is also something that I believe is becoming more and more critical to successful DevOps teams. And one application of monitoring is hypothesis driven development.

Hypothesis Driven Development using App Service Slots

There are some prerequisites for hypothesis driven development: you need to have metrics that you can measure (I highly, highly recommend using Application Insights to gather the metrics) and you have to have a hypothesis that you can quickly test. Testing in production is the best way to do this – but how do you manage that?

If you’re deploying to Azure App Services, then it’s pretty simple: create a deployment slot on the Web App that you can deploy the “experimental” version of your code to and divert a small percentage of traffic from the real prod site to the experimental slot. Then monitor your metrics. If you’re happy, swap the slots, instantly promoting the experiment. If it does not work, then you’ve failed fast – and you can back out.

Sounds easy. But how do you do all of that in an automated pipeline? Well, you can already deploy to a slot using VSTS and you can already swap slots using OOB tasks. What’s missing is the ability to route a percentage of traffic to a slot.

Route Traffic Task

To quote Professor Farnsworth, “Good news everyone!” There is now a VSTS task in my extension pack that allows you to configure a percentage of traffic to a slot during your release – the Route Traffic task. To use it, just deploy the new version of the site to a slot and then drop in a Route Traffic task to route a percentage of traffic to the staging site. At this point, you can approve or reject the experiment – in both cases, take the traffic percentage down to 0 to the slot (so that 100% traffic goes to the production slot) ad then if the experiment is successful, swap the slots.

What He Said – In Pictures

To illustrate that, here’s an example. In this release I have DEV and QA environments (details left out for brevity), and then I’ve split prod into Prod-blue, blue-cleanup and Prod-success. There is a post-approval set on Prod-blue. For both success and failure of the experiment, approve the Prod-blue environment. At this stage, blue-cleanup automatically runs, turning the traffic routing to 0 for the experimental slot. Then Prod-success starts, but it has a pre-approval set that you can approve only if the experiment is successful: it swaps the slots.

Here is the entire release in one graphic:

In Prod-blue, the incoming build is deployed to the “blue” slot on the web app:

Next, the Route Traffic task routes a percentage of traffic to the blue slot (in this case, 23%):

If you now open the App Service in the Azure Portal, click on “Testing in Production” to view the traffic routing:

Now it’s time to monitor the two slots to check if the experiment is successful. Once you’ve determined the result, you can approve the Prod-blue environment, which automatically triggers the blue-cleanup environment, which updates the traffic routing to route 0% traffic to the blue slot (effectively removing the traffic route altogether).

Then the Prod-success environment is triggered with a manual pre-deployment configured – reject to end the experiment (if it failed) or approve to execute the swap slot task to make the experimental site production.

Whew! We were able to automate an experiment fairly easily using the Route Traffic task!

Conclusion

Using my new Route Traffic task, you can easily configure traffic routing into your pipeline to conduct true A/B testing. Happy hypothesizing!

Testing in Production: Routing Traffic During a Release

Hypothesis Driven Development using App Service Slots

Route Traffic Task

What He Said – In Pictures

Conclusion

Colin's ALM Corner

Error

Hypothesis Driven Development using App Service Slots

Route Traffic Task

What He Said – In Pictures

Conclusion

Templates (for web app):

Error