Episode 38: Better Built By Burkhard

Metrics for Continuous Delivery

Mar 16, 2023

Continuous Delivery encompasses all aspects of software development and is a holistic approach, which I define as: “going from idea to valuable, working software in the hands of users.”
Continuous Delivery is achieved by working so that our software is always in a releasable state.
Dave Farley, Continuous Delivery Pipelines - How To Build Better Software Faster, p. 4-5

This definition tells you what the final goal of Continuous Delivery (CD) is. It neither tells you whether you are moving towards the goal nor how close to the goal you are. Only measuring can tell you. The DORA metrics have become the standard metrics for Continuous Delivery.

Stability
- The Change Failure Rate is the percentage of code changes (commits) that lead to failed runs of the CD pipeline or to bug reports from customers.
- The Failure Recovery Time is the time it takes teams to repair pipeline failures or fix bugs.
Throughput
- The Deployment Frequency measures how often teams release software to customers in a given period of time.
- The Lead Time for Changes is the time from a code change (commit) to the customer receiving a release with this change.

The precondition for collecting these metrics is a working CD pipeline. So, set up a minimal pipeline for embedded HMIs as described here and make it compute the metrics automatically. Show the current metrics and their changes over time on a dashboard visible for every stakeholder in the company.

The four metrics measure the software delivery performance of a team. A higher software delivery performance predicts a higher organisational and non-commercial performance of your company. Regarding positive effects on organisational performance:

Analysis over several years shows that high-performing organizations were consistently twice as likely to exceed goals {in profitability, market share, productivity, ROI and resilience to economic cycles] as low performers. This demonstrates that your organization’s software delivery capacity can in fact provide a competitive advantage to your business.
Nicole Forsgren et al., Accelerate - Building and Scaling High Performing Technology Organizations, p. 24

Regarding positive effects on non-commercial performance:

We found that high performers were also twice as likely to exceed objectives in quantity of goods and services, operating efficiency, customer satisfaction, quality of products or services, and achieving organization or mission goals.
Nicole Forsgren et al., Accelerate - Building and Scaling High Performing Technology Organizations, p. 24

Teams can only achieve high performance if they improve both on stability and on throughput. It is not enough to do well on either stability or throughput. Two negative feedback loops illustrate this point.

You could easily increase the deployment frequency and reduce the lead time for change by releasing code changes without tests. However, you would soon receive more bug reports from your customers and spend more time on bug fixing.
You could reduce the failure rate nearly to 0% by investing an enormous amount of time on testing. However, you would hardly every release changes to your customers.

The researchers of the 2021 State of DevOps Report could identify three performance levels for software development organisations in their representative data. The groups are characterised by different limits for the four metrics.

                        Low               Mid               High

Change Failure       <= 15%             <= 15%           <= 5%
Rate                

Failure Recovery     <= 1 week          <= 1 day         <= 1 hour
Time                

Deployment           Monthly or        From daily        On demand
Frequency            less often        to weekly

Lead Time            From 1 week       <= 1 week         <= 1 hour
for Changes          to 6 months

For embedded HMIs, I recommend to set the following limits as team goals.

Change failure rate <= 1%. This is achievable if you use TDD diligently. I regard TDD as a main driver for high performance. TDD increases the testability of your software. Writing acceptance, integration and system tests becomes easier. Writing more and better tests reduces the number of failures, which reduces the time spent on rework - a.k.a. debugging and bug fixing.
Failure recovery time <= 1.5 hours. If a harvester stands still on the field, if a professional cooking appliance in fast-food restaurant burns the burgers, or if a robot in a car assembly line breaks down, you will have many angry customers losing a lot of money. You are better able to get a fix to your customers almost instantly.
Most likely, you won’t be able to run the (full) system stage of your pipeline (<= 3 hours). Hence, your commit stage (<= 5 minutes) and acceptance stage (<= 1 hour) must have high-quality tests suites so that you can release the bug fix with high confidence.
Deployment frequency: twice per day. You can release software to your customers whenever you want. This enables you to get a bug fix into the hands of your customers within 1.5 hours. On-demand deployment doesn’t mean that you release every code change to your customers. It means that you are capable of doing it successfully - if needed.
So, you have your fleet management solution for OTA updates in place and have tried it out successfully hundreds of times. Then, it’s a walk in the park when your customer really needs a new release.
Lead time for changes <= 3 hours. A lead time of 3 hours allows you to run the full pipeline. The lead time starts when you integrate a change into the main branch and ends when the pipeline produces a good release candidate. The lead time shouldn’t be longer than the run time for the pipeline.
Consequently, your code must be “done” when merged. It must be tested, reviewed and documented so that it will pass the CD pipeline with high probability. There is no time to do a code review after a merge request. The code review must happen before - e.g., by pair programming or thorough testing.

I have watched teams discuss what constitutes a bug and what severity a bug has. It’s certainly a bug if the pipeline fails. The persons responsible for the bug drop their work and make the pipeline pass again. It’s also a bug, if the observable behaviour of interfaces on any level change.

Mismatches between customer expectations and your implementation are bugs. You should assess the impact on the customer (e.g., extra costs and efforts), how many customers are affected and how often the bug occurs. Then, you triage the bugs timely into three categories: Fix right away, fix soon or don’t fix at all. Be pragmatic: A few bugs more don’t hurt. They tell you what to improve.

My goal for the lead time assumes that developers work on a local copy of the main branch and push one or more commits to the remote repository. The CD pipeline learns about the code change for the first time when pushed. The push determines the start for the lead time. The lead time doesn’t contain the time developers need to write the code. Instead of the push time, you could also use the time of the merge request.

If developers create a branch tracked by the remote repository before they start writing code, the CD pipeline can use the creation time as the start for the lead time. Then, I would limit the lead time to less than 2 days. Developers have up to 1.5 days for a change and the pipeline half a day.

My goal for the lead time also assumes that there are no manual steps after the release candidate passed the pipeline. Typical manual steps include manual testing, collecting regulatory documentation, writing release notes and uploading the release candidate to the fleet management server. This time is part of the lead time. You should strive to automate these steps and reduce the lead time.

Many organisations do not even reach the low level at the beginning. Guided by the principles of Continuous Delivery, they move towards the low-level metrics, then the mid-level metrics and finally the high-level metrics. This process takes months or even years. The risk to fail is considerable, especially if management is not fully behind it. However, the return on investing in Continuous Delivery is huge including higher profitability, higher customer satisfaction and higher employee satisfaction. In summary, Continuous Delivery is a risk worth taking!

Better Built By Burkhard

Episode 38: Better Built By Burkhard

Metrics for Continuous Delivery

Discussion about this post