Discover more from Better Built By Burkhard
Episode 32: Better Built By Burkhard
My Favourite Books of 2022: Enabling Continuous Delivery
2022 is quickly coming to an end. I am looking back at my most profitable and most exhausting year of the 9.5 years as a solopreneur. I helped a 50-people organisation change how they develop software - by introducing Continuous Delivery. After strong resistance at the beginning, we got a Continuous Delivery pipeline up and running. Overcoming the resistance would have much less coercion and fighting, if I had known Derby’s 7 Rules for Positive Productive Change earlier in the project.
We reduced the average time for building the software and running the unit tests from 50 minutes to 10 minutes. The pipeline could handle 50 integrations to the main branch in an 8-hour workday. We made it mandatory for developers to cover every code change with tests. These two measures showed positive effects quickly. Broken builds and lengthy integrations were a thing of the past. The number of bugs came down significantly. I discuss these changes in my post Can We Use Trunk-Based Development for Legacy Software?
We identified the dependencies between the seven teams and classified them as “OK”, “slowing-down” or “blocking”. Negative dependencies between teams were reflected by negative dependencies between software components. This is Conway’s law in action. We followed the recommendations of the book Team Topologies and applied the reverse Conway manoeuvre.
We decoupled the teams and defined clean team APIs mirroring our target software architecture. It became clear that the organisation must inspect the team and software dependencies every three months and adapt them as needed. Repeating the inspect-and-adapt cycle leads to separable teams with single-threaded leadership, the upgrade of the two-pizza teams.
Applying the principles and practices of Continuous Delivery relentlessly leads to loosely coupled teams mirroring the loosely coupled architecture. Such teams do not need the high coordination efforts of SAFe, waste in agile lingo. As a process innovation, Continuous Delivery provides an order of magnitude more value to customers than SAFe. And, it does it faster, too. This is the topic of Deming’s essay The Need for Change.
Without the five books, I wouldn’t have been able to help my customer effectively with their transformation towards Continuous Delivery. I read all books cover to cover. They lie on my desk for quick lookup. I hope you find these books as valuable as I did.
You may be surprised that my list of favourite books doesn’t include any technical books, say, about C++, CMake, Yocto or Linux. I read these, too - at least partially. Understanding all the latest C++ features doesn’t improve your performance as a developer. But understanding how TDD, trunk-based development, continuous integration and all the other Continuous Delivery practices play together through positive feedback loops does. Big time!
I wish you a Merry Christmas and a Happy New Year. All the best, Burkhard 💜
My Favourite Books of 2022: Enabling Continuous Delivery
Matthew Skelton, Manuel Pais: Team Topologies - Organizing Business and Technology Teams for Fast Flow
Software development has both technical and social aspects. If an organisation focuses only on one aspect, its performance can improve only a little. If it focuses on both, its performance can improve a lot. Conway’s law provides the link between the social and technical aspects.
Ruth Malan provides […] the modern version of Conway’s law: “If the architecture of the system and the architecture of the organization are at odds, the architecture of the organization wins.” Malan reminds us that the organization is constrained to produce designs that match or mimic the real, on-the-ground communication structure of the organization.
Matthew Skelton & Manual Pais, p. 17
In other words, the architecture of the software is the mirror image of the architecture of the organisation. In my introductory example, all parts of the software depend heavily on the database. The reason is simple. There are only two developers, who know the database well enough to do non-trivial changes. So, all five teams depend on these two people. This is sort of OK for the three teams on the same floor. This is a major blocking point for the two teams at a different site.
[…] anyone who makes decisions about the shape and placement of engineering teams is strongly influencing the software systems architecture. […] in the words of Ruth Malan: “if we have managers deciding which services will be built, by which teams, we implicitly have managers deciding on the system architecture.
Matthew Skelton & Manual Pais, p. 23
System architects with the necessary social skills are an excellent option. They identify components that can be decoupled to large extent from the rest of the software by well-defined interfaces. Then, they carve out a team from the organisation that is solely focused on developing this component. They repeat these steps until they arrive at a loosely coupled system architecture that is the mirror image of a loosely coupled organisation architecture. This method is known as the reverse Conway manoeuvre (p. 18-21).
Similarly to software components, teams require a well-defined interface. The team API defines the responsibilities and how the team interacts with other teams. Skelton and Pais identify four team types or team topologies: stream-aligned teams (e.g., feature and product teams), enabling teams, complicated-subsystem teams and platform teams (see Chapter 5, The Four Fundamental Team Topologies of my full review).
I’ll focus on enabling teams, as they are ideally suited for introducing Continuous Delivery to an organisation.
Often [enabling teams'] are focused on more specific areas, such as build engineering, continuous delivery, deployments, or test automation for particular client technology (e.g., desktop, mobile, web). For example, the enabling team might set up a walking skeleton of a deployment pipeline or a basic test framework combining automation tools and some initial scenarios and examples.
Matthew Skelton & Manual Pais, p. 87
The enabling team helps the feature teams apply the principles and practices of Continuous Delivery - based on an evolving pipeline. The job of the enabling team is done, when the feature teams are doing fine on their own. The enabling team disbands and its members move back into the feature teams. A successful introduction of Continuous Delivery is close to impossible without a dedicated enabling team.
The end goal of an enabling team is to increase the autonomy of stream-aligned teams by growing their capabilities with a focus on their problems first, not the solutions per se. If an enabling team does its job well, the [stream-aligned team] should no longer need the help […] after a few weeks or months; there should not be a permanent dependency on an enabling team.
Matthew Skelton & Manual Pais, p. 87 (emphasis mine)
For a full review of the book see Episode 27 of my newsletter.
Farley divides the Continuous Delivery pipeline into the Commit Stage (Chapter 8) and the Acceptance Stage (Chapter 10). I’ll extend his definitions to applications running on embedded devices.
The Commit Stage is triggered when a developers pushes changes to the main branch.
It runs all unit tests on the host workstation.
It runs static analysis.
It cross-builds the applications for the target device and creates a binary archive for standard OTA update, a release candidate.
The Commit Stage should complete in less than 5 minutes. When the Commit Stage runs into an error, it rejects the change and resets the main branch to a working commit. It doesn’t create a binary archive. When the Commit Stage passes, developers can be pretty sure that the Acceptance Stage won’t find any problems. The Commit Stage provides a binary archive for the Acceptance Stage to scrutinise.
The Acceptance Stage starts running when the Commit Stage produced a good release candidate.
It deploys the release candidate on the target device using the standard OTA update procedure. This guarantees that the update has been tested numerous times before customers do an update.
It runs all acceptance, performance, integration and system tests on the target device.
The Acceptance Stage should complete in less than an hour. If it fails, the release candidate is rejected. If it passes, it is highly unlikely that the release candidate has any bugs in production. From this point on, the release candidate must not be changed any more. The CD pipeline uploads it to the cloud for customers to download and install on their systems.
The CD pipeline automatically calculates the DORA metrics in every run (Chapter 18). The DORA metrics are use in the annual Accelerate State of DevOps report to measure the performance of software development teams.
Change Failure Rate measures how often the CD pipeline fails at different steps, e.g., when running the cross-build, unit tests, acceptance tests or static analysis.
Failure Recovery Time measures “the amount of time when our software is not in a releasable state” (p. 117), that is, the time needed to repair a failure of the CD pipeline.
Deployment Frequency measures how often a commit (code change) passes the CD pipeline, that is, how often a release candidate is fit for production.
Lead Time for Changes measures the time it takes a commit (code change) to be deployed into production.
Change Failure Rate and Failure Recovery Time are the Quality or Stability metrics. Deployment Frequency and Lead Time for Change are the Efficiency or Throughput metrics.
We do not selectively choose between these measures - all four together create the value. After all, being fast but failing frequently, or being extremely stable and very slow, are not good outcomes. […]
Year on year, [the State of DevOps reports] have identified a strong correlation between high measures of Throughput and Stability, and High-Performing Teams. These reports also show that high performance is achieved by striving for better Throughput AND Stability, not one at the expense of the other.
Dave Farley, p. 118 (bold emphasis mine)
Derby recommends to understand the informal communication paths in an organisation (Chapter 5, Attend to Networks) and to use these networks to bring about change. Instead of coercing people to change, she lets the attraction and scarcity principles do their steady but unstoppable work. Here is how she defines the attraction principle.
Don’t try to convince people who aren’t eager. Work with those who already are. This is useful for testing and refining ideas. […] people will notice that something interesting is going on and want to know more. They may want to join in - because all the cool kids are doing it. Success will attract more people to try out a new idea.
Esther Derby, p. 93 (bold emphasis mine)
When you want to introduce Continuous Delivery to an organisation, a huge change by the way, you start implementing this change with people in favour of it. You listen to the sceptics but you don’t try to convince them. If the attraction of something provably better doesn’t weaken the resistance of the sceptics, the fear of missing out (FOMO) will.
The scarcity principle reflects the human tendency to value what is rare over what is abundant. Rather than mandate that everyone participate, make participation selective. I’ve used both applications and opt-in interviews. The investment of time and energy to become part of an effort also increases investment in that effort.
Esther Derby, p. 94 (bold emphasis mine)
The attraction and scarcity principle were an eye-opener to me. In my projects, I spend too much time trying to persuade people of best practices like TDD, refactoring, continuous integration and loosely coupled architecture. Although my arguments are sound and backed by empirical evidence, my success rate is in need of improvement. I am pretty sure that following the attraction and scarcity principle will provide this improvement - and save me a lot of energy.
Creating autonomous teams - also called self-organising or empowered teams - is one of the dedicated goals of Continuous Delivery. Many teams think this gives them free rein to do whatever they want. This is wrong. Teams should still follow the boundaries set by leadership (Chapter 7, Guide, and Allow for Variation).
Boundary stories give people concrete examples of desirable and undesirable outcomes without over specifying the means. I’ve seen many wonderful, creative solutions emerge when people [are guided by] appropriate boundaries, and have complete freedom within those bounds.
Esther Derby, p. 126 (bold emphasis mine)
The five key principles of Continuous Delivery are such boundaries. You can capture these boundaries with the DORA metrics. For example, you could define that the organisation should deploy the software multiple times per week, the lead time for changes is less than 2 days, the change failure rate is less than 10% and the failure recovery time is less than 1 day.
As the above metrics are those of a high-performing team, you would start with more relaxed boundaries and tighten them gradually over time. The teams can work however they want. They will be hard pressed to stay within the boundaries, if they don’t follow the principles and practices of Continuous Delivery. So, teams have “bounded autonomy”.
In the late nineties and early noughties, Amazon faced explosive growth in revenue, customers and employees. The software running their website grew into a big ball of mud, where nearly every piece of code depended on a single database containing all their business information. An innocent code change could bring down their entire business.
It required that we first untangle our monolithic software architecture and the organizational structures that had grown alongside it, then replace both, step by step, with systems designed to support rapid innovation.
Colin Bryar & Bill Carr, p. 54
In other words, Amazon had experienced Conway’s law first hand. The tangled mess of team dependencies was mirrored by the tangled mess of software dependencies. Amazon did what most companies do in this situation. They introduced heavyweight change processes and required more and better coordination and communication between the teams. They hired countless cross-team coordinators.
At last we realized that all this cross-team communication didn’t really need refinement at all - it needed elimination. […] We finally grasped the true identity of our problem: the ever-expanding cost of coordination among teams.
Colin Bryar & Bill Carr, p. 61 (emphasis mine)
This realisation gave birth to the microservices architecture made popular by AWS.
[Jeff Bezos] suggested that each software team should build and clearly document a set of application program interfaces (APIs) for all their systems/services. […] Jeff’s vision was that we needed to focus on loosely coupled interaction via machines through well-defined APIs rather than via humans through emails and meetings. This would free each team to act autonomously and move faster.
Colin Bryar & Bill Carr, p. 61 (emphasis mine)
Then, Amazon applied the reverse Conway manoeuvre. They defined team APIs along the lines of the service APIs. Conway’s law would ensure that the hundreds of loosely coupled teams - better known as two-pizza teams - created the desired microservices architecture.
Two pizza teams had less than ten people, required hardly any coordination with other teams, were the business owner, were self-funding and were “led by a multi-disciplined top-flight leader [who] must have deep technical expertise, know how to hire world-class software engineers and product managers, and possess excellent business judgement”. Such leaders were incredibly rare - and as Amazon found out not a must-have.
The biggest predictor of a team’s success was not whether it whether it was small but whether it had a leader with the appropriate skills, authority, and experience to staff and and manage a team whose sole focus was to get the job done.
Colin Bryar & Bill Carr, p. 75 (emphasis mine)
Leading a team is not a part-time job, but the only thing a leader works on at a time. As a single-threaded program only works on one thing at a time, Amazon came up with the name “single-threaded leader”. “Separable teams” can work autonomously with hardly any coordination between them. Hence, the new name for two-pizza teams is “separable teams with single-threaded leaders”. How can you identify separable teams?
A good rule of thumb to see if a team has sufficient autonomy is deployment - can the team build and roll out their changes without coupling, coordination, and approvals from other teams? If the answer is no, then one solution is to carve out a small piece of functionality that can be autonomous and repeat.
Colin Bryar & Bill Carr, p. 77 (emphasis mine)
Joyce Nilsson Orsini (Editor): The Essential Deming - Leadership Principles from the Father of Quality
Building quality in is one of the five key principles of Continuous Delivery. The State of the DevOps reports give ample evidence that high-performing teams achieve both better quality and higher speed (see Farley above). Deming understood this 40 years ago.
[As a consequence of ] better quality […] costs go down and productivity goes up […]. Better quality at lower price has a chance to capture a market. Cutting costs without improvement of quality is futile.
Edwards Deming, p. 38
This quote is from Chapter 2 titled “Quality is Made in the Boardroom”. Quality is one of the boundaries (see Derby above) that leaders must set. Leaders are responsible for setting and checking the boundaries. Then, they should step back and let the developers unleash their creativity to get better software into the hands of customers faster.
There are […] four ways to improve quality of product and service:
Innovation in product and service
Innovation in process
Improvement of existing product and service
Improvement of existing process
Edwards Deming, p. 41
Points 3 and 4 optimise the efficiency of an existing product or process. Improving the accuracy of image recognition by 1% or reducing the failure rate by 3% through better tests are characteristic examples. In contrast, the innovation of points 1 and 2 increases quality by an order of magnitude or makes new things possible.
The use of deep neural networks was a game changer for image recognition. Special cameras can detect with higher accuracy than humans if chips have manufacturing defects, perfumes are fakes or human cells are cancerous. Using Continuous Delivery for software development is a quality leap over using Scrum or SAFe.
When you try to change a complex system like a software development organisation, you will often wish for the following. At least, I do. And it’s good to be in good company.
Anytime you say something, people will give you ten reasons why you can’t do it. What I want to hear is the one reason you’re going to do it.
Edwards Deming, p. 37