CS 489: Software Delivery
Shane McIntosh
Estimated study time: 1 hr 17 min
Table of contents
Sources and References
Primary textbook — None required
Supplementary texts
- Jez Humble and David Farley, Continuous Delivery: Reliable Software Releases through Build, Test, and Deployment Automation, Addison-Wesley, 2010.
- Nicole Forsgren, Jez Humble, and Gene Kim, Accelerate: The Science of Lean Software and DevOps, IT Revolution Press, 2018.
- Betsy Beyer, Chris Jones, Jennifer Petoff, and Niall Richard Murphy (eds.), Site Reliability Engineering: How Google Runs Production Systems, O’Reilly Media, 2016.
- Gene Kim, Jez Humble, Patrick Debois, and John Willis, The DevOps Handbook, IT Revolution Press, 2016.
- Gene Kim, Kevin Behr, and George Spafford, The Phoenix Project, IT Revolution Press, 2013.
- Brendan Burns, Joe Beda, and Kelsey Hightower, Kubernetes: Up and Running, O’Reilly Media, 3rd edition, 2022.
- Kief Morris, Infrastructure as Code, O’Reilly Media, 2nd edition, 2020.
- Titus Winters, Tom Manshreck, and Hyrum Wright, Software Engineering at Google: Lessons Learned from Programming Over Time, O’Reilly Media, 2020.
- Michael T. Nygard, Release It! Design and Deploy Production-Ready Software, Pragmatic Bookshelf, 2nd edition, 2018.
Online resources
- DORA Team, Accelerate State of DevOps Report 2024, Google Cloud, 2024.
- MIT, The Missing Semester of Your CS Education, 2020.
- Bazel Documentation, https://bazel.build
- Terraform Documentation, https://developer.hashicorp.com/terraform
- Kubernetes Documentation, https://kubernetes.io/docs
- Engineering blogs from Google, Netflix, Meta, Shopify, Spotify, and GitHub.
Chapter 1: Foundations of Software Delivery
Software delivery is the discipline of getting code from a developer’s machine into the hands of users reliably, repeatedly, and rapidly. It sounds deceptively simple, yet the history of software engineering is littered with spectacular delivery failures: missed launch windows, corrupted deployments, multi-day outages triggered by a single misconfigured line, and releases so painful that teams dreaded them. This chapter sets the stage by examining why delivery is hard, how the industry arrived at its current practices, and how we measure whether we are actually getting better.
1.1 The Software Delivery Challenge
In the early decades of commercial software, delivery was an event. Teams spent months or years writing code, then handed a golden master to a release engineering group who burned it onto physical media or uploaded it to an FTP server. The feedback loop from writing a line of code to learning whether it worked in the real world could stretch to a year or more. Bugs discovered in production meant expensive patch cycles, and the cost of a botched release was enormous because another attempt might be months away.
Several forces made this model increasingly untenable. First, the internet compressed user expectations. Web applications could be updated at any time, and users began to expect rapid iteration. Second, competitive pressure rewarded speed: the company that shipped a feature first captured the market. Third, system complexity exploded. Modern applications are not monolithic binaries but constellations of microservices, third-party APIs, managed databases, and client-side code running on an astounding variety of devices. Coordinating a release across dozens of interdependent services is a fundamentally different problem from shipping a single executable.
The consequence is that modern software delivery is not a phase at the end of a project. It is a continuous, automated, and deeply technical discipline that begins the moment a developer creates a branch and does not end until the change is safely running in production and being monitored.
1.2 From Waterfall to Agile to DevOps
The waterfall model, formalized by Winston Royce in 1970 (who, ironically, presented it as a flawed strawman), treated software development as a sequence of distinct phases: requirements, design, implementation, verification, and maintenance. Each phase completed before the next began. Delivery was the tail end, an afterthought separated from development by a wall of documentation and sign-offs.
Agile methodologies, emerging in the early 2000s from the Agile Manifesto, shortened the feedback cycle by organizing work into sprints of one to four weeks. Teams delivered working software incrementally. However, many organizations adopted agile for development while leaving operations untouched, creating a new bottleneck: development teams could produce features every two weeks, but the operations team still deployed quarterly.
The DevOps movement emerged around 2008-2009 to dissolve this wall between development and operations. The term was coined by Patrick Debois after a series of “DevOpsDays” conferences. DevOps is not a specific tool or job title but a cultural and technical philosophy built on several pillars:
- Shared ownership: Developers are responsible for the operability of their code, and operators participate in design decisions.
- Automation: Every repeatable process, from building to testing to deploying to monitoring, should be automated.
- Measurement: You cannot improve what you do not measure. Instrument everything.
- Rapid feedback: Shorten the time between making a change and learning its impact.
Companies that embraced DevOps early, such as Flickr (famous for its 2009 presentation “10+ Deploys Per Day”), Amazon, Etsy, and Netflix, demonstrated that frequent, low-risk deployments were not only possible but safer than infrequent, high-ceremony releases.
1.3 DORA Metrics and Measuring Delivery Performance
How do you know if your delivery pipeline is any good? For years, this question produced hand-waving and anecdotes. The DORA (DevOps Research and Assessment) team, led by Dr. Nicole Forsgren, Jez Humble, and Gene Kim, changed that by applying rigorous statistical methods to survey data from tens of thousands of professionals worldwide.
Their research, published in the annual Accelerate State of DevOps Report and the book Accelerate, identified four key metrics that predict both software delivery performance and organizational performance:
Deployment Frequency: How often does your organization deploy code to production? Elite teams deploy on demand, often multiple times per day. Low performers deploy between once per month and once every six months.
Lead Time for Changes: How long does it take for a commit to reach production? Elite teams achieve lead times of less than one day. Low performers measure lead times in months.
Mean Time to Restore (MTTR): When a service incident occurs, how quickly can you restore service? Elite teams recover in under one hour. Low performers take between one week and one month.
Change Failure Rate: What percentage of deployments cause a failure in production that requires remediation? Elite teams maintain a change failure rate around 0-15%. Low performers see rates of 46-60%.
The 2024 DORA report, based on responses from over 39,000 professionals, revealed several striking findings. Elite-level teams, those with sub-day lead times, on-demand deployment, a roughly 5% failure rate, and sub-hour recovery, represented only about 19% of respondents. The gap between elite and low performers was staggering: as large as 182x in deployment frequency and 127x in change lead time. The report also found that transformational leadership is a key driver of high performance, with leaders who communicate a clear vision and actively support their teams reducing burnout while boosting both job satisfaction and delivery metrics.
Perhaps most provocatively, the 2024 report found that AI adoption in coding was associated with a 1.5% decrease in delivery throughput and a 7.2% decrease in delivery stability. The researchers attributed this not to AI being inherently harmful but to the tendency for AI-assisted development to produce larger changesets, and larger batch sizes introduce more risk. Thirty-nine percent of respondents reported little to no trust in AI-generated code.
These metrics matter because they are not vanity metrics. The DORA research demonstrates a statistically significant relationship between delivery performance and organizational outcomes including profitability, market share, and employee satisfaction. In other words, getting better at delivery is not just an engineering concern; it is a business imperative.
Chapter 2: Version Control Systems
Version control is the bedrock of software delivery. Every other practice in this course, from building to testing to deploying, depends on the assumption that there is a single, authoritative, well-understood history of all changes to the codebase. This chapter examines the internal mechanics of Git, the dominant version control system, and explores the branching strategies, hooks, and repository structures that high-performing teams use in practice.
2.1 Git Internals: The Object Model
Git is fundamentally a content-addressable filesystem with a version control interface layered on top. Understanding its internal data model demystifies many of its behaviors and makes advanced operations far less intimidating.
2.1.1 The Four Object Types
Every piece of data Git stores is an object, identified by the SHA-1 hash of its contents. There are four object types:
Blob: A blob stores the contents of a single file. It does not store the filename, permissions, or any metadata, just the raw bytes. If two files in different directories have identical contents, Git stores only one blob.
Tree: A tree represents a directory. It contains a list of entries, each of which is a mode (file permissions), a type (blob or tree), a SHA-1 hash, and a filename. Trees can reference other trees, forming a recursive structure that represents the entire directory hierarchy of a project at a given point in time.
Commit: A commit object points to a single tree (the root directory of the project at that moment), zero or more parent commits (zero for the initial commit, one for a normal commit, two or more for a merge), and metadata: the author, the committer, a timestamp, and a commit message.
Tag: An annotated tag object points to another object (usually a commit) and includes a tagger, a date, and a message. Lightweight tags are simply refs and do not create tag objects.
2.1.2 The Directed Acyclic Graph (DAG)
Commits form a directed acyclic graph (DAG). Each commit points backward to its parent(s), forming a chain of history. Branches and tags are simply pointers (refs) to specific commits in this graph. The HEAD ref points to the current branch, which in turn points to the latest commit on that branch. Understanding the DAG is essential for reasoning about merges, rebases, and cherry-picks.
When you create a new commit, Git does not copy the entire tree. Instead, it creates a new tree object that reuses unchanged subtrees and blobs from the parent. This structural sharing makes Git extremely space-efficient even for large repositories.
2.1.3 Packfiles and Compression
Initially, each object is stored as a separate file in .git/objects, a format called loose objects. Over time, Git runs garbage collection (git gc) and packs objects into packfiles. A packfile stores objects as deltas against similar objects, dramatically reducing storage. Git uses heuristics to find good delta bases, often achieving compression ratios of 10:1 or better. This is why a Git clone of a large project often downloads far less data than you might expect.
2.1.4 The Index (Staging Area)
The index, also called the staging area, is a binary file at .git/index that serves as a buffer between the working directory and the next commit. When you run git add, you are copying a snapshot of the file into the index. When you run git commit, Git creates a tree from the contents of the index. This three-state model (working directory, index, repository) gives developers fine-grained control over what goes into each commit.
2.2 Branching Strategies
A branching strategy is a team’s convention for when and how to create, merge, and delete branches. The choice of strategy profoundly affects deployment frequency, code review practices, merge conflict rates, and overall delivery velocity.
2.2.1 GitFlow
GitFlow, proposed by Vincent Driessen in 2010, is a branching model with two permanent branches (main and develop) and three categories of temporary branches:
- Feature branches diverge from
developand merge back intodevelopwhen the feature is complete. - Release branches diverge from
developwhen enough features have accumulated for a release. Final testing and stabilization happen on the release branch, which is then merged into bothmain(tagged with a version number) and back intodevelop. - Hotfix branches diverge from
mainto fix critical production bugs and are merged into bothmainanddevelop.
GitFlow was well-suited to a world of infrequent, planned releases. It remains relevant for products that ship versioned releases (desktop software, mobile apps with app store review cycles, embedded systems) and for teams that need strict separation between “what is released” and “what is in development.” However, for teams practicing continuous delivery, GitFlow introduces unnecessary overhead. The long-lived develop branch accumulates drift from main, merge conflicts pile up, and the ritual of release branches creates delays.
A 2023 JetBrains Developer Survey found that only about 22% of teams still use GitFlow, down significantly from its peak. The trend is clear: teams that deploy continuously are moving toward simpler models.
2.2.2 GitHub Flow
GitHub Flow is a dramatically simpler model: there is one permanent branch (main), and all work happens on short-lived feature branches. When a feature branch is ready, the developer opens a pull request, the code is reviewed and tested, and the branch is merged into main. Deployment happens from main, either automatically or on demand.
GitHub Flow works well for SaaS applications where there is only one version in production at a time. Its simplicity reduces cognitive overhead and encourages small, frequent merges. Many modern teams in 2025, especially those building web applications and cloud-native systems, use GitHub Flow or a close variant as a pragmatic balance between control and speed.
2.2.3 Trunk-Based Development
Trunk-based development (TBD) takes the simplification further. All developers commit directly to a single branch (the trunk, usually main). Feature branches, if they exist at all, are extremely short-lived (hours, not days or weeks) and are merged at least once a day. Integration happens continuously, not in batches.
TBD is the branching model most strongly correlated with high delivery performance in the DORA research. Google, Meta, and Netflix all practice variants of trunk-based development. At Google, over 25,000 engineers commit to a single monolithic repository, and the trunk receives thousands of commits per day.
The key enablers of TBD are:
- Feature flags: Incomplete features are hidden behind flags so that half-finished code can be committed to trunk without affecting users. LaunchDarkly, Unleash, and Flagsmith are popular feature flag management platforms.
- Comprehensive automated testing: Since every commit immediately integrates with everyone else’s work, the test suite must be fast and reliable enough to catch regressions within minutes.
- Small batch sizes: Developers make small, incremental changes rather than large feature branches. This reduces merge conflicts and makes each change easier to review, test, and (if necessary) revert.
2.2.4 Choosing a Strategy
There is no universally correct branching model. The right choice depends on your deployment model, team size, regulatory constraints, and organizational culture. However, the empirical evidence is clear: teams that integrate more frequently, with smaller batch sizes, achieve better delivery outcomes. If your branching strategy creates long-lived branches and large merges, you are probably leaving performance on the table.
2.3 Git Hooks
Git hooks are scripts that Git executes at specific points in its workflow. They are stored in .git/hooks/ (or a configured directory) and provide a mechanism for enforcing quality gates, automating tasks, and integrating Git with other tools.
2.3.1 Client-Side Hooks
pre-commit: Runs before a commit is created. Commonly used to lint code, check formatting, scan for secrets (tools like
detect-secretsortruffleHog), and verify that tests pass. Thepre-commitframework (pre-commit.com) provides a language-agnostic hook management system with a large ecosystem of existing hooks.prepare-commit-msg: Runs after the default commit message is created but before the editor opens. Useful for injecting information like the branch name or a ticket number into the commit message.
commit-msg: Runs after the developer writes the commit message. Used to enforce message conventions such as Conventional Commits (e.g.,
feat:,fix:,chore:). The toolcommitlintis widely used for this purpose.pre-push: Runs before a push to a remote. Can be used to run a heavier test suite or verify that the branch is up to date.
2.3.2 Server-Side Hooks
pre-receive: Runs on the server before any refs are updated. The most powerful enforcement point. Can reject pushes that do not meet policy (e.g., unsigned commits, pushes to protected branches, changes that fail static analysis).
update: Similar to pre-receive but runs once per ref being updated, allowing per-branch policies.
post-receive: Runs after a push is accepted. Commonly used to trigger CI pipelines, send notifications, or update dashboards.
2.3.3 Hooks in Practice
At companies with mature delivery pipelines, hooks are a first line of defense. Shopify, for example, uses pre-commit hooks to catch common issues before code even reaches CI, saving both developer time and CI compute. Server-side pre-receive hooks enforce branch protection rules, required review policies, and security scanning at the platform level, independent of any individual developer’s local configuration.
2.4 Monorepos vs. Polyrepos
A monorepo is a single repository that contains all of an organization’s code. A polyrepo (or multi-repo) structure uses a separate repository for each project or service.
2.4.1 The Monorepo Approach
Google is the most famous monorepo practitioner. Their repository contains billions of lines of code, and virtually all of Google’s software is stored in it. The advantages include:
- Atomic cross-project changes: A single commit can update a library and all of its consumers simultaneously, eliminating version skew.
- Simplified dependency management: There is one version of every library, always at head. Diamond dependency problems are impossible.
- Unified tooling: One build system, one test framework, one CI configuration.
- Code discoverability: All code is searchable in one place.
Meta also uses a monorepo (historically for their main product, managed with Mercurial). Microsoft moved the Windows codebase into a Git monorepo using their VFS for Git technology.
The challenges of monorepos are significant: standard Git does not scale well to repositories with millions of files. Google built their own version control system (Piper) with a virtual filesystem (CitC). Meta contributed heavily to Mercurial’s scalability. Microsoft built VFS for Git (now called Scalar) to handle the Windows repository. More recently, Git’s built-in sparse checkout and partial clone features have made large monorepos more feasible with standard Git.
2.4.2 The Polyrepo Approach
Most organizations use polyrepos, often because they started that way and the switching cost is high. Polyrepos provide clear ownership boundaries, simpler CI configuration per project, and the ability to use different languages and toolchains in different repos without conflict. However, they make cross-project changes difficult (requiring coordinated pull requests across multiple repos), can lead to version drift between shared libraries, and require more sophisticated dependency management.
2.4.3 Hybrid Approaches
Many organizations adopt a middle ground. They might have a monorepo for their core platform services and separate repos for isolated components, or they might use a tool like Turborepo or Nx to get monorepo-like experiences across a polyrepo structure.
2.5 Code Review Workflows
Code review is the process of having other developers examine changes before they are merged. It is one of the most effective quality gates in software delivery.
Pull requests (GitHub, Bitbucket) or merge requests (GitLab) are the dominant code review mechanism for teams using GitHub Flow or similar branching models. A developer pushes a branch, opens a PR, and one or more reviewers approve the changes before merge.
Gerrit, developed by Google, implements a patch-based review workflow. Each commit is reviewed independently, and reviews happen before code is merged into the mainline. Gerrit is used by the Android Open Source Project and several other large open-source communities.
Phabricator, originally developed at Meta, provided a review workflow based on “diffs” (similar to Gerrit’s patches) and was widely used at Meta, Uber, and other companies before being sunset.
Google’s internal code review tool, Critique, enforces that every change to the monorepo is reviewed by at least one other engineer. Their research has found that code review catches about 15% of bugs that would otherwise reach production, but its primary value lies in knowledge sharing, maintaining code consistency, and ensuring that more than one person understands every change.
Chapter 3: Build Systems
A build system is the automated process that transforms source code into executable artifacts. At its simplest, a build system runs a compiler. At scale, it orchestrates thousands of compilation, linking, testing, and packaging steps across a distributed cluster of machines. This chapter progresses from foundational concepts through low-level tools (Make), abstraction-based systems (Maven, Gradle), and framework-driven systems (Bazel), culminating in artifact management practices.
3.1 Build System Fundamentals
3.1.1 The Dependency Graph
Every build system, at its core, constructs and traverses a dependency graph. Nodes represent artifacts (source files, object files, libraries, executables), and edges represent dependencies between them. A correct build executes steps in a topologically sorted order, ensuring that every input to a step is up to date before the step runs.
3.1.2 Incremental Builds
A naive build system recompiles everything from scratch every time. An incremental build system detects which outputs are out of date with respect to their inputs and only rebuilds those. The simplest heuristic is timestamp comparison (used by Make): if a source file is newer than its output, rebuild. More sophisticated systems use content hashing: compute a hash of all inputs (source files, compiler flags, environment variables) and rebuild only if the hash has changed.
3.1.3 Reproducibility
A build is reproducible if running it with the same inputs always produces the same outputs, regardless of when or where it is run. Reproducibility is critical for debugging (can you reproduce a production bug with the exact binary that is deployed?), security (can you verify that a binary was built from the claimed source?), and caching (can you safely reuse a cached output?).
Threats to reproducibility include timestamps embedded in binaries, non-deterministic linking order, floating dependency versions, and differences in compiler versions or system libraries. Hermetic build systems like Bazel address these threats by controlling the entire build environment.
3.1.4 Build vs. Task
An important conceptual distinction exists between build systems and task runners. A build system understands dependencies between artifacts and computes the minimal set of actions needed to produce an up-to-date output. A task runner simply executes a predefined sequence of commands. Many tools blur this line. Make is a build system that is often used as a task runner. npm scripts are a task runner that people sometimes use as a build system. Using the right tool for the right job matters at scale.
3.2 Low-Level Build Systems: Make
Make, created by Stuart Feldman at Bell Labs in 1976, is one of the oldest and most widely used build tools. It remains the standard build system for C and C++ projects on Unix-like systems and is the foundation on which many higher-level tools are built.
3.2.1 Makefile Syntax
A Makefile consists of rules, each of which has a target, prerequisites, and a recipe:
target: prerequisites
recipe
The recipe lines must be indented with a tab character (not spaces). This is perhaps the most notorious syntax decision in computing history.
A concrete example:
CC = gcc
CFLAGS = -Wall -O2
main: main.o utils.o
$(CC) $(CFLAGS) -o main main.o utils.o
main.o: main.c utils.h
$(CC) $(CFLAGS) -c main.c
utils.o: utils.c utils.h
$(CC) $(CFLAGS) -c utils.c
clean:
rm -f main *.o
3.2.2 Pattern Rules and Automatic Variables
Make supports pattern rules that use % as a wildcard:
%.o: %.c
$(CC) $(CFLAGS) -c $< -o $@
Here, $< is the first prerequisite, $@ is the target, and $^ would be all prerequisites. These automatic variables reduce repetition but make Makefiles harder to read for newcomers.
3.2.3 Phony Targets
A phony target is a target that does not correspond to a file. The clean target above is a classic example. Without declaring it phony, Make would skip the recipe if a file named clean happened to exist:
.PHONY: clean
clean:
rm -f main *.o
3.2.4 Limitations of Make
Make was designed for a world of C programs on a single Unix machine. Its limitations become apparent in modern software development:
- No built-in dependency discovery: Make does not understand the semantics of any programming language. The developer must manually declare that
main.odepends onutils.h. Forgetting a dependency leads to stale builds. - Timestamp-based invalidation: Make uses file modification times to determine staleness. This is fragile: touching a file without changing its contents triggers unnecessary rebuilds; resetting timestamps (e.g., after a
git checkout) can cause Make to skip needed rebuilds. - No native support for sandboxing or hermeticity: Make executes recipes as shell commands with full access to the file system and environment. Builds are not reproducible by default.
- Poor scalability: Make runs on a single machine. While
-jenables parallel execution of independent targets, there is no built-in support for distributed builds.
Despite these limitations, Make remains valuable for small to medium projects and is universally available. Many developers also use Make as a lightweight task runner, wrapping more complex build commands in a simple make build, make test, make deploy interface.
3.3 Abstraction-Based Build Systems: Maven and Gradle
As Java became dominant in enterprise software, the community needed build systems that understood Java’s conventions: source directory structures, classpath management, dependency resolution from remote repositories, and standard lifecycle phases.
3.3.1 Maven
Apache Maven, released in 2004, introduced several ideas that transformed build system design:
Convention over configuration: Maven defines a standard directory layout (
src/main/java,src/test/java, etc.) and a standard build lifecycle. If you follow the conventions, you need almost zero configuration.The POM (Project Object Model): Every Maven project is described by a
pom.xmlfile that specifies the project’s coordinates (groupId, artifactId, version), its dependencies, and any plugin configurations.Dependency resolution: Maven downloads dependencies from remote repositories (Maven Central, corporate Nexus instances) and manages transitive dependencies automatically. This was revolutionary in the early 2000s.
Build lifecycle: Maven defines a fixed sequence of phases:
validate,compile,test,package,verify,install,deploy. Plugins bind goals to phases. Runningmvn packageexecutes all phases up to and includingpackage.
Maven’s strength is its standardization. A developer who has seen one Maven project can navigate any Maven project. Its weakness is rigidity. Deviating from Maven’s conventions requires writing plugins, which is heavyweight. The XML-based POM files are verbose and hard to read at scale. Dependency conflicts (“dependency hell”) are a perennial challenge, particularly diamond dependencies where two libraries require incompatible versions of a third.
3.3.2 Gradle
Gradle, first released in 2012, aimed to combine Maven’s dependency management and convention-based approach with the flexibility of a real programming language. Build scripts are written in Groovy or Kotlin DSL, enabling conditional logic, loops, and custom task definitions without leaving the build file.
Key features of Gradle:
Task graph: Gradle builds a directed acyclic graph of tasks and executes them in dependency order. Unlike Maven’s fixed lifecycle, Gradle’s task graph is fully customizable.
Incremental builds: Gradle tracks the inputs and outputs of each task. If neither has changed, the task is skipped. This is more sophisticated than Make’s timestamp approach.
Build cache: Gradle can cache task outputs locally or in a shared remote cache. If another developer (or a CI server) has already built the same inputs, Gradle can reuse the output without re-executing the task.
Multi-project builds: Gradle handles monorepo-style multi-project builds natively, with fine-grained control over inter-project dependencies.
Gradle is the standard build system for Android development and is widely used in the JVM ecosystem. Its flexibility is both a strength and a weakness: Gradle build scripts can become complex, and debugging build logic written in Groovy can be challenging. The Kotlin DSL, now the default for new projects, provides better IDE support and type safety.
3.4 Framework-Driven Build Systems: Bazel
Bazel is Google’s open-source build system, derived from their internal system Blaze, which has been in use at Google since 2006. Bazel represents a fundamentally different philosophy from Make, Maven, or Gradle: it prioritizes correctness, reproducibility, and scalability above all else.
3.4.1 Core Principles
Hermeticity: Bazel builds are designed to be hermetic, meaning they depend only on their declared inputs and produce the same outputs regardless of the state of the host machine. Bazel achieves this by running build actions in sandboxes that restrict file system access.
Content-based invalidation: Bazel hashes the contents of all inputs (source files, compiler toolchain, build flags) to determine whether an action needs to be re-executed. This is strictly more correct than timestamp-based invalidation.
Language-agnostic: Bazel supports multiple languages (Java, C++, Go, Python, Rust, and many more) through a rule system. Rules define how to build artifacts in a particular language.
3.4.2 BUILD Files and Starlark
Bazel projects declare their build targets in BUILD files (or BUILD.bazel files). These files use Starlark, a restricted dialect of Python designed for configuration. Starlark is intentionally limited: no I/O, no global mutable state, no unbounded computation. This makes BUILD files analyzable and safe to execute.
A typical BUILD file:
java_library(
name = "user-service",
srcs = glob(["src/main/java/**/*.java"]),
deps = [
"//common/logging",
"//common/database",
"@maven//:com_google_guava_guava",
],
visibility = ["//services:__subpackages__"],
)
java_test(
name = "user-service-test",
srcs = glob(["src/test/java/**/*.java"]),
deps = [
":user-service",
"@maven//:junit_junit",
],
)
3.4.3 Remote Caching and Remote Execution
Bazel’s most powerful features for large-scale development are remote caching and remote execution:
Remote caching: Bazel can store the outputs of build actions in a shared cache (e.g., a cloud storage bucket or a dedicated cache server). When any developer or CI machine needs to build the same inputs, Bazel fetches the cached output instead of re-executing the action. This can reduce build times by an order of magnitude for large projects.
Remote execution: Bazel can distribute build actions across a cluster of worker machines. The Remote Execution API (an open standard developed alongside Bazel) defines a protocol for clients to submit actions to a remote execution service. The service manages a pool of worker machines, schedules actions, and returns results. This allows build parallelism to scale far beyond a single machine’s core count.
At BazelCon 2024, the Bazel team announced Bazel 8, a long-term support release bringing significant enhancements to modularity, performance, and dependency management. A key improvement is Build without the Bytes (BwoB), now enabled by default for remote execution builds. BwoB allows Bazel to download only the outputs of requested top-level targets instead of intermediate outputs, dramatically reducing network bandwidth and local disk usage for remote builds. The Bazel 9.0 release, planned for late 2025, includes support for asynchronous execution to further increase remote execution parallelism.
Google’s internal build system processes millions of builds executing millions of test cases and producing petabytes of build outputs from billions of lines of source code every day. The scale is staggering, and the remote execution architecture is what makes it possible.
3.4.4 Buck2 and Pants
Bazel is not the only framework-driven build system. Buck2 (Meta) is a Rust-based rewrite of the original Buck build system, designed for Meta’s monorepo and emphasizing performance and incrementality. Pants (originally developed at Twitter) targets Python, Go, Java, and Scala projects and provides a more gentle onboarding experience than Bazel for smaller teams.
3.5 Artifact Management
Once a build system produces artifacts (JAR files, Docker images, npm packages, binaries), those artifacts need to be stored, versioned, and distributed. Artifact management is the practice of doing this reliably.
3.5.1 Repository Managers
- JFrog Artifactory and Sonatype Nexus are the two dominant universal artifact repository managers. They can proxy remote repositories (Maven Central, npm registry, PyPI), host private artifacts, and enforce security policies.
- Container registries (Docker Hub, Amazon ECR, Google Container Registry, Harbor) store and distribute container images.
- Language-specific registries (npm for JavaScript, PyPI for Python, crates.io for Rust) serve as canonical distribution points for open-source packages.
3.5.2 Semantic Versioning
Semantic versioning (SemVer) provides a convention for version numbers: MAJOR.MINOR.PATCH. Incrementing MAJOR signals breaking changes, MINOR signals backward-compatible new features, and PATCH signals backward-compatible bug fixes. SemVer enables dependency managers to express compatibility constraints (e.g., ^1.2.3 means “any version >= 1.2.3 and < 2.0.0”).
In practice, SemVer is an imperfect social contract. Libraries sometimes introduce breaking changes in minor versions, and the definition of “breaking” can be surprisingly nuanced (Hyrum’s Law: “With a sufficient number of users of an API, it does not matter what you promise in the contract: all observable behaviors of your system will be depended on by somebody”).
3.5.3 Software Bills of Materials (SBOMs)
Modern supply chain security practices increasingly require SBOMs, machine-readable inventories of all components in a software artifact. Formats include SPDX and CycloneDX. SBOMs enable vulnerability scanning, license compliance, and provenance tracking. The SLSA (Supply-chain Levels for Software Artifacts) framework defines a maturity model for artifact integrity, from basic build provenance (SLSA Level 1) to hermetic, reproducible builds with signed provenance (SLSA Level 3+).
Chapter 4: Continuous Integration and Continuous Delivery
Continuous integration (CI) and continuous delivery (CD) are the practices of automatically building, testing, and preparing software for release every time a developer commits a change. Together, they form the backbone of modern software delivery pipelines. This chapter covers CI/CD principles, the architecture of CI servers, pipeline design patterns, and the role of testing.
4.1 Continuous Integration Principles
CI was popularized by Kent Beck as part of Extreme Programming in the late 1990s and codified by Martin Fowler in his influential 2006 article. The core principles are:
Maintain a single source repository: All developers work against the same codebase. This does not preclude branches, but branches should be short-lived and integrate frequently.
Automate the build: The build should be a single command that anyone can run. It should produce a deployable artifact.
Make the build self-testing: The build should include automated tests. If the tests fail, the build fails.
Everyone commits to the mainline every day: The longer you wait to integrate, the more painful it becomes. Frequent integration keeps merge conflicts small and catches regressions quickly.
Every commit triggers a build: An automated system should build and test every change within minutes of it being committed.
Fix broken builds immediately: A broken build is a stop-the-line event. The team’s top priority is to restore the build to a green state.
Keep the build fast: If the build takes too long, developers will stop waiting for it. A primary build should complete in under ten minutes.
The goal of CI is to provide rapid feedback. A developer should know within minutes whether their change integrates cleanly, compiles, and passes the test suite. This tight feedback loop is what enables the small batch sizes and frequent integration that the DORA research correlates with high performance.
4.2 CI Server Architectures
A CI server watches a repository for changes, triggers builds, runs tests, and reports results. The market has evolved from self-hosted monoliths to cloud-native, pipeline-as-code platforms.
4.2.1 Jenkins
Jenkins (originally Hudson, forked in 2011) is the granddaddy of CI servers. It is open source, self-hosted, and extensible through a massive plugin ecosystem (over 1,800 plugins). Jenkins can build almost anything but requires significant operational investment. You must manage the server, install and update plugins, configure security, and scale worker nodes. Jenkinsfile, introduced with Jenkins Pipeline, allows pipeline definitions to be stored in the repository alongside the code, which was a major improvement over the earlier UI-based job configuration.
4.2.2 GitHub Actions
GitHub Actions is a CI/CD platform integrated directly into GitHub. Workflows are defined in YAML files stored in .github/workflows/. Actions provides a marketplace of reusable workflow steps, built-in support for matrix builds (testing across multiple OS and language versions), and tight integration with GitHub’s pull request and deployment features. For teams already using GitHub, Actions reduces the operational overhead of running a separate CI server.
4.2.3 GitLab CI
GitLab CI is built into GitLab and configured via a .gitlab-ci.yml file at the root of the repository. It provides a complete DevOps platform (source control, CI/CD, container registry, monitoring) in a single application. Its Auto DevOps feature can automatically detect the project type and configure a sensible pipeline.
4.2.4 Buildkite and CircleCI
Buildkite takes a hybrid approach: the control plane is cloud-hosted, but agents (the machines that run builds) are self-hosted. This gives teams the scalability of a SaaS platform with the security and customizability of self-hosted agents. Shopify uses Buildkite to run builds across thousands of agents.
CircleCI is a cloud-native CI/CD platform known for its fast startup times, Docker-native workflows, and orbs (reusable configuration packages).
4.3 Pipeline Design
A delivery pipeline is the automated manifestation of your path to production. Good pipeline design balances speed, reliability, and cost.
4.3.1 Stages and Gates
A typical pipeline progresses through stages:
- Build: Compile the code and produce an artifact.
- Unit tests: Run fast, isolated tests.
- Integration tests: Test interactions between components.
- Security scanning: Static analysis (SAST), dependency vulnerability scanning (SCA), and secret detection.
- Staging deployment: Deploy to a production-like environment.
- End-to-end tests: Run tests against the staging environment.
- Production deployment: Deploy to production (possibly via a canary or blue-green strategy).
Each stage acts as a gate: if it fails, the pipeline stops and the change does not progress further. This ensures that only changes that pass all quality checks reach production.
4.3.2 Parallelism and Fan-In/Fan-Out
Pipeline stages that do not depend on each other can run in parallel. A common pattern is fan-out/fan-in: the build stage produces an artifact, then multiple test stages (unit tests, integration tests, security scanning, linting) run in parallel against that artifact, and a deployment stage fans in, waiting for all test stages to complete before proceeding.
4.3.3 Pipeline as Code
Modern CI/CD platforms define pipelines in configuration files stored in the repository. This is pipeline as code, and it provides the same benefits as infrastructure as code: version control, code review, auditability, and reproducibility. Examples include Jenkinsfile (Groovy-based), .github/workflows/*.yml (YAML), and .gitlab-ci.yml (YAML).
4.4 Continuous Delivery vs. Continuous Deployment
These terms are often confused. Continuous delivery means that every change that passes the pipeline is deployable, but deployment to production requires a manual approval step. Continuous deployment means that every change that passes the pipeline is automatically deployed to production without human intervention.
Continuous delivery is a prerequisite for continuous deployment. Most organizations practice continuous delivery; fewer practice continuous deployment. The choice depends on risk tolerance, regulatory requirements, and organizational culture. A hospital’s electronic health records system might use continuous delivery with manual gates for compliance reasons, while a consumer web application might use continuous deployment.
Release trains are a compromise used by some organizations: changes are batched and deployed on a fixed schedule (e.g., every Tuesday). This provides predictability but sacrifices the ability to ship urgent changes quickly (unless escape hatches like hotfix processes exist).
4.5 Testing in CI/CD
4.5.1 The Test Pyramid
The test pyramid, introduced by Mike Cohn, suggests that a healthy test suite has many fast, isolated unit tests at the base, fewer integration tests in the middle, and very few slow, expensive end-to-end tests at the top. The reasoning is economic: unit tests are cheap to write, fast to run, and precise in their failure messages. End-to-end tests are expensive, slow, and produce failures that are difficult to diagnose.
In practice, many organizations have an “ice cream cone” anti-pattern: too many end-to-end tests, too few unit tests. This leads to slow, flaky pipelines.
4.5.2 Flaky Tests
A flaky test is one that passes and fails intermittently without any change to the code under test. Flaky tests are a major source of waste in delivery pipelines. Developers lose trust in the test suite, start ignoring failures, and eventually bypass the pipeline entirely.
Google has invested heavily in flaky test management. Their approach includes automatically quarantining tests that flake above a threshold, tracking flakiness rates per test, and providing tooling for developers to identify and fix the root causes (often race conditions, dependency on test execution order, or sensitivity to timing).
4.5.3 Test Impact Analysis
Test impact analysis (TIA) is the practice of running only the tests that are affected by a given change, rather than the entire test suite. By analyzing code coverage data and the dependency graph, TIA systems can dramatically reduce CI run times. Microsoft’s TIA system reduced test execution time by 90% while still catching 99.9% of regressions.
4.6 Pipeline Security
Modern pipelines are both a security asset (they enforce quality gates) and a security target (they have access to production credentials). Key security practices include:
- Least-privilege CI credentials: CI jobs should have only the permissions they need. Avoid storing long-lived production credentials in CI; use short-lived tokens (e.g., OIDC federation with cloud providers).
- Signed commits and artifacts: Use GPG-signed commits and Sigstore/cosign for artifact signing to establish provenance.
- Dependency scanning: Tools like Dependabot, Renovate, and Snyk automatically detect known vulnerabilities in dependencies and open pull requests with updates.
Chapter 5: Waste and Acceleration in Delivery Pipelines
Delivery pipelines are not free. They consume compute resources, developer time, and organizational attention. This chapter applies lean thinking to software delivery, identifying sources of waste and techniques for acceleration.
5.1 Lean Principles Applied to Software Delivery
Lean manufacturing, pioneered by Toyota, aims to maximize customer value while minimizing waste. The lean philosophy has been adapted to software development by Mary and Tom Poppendieck in Lean Software Development and to delivery pipelines by the DevOps movement.
A value stream map traces the journey of a change from idea to production, identifying every step, handoff, and wait time. In many organizations, the actual work (writing code, running tests) takes hours, but the elapsed time from commit to production takes days or weeks because of queues, approvals, and manual steps.
5.2 Types of Waste in Delivery Pipelines
Lean identifies several categories of waste. Applied to delivery pipelines:
Waiting: A developer’s pull request sits in a queue for code review for two days. The CI pipeline queues for 30 minutes because all runners are busy. A deployment waits for a change advisory board that meets weekly.
Handoffs: The code is written by one team, reviewed by another, deployed by a third, and monitored by a fourth. Each handoff introduces delay, context loss, and potential miscommunication.
Rework: A bug is discovered in production that should have been caught by a test. A deployment fails because the staging environment does not match production. A PR is rejected after days of review because it does not meet architectural standards that were not communicated.
Overprocessing: Running the entire test suite (including irrelevant end-to-end tests) on every commit. Requiring three levels of manual approval for a one-line documentation change. Maintaining elaborate branching strategies that add ceremony without value.
Context switching: A developer is interrupted to investigate a flaky test failure, loses 20 minutes of context on their current task, and then discovers the failure is unrelated to their change.
Partially done work: Long-lived feature branches represent inventory, work that has been started but not delivered. The value of the code is zero until it reaches users.
5.3 Acceleration Techniques
5.3.1 Build Caching
Both local and remote build caches can dramatically reduce build times. If a developer’s change touches only one module of a multi-module project, a build cache ensures that only that module is rebuilt. Gradle’s build cache, Bazel’s remote cache, and tools like Turborepo for JavaScript monorepos all implement this pattern. Organizations report 60-90% reductions in average build times after implementing remote caching.
5.3.2 Test Parallelization and Sharding
Splitting a test suite across multiple machines (sharding) reduces wall-clock test time. If your test suite takes 60 minutes on one machine, running it on 10 machines can reduce it to roughly 6 minutes (plus overhead for distribution and result aggregation). Tools like Buildkite’s parallelism feature, CircleCI’s test splitting, and Bazel’s built-in test sharding make this straightforward.
5.3.3 Incremental Analysis
Rather than running all linters and static analysis tools on the entire codebase, incremental analysis tools operate only on the files changed in a given commit. Facebook’s Infer, for example, can run in “diff mode,” analyzing only changed files and their transitive dependents.
5.3.4 Merge Queues
A merge queue (available in GitHub, GitLab, and Mergify) serializes merges to the mainline, running CI on each change rebased on top of the previous one. This prevents the “semantic merge conflict” problem where two changes each pass CI independently but fail when combined. Merge queues also batch changes together when possible, running a single CI pipeline for multiple merged changes.
5.3.5 Ephemeral Environments
Spinning up a fresh environment for each pull request (using containers or cloud infrastructure) eliminates the bottleneck of shared staging environments. Services like Vercel Preview Deployments, Netlify Deploy Previews, and custom Kubernetes-based systems create these environments automatically.
5.4 Case Studies of Pipeline Optimization
5.4.1 Shopify
Shopify’s monolith, one of the largest Ruby on Rails applications in the world, had a CI pipeline that took over 30 minutes. They invested in test parallelization, intelligent test selection (running only tests affected by the change), and build caching. The result was a CI pipeline that completes in under 10 minutes, enabling their thousands of developers to maintain a rapid commit-to-production cycle.
5.4.2 Google
Google’s TAP (Test Automation Platform) runs millions of tests per day. To manage this scale, Google heavily invests in test impact analysis (only running affected tests), aggressive caching and reuse of test results, and prioritization of tests based on historical failure rates. Their target is that every commit receives feedback within minutes, despite the enormous scale of the codebase.
5.4.3 Netflix
Netflix’s pipeline is designed for maximum velocity. Changes flow from commit through automated testing and canary analysis to production in hours, not days. Their CI system integrates with Spinnaker (their open-source continuous delivery platform) for automated canary deployments, where a small percentage of traffic is routed to the new version and metrics are compared against the baseline before full rollout.
Chapter 6: Containerization and Container Orchestration
Containers have fundamentally changed how software is packaged, distributed, and run. This chapter covers the Linux kernel features that make containers possible, Docker as the dominant container runtime, and Kubernetes as the standard orchestration platform.
6.1 Container Fundamentals
A container is a lightweight, isolated execution environment that shares the host operating system’s kernel. Unlike virtual machines, which include a full guest OS, containers package only the application and its dependencies. This makes them fast to start (milliseconds, not minutes), efficient in resource usage, and highly portable.
6.1.1 Linux Namespaces
Containers are built on Linux namespaces, a kernel feature that partitions system resources so that each partition appears to be an independent system. The key namespaces are:
| Namespace | Isolates |
|---|---|
| PID | Process IDs (each container sees its own PID 1) |
| NET | Network interfaces, routing tables, firewall rules |
| MNT | Mount points (each container has its own filesystem view) |
| UTS | Hostname and domain name |
| IPC | Inter-process communication resources |
| USER | User and group IDs (allows root inside container without root on host) |
6.1.2 Cgroups
Control groups (cgroups) limit and account for the resource usage (CPU, memory, I/O, network) of a group of processes. While namespaces provide isolation (what a process can see), cgroups provide resource control (how much a process can use). Together, they create the illusion of a dedicated machine.
6.1.3 Union Filesystems
Container images are composed of layers. Each layer represents a set of filesystem changes (files added, modified, or deleted). Layers are stacked using a union filesystem (e.g., OverlayFS) that presents them as a single coherent filesystem. This layering has two major benefits:
- Sharing: Multiple containers can share common base layers (e.g., an Ubuntu base image), reducing storage and download time.
- Caching: When building an image, unchanged layers are cached and reused, making rebuilds fast.
6.1.4 The OCI Specification
The Open Container Initiative (OCI) defines open standards for container image formats and runtime specifications. This ensures interoperability: an image built with Docker can run on any OCI-compliant runtime (containerd, CRI-O, Podman).
6.2 Docker
Docker, released in 2013, did not invent containers (LXC, Solaris Zones, and FreeBSD Jails predate it by years). Docker’s contribution was making containers accessible by providing a user-friendly CLI, a standardized image format, and a public registry (Docker Hub).
6.2.1 Dockerfiles
A Dockerfile is a text file that describes how to build a container image, layer by layer:
FROM node:20-alpine
WORKDIR /app
COPY package*.json ./
RUN npm ci --production
COPY . .
EXPOSE 3000
CMD ["node", "server.js"]
Each instruction (FROM, RUN, COPY, etc.) creates a new layer. The order of instructions matters for caching: instructions that change frequently (like COPY . .) should come last so that earlier layers can be cached.
6.2.2 Multi-Stage Builds
Multi-stage builds use multiple FROM statements to create intermediate build environments and copy only the final artifacts into the production image:
FROM golang:1.22 AS builder
WORKDIR /app
COPY . .
RUN CGO_ENABLED=0 go build -o server .
FROM alpine:3.19
COPY --from=builder /app/server /server
CMD ["/server"]
This produces a minimal production image that contains only the compiled binary and the Alpine base, not the Go compiler, source code, or build dependencies. Multi-stage builds are essential for keeping images small (which reduces attack surface, download time, and storage cost).
6.2.3 Image Optimization and Security
Best practices for production Docker images:
- Use minimal base images: Alpine Linux (~5MB), distroless images (Google’s images that contain only the application runtime, no shell or package manager), or scratch (an empty image, for fully static binaries).
- Run as non-root: The
USERinstruction should specify a non-root user. Running as root inside a container is a security risk because container escapes can grant host-level root access. - Scan for vulnerabilities: Tools like Trivy, Snyk Container, and Docker Scout scan images for known CVEs in OS packages and application dependencies.
- Pin versions: Use specific image tags (e.g.,
node:20.11.0-alpine) rather than mutable tags likelatestto ensure reproducibility.
6.3 Container Registries
A container registry stores and distributes container images. Teams need registries for both public and private images:
- Docker Hub: The default public registry. Free for public images, with rate limits on pulls.
- Amazon ECR, Google Artifact Registry, Azure Container Registry: Cloud-provider registries tightly integrated with their respective platforms.
- Harbor: An open-source registry with enterprise features like vulnerability scanning, image signing, and replication.
- GitHub Container Registry (ghcr.io): Integrated with GitHub, useful for open-source projects.
6.4 Kubernetes Fundamentals
Kubernetes (K8s), originally developed at Google and open-sourced in 2014, is the dominant container orchestration platform. It automates the deployment, scaling, and management of containerized applications.
6.4.1 Architecture
A Kubernetes cluster consists of a control plane and worker nodes:
- Control plane: Runs the API server (the cluster’s front door), etcd (a distributed key-value store that holds all cluster state), the scheduler (assigns pods to nodes), and controller managers (reconciliation loops that ensure desired state matches actual state).
- Worker nodes: Run the kubelet (an agent that manages pods on the node), a container runtime (containerd), and kube-proxy (manages networking rules).
6.4.2 Core Resources
| Resource | Purpose |
|---|---|
| Pod | The smallest deployable unit; one or more containers sharing network and storage |
| Deployment | Declares a desired state for pods (replica count, image version) and manages rollouts |
| Service | A stable network endpoint that load-balances traffic across pods |
| ConfigMap | Stores non-sensitive configuration data as key-value pairs |
| Secret | Stores sensitive data (passwords, tokens) with base64 encoding (not encryption by default) |
| StatefulSet | Manages stateful applications with stable network identities and persistent storage |
| Ingress | Manages external HTTP/HTTPS access to services |
6.4.3 Declarative Configuration
Kubernetes is declarative: you describe the desired state of the system (e.g., “I want 3 replicas of my web server running version 2.1”), and Kubernetes continuously reconciles the actual state to match. If a pod crashes, the controller restarts it. If a node fails, the scheduler moves pods to healthy nodes. This reconciliation loop is the heart of Kubernetes’ reliability.
6.4.4 Kubernetes Networking and Service Mesh
Every pod in a Kubernetes cluster gets its own IP address. Pods can communicate with each other directly, without NAT. Services provide stable virtual IPs that route traffic to the appropriate pods.
For more sophisticated networking requirements (mutual TLS, traffic shaping, observability, circuit breaking), organizations deploy a service mesh:
- Istio: The most feature-rich service mesh, providing traffic management, security (mutual TLS), and observability through sidecar proxies (Envoy).
- Linkerd: A lighter-weight alternative to Istio, focused on simplicity and performance.
6.4.5 Helm and GitOps
Helm is a package manager for Kubernetes. Helm charts are templated Kubernetes manifests that can be parameterized and versioned, making it easier to deploy complex applications and manage configuration across environments.
GitOps extends the “everything as code” principle to Kubernetes operations. The desired state of the cluster is stored in a Git repository, and a GitOps operator continuously syncs the cluster to match:
- ArgoCD: Watches a Git repository for changes and automatically applies them to the cluster. Provides a web UI for visualizing the state of deployments.
- Flux: A CNCF project that provides GitOps for Kubernetes, with support for Helm charts, Kustomize, and plain manifests.
GitOps provides an audit trail (every change is a Git commit), rollback capability (revert the Git commit), and a familiar workflow (pull requests for infrastructure changes).
Chapter 7: Infrastructure as Code
Infrastructure as Code (IaC) is the practice of managing and provisioning computing infrastructure through machine-readable configuration files rather than interactive configuration tools or manual processes. This chapter covers IaC principles, major tools, testing strategies, and the GitOps workflow.
7.1 IaC Principles
7.1.1 Declarative vs. Imperative
- Declarative IaC describes the desired end state (“I want a virtual machine with 4 CPUs, 16 GB RAM, running Ubuntu 22.04 in us-east-1”), and the tool figures out how to achieve it. Terraform and Kubernetes manifests are declarative.
- Imperative IaC describes the steps to reach the desired state (“Create a VM, then configure its network, then install these packages”). Bash scripts and some Ansible playbooks are imperative.
Declarative IaC is generally preferred because it is idempotent: running the same configuration multiple times produces the same result without side effects. If the infrastructure already matches the desired state, nothing happens.
7.1.2 Idempotency
An operation is idempotent if performing it multiple times has the same effect as performing it once. IaC tools must be idempotent to be safe. If you accidentally run your IaC tool twice, it should not create duplicate resources. This property is easy to achieve with declarative tools (compare desired state to actual state, compute the diff, apply only the changes) but requires discipline with imperative tools.
7.1.3 Immutable Infrastructure
Immutable infrastructure is the practice of never modifying a running server. Instead of SSHing into a server to install a security patch, you build a new image with the patch applied, deploy new servers from that image, and decommission the old servers. This eliminates configuration drift (the slow divergence between servers that were ostensibly configured identically) and makes rollback trivial (just redeploy the previous image).
Netflix pioneered this approach with their “Bake and Deploy” model: AMIs (Amazon Machine Images) are baked with all application code and configuration, and deployments simply swap out the old AMI for the new one.
7.2 Terraform
Terraform, created by HashiCorp and first released in 2014, is the most widely adopted IaC tool. It manages infrastructure across hundreds of cloud providers and services through a plugin system.
7.2.1 HCL (HashiCorp Configuration Language)
Terraform configurations are written in HCL, a declarative language designed for infrastructure:
provider "aws" {
region = "us-east-1"
}
resource "aws_instance" "web" {
ami = "ami-0c55b159cbfafe1f0"
instance_type = "t3.micro"
tags = {
Name = "web-server"
Environment = "production"
}
}
resource "aws_security_group" "web_sg" {
name = "web-sg"
ingress {
from_port = 443
to_port = 443
protocol = "tcp"
cidr_blocks = ["0.0.0.0/0"]
}
}
7.2.2 The Terraform Workflow
Terraform follows a plan-and-apply workflow:
terraform init: Downloads provider plugins and initializes the backend.terraform plan: Compares the desired state (configuration files) to the actual state (recorded in the state file and queried from the cloud provider) and produces an execution plan showing what will be created, modified, or destroyed.terraform apply: Executes the plan after human confirmation.
This workflow provides a critical safety net: you can review exactly what Terraform will do before it does it. In a CI/CD pipeline, terraform plan runs on every pull request, and terraform apply runs after merge with appropriate approvals.
7.2.3 State Management
Terraform records the current state of managed infrastructure in a state file. This file is the source of truth for what Terraform has created. Key considerations:
- Remote state: In a team environment, the state file must be stored remotely (S3 + DynamoDB locking, Terraform Cloud, or a similar backend) so that team members do not overwrite each other’s changes.
- State locking: Concurrent
terraform applyoperations on the same state can corrupt it. Remote backends provide locking to prevent this. - State sensitivity: The state file may contain sensitive values (database passwords, API keys). It should be encrypted at rest and access-controlled.
7.2.4 Modules and Workspaces
Modules are reusable, composable units of Terraform configuration. A module encapsulates a set of related resources (e.g., a “VPC module” that creates a VPC, subnets, route tables, and NAT gateways) with input variables and output values. Organizations build internal module registries to enforce standardized infrastructure patterns.
Workspaces provide a mechanism for managing multiple environments (dev, staging, production) from the same configuration, each with its own state file. However, many teams prefer separate configuration directories per environment for greater isolation.
7.3 Configuration Management
While Terraform excels at provisioning infrastructure (creating VMs, databases, load balancers), configuration management tools handle what happens inside a server: installing packages, writing configuration files, managing services.
7.3.1 Ansible
Ansible (Red Hat) uses push-based configuration management. A control machine SSHes into target servers and executes tasks defined in YAML playbooks. No agent is required on the target. Ansible’s simplicity (just SSH and Python) makes it popular for smaller environments and for tasks that do not justify the complexity of a full configuration management system.
7.3.2 Chef and Puppet
Chef and Puppet use pull-based models. An agent runs on each server, periodically checking in with a central server to retrieve its desired configuration and converging the local state to match. This model scales well (the servers pull their own configuration rather than being pushed to) but requires more infrastructure to operate.
7.3.3 The Shift Toward Immutable Infrastructure
In modern cloud-native architectures, configuration management is increasingly replaced by immutable infrastructure. Rather than configuring a running server, you bake the configuration into the image (using Packer, Docker, or cloud-native image builders) and deploy new instances. Configuration management tools still have a role in organizations with legacy infrastructure or in building those images, but the trend is clearly toward immutability.
7.4 Cloud-Native IaC
7.4.1 AWS CDK
The AWS Cloud Development Kit (CDK) allows developers to define infrastructure using general-purpose programming languages (TypeScript, Python, Java, Go, C#). CDK synthesizes CloudFormation templates from the code and deploys them. The advantage is that developers can use familiar programming constructs (loops, conditionals, functions, classes) and benefit from IDE support (autocompletion, type checking).
7.4.2 Pulumi
Pulumi takes a similar approach but is cloud-agnostic, supporting AWS, Azure, GCP, and Kubernetes. Pulumi programs are written in TypeScript, Python, Go, C#, or Java and interact with a state management backend similar to Terraform’s.
The programming-language-based approach to IaC is gaining traction because it eliminates the need to learn a domain-specific language (HCL, CloudFormation YAML) and allows infrastructure definitions to be tested with standard testing frameworks.
7.5 IaC Testing and Validation
Infrastructure code should be tested like application code. Levels of testing include:
Static analysis: Tools like
tflint,checkov, andtfsecanalyze Terraform configurations for errors, security misconfigurations, and best-practice violations without actually provisioning anything.Policy as code: Open Policy Agent (OPA) and HashiCorp Sentinel allow organizations to define policies (e.g., “all S3 buckets must have encryption enabled,” “no resources may be created outside approved regions”) that are evaluated against Terraform plans before
applyruns.Unit tests: Terratest (a Go library by Gruntwork) provisions real infrastructure in a test account, validates that it behaves correctly, and tears it down. While slower than static analysis, this catches issues that static tools miss.
Integration tests: Testing the interaction between multiple infrastructure components (e.g., verifying that a load balancer correctly routes traffic to instances in an auto-scaling group).
7.6 GitOps Workflow for Infrastructure
In a mature GitOps workflow for infrastructure:
- A developer proposes an infrastructure change by opening a pull request that modifies Terraform configuration.
- CI automatically runs
terraform planand posts the plan as a comment on the PR. - Reviewers examine both the code change and the plan output.
- After approval and merge, CI runs
terraform apply. - A drift detection job periodically runs
terraform planagainst the live infrastructure to detect manual changes (drift) and alerts the team.
This workflow provides auditability (every change is a Git commit with a reviewer), safety (plan before apply), and accountability. Tools like Atlantis, Spacelift, and Terraform Cloud automate this workflow.
7.7 Multi-Cloud and Platform Engineering
Large organizations often operate across multiple cloud providers, either by choice (avoiding vendor lock-in, regulatory requirements) or by circumstance (mergers, acquisitions). Terraform’s provider model makes multi-cloud management feasible from a single toolchain, though abstractions across providers inevitably leak.
Platform engineering teams build internal developer platforms (IDPs) that abstract away infrastructure complexity. Instead of writing Terraform directly, a developer might click “create a new service” in a portal that generates a standardized Terraform configuration, provisions a CI/CD pipeline, sets up monitoring, and configures DNS. This is the promise of platform engineering: make the paved road so smooth that developers naturally follow it.
Chapter 8: Deployment Strategies and Resiliency
Getting code to production is only half the battle. It must also stay running, recover gracefully from failures, and degrade under load rather than collapse. This chapter covers deployment strategies that minimize risk, resilience patterns that prevent cascading failures, and observability practices that enable rapid diagnosis.
8.1 Deployment Strategies
8.1.1 Rolling Updates
A rolling update gradually replaces old instances with new ones. At any point during the rollout, some instances are running the old version and some are running the new version. Kubernetes Deployments implement rolling updates by default, with configurable parameters for the maximum number of unavailable pods and the maximum number of extra pods during the transition.
Rolling updates are simple and work well when the old and new versions are compatible (both versions may serve traffic simultaneously). They can be slow for large deployments and do not provide an easy rollback mechanism beyond “roll forward to the old version.”
8.1.2 Blue-Green Deployments
In a blue-green deployment, two identical production environments exist: blue (the current live environment) and green (the idle environment). A new version is deployed to the idle environment, tested, and then traffic is switched from blue to green by updating the load balancer or DNS. Rollback is instantaneous: switch traffic back to the original environment.
Blue-green deployments require double the infrastructure (both environments must be fully provisioned). They work best for stateless applications where the switch can be atomic. Database schema changes complicate blue-green deployments because both versions may need to read from the same database.
8.1.3 Canary Releases
A canary release routes a small percentage of production traffic (e.g., 1-5%) to the new version while the majority continues to be served by the old version. The new version is monitored for errors, latency increases, and other anomalies. If the canary looks healthy, traffic is gradually shifted until 100% of traffic is on the new version. If problems are detected, the canary is killed and all traffic returns to the old version.
Netflix’s Spinnaker automates canary analysis by comparing metrics between the canary and a baseline (a fresh deployment of the old version receiving the same percentage of traffic) using statistical tests. Kayenta, Spinnaker’s canary analysis engine, produces a score that determines whether the canary should be promoted or rolled back.
8.1.4 Feature Flags
Feature flags (also called feature toggles) decouple deployment from release. Code is deployed to production with the feature hidden behind a flag. The flag can be turned on for specific users, a percentage of traffic, or specific regions without a new deployment. This allows:
- Dark launches: Deploy a feature and enable it for internal users or a small beta group before general availability.
- Kill switches: Instantly disable a problematic feature without a rollback.
- A/B testing: Route different user segments to different feature variants.
Feature flag management platforms (LaunchDarkly, Unleash, Flagsmith, Split) provide SDKs, targeting rules, and audit logs. The discipline required is to clean up stale flags; a codebase littered with abandoned flags becomes a maintenance burden.
8.2 Chaos Engineering
Chaos engineering is the discipline of experimenting on a distributed system to build confidence in the system’s ability to withstand turbulent conditions in production. It was pioneered by Netflix, who reasoned that the best way to know if your system can survive failure is to actually cause failures, in production, during business hours.
8.2.1 The Simian Army
Netflix’s suite of chaos tools, collectively known as the Simian Army, includes:
- Chaos Monkey: Randomly terminates production instances to ensure that services can tolerate instance failure.
- Chaos Kong: Simulates the failure of an entire AWS region to test multi-region failover.
- Latency Monkey: Introduces artificial latency into service-to-service calls to test timeout and retry behavior.
The results have been dramatic. Netflix reports that Chaos Monkey and related practices reduced their Mean Time to Recovery by approximately 65%, shifting incident resolution from hours to minutes. By 2025, Netflix serves over 300 million paid subscribers globally with remarkable uptime, a testament to the effectiveness of proactive failure testing.
8.2.2 Chaos Engineering Principles
The Chaos Engineering community has codified several principles:
- Start with a hypothesis: Define what “steady state” looks like (e.g., request success rate above 99.9%) and hypothesize that the system will maintain steady state during the experiment.
- Vary real-world events: Simulate failures that actually happen: instance crashes, network partitions, disk full, DNS failures, clock skew.
- Run experiments in production: Staging environments do not replicate production’s complexity, traffic patterns, or data. (Start with staging if you are new to chaos engineering, but the goal is production.)
- Automate experiments to run continuously: A one-time experiment proves resilience at one point in time. Continuous experimentation catches regressions.
- Minimize blast radius: Start small. Kill one instance, not ten. Use circuit breakers and abort conditions.
8.2.3 Tools Beyond Netflix
- Gremlin: A commercial chaos engineering platform that provides a controlled way to inject failures (CPU pressure, memory exhaustion, network blackhole, DNS failure) into servers, containers, and Kubernetes pods.
- LitmusChaos: A CNCF project for Kubernetes-native chaos engineering.
- AWS Fault Injection Simulator: A managed service for running chaos experiments on AWS infrastructure.
Recent developments in 2025 show chaos engineering tools increasingly incorporating artificial intelligence for proactive resilience testing, including failure prediction, intelligent selection of experiments, and automated validation of cloud resilience.
8.3 Resilience Patterns
Michael Nygard’s Release It! and the broader distributed systems literature provide patterns for building resilient services:
8.3.1 Circuit Breakers
A circuit breaker wraps a remote call and monitors for failures. When failures exceed a threshold, the circuit “opens” and subsequent calls fail immediately (or return a fallback response) without attempting the remote call. After a timeout, the circuit enters a “half-open” state and allows a few test calls. If they succeed, the circuit closes; if they fail, it opens again.
Circuit breakers prevent cascading failures: without them, a slow or failing downstream service causes callers to pile up threads and connections, eventually failing themselves, propagating the failure upstream.
Libraries: Netflix Hystrix (now in maintenance mode), Resilience4j (its successor), Polly (.NET), Sentinel (Java by Alibaba).
8.3.2 Bulkheads
A bulkhead isolates components so that a failure in one does not bring down the others. Named after the watertight compartments in a ship, bulkheads in software take forms like:
- Thread pool isolation: Each downstream dependency gets its own thread pool. If one dependency slows down, only its pool is exhausted; other calls are unaffected.
- Process isolation: Running components in separate processes or containers.
- Data isolation: Using separate databases or schemas for different services.
8.3.3 Retries with Backoff and Jitter
Retrying failed requests is essential for handling transient failures, but naive retries can cause thundering herd problems: if a service goes down and 1,000 clients simultaneously retry, the service is overwhelmed when it comes back up. Exponential backoff (doubling the wait time between retries) with jitter (adding a random component to the wait time) spreads out retries and prevents synchronized spikes.
8.3.4 Timeouts
Every remote call should have a timeout. Without timeouts, a slow downstream service causes callers to hang indefinitely, consuming resources and eventually failing. Timeouts should be set based on measured latency percentiles (e.g., set the timeout to the 99.9th percentile of normal response times).
8.4 Observability
Observability is the ability to understand the internal state of a system from its external outputs. In modern distributed systems, observability is built on three pillars:
8.4.1 Metrics
Metrics are numerical measurements collected at regular intervals. They answer questions like “What is the request rate?”, “What is the 99th percentile latency?”, “How much memory is the service using?”
- Prometheus is the de facto standard for metrics collection in cloud-native environments. It uses a pull-based model (scraping metrics endpoints at regular intervals) and provides a powerful query language (PromQL).
- Grafana provides visualization and dashboarding for Prometheus and other data sources.
The RED method (Rate, Errors, Duration) provides a starting point for service-level metrics. The USE method (Utilization, Saturation, Errors) is useful for infrastructure-level metrics.
8.4.2 Logs
Logs are discrete, timestamped records of events. Structured logging (JSON format with consistent fields) makes logs machine-parseable and queryable. Centralized log aggregation (ELK stack: Elasticsearch, Logstash, Kibana; or Grafana Loki) enables searching across all services.
8.4.3 Traces
Distributed traces follow a request as it traverses multiple services. Each service adds a span (a named, timed operation) to the trace. The assembled trace shows the full path of a request, where time was spent, and where errors occurred.
OpenTelemetry is the CNCF project that provides a unified standard for metrics, logs, and traces. It offers SDKs for instrumenting applications and a collector for processing and exporting telemetry data to backends like Jaeger, Zipkin, Datadog, or Grafana Tempo.
8.5 Incident Management and Post-Mortems
8.5.1 SLOs, SLIs, and SLAs
Google’s SRE book formalizes the relationship between reliability targets and business objectives:
- Service Level Indicator (SLI): A quantitative measure of service behavior (e.g., the proportion of requests that complete in less than 200ms).
- Service Level Objective (SLO): A target value for an SLI (e.g., 99.9% of requests complete in less than 200ms, measured over a 30-day rolling window).
- Service Level Agreement (SLA): A contractual commitment with consequences for missing the SLO (e.g., refunds, credits).
Error budgets operationalize SLOs. If your SLO is 99.9% availability, your error budget is 0.1%, approximately 43 minutes of downtime per month. When the error budget is healthy, the team ships fast. When the error budget is nearly exhausted, the team slows down and focuses on reliability.
8.5.2 Blameless Post-Mortems
When incidents occur, high-performing organizations conduct blameless post-mortems. The goal is to understand what happened, why it happened, and what can be done to prevent similar incidents, without assigning personal blame. The reasoning is that blame discourages honest reporting and learning.
A post-mortem document typically includes:
- A timeline of events
- The impact (duration, affected users, revenue impact)
- Root cause analysis (often using the “5 Whys” technique)
- Action items (with owners and deadlines)
- Lessons learned
Google, Atlassian, PagerDuty, and other companies publish post-mortem templates and encourage sharing post-mortems widely within the organization.
Chapter 9: Experimentation and Progressive Delivery
Deploying software is not the end of the story. High-performing organizations use experimentation to make data-driven decisions about what features to ship, how to configure them, and when to roll back. This chapter covers A/B testing, feature flag management, progressive delivery, and experimentation platforms.
9.1 A/B Testing Fundamentals
An A/B test (also called a randomized controlled experiment) compares two or more variants of a system to determine which performs better on a defined metric.
9.1.1 Experimental Design
- Hypothesis: State what you expect. (“Changing the checkout button from gray to green will increase the conversion rate by at least 2%.”)
- Control and treatment: Users are randomly assigned to the control group (existing version) or the treatment group (new variant).
- Metrics: Define primary (conversion rate), secondary (revenue per user), and guardrail metrics (page load time, error rate). Guardrail metrics ensure that an improvement in the primary metric does not come at the cost of degrading something else.
- Sample size and duration: Statistical power analysis determines how many users and how long you need to run the experiment to detect the expected effect size with confidence. Running an experiment for too short a time risks missing a real effect (Type II error); stopping early when results look good inflates the false positive rate (Type I error, also called “peeking”).
- Statistical significance: Typically, results are considered significant if the p-value is below 0.05 (a 5% probability that the observed difference is due to chance). More sophisticated organizations use Bayesian methods or sequential testing to enable earlier stopping.
9.1.2 Common Pitfalls
- Selection bias: Non-random assignment corrupts results. Ensure the randomization unit (usually user ID) produces balanced groups.
- Novelty effects: Users may interact differently with a new feature simply because it is new, not because it is better. Run experiments long enough for the novelty to wear off.
- Multiple testing: Running many simultaneous experiments increases the chance of false positives. Correction methods (Bonferroni, Benjamini-Hochberg) adjust significance thresholds.
- Interference: In social products, a user’s behavior depends on what their friends see. If the control and treatment groups interact, the treatment effect may leak.
9.2 Feature Flags and Feature Management
We discussed feature flags briefly in the deployment strategies section. Here we examine them as the enabling mechanism for experimentation and progressive delivery.
A feature flag is a conditional branch in code that checks a configuration value (the flag) to determine which code path to execute. In its simplest form:
if feature_flags.is_enabled("new_checkout_flow", user_id=user.id):
return new_checkout_flow(request)
else:
return old_checkout_flow(request)
Feature management platforms (LaunchDarkly, Unleash, Flagsmith, Split) provide:
- Targeting rules: Enable a feature for specific users, user segments (e.g., beta testers, employees, users in a specific region), or a percentage of traffic.
- Gradual rollouts: Ramp a feature from 0% to 100% over hours or days.
- Kill switches: Instantly disable a feature if it causes problems.
- Audit logs: Track who changed a flag, when, and why.
- Integration with experimentation: Assign users to experiment variants via flags.
The discipline of feature flag management includes regular cleanup of stale flags (flags that are fully rolled out or abandoned) and treating flag configurations with the same rigor as code (review, version control, testing).
9.3 Progressive Delivery
Progressive delivery is an umbrella term for deployment strategies that gradually expose new code to users while monitoring for problems. It encompasses canary releases, A/B testing, and automated rollback, unified by the principle that you should never expose 100% of your users to untested code.
A progressive delivery pipeline might look like:
- Deploy the new version alongside the old version.
- Route 1% of traffic to the new version (canary).
- Monitor canary metrics for 15 minutes.
- If healthy, increase to 5%, then 25%, then 50%, then 100%.
- At any stage, if error rates or latency exceed thresholds, automatically roll back to the old version.
Tools like Flagger (for Kubernetes), Argo Rollouts, and Netflix’s Spinnaker/Kayenta automate this process.
9.4 Experimentation Platforms at Scale
9.4.1 Google
Google’s experimentation platform supports overlapping experiments: a single user can be in multiple experiments simultaneously without the experiments interfering with each other. This is achieved by partitioning the traffic space into orthogonal “layers,” each controlling a different aspect of the product. Google runs thousands of simultaneous experiments on Search alone.
9.4.2 Netflix
Netflix runs experiments on nearly every aspect of its product, from recommendation algorithms to the size of thumbnail images to the wording of error messages. Their experimentation platform, which evolved from the open-source Netflix Zuul and their internal A/B testing infrastructure, integrates tightly with their deployment pipeline so that deploying a new feature and assigning it to an experiment are a single workflow.
9.4.3 Microsoft
Microsoft’s ExP (Experimentation Platform) is one of the largest in the world, running thousands of concurrent experiments across products like Bing, Office, and Xbox. Their research has demonstrated that most ideas (roughly two-thirds) do not improve the metrics they target, underscoring the importance of testing rather than relying on intuition.
9.5 Data-Driven Deployment Decisions
The convergence of experimentation and deployment creates a powerful feedback loop:
- A developer proposes a change.
- The change is deployed behind a feature flag.
- An experiment is created, routing a percentage of traffic to the new variant.
- Automated analysis determines whether the change improves the target metrics without degrading guardrail metrics.
- If successful, the change is fully rolled out and the flag is removed.
- If unsuccessful, the change is reverted and the team learns from the data.
This process replaces HiPPO (Highest Paid Person’s Opinion) decision-making with evidence. It requires investment in experimentation infrastructure, metric pipelines, and a culture that values data over intuition.
Chapter 10: The Modern Software Delivery Landscape
The field of software delivery continues to evolve rapidly. This final chapter surveys emerging trends and technologies that are shaping the future of how we build, ship, and run software.
10.1 Platform Engineering and Internal Developer Platforms
Platform engineering is the discipline of building and maintaining internal developer platforms (IDPs) that provide self-service capabilities to development teams. Rather than each team managing its own infrastructure, CI/CD pipelines, and monitoring, a platform team builds a paved road that other teams follow.
10.1.1 The Rise of Platform Engineering
The 2024 State of Platform Engineering report found that over 65% of enterprises have either built or adopted an internal developer platform. Companies using IDPs report delivering updates up to 40% faster while cutting operational overhead nearly in half. However, the 2024 DORA report adds nuance: organizations amid a platform engineering initiative may see temporary dips in performance as the platform is built and adoption ramps up. The benefits accrue over time as the platform matures.
10.1.2 Backstage
Backstage, originally developed at Spotify and donated to the CNCF, is the most popular open-source framework for building developer portals. At Spotify, Backstage powered over 2,000 microservices, 300+ websites, and 4,000 data pipelines before it was open-sourced. By 2024, over 2 million developers across 3,400+ organizations used Backstage, including Airbnb, LinkedIn, Twilio, and American Airlines.
Backstage provides:
- Software catalog: A centralized inventory of all services, libraries, and infrastructure, with ownership, documentation, and API information.
- Software templates: Standardized project scaffolding that creates a new service with CI/CD, monitoring, and documentation pre-configured.
- TechDocs: Documentation-as-code, rendered from Markdown files stored alongside the source code.
- Plugin ecosystem: Extensible through plugins that integrate with tools like Kubernetes, PagerDuty, CircleCI, and hundreds of others.
The key insight of platform engineering is that an IDP is a product, not a project. The platform team has internal customers (development teams) and must provide a compelling developer experience. A portal that developers do not use is a failed investment.
10.1.3 Alternatives and Commercial Offerings
Other notable platforms include Port, Cortex, Roadie (managed Backstage), and Humanitec. The CNCF launched a Certified Backstage Associate (CBA) certification in late 2024, reflecting the maturity and adoption of the ecosystem.
10.2 Supply Chain Security
Software supply chain attacks (SolarWinds, Log4Shell, XZ Utils) have elevated supply chain security from a niche concern to a board-level issue. Modern delivery pipelines must address the integrity of every component in the chain from source code to running binary.
10.2.1 SLSA Framework
SLSA (Supply-chain Levels for Software Artifacts, pronounced “salsa”) defines a set of incrementally adoptable security guidelines:
| Level | Requirements |
|---|---|
| SLSA 1 | Build process is documented and produces provenance metadata |
| SLSA 2 | Build service is hosted and generates authenticated provenance |
| SLSA 3 | Build platform is hardened and provenance is non-falsifiable |
10.2.2 Sigstore
Sigstore provides free, easy-to-use tools for signing and verifying software artifacts. Its components include:
- Cosign: Signs and verifies container images and other artifacts.
- Fulcio: A certificate authority that issues short-lived certificates tied to OIDC identities (e.g., your GitHub identity).
- Rekor: A transparency log that provides a tamper-evident record of signing events.
10.2.3 SBOMs and Dependency Management
Generating SBOMs (Software Bills of Materials) at build time and scanning them for known vulnerabilities is becoming a regulatory requirement (the U.S. Executive Order 14028 mandates SBOMs for software sold to the federal government). Tools like Syft, Grype, and Trivy automate SBOM generation and vulnerability scanning.
Dependency management tools like Dependabot (GitHub), Renovate, and Snyk automatically detect outdated or vulnerable dependencies and open pull requests with updates. At scale, organizations need policies for how quickly different severity levels of vulnerabilities must be patched.
10.3 AI/ML in Delivery Pipelines
Artificial intelligence is beginning to transform delivery pipelines, though the 2024 DORA data suggests the impact is still nuanced.
10.3.1 Intelligent Test Selection
Machine learning models can predict which tests are most likely to fail for a given change, enabling test suites to be prioritized or pruned. Facebook’s Predictive Test Selection system uses historical test failure data and code change patterns to select a subset of tests for each diff, reducing CI compute by orders of magnitude while maintaining a high defect detection rate.
10.3.2 Automated Code Review
AI-powered code review tools (GitHub Copilot, CodeRabbit, Amazon CodeGuru) can identify bugs, security vulnerabilities, and style issues. These tools augment human reviewers rather than replacing them, flagging potential issues that human reviewers might miss due to fatigue or unfamiliarity with the codebase.
10.3.3 Pipeline Optimization
ML models can predict build times, identify resource bottlenecks, and recommend pipeline configuration changes. They can also detect anomalies in pipeline metrics (sudden increases in build time, flakiness spikes) and alert teams proactively.
10.4 Regulatory Compliance in Delivery
For organizations in regulated industries (healthcare, finance, government), delivery pipelines must satisfy compliance requirements.
10.4.1 Compliance as Code
Rather than manual audits and paper checklists, compliance requirements are encoded as automated checks in the delivery pipeline:
- SOC 2: Requires evidence of change management controls. A CI/CD pipeline with mandatory code review, automated testing, and deployment audit logs provides this evidence automatically.
- HIPAA: Requires access controls and audit trails for systems handling protected health information. IaC with encrypted state, role-based access, and immutable audit logs satisfies these requirements.
- FedRAMP: Requires continuous monitoring and vulnerability management. Automated security scanning in the pipeline, combined with infrastructure compliance tools (AWS Config, Azure Policy), provides continuous assurance.
Policy-as-code tools (Open Policy Agent, HashiCorp Sentinel, AWS Config Rules) enforce compliance rules at the infrastructure level, preventing non-compliant resources from being created.
10.5 The Future of Software Delivery
Several trends are shaping the next wave of software delivery:
AI-augmented development and delivery: Despite the 2024 DORA findings showing current AI adoption correlating with slightly lower performance, the trajectory is toward AI that understands codebases deeply enough to generate tests, suggest deployments, and automate incident response. The key will be managing batch sizes and maintaining human judgment.
Serverless and edge computing: The unit of deployment continues to shrink from servers to containers to functions to edge workers. Each step reduces operational overhead but introduces new challenges for testing, debugging, and observability.
WebAssembly (Wasm): Wasm is emerging as a portable, sandboxed runtime that can run in browsers, on servers, and at the edge. It may eventually complement or replace containers for certain workloads.
Platform engineering maturity: The platform engineering movement will continue to professionalize, with more standardized tooling, certifications, and best practices. The goal is to make the right thing the easy thing for developers.
Zero-trust delivery: Extending zero-trust security principles to the entire delivery pipeline, from developer workstations through CI/CD to production, ensuring that no component is implicitly trusted.
The constant across all of these trends is the principle at the heart of this entire course: software delivery is not a solved problem but a continuously improving practice. The tools change, the platforms evolve, and the scale increases, but the fundamentals endure. Measure your performance. Automate everything you can. Shorten feedback loops. Build resilient systems. And above all, keep shipping.