CS 246EE: Object-Oriented Software Development — Modern C++ Extension

Estimated study time: 3 hr 48 min

Table of contents

⚡ Why make it up

CS 246E works through thirty-four problems to build up the C++ object model from first principles — RAII, the Big Five, templates, iterators, and the visitor pattern. It necessarily stops at the boundary of modern C++: the language that emerged from C++11 onward is substantively different, and the idioms that govern it — move semantics, perfect forwarding, type erasure, concepts, ranges, and structured concurrency — are not extensions of what came before but new fluency. CS 246EE picks up where CS 246E leaves off with three movements: how ownership and lifetime are expressed cleanly through move semantics, std::unique_ptr, and RAII in the C++11/14/17 idiom; how generic programming reaches its full power through concepts, constexpr evaluation, and policy-based design; and how concurrent C++ works from the memory model through atomics, futures, and coroutines. Stroustrup’s flaneries (A Tour of C++), Meyers’s Effective Modern C++, and Williams’s C++ Concurrency in Action reach for adjacent ideas through their own lenses; CS 246EE is unapologetically problem-driven — the working C++ engineer’s enrichment of CS 246E.

Sources and References

Stroustrup, Bjarne — A Tour of C++, 3rd ed. (2022). Addison-Wesley. Primary narrative source for C++20 features.

Meyers, Scott — Effective Modern C++ (2014). O’Reilly. Canonical reference for C++11/14 idioms; items cited by number.

Williams, Anthony — C++ Concurrency in Action, 2nd ed. (2019). Manning. Primary source for Subpart C (Concurrency).

Sutter, Herb — Exceptional C++ and More Exceptional C++ (1999, 2001). Addison-Wesley. Advanced idioms: pimpl, type erasure, SFINAE.

Josuttis, Nicolai — The C++ Standard Library, 2nd ed. (2012). Addison-Wesley. Standard library deep-dives.

Alexandrescu, Andrei — Modern C++ Design (2001). Addison-Wesley. Policy-based design (Chapter 59).

cppreference.com — Online reference; all API signatures and constraints verified here.

Note: CS 246EE extends CS 246E into modern C++ (C++11 through C++20). Content covered in the original CS 246E is not repeated here.

Problem 35: I Want a Maybe Value

Back in Problem 9 we confronted the same question that haunts every API designer: what should a function return when it has no meaningful answer? We ruled out sentinel integers — there is no magic int that universally means “absent” — and settled on exceptions as the right tool. But exceptions carry a cost and a connotation: they signal something went wrong. Sometimes, though, a missing value is perfectly normal. A configuration file may or may not define a timeout. A map lookup may or may not find a key. A command-line parser may or may not see a particular flag. None of these is an error; each is just a value that might not be there.

We need a type that honestly says so.

The Problem with Sentinels

Suppose we write a function that parses an integer from a string:

int parse_int(const std::string &s) {
    // ...
}

What should parse_int("abc") return? Returning -1 fails immediately: -1 is a valid integer. Returning std::numeric_limits<int>::min() fails for the same reason. There is no integer value we can reserve as an “absence” marker.

The pair approach. We can widen the return type:

std::pair<bool, int> parse_int(const std::string &s) {
    try {
        return {true, std::stoi(s)};
    } catch (...) {
        return {false, 0};
    }
}

The caller must check the bool before using the int:

auto [ok, val] = parse_int("42");
if (ok) std::cout << val;

This works, but nothing in the type system forces the caller to check ok before reading val. If they forget, they silently use a garbage 0. The types lie: the int in the pair appears to always exist, even when it does not.

`std::optional`: The Type-Safe Maybe

C++17 introduces std::optional<T>, a type that either holds a value of type T or holds nothing. “Nothing” is spelled std::nullopt and has its own distinct type, std::nullopt_t, so it can never be confused with a real T.

#include <optional>

std::optional<int> parse_int(const std::string &s) {
    try {
        return std::stoi(s);   // wraps int in optional automatically
    } catch (...) {
        return std::nullopt;   // explicitly empty
    }
}

The caller now cannot accidentally use the value without checking:

auto result = parse_int("42");
if (result) {               // operator bool: true iff has value
    std::cout << *result;   // dereference to get the int
}
int val = result.value_or(0);  // 0 if absent

Constructing and Accessing an `optional`

std::optional<int> empty;           // default: no value
std::optional<int> opt{42};         // has value 42
auto opt2 = std::make_optional(42); // same; factory deduces T

opt = 7;               // reassign to new value
opt = std::nullopt;    // clear it
opt.reset();           // same as above

The four ways to retrieve the value:

std::optional<int> opt{42};

opt.has_value();    // true; equivalent to bool(opt)

*opt;               // 42; like dereferencing a pointer -- UB if empty
opt.value();        // 42; throws std::bad_optional_access if empty
opt.value_or(-1);   // 42; returns -1 (the default) if empty
opt->some_method(); // arrow operator, like a pointer; UB if empty

Note. operator* and operator-> follow pointer semantics: they give direct access but do not check. They are the right choice inside an if (opt) guard. Outside such a guard, prefer value() so an accidental empty dereference throws an exception rather than producing undefined behaviour.

Worked Example: A Configuration Reader

using Config = std::unordered_map<std::string, std::string>;

std::optional<int> get_int(const Config &cfg, const std::string &key) {
    auto it = cfg.find(key);
    if (it == cfg.end()) return std::nullopt;  // key absent
    return parse_int(it->second);              // may also be nullopt
}

The caller chains the two checks naturally:

Config cfg = {{"timeout", "5"}, {"retries", "bad"}};

if (auto t = get_int(cfg, "timeout")) {
    set_timeout(*t);
}
if (auto r = get_int(cfg, "retries")) {
    set_retries(*r);    // "bad" fails parse_int; this branch not taken
}

The optional propagates absence: get_int is absent either because the key is missing or because the value is not a valid integer, and the caller handles both cases with one if check.

Aside (CS 146 connection: Haskell’s Maybe and Racket’s false). In CS 146 you met Maybe in Haskell: Maybe Int is either Nothing or Just 42. std::optional is exactly the same algebraic type dressed in C++ syntax. The analogy extends to the accessor names: Haskell’s fromMaybe defaultVal maybeVal is opt.value_or(defaultVal). In Racket, the idiom is to return #f for “absent” and a value otherwise — a dynamically typed convention that the type system cannot enforce. optional makes the same contract statically checkable: the type std::optional<int> cannot be implicitly used as an int, so forgetting the check is a compile error, not a runtime surprise.

Monadic Operations (C++23)

C++23 adds three methods that let you chain optional-producing operations without explicit if checks. They mirror the >>= combinator from Haskell’s Maybe monad.

Monadic optional methods (C++23)

opt.and_then(f) — if opt has a value, calls f(*opt) where f itself returns an optional; if opt is empty, returns std::nullopt without calling f.
opt.transform(f) — if opt has a value, calls f(*opt) and wraps the result in an optional; otherwise propagates std::nullopt.
opt.or_else(f) — if opt is empty, calls f() (which must return an optional<T>); otherwise passes through opt unchanged.

// C++23
auto timeout_ms =
    get_int(cfg, "timeout")
    .transform([](int secs) { return secs * 1000; })  // scale if present
    .value_or(5000);                                   // default 5 s

Each step is evaluated only if the previous step produced a value. The distinction between and_then and transform mirrors flatMap vs map in functional programming: transform applies a function that returns a plain value; and_then applies a function that itself returns an optional, for use when that sub-step can also fail.

When Not to Use `optional`

std::optional<T> stores its value inline — there is no heap allocation. The size is roughly sizeof(T) + 1 (plus alignment padding). This makes it cheap for small types like int or double.

Two situations where optional is the wrong tool:

Large types. std::optional<HugeMatrix> reserves the full HugeMatrix storage on the stack even when the optional is empty. Consider std::unique_ptr<HugeMatrix> instead — a null pointer costs only sizeof(void*).
Signalling genuine errors. If a function can fail for a reason the caller should act on — a network timeout, a permissions violation, a malformed file — the absence of a value should carry why it is absent. C++23’s std::expected<T, E> (Chapter 37) does exactly that: it holds either a value of type T or an error of type E. optional carries only the absence; expected carries a diagnosis.

Note. Prefer std::optional over nullable raw pointers for “maybe a value” semantics in value types. A T* that means “optionally a T” has a second meaning: “a T owned elsewhere”. Those two ideas are better kept separate. Use optional<T> when you own the potential value; use raw or smart pointers when the ownership question is the point.

Problem 36: Sum Types Without Inheritance

Problem 18 gave us heterogeneous data through inheritance. Store a vector<unique_ptr<Book>>, use virtual dispatch, and each element behaves as its true type at runtime. It works. But it costs: every object carries a hidden vptr, every element lives on the heap, every call to a virtual method goes through an indirection. Worse, to add a new operation over the hierarchy you must touch every class — or reach for dynamic_cast (Problem 22).

Sometimes the set of alternatives is closed and known at compile time. We do not want a vtable. We do not want heap allocation. We want value semantics and exhaustive case analysis that the compiler can enforce.

C’s Approach: Tagged Union

C programmers have always had heterogeneous data. The idiom is a struct that pairs an enum tag with a union of the possible payloads:

enum MediaKind { SONG, PODCAST, AUDIOBOOK };

struct Media {
    MediaKind kind;
    union {
        Song      song;
        Podcast   podcast;
        Audiobook book;
    };
};

Media m;
m.kind = SONG;
m.song = makeSong(/*...*/);

// Later:
switch (m.kind) {
    case SONG:      playSong(m.song);       break;
    case PODCAST:   playPodcast(m.podcast); break;
    case AUDIOBOOK: playBook(m.book);       break;
}

The problems are familiar. Nothing stops you from writing m.podcast when m.kind == SONG: the union does not track which member is active. If you add a new MediaKind to the enum, every switch in the codebase silently becomes incomplete. And the union cannot hold types with non-trivial constructors or destructors without heroic manual management.

std::variant: The Type-Safe Tagged Union

C++17 introduces std::variant<T1, T2, ...>, which holds exactly one value from the listed alternatives at a time. The tag is implicit and always consistent. The alternatives may have constructors and destructors; the variant manages their lifetimes automatically.

#include <variant>

using BookVar = std::variant<Text, Comic, Book>;

BookVar v = Comic{"Watchmen", "Moore", 400, "Rorschach"};

The variant stores the value inline — no heap allocation. Its size is approximately max(sizeof(T1), sizeof(T2), ...) plus the tag.

Inspecting a variant

Which alternative is active?

v.index();                          // 1 -- Comic is the second alternative
std::holds_alternative<Comic>(v);   // true
std::holds_alternative<Book>(v);    // false

Extracting the value.

// By type -- throws std::bad_variant_access if wrong alternative active:
Comic &c = std::get<Comic>(v);

// By index -- same behaviour:
Comic &c2 = std::get<1>(v);

// Non-throwing: returns nullptr if wrong alternative:
Comic *cp = std::get_if<Comic>(&v);   // takes a pointer to the variant
if (cp) { /* safe to use cp */ }

Note. std::get<T> and std::get_if<T> are free function templates, not member functions. The template argument is the type (or zero-based index) you want to extract.

std::visit: Exhaustive Case Analysis

Calling std::get<T> by hand is error-prone for the same reason the C switch was: you must remember to handle every alternative. The right tool is std::visit, which calls a visitor — any callable — with the currently active value.

std::visit([](auto &val) {
    std::cout << val.getTitle() << "\n";
}, v);

The generic lambda receives the active alternative as its argument. If all alternatives share a common interface, a single generic lambda is all you need.

The Overload Trick

When alternatives require different handling, define a visitor with one overload per type. C++ has no built-in “overloaded lambda” syntax, but a small helper makes it easy:

// Define once, reuse everywhere:
template<typename... Ts>
struct overloaded : Ts... { using Ts::operator()...; };

// C++17 deduction guide (lets you write overloaded{f, g} without <...>):
template<typename... Ts> overloaded(Ts...) -> overloaded<Ts...>;

std::visit(overloaded{
    [](const Text  &t) { std::cout << "Text: "  << t.getTopic() << "\n"; },
    [](const Comic &c) { std::cout << "Comic: " << c.getHero()  << "\n"; },
    [](const Book  &b) { std::cout << "Book\n"; }
}, v);

If you forget to handle one of the alternatives, the code will not compile. The compiler enforces exhaustiveness — exactly what the C switch could not do.

Aside (CS 146 connection: algebraic data types and pattern matching). In CS 146 you saw Haskell’s algebraic data types:
data Shape = Circle Double | Rect Double Double

area :: Shape -> Double
area (Circle r) = pi * r * r
area (Rect w h) = w * h
Shape is exactly a sum type: a value is either a Circle carrying a radius or a Rect carrying a width and height. Pattern matching is exhaustive by construction; the compiler warns if you miss a case.
std::variant is C++’s version of the same idea. std::visit corresponds to pattern matching. The overloaded helper corresponds to a function with multiple equations, one per constructor. Haskell’s Either a b (either Left a or Right b) maps to std::variant<A, B>.

Worked Example: A JSON Value

JSON has exactly six value types: null, boolean, number, string, array, and object. That is a closed, compile-time-known set — a perfect fit for variant.

#include <variant>
#include <string>
#include <vector>
#include <map>
#include <memory>

struct JsonNull {};

struct JsonValue;  // forward declaration needed for the recursive types

using JsonArray  = std::vector<std::shared_ptr<JsonValue>>;
using JsonObject = std::map<std::string, std::shared_ptr<JsonValue>>;

struct JsonValue {
    std::variant<JsonNull, bool, double,
                 std::string, JsonArray, JsonObject> data;
};

A pretty-printer becomes a single std::visit:

void print(const JsonValue &jv) {
    std::visit(overloaded{
        [](JsonNull)             { std::cout << "null"; },
        [](bool b)               { std::cout << (b ? "true" : "false"); },
        [](double d)             { std::cout << d; },
        [](const std::string &s) { std::cout << '"' << s << '"'; },
        [](const JsonArray &arr) {
            std::cout << "[";
            for (auto &el : arr) { print(*el); std::cout << ","; }
            std::cout << "]";
        },
        [](const JsonObject &obj) {
            std::cout << "{";
            for (auto &[k, v] : obj) {
                std::cout << '"' << k << "\":";
                print(*v);
                std::cout << ",";
            }
            std::cout << "}";
        }
    }, jv.data);
}

Adding a new operation (a JSON serialiser, a validator) means writing one more std::visit call — no touching the JsonValue struct. Adding a new alternative (say, a special JsonInt) means updating the variant alias and recompiling; the compiler will flag every visit that does not handle the new case.

Note. Variant vs. inheritance: a decision guide. Use std::variant when the set of alternatives is closed (you control it and it does not grow at runtime), when you want value semantics (stack allocation, copyable, no heap), or when you need exhaustive case analysis enforced at compile time. Use inheritance and virtual dispatch when the set of types is open (third-party code may add new subclasses), when objects have independent lifetimes managed through pointers, or when the polymorphic interface is the stable part and new operations are rare. The two tools are complementary, not competing.

Problem 37: Errors Without Exceptions

Problem 35’s std::optional<T> tells the caller that a value might not be there. That is enough when absence is normal: a config key is missing, a map lookup finds nothing. But sometimes the caller needs to know why the value is absent. A network request failed — was it a timeout, a permissions error, or a malformed URL? We have been throwing exceptions for this, but exceptions have costs and connotations.

Back in Problem 9 we reached for exceptions as the right tool for “something went wrong.” Problem 15 reminded us of the exception-safety obligations that entails. But exceptions interrupt normal control flow, carry overhead from stack-unwinding infrastructure, and in some environments (embedded systems, certain real-time kernels) they are disabled entirely.

The C Approach: Error Codes

C programmers handle fallible operations with return codes:

// Returns 0 on success, errno value on failure.
int read_file(const char *path, char *buf, size_t *n);

// Caller:
char buf[4096]; size_t n;
int err = read_file("data.txt", buf, &n);
if (err != 0) {
    fprintf(stderr, "error: %s\n", strerror(err));
    return err;
}

The problems are well known. The return value is overloaded: it carries both the success result and the error code, but not at the same time. Nothing in the type system forces the caller to check for an error before using the result. Errno is a global, so it is not thread-safe. And the successful value must be returned through an output parameter, which inverts the natural reading order.

std::expected: The Type-Safe Result

C++23 introduces std::expected<T, E> (paper P0323). A value of this type holds either a T (the expected, successful result) or an E (the unexpected error). The type is explicit; the caller cannot use a T without first checking whether the result is an error.

#include <expected>   // C++23
#include <system_error>
#include <string>
#include <fstream>

std::expected<std::string, std::error_code>
read_file(const std::string &path) {
    std::ifstream f{path};
    if (!f) return std::unexpected{
        std::make_error_code(std::errc::no_such_file_or_directory)};
    return std::string{std::istreambuf_iterator<char>{f}, {}};
}

Notice that returning a T directly constructs the success case. Returning std::unexpected{err} constructs the error case. std::unexpected is a thin wrapper that distinguishes an error value from a success value when they might otherwise have the same type.

Constructing and Accessing

// Constructing:
std::expected<int, std::string> ok  = 42;
std::expected<int, std::string> err = std::unexpected{"oops"};

// Checking:
ok.has_value();   // true
bool(ok);         // same -- operator bool

// Extracting the value (UB or throw if in error state):
int v  = *ok;             // like optional: UB if error
int v2 = ok.value();      // throws std::bad_expected_access if error
int v3 = ok.value_or(0);  // returns 0 if error

// Extracting the error:
std::string e = err.error();  // UB if in value state

Look familiar? The interface is deliberately parallel to std::optional: has_value, operator bool, value, value_or, operator*. Once you know optional you already know most of expected.

Note. std::bad_expected_access<E> is the exception thrown by value() when the expected holds an error. It carries the error value so the handler can inspect it.

Monadic Operations

Like std::optional (Problem 35), C++23’s expected supports and_then, transform, and or_else for chaining fallible operations without explicit if checks.

Monadic expected methods (C++23)

e.and_then(f) — if e has a value, calls f(*e); f must return an expected<U, E>. Propagates the error unchanged if e is in error state.
e.transform(f) — if e has a value, calls f(*e) and wraps the result in expected; otherwise propagates the error.
e.or_else(f) — if e is in error state, calls f(e.error()); f must return expected<T, G>. Passes through a successful e unchanged.

Worked Example: A File-Reading Pipeline

Suppose we want to read a configuration file, extract a particular line, and parse an integer from it. Any step can fail.

using Result = std::expected<std::string, std::error_code>;

Result read_line(const std::string &path, int n);
std::expected<int, std::error_code> parse_int(const std::string &s);

// Chain them without a single explicit if:
auto timeout =
    read_line("config.txt", 3)
    .and_then(parse_int)
    .transform([](int secs) { return secs * 1000; })
    .value_or(5000);   // default if any step failed

Each step runs only if the previous step succeeded. If read_line fails, the and_then and transform are skipped and the error propagates unchanged to value_or, which supplies the fallback.

Aside (CS 245 connection: preconditions, postconditions, and partial functions). In CS 245 you wrote Hoare triples: {P} cmd {Q}, asserting that if precondition P holds before statement S executes, postcondition Q holds afterwards.
A function that can fail is a partial function: it has a postcondition but only under certain preconditions. The traditional C approach — set a global error flag, return a sentinel — is equivalent to saying the function always terminates with Q, but Q includes a hidden “or the error flag was set” disjunct that callers routinely ignore.
std::expected<T, E> makes the disjunction explicit in the type. The postcondition is: either the return value is a T satisfying the happy-path postcondition, or the return value is an E describing why the preconditions could not be met. Hoare logic has a name for this: a total correctness specification with a distinguished failure mode.

When to Use `expected` vs. Exceptions vs. `optional`

Use std::optional<T> when absence is normal and uninformative: a map lookup, an optional config value. There is nothing to explain.
Use std::expected<T, E> when failure is part of normal control flow but the caller needs to know why: parsing user input, I/O that routinely encounters missing files or permission errors, library APIs that must work without exceptions enabled.
Use exceptions for truly exceptional conditions: programming errors, resource exhaustion, situations the caller cannot reasonably recover from locally. Exceptions propagate automatically; expected must be explicitly threaded through every call site.

Note. std::expected is a C++23 feature. At the time of writing, GCC 12+, Clang 16+, and MSVC 19.36+ provide implementations under <expected>. The design is based on proposal P0323 by Vicente Botet and JF Bastien, which drew heavily on boost::outcome and Rust’s Result<T, E>. Rust programmers will find the monadic API immediately familiar.

Choosing the Error Type

The error type E in std::expected<T, E> is up to you. There are several common choices:

std::error_code — the standard type from <system_error>, carrying an integer code plus a category (std::system_category(), std::generic_category(), etc.). Good for I/O and OS errors. Composable across library boundaries because categories are extensible.
A plain enum class — simple, fast, zero overhead. The right choice when the set of failure modes is small and fixed.

  enum class ParseError { empty_input, invalid_char, overflow };

  std::expected<int, ParseError> parse_int(std::string_view s);

std::string — carries a human-readable message. Convenient but expensive (heap allocation). Prefer error_code plus a message function for library code.
A std::variant of error types — when a function can fail in structurally different ways that the caller will handle differently. This combines the ideas from Problem 36 and this one.

Example: Propagating errors up the call stack

With exceptions you get automatic propagation: an uncaught exception unwinds the stack until a handler is found. With expected you propagate manually. The idiom is to check and re-wrap:

using Err = std::error_code;

std::expected<Config, Err> load_config(const std::string &path) {
    auto text = read_file(path);        // returns expected<string, Err>
    if (!text) return std::unexpected{text.error()};  // propagate

    auto cfg = parse_config(*text);     // returns expected<Config, Err>
    if (!cfg) return std::unexpected{cfg.error()};    // propagate

    return *cfg;
}

The manual check is more verbose than exceptions, but it is also more explicit: every call site where a failure can originate is visible in the code. The monadic and_then from the previous section eliminates the boilerplate when errors pass through unchanged.

Problem 38: Compile-Time If

Back in Problem 24 we dispatched at compile time by overloading on iterator category tags. The technique works: the compiler sees the tag type, selects the right overload, and the wrong branch is never instantiated. But it is verbose. Every compile-time branch requires a separately named helper function and an explicit traits query. Read the code a month later and the intent is buried under machinery.

There is a simpler way.

The Old Way: Tag Dispatch and SFINAE

Recall the advance example from Problem 24. We wrote three overloads of doAdvance, one for each iterator category, and dispatched by constructing a tag object:

template<typename Iter>
Iter advance(Iter it, ptrdiff_t n) {
    return doAdvance(it, n,
        typename iterator_traits<Iter>::iterator_category{});
}

Alternatively, we might have tried SFINAE — Substitution Failure Is Not An Error — where we write enable conditions on template parameters to selectively disable overloads. That is even harder to read.

Both approaches share a flaw: they scatter the logic for one operation across multiple function templates. What we really want is to write the alternatives side-by-side, in one function body.

if constexpr

C++17 introduces if constexpr, which evaluates its condition at compile time. The discarded branch is not instantiated. It just needs to parse.

#include <type_traits>

template<typename T>
void serialize(const T &val) {
    if constexpr (std::is_integral_v<T>) {
        std::cout << "(int) " << val;
    } else if constexpr (std::is_floating_point_v<T>) {
        std::cout << "(float) " << val;
    } else {
        std::cout << "(other) " << val.toString();
    }
}

When T is int, the compiler instantiates only the first branch. The val.toString() in the third branch is never compiled — so it does not matter whether int has a toString member.

When T is std::string, the compiler instantiates only the third branch. std::is_integral_v<std::string> is false, so the first branch is discarded.

Note. The discarded branch must still be syntactically valid C++. The compiler parses it and checks basic syntax. What it does not do is instantiate the branch or check that its names are valid for the concrete type T. The rule: the discarded branch must parse, but need not type-check for the given T.

The Critical Constraint: Must Depend on a Template Parameter

if constexpr is only meaningful inside a template. The condition must depend on a template parameter; otherwise the compiler could simply evaluate it and warn about unreachable code.

// This is fine -- condition depends on T:
template<typename T>
void f(T x) {
    if constexpr (sizeof(T) > 4) { /* large type path */ }
    else                          { /* small type path */ }
}

// This is NOT if constexpr's job -- condition is a runtime value:
void g(bool flag) {
    if constexpr (flag) { }  // error: flag is not a constant expression
}

Comparison with `#ifdef`

#ifdef also selects code at compile time, but it is a preprocessor tool: it knows nothing about types, templates, or scopes.

// #ifdef: coarse, textual, not type-aware
#ifdef PLATFORM_64
    using size_type = uint64_t;
#else
    using size_type = uint32_t;
#endif

// if constexpr: fine-grained, inside templates, type-aware
template<typename T>
auto make_buffer() {
    if constexpr (sizeof(T) <= 8) {
        return SmallBuffer<T>{};
    } else {
        return LargeBuffer<T>{};
    }
}

if constexpr respects scoping, participates in type-checking of the taken branch, and interacts correctly with auto return type deduction. #ifdef does none of these things.

Aside (CS 146 connection: conditional at the type level). In a typed functional language like Haskell or ML, type-directed dispatch happens through type classes (Haskell) or modules (ML): different instances of a class provide different behaviour for different types. The selection happens at compile time, not runtime.
In Racket, cond and match are ordinary runtime conditionals — Racket is dynamically typed so there is no compile-time type to inspect.
if constexpr occupies the middle ground: it is a runtime if in syntax and scoping, but its condition is evaluated at compile time using the type system. It is as close as C++ gets to Haskell-style compile-time pattern matching on type properties.

Worked Example: `to_string`

Suppose we want a generic to_string that handles arithmetic types with std::to_string and everything else by calling a member str() method:

#include <string>
#include <type_traits>

template<typename T>
std::string to_string(const T &val) {
    if constexpr (std::is_arithmetic_v<T>) {
        return std::to_string(val);  // works for int, float, etc.
    } else {
        return val.str();            // user-defined types supply str()
    }
}

Without if constexpr we would need two overloads separated by enable_if conditions — messy boilerplate just to say “pick one of two implementations based on a type property.”

Example: Recursive template with if constexpr

if constexpr also replaces base-case specialisations in recursive templates. A classic example: printing a parameter pack.

template<typename Head, typename... Tail>
void print_all(Head h, Tail... tail) {
    std::cout << h;
    if constexpr (sizeof...(tail) > 0) {
        std::cout << ", ";
        print_all(tail...);
    }
    // No separate base-case specialisation needed.
}

When the pack is empty, sizeof...(tail) == 0 and the recursive call is in the discarded branch, so it is never instantiated. Previously you would need a separate print_all() overload for zero arguments.

A Note on `consteval` and `constexpr` Functions

if constexpr selects branches at compile time. A related but distinct feature is constexpr functions: functions that may be evaluated at compile time (when their arguments are compile-time constants) and at runtime otherwise. C++20 adds consteval for functions that must be evaluated at compile time.

The relationship:

constexpr function: compile-time if possible, runtime otherwise.
consteval function: compile-time only; calling it with a runtime argument is an error.
if constexpr: selects a branch at compile time inside a template.

They are complementary. A constexpr function can contain an if constexpr; an if constexpr condition often calls constexpr predicates from <type_traits>.

Problem 39: Unpack My Tuple

std::pair<T1, T2> gives us two heterogeneous values in one object. Problem 17 used it constantly — std::map stores key-value pairs, std::minmax returns a pair of iterators. But what about three values? Or four? Writing nested pairs (pair<int, pair<string, bool>>) is possible but unreadable. Enter std::tuple.

std::tuple: N Heterogeneous Values

std::tuple<T1, T2, ..., TN> generalises pair to an arbitrary number of elements. The types need not be related.

#include <tuple>
#include <string>

// Explicit construction:
std::tuple<int, std::string, double> t{42, "hello", 3.14};

// Factory (deduces types):
auto t2 = std::make_tuple(42, std::string{"hello"}, 3.14);

Accessing Elements

Elements are accessed by index or by type using the free function template std::get:

std::get<0>(t);    // 42      (int)
std::get<1>(t);    // "hello" (std::string)
std::get<2>(t);    // 3.14    (double)

// By type -- only unambiguous if the type appears exactly once:
std::get<std::string>(t);  // "hello"

std::tuple_size<decltype(t)>::value gives the number of elements; std::tuple_element<0, decltype(t)>::type gives the type of element 0. These are the machinery that std::get relies on internally.

std::tie: Unpacking into Existing Variables

Before C++17, the standard way to unpack a tuple into named variables was std::tie:

int    n;
std::string s;
double d;

std::tie(n, s, d) = t;   // n=42, s="hello", d=3.14

// Ignore fields you don't need:
std::tie(n, std::ignore, d) = t;

std::tie creates a tuple of references to its arguments. Assigning a tuple to it copies each element into the referenced variable.

Structured Bindings (C++17)

C++17 introduces a much cleaner syntax for unpacking. It works on tuples, pairs, structs, and arrays.

auto [n, s, d] = t;    // n is int, s is std::string, d is double

The types are deduced; you just name the pieces. This is the same auto [ok, val] syntax from Problem 35’s pair example — now generalised to any fixed-size heterogeneous aggregate.

Structured Bindings on Pairs and Maps

Problem 17 showed that iterating over a std::map is clunky because each element is a std::pair<const Key, Value>:

std::map<std::string, int> scores{{"Alice", 90}, {"Bob", 85}};

// Pre-C++17: verbose
for (auto &kv : scores) {
    std::cout << kv.first << ": " << kv.second << "\n";
}

// C++17 structured binding: clear and natural
for (auto &[name, score] : scores) {
    std::cout << name << ": " << score << "\n";
}

Structured Bindings on Plain Structs

Structured bindings work on any aggregate (a struct with no user-provided constructor, no virtual functions, and no private members), binding the variables to the fields in declaration order:

struct Point { int x; int y; };
Point p{3, 4};

auto [x, y] = p;   // x == 3, y == 4

For non-aggregate types, structured bindings require a get<N> specialisation and std::tuple_size / std::tuple_element traits — the same protocol that std::pair and std::tuple already satisfy.

Aside (CS 146 connection: tuples and pattern matching). Haskell and Racket both have tuples. In Haskell, (Int, String, Double) is the type of a 3-tuple; let (n, s, d) = t is the idiom for unpacking it.
In Racket, (match t [(list n s d) ...]) is the pattern-match equivalent. These are all the same idea: name the pieces of a compound value at the use site without first defining a dedicated record type.
Structured bindings in C++17 bring the same convenience to statically typed C++ code. The compiler deduces the types; you just give the pieces names that make the code readable.

std::apply: Unpack as Function Arguments

Sometimes you want to pass a tuple’s elements as separate arguments to a function. std::apply does this:

#include <tuple>
#include <functional>

auto add = [](int a, int b, int c) { return a + b + c; };

auto args = std::make_tuple(1, 2, 3);
int result = std::apply(add, args);   // same as add(1, 2, 3): result == 6

std::apply is useful when you build argument lists dynamically and then call a function with them — a pattern that appears in generic dispatchers, test frameworks, and serialisation libraries.

Worked Example: Multiple Return Values

A common use of tuple is returning multiple values from a function without defining a dedicated struct.

// Returns {mean, variance, count} -- all computed in one pass.
std::tuple<double, double, int>
stats(const std::vector<double> &data) {
    if (data.empty()) return {0.0, 0.0, 0};
    double sum = 0, sum2 = 0;
    for (double x : data) { sum += x; sum2 += x * x; }
    int n = static_cast<int>(data.size());
    double mean = sum / n;
    double var  = sum2 / n - mean * mean;
    return {mean, var, n};
}

// Caller unpacks with a structured binding:
auto [mean, var, n] = stats(data);
std::cout << "mean=" << mean << " var=" << var << " n=" << n << "\n";

Note. Prefer a named struct over a tuple when the fields have semantic meaning that a reader should not have to deduce from position. tuple<double, double, int> says nothing about which double is the mean and which is the variance. A struct named Stats with fields mean, var, and count is self-documenting. The tuple approach is convenient for short-lived, internal helper functions where names would be over-engineering; it becomes a liability when the type crosses API boundaries.

Tuple Comparison and Structured Binding References

Tuples support lexicographic comparison with ==, <, and the rest, provided the element types do. This makes them immediately usable in sorted containers and with algorithms that require ordering.

auto a = std::make_tuple(1, "abc", 3.0);
auto b = std::make_tuple(1, "abd", 3.0);

a < b;   // true: compares element by element; first difference is "abc" < "abd"
a == b;  // false

Structured bindings introduce names, not copies, by default when the right-hand side is an lvalue. Using auto & gives you references; auto gives copies.

std::tuple<int, std::string> t{42, "hi"};

auto  [a, b] = t;   // a and b are copies; modifying a does not change t
auto &[c, d] = t;   // c and d are references; modifying c changes t
const auto &[e, f] = t;  // read-only references

The reference form is especially important in the range-for over a std::map: writing for (auto [k, v] : m) copies every key-value pair. Writing for (const auto &[k, v] : m) avoids the copies.

Example: Sorting a vector of records by secondary key

A classic use of tuple comparison: sort records by a composite key without writing a custom comparator.

struct Student { std::string name; int grade; double gpa; };

std::vector<Student> students = { /* ... */ };

// Sort by (grade descending, gpa descending, name ascending):
std::sort(students.begin(), students.end(), [](const Student &a, const Student &b) {
    return std::make_tuple(-a.grade, -a.gpa, a.name)
         < std::make_tuple(-b.grade, -b.gpa, b.name);
});

Negating the numeric fields reverses their ordering. The tuple comparison handles the tie-breaking automatically, left to right. No custom comparator logic to write or debug.

Problem 40: Constraining Templates

Templates accept any type that satisfies the operations the code uses. That flexibility is their strength, but it has a sharp edge. Instantiate std::sort on a type that has no operator< and the compiler produces a wall of error messages deep inside the standard library implementation — none of which mention the actual mistake at the call site. Problem 24 gave us tag dispatch as one solution. Problem 25 mentioned SFINAE as another. Both work, but both are verbose and hard to read. C++20 gives us a better one: concepts.

The Problem: Opaque Template Errors

struct NoCmp { int x; };   // no operator<

std::vector<NoCmp> v = { {3}, {1}, {2} };
std::sort(v.begin(), v.end());   // a screenful of errors

The errors name internal helper types and lambdas deep in the <algorithm> header. The user’s mistake — providing an incomparable type — is nowhere in the message.

A concept is a compile-time predicate on types. Attaching one to a template parameter produces a clean diagnostic at the call site when the constraint is not satisfied.

Writing a Concept

#include <concepts>

template<typename T>
concept Comparable = requires(T a, T b) {
    { a < b } -> std::convertible_to<bool>;
};

Read this as: T satisfies Comparable if, given two const T values a and b, the expression a < b is well-formed and its result is convertible to bool.

requires Expressions

The body of a requires expression is a list of requirements. There are four kinds:

template<typename T>
concept Printable = requires(T x, std::ostream &os) {
    // Simple requirement: expression must be well-formed:
    os << x;

    // Type requirement: T must have a nested type:
    typename T::value_type;

    // Compound requirement: well-formed AND result satisfies a concept:
    { x.size() } -> std::convertible_to<std::size_t>;

    // Nested requirement: a further concept must hold:
    requires std::copyable<T>;
};

Using Concepts

There are three syntactic styles. All are equivalent; choose the one that reads most clearly for your context.

As a template parameter constraint.

template<Comparable T>
T max(T a, T b) { return a < b ? b : a; }

With a requires clause.

template<typename T>
    requires Comparable<T>
T max(T a, T b) { return a < b ? b : a; }

Abbreviated function template (C++20).

// 'auto' makes it a template; 'Comparable auto' constrains it:
Comparable auto max(Comparable auto a, Comparable auto b) {
    return a < b ? b : a;
}

The abbreviated form is the most concise. The requires-clause form is most flexible: its condition can be an arbitrary boolean expression, not just a single concept name.

Standard Library Concepts

C++20’s <concepts> header provides a rich library of built-in concepts:

std::same_as<T, U>       // T and U are the same type
std::derived_from<T, B>  // T is derived from B
std::convertible_to<T,U> // T is implicitly convertible to U
std::integral<T>         // T is an integral type
std::floating_point<T>   // T is a floating-point type
std::copyable<T>         // T can be copy-constructed and copy-assigned
std::movable<T>          // T can be move-constructed and move-assigned
std::equality_comparable<T>  // T has operator==
std::totally_ordered<T>      // T has all six comparison operators
std::invocable<F, Args...>   // F is callable with Args...

The <iterator> header adds iterator concepts: std::input_iterator, std::random_access_iterator, std::sentinel_for, and so on.

How Concepts Replace SFINAE

The old SFINAE way to write a constrained max (using std::enable_if):

template<typename T,
         typename = std::enable_if_t<
             std::is_same_v<
                 decltype(std::declval<T>() < std::declval<T>()),
                 bool>>>
T max(T a, T b) { return a < b ? b : a; }

The concept version:

template<Comparable T>
T max(T a, T b) { return a < b ? b : a; }

Same semantics. Dramatically better error messages. And the intent is explicit in the code.

Aside (CS 246 / Problem 24 connection: tag dispatch was a hand-rolled concept). Problem 24’s iterator tags were C++’s original compile-time dispatch mechanism. Each tag (random_access_iterator_tag, etc.) is essentially a hand-written concept: a type-level predicate that classifies iterator types. The overloaded doAdvance functions are hand-written concept-driven dispatch.
Concepts subsume all of this. The standard library’s iterator concepts (std::random_access_iterator, std::bidirectional_iterator, …) now express these distinctions directly, and the advance algorithm can be written with a requires clause or constrained overloads instead of tag objects.

Worked Example: Constrained `max`

#include <concepts>

template<typename T>
concept LessThanComparable = requires(T a, T b) {
    { a < b } -> std::convertible_to<bool>;
};

template<LessThanComparable T>
const T &max(const T &a, const T &b) { return a < b ? b : a; }

struct NoCmp { int x; };

int main() {
    max(3, 5);          // OK: int satisfies LessThanComparable
    max(NoCmp{}, NoCmp{});  // error: NoCmp does not satisfy LessThanComparable
    // Error message names the concept -- clear and helpful.
}

Note. Concepts are about what a type must do, not what it is. std::integral<T> checks whether T is an integral type by querying a type trait. Comparable<T> checks whether T supports < — it would be satisfied by any user-defined type that provides the operator, regardless of its inheritance hierarchy. This is duck typing made explicit and statically verified: if it supports <, it satisfies Comparable.

Problem 41: Lazy Pipelines

Problem 17 gave us std::transform, std::copy_if, std::accumulate, and friends. They work. But chaining them is painful. Each step allocates a new intermediate container. Each step requires an output iterator. Composing three transformations means three temporary vectors, three passes over the data, and three iterator pairs to manage.

C++20 introduces the ranges library, which solves all of these problems at once.

The Pain of Chaining Algorithms

Suppose we want the squares of all even numbers in a vector.

std::vector<int> v = {1, 2, 3, 4, 5, 6, 7, 8, 9, 10};

// Step 1: filter evens -- allocates a new vector
std::vector<int> evens;
std::copy_if(v.begin(), v.end(), std::back_inserter(evens),
             [](int n) { return n % 2 == 0; });

// Step 2: square them -- allocates another new vector
std::vector<int> squares;
std::transform(evens.begin(), evens.end(), std::back_inserter(squares),
               [](int n) { return n * n; });

// squares == {4, 16, 36, 64, 100}

Two temporary vectors, two passes, lots of boilerplate. With ten steps the code would be unreadable.

Views: Lazy Adapters

C++20’s std::views (also accessible as std::ranges::views) provides lazy range adapters. A view does not compute anything when constructed. It wraps the underlying range and produces elements on demand, one at a time, as you iterate. No intermediate containers. No extra passes.

#include <ranges>
#include <vector>
#include <iostream>

std::vector<int> v = {1, 2, 3, 4, 5, 6, 7, 8, 9, 10};

auto result = v
    | std::views::filter([](int n) { return n % 2 == 0; })
    | std::views::transform([](int n) { return n * n; });

for (int x : result) {
    std::cout << x << " ";   // 4 16 36 64 100
}

The | pipe operator composes views left-to-right. result is not a vector — it is a lightweight view object that remembers the composition. Elements are produced lazily as the range-for loop advances.

The View Vocabulary

The standard library ships a rich set of views. Here are the most common:

views::filter(pred)       // yield only elements satisfying pred
views::transform(fn)      // apply fn to each element
views::take(n)            // yield at most n elements
views::drop(n)            // skip first n elements
views::reverse            // iterate backwards (requires bidirectional range)
views::iota(a, b)         // generates a, a+1, ..., b-1 on demand
views::iota(a)            // infinite sequence a, a+1, a+2, ...
views::keys               // from a range of pairs, yield only the keys
views::values             // from a range of pairs, yield only the values
views::enumerate          // (C++23) yield {index, element} pairs
views::zip(r1, r2)        // (C++23) yield pairs of elements side-by-side

views::iota is particularly powerful because it generates a range without an underlying container at all:

// Print the first 5 squares, no vector needed:
for (int x : std::views::iota(1, 6)
             | std::views::transform([](int n){ return n * n; })) {
    std::cout << x << " ";   // 1 4 9 16 25
}

Ranges Algorithms

The <algorithm> header gains a parallel set of algorithms in the std::ranges:: namespace. These take a range directly instead of a pair of begin/end iterators.

std::vector<int> v = {3, 1, 4, 1, 5, 9, 2, 6};

std::ranges::sort(v);              // sort in-place

auto it = std::ranges::find(v, 5); // returns iterator, not bool

std::ranges::copy(v, std::ostream_iterator<int>{std::cout, " "});

The range overloads are strictly more convenient: one argument instead of two, and they work directly with views.

// Sort only the even elements (copy to a temp view, sort that):
auto evens = v | std::views::filter([](int n){ return n % 2 == 0; });
// Note: filter view is not sortable directly (not a common range).
// For in-place work on views, use ranges::for_each or materialize first.
std::ranges::for_each(evens, [](int x){ std::cout << x << " "; });

Materialising a View

A view is not a container. To store the results in a vector, you must materialise the view. In C++23, std::ranges::to does this cleanly:

// C++23:
auto squares_of_evens = v
    | std::views::filter([](int n){ return n % 2 == 0; })
    | std::views::transform([](int n){ return n * n; })
    | std::ranges::to<std::vector>();

// C++20 workaround (no std::ranges::to yet):
std::vector<int> out;
for (int x : result) out.push_back(x);

Aside (CS 146 connection: lazy evaluation and infinite lists). In Haskell, all evaluation is lazy by default. [1..] is an infinite list of integers; take 5 (map (*2) [1..]) produces [2,4,6,8,10] without ever generating the full infinite list. The pipeline is evaluated only as far as take demands.
std::views::iota(1) is an infinite range. Composing it with views::take(5) is the exact C++ equivalent of take 5 [1..]. Nothing is computed until the for-loop (or a materialising algorithm) demands elements.
This is lazy evaluation in a normally eager language. Ranges bring it to C++ in a controlled, opt-in way, restricted to sequence processing rather than applied globally.

Worked Example: Top 5 Words by Length

Take a string of words, find the five longest, and print them in order of length.

#include <ranges>
#include <algorithm>
#include <string>
#include <vector>
#include <iostream>
#include <sstream>

int main() {
    std::string sentence = "the quick brown fox jumps over a lazy dog";
    std::istringstream iss{sentence};
    std::vector<std::string> words{
        std::istream_iterator<std::string>{iss}, {}};

    // Sort words longest-first:
    std::ranges::sort(words, [](auto &a, auto &b){
        return a.size() > b.size();
    });

    // Take the top 5 and print them:
    for (auto &w : words | std::views::take(5)) {
        std::cout << w << " (" << w.size() << ")\n";
    }
    // Output: jumps(5) quick(5) brown(5) over(4) lazy(4)
}

No temporary containers for the top-5 step. views::take(5) wraps the sorted vector lazily; the range-for loop reads the first five elements and stops.

Note. The ranges library decouples iterators from sentinels. In the classic STL, begin() and end() must have the same type. In ranges, the sentinel (end marker) may have a different type than the iterator, as long as they can be compared with !=. This enables ranges over null-terminated C strings (where the sentinel is a comparison against '\0') and other non-symmetric iteration patterns that the old model could not express cleanly. The concepts std::ranges::range and std::sentinel_for formalise this — we met concept syntax in Problem 40.

Problem 42: Non-Owning Views

Back in Problem 3 we built a resizable array from scratch, and we already knew how C passes arrays around: a pointer to the first element plus a separate size_t for the length. Every function that needed to look at a contiguous sequence had to accept two parameters and trust the caller to pass a matching pair. Get it wrong — pass the wrong length, forget to subtract one, slice the wrong half — and you get undefined behaviour with no compiler warning. Strings were even worse: a const char* carries no length at all, so every substring operation either allocates a fresh copy or resorts to pointer arithmetic.

There is a better way. C++17 gives us two types that bundle the pointer and the length together, add a rich interface on top, and crucially do so without owning the underlying data.

The Ownership Distinction

Problem 14 drew a sharp line between owning and non-owning handles to memory. A std::unique_ptr<T> owns its pointee: when the smart pointer dies, the object dies with it. A raw pointer T* may or may not own what it points to — the type gives you no clue.

C++17 introduces two unambiguously non-owning types:

View types

A view is a lightweight, non-owning reference to data that lives elsewhere. The view does not allocate, does not copy, and does not extend the lifetime of the underlying storage. The programmer is responsible for ensuring the underlying data outlives the view.
std::string_view — a view into a character sequence.
std::span<T> — a view into a contiguous sequence of T.

std::string_view

std::string_view (from <string_view>) represents a window into some existing character data. It stores just two things: a pointer to the first character and a length. It is always cheap to copy.

Construction

#include <string_view>

// From a string literal -- no allocation
std::string_view sv1 = "hello, world";

// From a std::string -- sv2 points into s's buffer
std::string s = "hello, world";
std::string_view sv2 = s;

// From a pointer + length -- useful for buffers
const char *buf = "hello, world";
std::string_view sv3{buf, 5};   // "hello"

All three constructions are O(1): no characters are copied.

Operations

string_view supports almost every read-only operation you expect from std::string: size(), empty(), operator[], front(), back(), find(), rfind(), starts_with() (C++20), ends_with() (C++20), and range-based for. The two operations most worth highlighting are substr and remove_prefix:

std::string_view sv = "2024-01-15";

// substr returns another view -- still no allocation
std::string_view year  = sv.substr(0, 4);   // "2024"
std::string_view month = sv.substr(5, 2);   // "01"

// remove_prefix / remove_suffix adjust the view in place
sv.remove_prefix(5);   // sv is now "01-15"
sv.remove_suffix(3);   // sv is now "01"

Compare this with std::string::substr, which heap-allocates a brand-new string every time. string_view::substr just adjusts the pointer and length. No allocation, no copy.

The Golden Rule

Note. The underlying data must outlive the view. A string_view is a pointer into memory it does not own. If that memory is freed or moved — if a std::string is destroyed, or resized, or moved to a new allocation — the view becomes a dangling pointer.
std::string_view danger() {
    std::string s = "temporary";
    return s;     // s is destroyed here; caller gets a dangling view -- UB
}
Never return a string_view that refers to a local variable. Never store a string_view as a class data member unless you can prove the string it refers to outlives the object containing the view.

string_view as a Parameter Type

The most important use of string_view is as a function parameter. Before C++17, a function that just read a string had to choose between:

void f(const std::string &s);  // forces allocation if caller passes a literal
void f(const char *s);         // no length; no std::string interface

Neither is ideal. string_view accepts both:

void f(std::string_view sv);   // accepts literals, strings, substrings

f("hello");              // no allocation -- literal becomes a string_view
f(some_std_string);      // points into the string's buffer; no copy
f(sv.substr(0, 5));      // another view; still no allocation

It works. One parameter type, all callers, no heap traffic.

std::span

std::span<T> (from <span>, standardized in C++20 but available in C++17-era toolchains via <gsl/span> or library headers) does for contiguous sequences of arbitrary type what string_view does for characters. It stores a pointer and a length. It does not own the data.

Fixed-size vs Dynamic Extent

span comes in two flavours:

// Dynamic extent: length not known at compile time
std::span<int> s1;

// Fixed extent: length baked into the type
std::span<int, 5> s2;

The fixed-extent form span<T, N> is a zero-overhead wrapper: the compiler knows the size at compile time and can optimize loops and bounds checks accordingly, without storing the length at runtime. The dynamic-extent form span<T> is the more commonly useful one.

Construction

int arr[5] = {1, 2, 3, 4, 5};
std::vector<int> v = {10, 20, 30};

std::span<int>    s1{arr};           // from C array
std::span<int>    s2{v};             // from vector
std::span<int, 5> s3{arr};          // fixed-extent from C array
std::span<int>    s4{arr + 1, 3};   // pointer + length: {2, 3, 4}

Like string_view, all constructions are O(1).

Subspans

std::span<int> s{v};

auto first3 = s.first(3);        // first 3 elements
auto last2  = s.last(2);         // last 2 elements
auto middle = s.subspan(1, 2);   // starting at index 1, length 2

Each returns a new span into the same underlying storage. No copying.

Replacing Pointer+Length Parameters

The old C style:

// Two parameters that must stay synchronized -- easy to mismatch
void process(const int *data, size_t len);

The C++ view style:

// Single parameter; accepts arrays, vectors, sub-ranges, all of it
void process(std::span<const int> data) {
    for (int x : data) { /* ... */ }
}

int arr[10] = {};
std::vector<int> v(20);
process(arr);                // OK
process(v);                  // OK
process({v.data(), 5});      // first 5 elements of v

The size is carried along automatically. The mismatch bug is gone.

Aside (Rust connection: borrows and slices). Rust formalizes views at the language level. A Rust string slice &str is exactly a string_view: a fat pointer (data address plus length) that borrows its data from some owner. Rust’s slice type &[T] is exactly a span<T>. The difference is that Rust’s borrow checker enforces the golden rule at compile time: a slice cannot outlive the owner it borrows from. C++ has no such static enforcement — the rule is a programmer responsibility, not a language guarantee. Rust’s approach eliminates the class of dangling-view bugs at the cost of learning the borrow checker; C++’s approach gives you the performance without the restriction.

Worked Example: Pattern Search in a Byte Buffer

Suppose we write a function that searches for a short byte pattern inside a larger buffer. With the old interface:

// Old style: four parameters, two pairs to keep in sync
bool contains(const uint8_t *buf, size_t buf_len,
              const uint8_t *pat, size_t pat_len);

With span:

#include <span>
#include <algorithm>
#include <cstdint>

bool contains(std::span<const uint8_t> buf,
              std::span<const uint8_t> pat) {
    if (pat.size() > buf.size()) return false;
    for (size_t i = 0; i + pat.size() <= buf.size(); ++i) {
        if (std::equal(pat.begin(), pat.end(),
                       buf.begin() + i))
            return true;
    }
    return false;
}

Two parameters. The sizes are always synchronized because they are part of the type. Any contiguous range — a C array, a std::vector, a section of a memory-mapped file — can be passed without modification.

Example: Constness and spans

Note the use of span<const uint8_t> rather than span<uint8_t>. The const here is on the element type, not the span itself. A span<const T> is a read-only view: the elements cannot be modified through it. A span<T> is a read-write view.

A span<T> implicitly converts to span<const T> (just as T* converts to const T*). The reverse does not compile. Use span<const T> for input parameters, span<T> only when the function needs to write back into the caller’s buffer.

Note. Never store a span or string_view as a class data member unless you have ironclad proof that the referred-to data outlives every instance of the class. In practice that means the referred-to data should be static or live in a scope that strictly encloses every instance. When in doubt, store a std::string or std::vector instead — they own their data and the lifetime question dissolves.

Problem 43: A Real Date Library

For most of this course, “time” meant “performance.” Back in Problem 1 we used clocks briefly, timing how long operations took. But <chrono> is much more than a stopwatch. It contains a carefully designed system of types for durations, time points, and clocks — a system that prevents the classic bugs of mixing seconds with milliseconds or subtracting timestamps from different clocks. C++20 extends it further with calendar types that let you write actual dates. The full picture deserves proper coverage.

The Core Idea: Types Prevent Bugs

The classic duration bug in C:

// Old C style -- no idea what units any of these are
long timeout   = 5;      // seconds? milliseconds?
long heartbeat = 200;    // milliseconds? microseconds?
if (elapsed > timeout + heartbeat) { /* ... */ }  // silent unit mismatch

<chrono> makes the units part of the type. Adding a std::chrono::seconds to a std::chrono::milliseconds compiles and gives the right answer. Adding a steady_clock::time_point to a system_clock::time_point does not compile. The type system catches the bug before the program runs.

Duration Types

A std::chrono::duration<Rep, Period> represents a span of time. Rep is the numeric representation (usually long long) and Period is a std::ratio that relates the tick to a second. You almost never write this by hand. The standard library provides all the common specializations:

#include <chrono>
using namespace std::chrono;

nanoseconds  ns{1};      // 1 nanosecond
microseconds us{1};      // 1 microsecond
milliseconds ms{200};    // 200 milliseconds
seconds      s{5};       // 5 seconds
minutes      m{2};       // 2 minutes
hours        h{1};       // 1 hour

Duration Arithmetic

Durations support the arithmetic you expect, with automatic unit promotion:

seconds s{5};
milliseconds ms{200};

auto total = s + ms;   // total has type milliseconds, value 5200
auto diff  = s - ms;   // milliseconds{4800}
auto triple = s * 3;   // seconds{15}

s += minutes{1};       // s is now seconds{65}

Converting to a coarser unit requires an explicit duration_cast — the compiler will not silently lose precision:

milliseconds ms{5200};
seconds s = duration_cast<seconds>(ms);   // s{5}: truncates toward zero
// seconds s = ms;   // ERROR: narrowing conversion not allowed implicitly

User-Defined Literals

C++14 adds duration literals so you can write durations inline without constructing types explicitly:

using namespace std::chrono_literals;

auto timeout   = 5s;     // std::chrono::seconds{5}
auto heartbeat = 200ms;  // std::chrono::milliseconds{200}
auto delay     = 1h + 30min;  // std::chrono::minutes{90}

auto sleep_for = 50us;   // microseconds
auto precision = 10ns;   // nanoseconds

The using namespace std::chrono_literals is necessary to bring the literal suffixes into scope. They live in their own sub-namespace precisely so you can import them without pulling in the entire std::chrono namespace.

Clock Types

A clock provides a now() function that returns a time point, and a period that tells you the resolution. C++ provides three:

Standard clocks

std::chrono::steady_clock — monotonic clock. It never goes backwards, never jumps forward. Use this for measuring elapsed time.
std::chrono::system_clock — the wall clock. Tracks real-world calendar time. May be adjusted by NTP; can go backwards. Use this to get the current date and time.
std::chrono::high_resolution_clock — the highest-resolution clock available. May be an alias for either of the above. Avoid it if you care which clock you are using.

Time Points

clock::now() returns a time_point<clock>. Time points can be subtracted to get a duration:

auto t1 = std::chrono::steady_clock::now();
auto t2 = std::chrono::steady_clock::now();
auto elapsed = t2 - t1;   // type: steady_clock::duration

Subtracting two time points from different clocks does not compile. That is the whole point.

time_since_epoch() gives the duration between the clock’s epoch (an implementation-defined point in time, typically 1970-01-01 for system_clock) and the time point:

auto now     = std::chrono::system_clock::now();
auto epoch_s = duration_cast<seconds>(now.time_since_epoch());
// epoch_s.count() is the Unix timestamp

Measuring Elapsed Time

The canonical timing pattern:

#include <chrono>
#include <iostream>
using namespace std::chrono;

auto t0 = steady_clock::now();

do_work();

auto elapsed = duration_cast<milliseconds>(steady_clock::now() - t0);
std::cout << "Elapsed: " << elapsed.count() << " ms\n";

.count() extracts the raw numeric value in the duration’s units. It is the only place where you cross the boundary from the type-safe world back to a plain integer.

Note. Use steady_clock for measuring elapsed time. It is monotonic: it advances at a constant rate and never jumps. system_clock is the wrong tool here because it can be adjusted mid-measurement by the operating system (NTP sync, daylight saving, etc.).

A Timer RAII Class

Problem 14 showed that any resource that needs cleanup belongs in an RAII wrapper. Time measurement is no different: the “resource” is the start time, and the “cleanup” is logging the elapsed duration.

#include <chrono>
#include <iostream>
#include <string>
using namespace std::chrono;

class Timer {
    std::string   label_;
    steady_clock::time_point start_;
public:
    explicit Timer(std::string label)
        : label_{std::move(label)}, start_{steady_clock::now()} {}

    ~Timer() {
        auto ms = duration_cast<milliseconds>(
            steady_clock::now() - start_).count();
        std::cout << label_ << ": " << ms << " ms\n";
    }
};

Usage:

void expensive_function() {
    Timer t{"expensive_function"};
    // ... work happens here ...
}   // Timer destructor fires: "expensive_function: 47 ms"

No matter how expensive_function exits — normal return, exception, early return from a nested scope — the timer logs.

C++20 Calendar Types

C++20 adds calendar and time-zone support. The key new types are year_month_day, sys_days, and year_month_day_last.

#include <chrono>
using namespace std::chrono;

// Constructing a date: year/month/day syntax
year_month_day d1 = 2024y / January / 15;
year_month_day d2 = 2024y / 1 / 15;    // same date

// Querying
int y = static_cast<int>(d1.year());   // 2024
unsigned m = static_cast<unsigned>(d1.month());  // 1
unsigned day = static_cast<unsigned>(d1.day());  // 15

// Is it a valid date?
d1.ok();    // true; would be false for 2024/February/30

sys_days converts a year_month_day to a time_point that can be used with system_clock. The bridge goes the other way too:

// time_point -> calendar date
auto today = floor<days>(system_clock::now());  // truncate to whole day
year_month_day ymd{today};
std::cout << ymd.year() << "-" << ymd.month() << "-" << ymd.day() << "\n";

// calendar date -> time_point (for arithmetic with durations)
sys_days target = 2024y / December / 25;
auto days_away  = target - today;
std::cout << days_away.count() << " days until Christmas\n";

The Last Day of the Month

A classic calendar calculation:

year_month_day last_day_of_jan = 2024y / January / last;
// last_day_of_jan.day() == 31

year_month_day last_day_of_feb = 2024y / February / last;
// 2024 is a leap year; last_day_of_feb.day() == 29

The standard library handles leap years and varying month lengths for you.

Aside (CS 343 connection: timed waits). In CS 343 you will use std::this_thread::sleep_for() to pause a thread for a specified duration. It accepts a chrono duration directly:
#include <thread>
#include <chrono>
using namespace std::chrono_literals;

std::this_thread::sleep_for(100ms);   // sleep for 100 milliseconds
std::this_thread::sleep_until(
    std::chrono::steady_clock::now() + 1s);  // sleep until 1 s from now
The type safety extends into threading: you cannot accidentally pass a plain integer where a duration is expected, and the units are explicit in the code.

Note. For wall-clock time (“what time is it right now?”) use system_clock. For measuring elapsed real time (“how long did this take?”) use steady_clock. The two purposes are different and deserve different clocks. high_resolution_clock is a shorthand for whichever of the two has finer resolution on the current platform; it is fine for quick experiments but says nothing about monotonicity.

Problem 44: A Real Filesystem API

Problem 1 opened files with std::ifstream and std::ofstream. That covers reading and writing the contents of files. But what about the world around those files? Creating directories. Listing what is in a folder. Checking whether a file exists before trying to open it. Copying, renaming, removing. In Problem 2 we briefly thought about separate compilation and the .cc/.h file structure of a project — all of which assumes you are navigating a directory tree. Doing any of that with portable code used to require calling POSIX functions (opendir, stat, mkdir) or Windows APIs, and writing separate implementations for each platform.

C++17 changes that. <filesystem> is a full, portable API for the directory structure around your files.

std::filesystem::path

Everything in <filesystem> revolves around std::filesystem::path. A path represents a location in the filesystem: it might name an existing file, a directory, or something that does not exist yet. It knows nothing about permissions or contents; it is just a name.

#include <filesystem>
namespace fs = std::filesystem;

fs::path p1 = "/usr/local/include";   // absolute path (POSIX)
fs::path p2 = "src/main.cc";          // relative path
fs::path p3 = "C:\\Users\\Alice";     // Windows path also works

Composing Paths

The / operator appends a component:

fs::path base = "/home/alice";
fs::path full = base / "projects" / "myapp" / "src";
// full == "/home/alice/projects/myapp/src"

Look familiar? It is operator overloading to make path manipulation read like shell notation.

Decomposing Paths

fs::path p = "/home/alice/report.pdf";

p.filename();      // "report.pdf"
p.stem();          // "report"
p.extension();     // ".pdf"
p.parent_path();   // "/home/alice"
p.root_path();     // "/"
p.is_absolute();   // true

p.string();        // native string representation

These all return path objects (except string(), which returns std::string, and the boolean queries).

Querying the Filesystem

fs::path p = "/home/alice/data.csv";

fs::exists(p);             // does anything at this path exist?
fs::is_regular_file(p);    // is it a regular file?
fs::is_directory(p);       // is it a directory?
fs::is_symlink(p);         // is it a symbolic link?

fs::file_size(p);          // size in bytes (throws if not a regular file)
fs::last_write_time(p);    // returns a filesystem::file_time_type

All of these query the filesystem on every call — they are not cached. If you need to make multiple queries about the same file, it is more efficient to use a fs::directory_entry, which caches the stat results.

Iterating a Directory

Flat Iteration

std::filesystem::directory_iterator yields one directory_entry for each item directly inside a directory (not recursively):

for (const fs::directory_entry &entry : fs::directory_iterator{"."}) {
    std::cout << entry.path().filename() << "\n";
}

The order is unspecified. Use std::sort on the paths if you need them in alphabetical order.

Recursive Iteration

std::filesystem::recursive_directory_iterator walks the entire subtree:

for (const fs::directory_entry &entry :
         fs::recursive_directory_iterator{"."}) {
    if (entry.is_regular_file()) {
        std::cout << entry.path() << "\n";
    }
}

Creating and Removing

fs::create_directory("output");           // create one directory
fs::create_directories("a/b/c");          // create whole path, like mkdir -p

fs::remove("output/old.txt");             // remove a single file
fs::remove_all("output");                 // remove directory and all contents

fs::rename("old_name.txt", "new_name.txt");  // rename or move

fs::copy("src.txt", "dst.txt");              // copy a file
fs::copy("src_dir", "dst_dir",
         fs::copy_options::recursive);       // copy directory tree

Error Handling: Two Forms

Most <filesystem> functions have two overloads. The default throws std::filesystem::filesystem_error on failure. An overload that takes an std::error_code& stores the error instead of throwing:

// Throwing form: use when failure is unexpected
fs::file_size("/etc/passwd");   // throws filesystem_error if it fails

// Non-throwing form: use when failure is routine
std::error_code ec;
auto sz = fs::file_size("/etc/passwd", ec);
if (ec) {
    std::cerr << "Error: " << ec.message() << "\n";
}

The non-throwing form is useful in performance-sensitive loops or when you expect certain failures (a file that might or might not exist).

Aside (CS 246 / CS 136 connection: POSIX underpinnings). Under the hood, <filesystem> on Linux is implemented using the same POSIX calls you saw in CS 136: stat(2), opendir(3)/readdir(3), mkdir(2), rename(2), and unlink(2). On Windows it uses GetFileAttributes, FindFirstFile, CreateDirectory, etc. The <filesystem> API gives you a single interface that compiles and behaves consistently on both. If you ever need a capability it does not expose — file locking, extended attributes, device numbers — you still have to drop down to the platform API.

Worked Example: Total Size of Source Files

A function that recursively totals the size of all .cpp files in a directory tree:

#include <filesystem>
#include <iostream>
namespace fs = std::filesystem;

uintmax_t cpp_total_size(const fs::path &root) {
    uintmax_t total = 0;
    std::error_code ec;
    for (const auto &entry :
             fs::recursive_directory_iterator{root, ec}) {
        if (ec) break;   // skip unreadable subtrees
        if (entry.is_regular_file() &&
            entry.path().extension() == ".cpp") {
            total += entry.file_size();
        }
    }
    return total;
}

int main() {
    auto bytes = cpp_total_size(".");
    std::cout << bytes << " bytes in .cpp files\n";
}

Note the use of entry.file_size() rather than fs::file_size(entry.path()). Both give the same result, but the directory_entry version is cheaper because the stat information was fetched during iteration and is already cached.

Example: Filtering by date

Combine last_write_time() with <chrono> (Problem 43) to find files modified in the last 24 hours:

auto cutoff = fs::file_time_type::clock::now()
              - std::chrono::hours{24};
for (const auto &entry : fs::recursive_directory_iterator{"."}) {
    if (entry.is_regular_file() &&
        entry.last_write_time() > cutoff) {
        std::cout << entry.path() << "\n";
    }
}

Paths Are Not Files

One subtlety worth internalizing: a fs::path object is just a string with path-manipulation methods. It does not verify that the path names an existing file. You can construct a path to something that does not exist, compose new paths from it, and only then call fs::exists() to check:

fs::path target = "/nonexistent/dir" / "file.txt";
// Constructing the path: always succeeds
// Querying it:
if (!fs::exists(target)) {
    std::cout << "no such file\n";
}

The / operator does not touch the disk. It purely manipulates the string representation of the path, using the platform’s separator character. This matters because constructing a path is cheap and can be done offline from any I/O; the I/O only happens when you call a querying or modifying function.

Canonical Paths

fs::canonical(p) resolves all symbolic links and .. components and returns an absolute path that refers unambiguously to the same filesystem object:

fs::path p = "src/../include/./foo.h";
fs::path c = fs::canonical(p);   // e.g. "/home/alice/include/foo.h"
// c has no ../ or ./ components, and no symlink indirections

fs::weakly_canonical(p) is like canonical but does not require the path to exist: it resolves as much as it can and leaves the rest as-is.

Note. Path comparison is operating-system sensitive. On Windows, filesystem paths are case-insensitive: C:\Users\Alice and C:\USERS\ALICE refer to the same file, but operator== on fs::path does a lexicographic comparison and would consider them different. To test whether two paths refer to the same filesystem object regardless of case or symlinks, use fs::equivalent(p1, p2), which resolves both paths through the OS and compares inodes (on POSIX) or file identifiers (on Windows).

Problem 45: Type-Safe Formatting

Problem 1 taught us std::cout and the stream output model. Stream output is type safe — you cannot accidentally print an integer with a %s format specifier — but it is awkward for anything beyond the simplest output. Aligning a table means calling std::setw, std::left, std::setprecision, and remembering to std::setfill if you want something other than spaces. Worse, many of those settings are sticky: they persist across subsequent output unless you explicitly reset them. You set the flags, print one value, and then spend the next ten minutes wondering why all subsequent output is still left-aligned.

The alternative is printf-style formatting. It is concise and familiar from CS 136. It is also a trap: passing the wrong argument for a format specifier is undefined behaviour, and the compiler is not required to catch it.

C++20 gives us the best of both: std::format.

The reveal: what was wrong before

Here is what a formatted table looked like with stream manipulators before C++20:

#include <iostream>
#include <iomanip>

void old_style(const std::string &name, double val) {
    std::cout << std::left  << std::setw(20) << name
              << std::right << std::setw(10) << std::fixed
              << std::setprecision(2) << val << "\n";
    // Did you remember to reset std::fixed and std::setprecision afterward?
    // The next caller's output will be affected if you didn't.
}

The flags set on std::cout persist across calls. std::fixed and std::setprecision(2) stay in effect until something else changes them. This makes output functions non-composable: calling one function can silently alter the formatting of another.

The Basic Idea

std::format takes a format string with replacement fields (marked by curly braces) and a list of arguments. It returns a std::string:

#include <format>

std::string s = std::format("{} + {} = {}", 1, 2, 3);
// s == "1 + 2 = 3"

Every argument is formatted using its type’s formatter. There is no %d/%s distinction; the type of the argument determines how it is rendered. Passing the wrong number of arguments is a compile error (in most implementations; the standard requires a runtime exception for mismatches that cannot be caught at compile time, but implementations go further).

Format Specifiers

Inside the curly braces you can put a format specification after a colon. The syntax is:

{[index]:[fill][align][sign][#][0][width][.precision][type]}

Most of the fields are optional. Here are the most useful ones:

Width and Alignment

std::format("{:10}", 42);       // "        42"  (right-align, width 10)
std::format("{:<10}", 42);      // "42        "  (left-align, width 10)
std::format("{:^10}", 42);      // "    42    "  (center, width 10)
std::format("{:*^10}", 42);     // "****42****"  (fill with '*', center)
std::format("{:010}", 42);      // "0000000042"  (zero-pad, width 10)

Precision and Floating Point

std::format("{:.3f}", 3.14159);        // "3.142"  (3 decimal places)
std::format("{:>10.3f}", 3.14159);     // "     3.142"  (right-align, width 10)
std::format("{:.6g}", 0.000123456);    // "0.000123456"  (general format)
std::format("{:e}", 12345.678);        // "1.234568e+04"  (scientific)

Integer Bases

std::format("{:d}", 255);    // "255"     (decimal, the default)
std::format("{:x}", 255);    // "ff"      (lowercase hex)
std::format("{:X}", 255);    // "FF"      (uppercase hex)
std::format("{:#x}", 255);   // "0xff"    (hex with 0x prefix)
std::format("{:b}", 255);    // "11111111" (binary)
std::format("{:o}", 255);    // "377"     (octal)

Positional Arguments

Arguments are used in order by default, but you can address them by index:

std::format("{0} {1} {0}", "ab", "cd");   // "ab cd ab"
std::format("{1} {0}", "world", "hello"); // "hello world"

std::print (C++23)

Constructing a std::string and then writing it is sometimes wasteful. C++23 adds std::print and std::println which format and write in a single call:

#include <print>

std::print("Hello, {}!\n", "world");    // no intermediate string
std::println("x = {}", 42);             // println appends '\n' automatically

// Write to a specific stream
std::print(std::cerr, "Error: {}\n", message);

std::print is also guaranteed to be atomic for each call, which means it is safer than interleaving std::cout << calls from multiple threads.

std::format_to

When you need to write into an existing buffer or iterator without creating a temporary string, use std::format_to:

#include <format>
#include <iterator>

std::string buf;
buf.reserve(64);

std::format_to(std::back_inserter(buf), "x = {:>8.3f}", 3.14159);
// buf == "x =    3.142"

std::format_to_n writes at most n characters, avoiding buffer overflows when writing into a fixed-size array.

Custom Formatters

You can teach std::format how to format your own types by specializing std::formatter<T>. The specialization must provide two methods: parse (which reads the format specification) and format (which produces the output):

struct Point { int x, y; };

template<>
struct std::formatter<Point> {
    // parse: read any format spec (we ignore it; just consume it)
    constexpr auto parse(std::format_parse_context &ctx) {
        return ctx.begin();
    }
    // format: produce the output
    auto format(const Point &p, std::format_context &ctx) const {
        return std::format_to(ctx.out(), "({}, {})", p.x, p.y);
    }
};

Point pt{3, 4};
std::string s = std::format("Point: {}", pt);  // "Point: (3, 4)"

Aside (CS 146 connection: format strings across languages). Python’s f-strings (f"{x:.2f}") were a direct inspiration for std::format. Racket’s format procedure (~a, ~s) is similar in spirit but dynamically typed: the format string is not checked against the argument types until runtime. Python’s f-strings resolve the format at compile time (the expression is evaluated, not matched to a spec string), so type errors surface immediately. std::format sits between the two: the format string is a runtime value, but the argument types are checked at compile time (in conforming implementations), giving type safety without requiring interpolation syntax.

A Performance Note

std::format does not use the global locale by default. Numbers always use . as the decimal separator, regardless of the user’s locale setting. This is intentional: locale-sensitive output was one of the least loved aspects of std::cout. To opt into locale formatting, add the L flag:

std::format("{:L}", 1234567.89);    // locale-specific: "1,234,567.89" in en_US
std::format("{}", 1234567.89);      // always:          "1234567.89"

Worked Example: A Table Formatter

A function that prints a two-column table with fixed-width columns:

#include <format>
#include <iostream>
#include <vector>
#include <string>

void print_table(const std::vector<std::pair<std::string,double>> &rows) {
    std::cout << std::format("{:<20} {:>10}\n", "Name", "Value");
    std::cout << std::string(32, '-') << "\n";
    for (const auto &[name, val] : rows) {
        std::cout << std::format("{:<20} {:>10.2f}\n", name, val);
    }
}

Sample output:

Name                      Value
--------------------------------
Alice                     92.50
Bob                       87.33
Carol                    100.00

Compare writing this with stream manipulators. The std::format version states the layout at the call site, is self-resetting (the format string is consumed per-call, not sticky), and reads as a single coherent expression.

Dynamic Width and Precision

Width and precision can themselves come from arguments, using {:{n}} or {:.{n}} syntax where n is the index of the width argument:

int width = 12;
int prec  = 4;
// {:*<{}.{}} means: fill '*', left-align, width from arg, precision from arg
std::format("{:*<{}.{}f}", 3.14159, width, prec);
// "3.1416******"

This is useful when the column widths are not known at compile time — for example, when building a table where the widths are computed by scanning the data first.

Example: Dynamic column width

A common idiom: measure the longest entry, then format all entries to that width.

std::vector<std::string> names = {"Alice", "Bob", "Christopher"};
size_t col = 0;
for (const auto &n : names) col = std::max(col, n.size());

for (const auto &n : names) {
    std::cout << std::format("{:<{}}", n, col) << "\n";
}
// Alice
// Bob
// Christopher   (all padded to width 11)

Note. std::format was introduced in C++20. Compiler support arrived gradually: GCC 13, Clang 14, and MSVC 19.29 have complete implementations. For earlier compilers the fmtlib library (https://fmt.dev) provides the same interface as a third-party header. Many codebases used fmtlib before C++20 standardized it; the APIs are nearly identical. If you are writing production code that must support older compilers, fmtlib is the right answer today.

Problem 46: Strings That Match

Problem 1 read input line by line with std::getline and character by character with >>. For simple inputs that works. But sometimes input has structure — dates in a log file, email addresses in a CSV, URLs embedded in HTML — and writing a hand-rolled parser for each format is tedious and error-prone. The natural tool for recognising structured text patterns is regular expressions. C++11 adds them to the standard library.

Quick Regex Reference

The ECMAScript dialect used by default supports the following meta-characters:

ECMAScript regex meta-characters (subset)

. — any character except newline.
\d / \w / \s — digit / word character / whitespace.
[abc], [a-z], [^abc] — character class, range, negated class.
*, +, ? — zero-or-more, one-or-more, zero-or-one (greedy by default; append ? for lazy: *?, +?).
{n}, {n,}, {n,m} — exactly n, at least n, between n and m repetitions.
(...) — capture group. (?:...) — non-capturing group.
| — alternation: cat|dog matches either.
^, $ — start and end of string (or line with multiline flag).

Constructing a Pattern

std::regex (from <regex>) compiles a pattern string into an internal representation that can be matched against strings:

#include <regex>

std::regex date_re{R"(\d{4}-\d{2}-\d{2})"};         // ISO date: 2024-01-15
std::regex email_re{R"([a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\.[a-zA-Z]{2,})"};

Raw string literals (R"(...)") are convenient here because the regex syntax uses many backslashes and the raw form avoids double-escaping.

The default syntax is ECMAScript (the same flavour as JavaScript). Other syntaxes are available as flags:

std::regex re{"^hello$", std::regex::icase};          // case-insensitive
std::regex posix_re{"[[:alpha:]]+", std::regex::awk}; // POSIX ERE syntax

Matching

std::regex_match: Does the Whole String Match?

std::regex_match requires the entire input string to match the pattern. It returns true or false:

std::regex date_re{R"(\d{4}-\d{2}-\d{2})"};

std::regex_match("2024-01-15", date_re);    // true
std::regex_match("2024-1-15",  date_re);    // false -- only one month digit
std::regex_match("prefix 2024-01-15", date_re);  // false -- extra prefix

std::regex_search: Does Any Part Match?

std::regex_search succeeds if the pattern matches anywhere inside the input:

std::string log = "[2024-01-15] Server started on port 8080";
std::regex date_re{R"(\d{4}-\d{2}-\d{2})"};

std::regex_search(log, date_re);   // true -- found the date inside

Capturing Subgroups with std::smatch

Parentheses in the pattern define capture groups. The match results are stored in a std::smatch (string match) object:

std::string s = "2024-01-15";
std::regex date_re{R"((\d{4})-(\d{2})-(\d{2}))"};
std::smatch m;

if (std::regex_match(s, m, date_re)) {
    std::string whole = m[0];   // "2024-01-15" (the entire match)
    std::string year  = m[1];   // "2024"
    std::string month = m[2];   // "01"
    std::string day   = m[3];   // "15"
}

m[0] is always the full match. m[1], m[2], etc. are the captured groups in left-to-right order. Each element is a std::ssub_match which converts implicitly to std::string.

std::cmatch works the same way but for const char* instead of std::string.

Replacement

std::regex_replace returns a new string with all matches substituted:

std::string text = "cat and cat and cat";
std::regex cat_re{"cat"};

// Replace every match
std::string r = std::regex_replace(text, cat_re, "dog");
// r == "dog and dog and dog"

// Back-references in the replacement: $1 refers to capture group 1
std::regex date_re{R"((\d{4})-(\d{2})-(\d{2}))"};
std::string rearranged = std::regex_replace(
    "2024-01-15", date_re, "$3/$2/$1");
// rearranged == "15/01/2024"

Iterating Over All Matches

std::sregex_iterator produces one match result per occurrence:

std::string text = "foo@bar.com and baz@qux.org are valid";
std::regex email_re{R"([a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\.[a-zA-Z]{2,})"};

auto begin = std::sregex_iterator{text.begin(), text.end(), email_re};
auto end   = std::sregex_iterator{};

for (auto it = begin; it != end; ++it) {
    std::smatch m = *it;
    std::cout << m[0] << "\n";
}
// foo@bar.com
// baz@qux.org

Look familiar? It is the same iterator protocol from Problem 8 and Problem 17. sregex_iterator is an input iterator; you can pass it to the algorithms in <algorithm> just like any other iterator pair.

A Performance Warning

Note. std::regex compiles the pattern at runtime. Compilation is expensive: it builds a finite automaton from the pattern string, which takes time proportional to the size of the pattern. If you use a regex inside a loop, declare it const outside the loop so it is compiled only once:
// SLOW: recompiles the pattern on every iteration
for (const auto &line : lines) {
    if (std::regex_search(line, std::regex{R"(\d+)"})) { ... }
}

// FAST: compile once, search many times
const std::regex num_re{R"(\d+)"};
for (const auto &line : lines) {
    if (std::regex_search(line, num_re)) { ... }
}
This is the single most common std::regex performance mistake.

Regex Flags

Several flags can be combined with | when constructing a std::regex:

// icase: case-insensitive matching
std::regex ci_re{"hello", std::regex::icase};
std::regex_search("HELLO world", ci_re);   // true

// multiline: ^ and $ match start/end of each line, not just the whole string
std::regex ml_re{"^start", std::regex::multiline};

// nosubs: do not store sub-expression matches (faster if you don't need them)
std::regex fast_re{R"(\d+)", std::regex::nosubs | std::regex::optimize};

The std::regex::optimize flag hints to the implementation that the pattern will be used many times, so it should spend extra time upfront constructing an efficient automaton.

Worked Example: Extracting URLs from Text

#include <regex>
#include <string>
#include <vector>

std::vector<std::string> extract_urls(const std::string &text) {
    // simplified URL pattern: http(s)://...
    const std::regex url_re{
        R"(https?://[^\s<>"{}|\\^`\[\]]+)"
    };
    std::vector<std::string> urls;
    auto begin = std::sregex_iterator{text.begin(), text.end(), url_re};
    for (auto it = begin; it != std::sregex_iterator{}; ++it) {
        urls.push_back((*it)[0]);
    }
    return urls;
}

The URL pattern is intentionally simple. Real URL parsing is more complex, but this covers the common case. Note that the const std::regex is declared inside the function body — it is still constructed once per call to extract_urls. If extract_urls is called in a tight loop, move the regex to a static const local to get the familiar “construct once per program” guarantee.

Aside (CS 241 connection: regular languages and DFAs). In CS 241 you studied the formal theory of regular languages. A regular expression describes a regular language; matching a string against the regex is equivalent to running the corresponding DFA. std::regex implements this at runtime: the constructor builds the automaton, and the search functions run it. The ECMAScript dialect adds back-references (\1, \2), which are not part of the strict formal definition of regular expressions — back-references can express non-regular patterns, so they are implemented with backtracking rather than a pure DFA, which can cause exponential worst-case behaviour on pathological inputs.

Example: Parsing log lines

A typical use: extract the timestamp and severity from a structured log line.

// Log format: "[2024-01-15 08:32:01] ERROR: disk full"
const std::regex log_re{
    R"(\[(\d{4}-\d{2}-\d{2} \d{2}:\d{2}:\d{2})\] (\w+): (.+))"
};
std::string line = "[2024-01-15 08:32:01] ERROR: disk full";
std::smatch m;
if (std::regex_match(line, m, log_re)) {
    std::string ts       = m[1];  // "2024-01-15 08:32:01"
    std::string severity = m[2];  // "ERROR"
    std::string message  = m[3];  // "disk full"
}

Note. std::regex is the right tool for recognising patterns. For simple prefix or suffix checks, string_view methods from Problem 42 are faster and clearer: sv.starts_with("http") is better than constructing and running a regex. For structured formats with nested grammar (JSON, XML, C++ source), a regex is the wrong tool entirely — use a proper parser.

Problem 47: Numbers Drawn From a Distribution

At some point every program needs a random number. The C function rand() has been in the language since C89, and you have probably used it before. But it is broken in several ways. The randomness is poor — most implementations use a linear congruential generator whose low-order bits cycle with very short periods. The range is implementation-defined (RAND_MAX could be 32767 on some platforms). It is seeded globally, so you cannot have two independent streams of random numbers running at the same time. And rand() % N introduces a bias whenever RAND_MAX + 1 is not a multiple of N. In Problem 17 we saw that std::random_shuffle — which was removed in C++17 — was based on rand() and inherited all of these problems.

C++11 provides a proper replacement in <random>. It separates two orthogonal concerns: generating random bits, and mapping those bits to values in a desired distribution.

Why rand() is Broken

The classic C idiom for a random integer in the range [0, N) is:

#include <cstdlib>
srand(time(nullptr));   // seed once at startup
int r = rand() % N;     // NOT uniformly distributed!

The modulo introduces a bias. If RAND_MAX + 1 is not divisible by N, the values near the bottom of the range come up slightly more often than those near the top. With RAND_MAX = 32767 and N = 100, the values 0–67 appear 329 times each and 68–99 appear 328 times each — a small but real bias. For a card shuffle this matters: you do not want 75% of deals to slightly favour certain orderings.

The fix is not to use modulo. The C++11 distributions handle the bias correctly by using rejection sampling: if the engine produces a value in the biased tail, the distribution discards it and tries again. This is exactly what std::uniform_int_distribution does internally.

The Two Concepts: Engines and Distributions

Engine and Distribution

A random engine produces a stream of pseudo-random bits. It is seeded, stateful, and deterministic given the same seed.
A distribution takes raw bits from an engine and transforms them into values drawn from a specific probability distribution (uniform, normal, etc.).

The key insight is that the same engine can drive many different distributions without bias between them. In the old rand() world, the entire program shared one global stream. With <random>, each distribution request draws from a well-defined engine, with no interference.

Engines

The workhorse is std::mt19937, an implementation of the Mersenne Twister algorithm. It has a period of 2^19937 - 1, excellent statistical properties, and is the right default for simulation and games. There is also a 64-bit version, std::mt19937_64, for generating 64-bit random values.

Seeding

#include <random>

// Deterministic seed: same output every run (useful for testing)
std::mt19937 rng{42};

// Non-deterministic seed: different output on each run
std::random_device rd;
std::mt19937 rng2{rd()};

std::random_device reads from the OS’s entropy source (/dev/urandom on Linux, BCryptGenRandom on Windows). It is not a pseudo-random engine; it is a gateway to true randomness. Using it to seed the Mersenne Twister is the standard idiom.

Note. On some embedded or virtualized platforms, std::random_device may not have access to a hardware entropy source and falls back to a deterministic sequence. Check rd.entropy() — if it returns 0.0, the device is deterministic and you should use a different seeding strategy.

Distributions

A distribution object is constructed with its parameters, then called with the engine to generate a value:

std::mt19937 rng{std::random_device{}()};

// Uniform integer in [1, 6] inclusive
std::uniform_int_distribution<int> die{1, 6};
int roll = die(rng);

// Uniform real in [0.0, 1.0)
std::uniform_real_distribution<double> unit{0.0, 1.0};
double u = unit(rng);

// Normal distribution with mean 0, standard deviation 1
std::normal_distribution<double> gauss{0.0, 1.0};
double x = gauss(rng);

// Bernoulli: true with probability p
std::bernoulli_distribution coin{0.5};
bool heads = coin(rng);

The engine is passed by reference to operator(). The distribution reads as many bits as it needs and updates the engine’s state. Two distributions can share an engine with no interaction between them.

Other Useful Distributions

The standard library provides many others: std::poisson_distribution, std::exponential_distribution, std::discrete_distribution (a weighted choice over a finite set), and std::piecewise_constant_distribution. They all follow the same dist(engine) pattern.

Discrete Distribution: Weighted Choice

A common game-programming pattern is a weighted random selection from a finite menu. std::discrete_distribution takes a list of weights and produces indices according to those weights:

// Loot table: 60% common, 30% rare, 10% legendary
std::discrete_distribution<int> loot{{60.0, 30.0, 10.0}};
std::mt19937 rng{std::random_device{}()};

int tier = loot(rng);   // 0, 1, or 2 with probabilities 0.6, 0.3, 0.1

The weights do not need to sum to 1; the distribution normalizes them.

Shuffling

std::shuffle (from <algorithm>) replaces the removed std::random_shuffle:

#include <algorithm>
#include <vector>

std::vector<int> deck = {1, 2, 3, 4, 5, 6, 7, 8, 9, 10};
std::mt19937 rng{std::random_device{}()};

std::shuffle(deck.begin(), deck.end(), rng);
// deck is now in a uniformly random permutation

std::random_shuffle was removed in C++17 because it used rand(). std::shuffle takes an explicit engine, so the source of randomness is under your control.

Thread Safety

Note. Random engines are not thread-safe. Calling dist(rng) from two threads simultaneously on the same engine is a data race. The fix is to give each thread its own engine:
thread_local std::mt19937 local_rng{std::random_device{}()};

void worker() {
    std::uniform_int_distribution<int> dist{1, 100};
    int v = dist(local_rng);   // safe: local_rng is per-thread
}
The thread_local storage class (introduced in C++11) gives each thread its own copy of the variable, initialized the first time that thread touches it.

Worked Example: Monte Carlo Estimation of π

A classic application: estimate π by sampling random points in the unit square and checking whether they fall inside the unit circle.

#include <random>
#include <cmath>
#include <iostream>

double estimate_pi(long long trials) {
    std::mt19937_64 rng{std::random_device{}()};
    std::uniform_real_distribution<double> dist{-1.0, 1.0};
    long long inside = 0;
    for (long long i = 0; i < trials; ++i) {
        double x = dist(rng), y = dist(rng);
        if (x*x + y*y <= 1.0) ++inside;
    }
    return 4.0 * static_cast<double>(inside) / trials;
}

int main() {
    std::cout << estimate_pi(10'000'000) << "\n";  // ~3.1416
}

The 10'000'000 uses C++14 digit separators for readability. The uniform_real_distribution covers the range [-1, 1] on both axes, so the unit circle fits inside the sampling square with area 4; multiplying the hit rate by 4 gives the estimate of π.

Aside (CS 466 / STAT courses connection: pseudo-random sampling). The quality of Monte Carlo estimates depends on the quality of the underlying random number generator. A poor generator introduces correlations between samples that can bias results in subtle, hard-to-detect ways. This is exactly why std::mt19937 is the standard recommendation: its statistical properties have been rigorously tested (Diehard battery, TestU01) and it passes all standard randomness tests. In numerical methods courses you will encounter quasi-random (low-discrepancy) sequences (Halton, Sobol) that converge faster than pseudo-random numbers for certain integration problems; C++ does not provide these in the standard library, but the engine/distribution separation makes it easy to plug in your own engine.

Example: Reproducible experiments

For scientific simulations, you often want results to be reproducible: run the same seed, get the same sequence. Fixed seeds make bugs debuggable.

// Set via command line: ./sim --seed 42
// Or use random_device for one-off runs
unsigned seed = 42;
std::mt19937 rng{seed};
std::cout << "seed: " << seed << "\n";  // log it so you can reproduce

std::uniform_int_distribution<int> d{1, 100};
for (int i = 0; i < 5; ++i)
    std::cout << d(rng) << " ";  // deterministic: same every run with seed 42

This pattern is standard in research code: log the seed alongside results so that any run can be exactly replicated.

Note. For cryptographic purposes — generating session tokens, passwords, keys — do not use std::mt19937. The Mersenne Twister is a predictable generator: after observing 624 consecutive outputs, an attacker can reconstruct the internal state and predict all future outputs. For cryptographic randomness, use std::random_device directly (it reads from the OS’s cryptographically secure source) or a platform-specific API (BCryptGenRandom on Windows, getrandom(2) on Linux).

Problem 48: Heterogeneous Storage Revisited

Problem 36 gave us std::variant for storing one of a closed set of types — closed meaning the set is fixed at compile time. Problem 35 gave us std::optional for the degenerate case of one type that may or may not be present. Between those two tools, a great deal of heterogeneous storage is covered. But not all of it.

Sometimes the set of types is genuinely open. A scripting language interpreter stores values that can be integers, floats, strings, lists, or any user-defined type — and users can add new types. A plugin system passes messages between components that were compiled independently and know nothing about each other’s type hierarchy. A property bag on a game entity holds arbitrary attributes whose types are set at data-load time, not compile time. For those cases, C++17 provides std::any.

The void* Comparison

Before C++17, the idiomatic (if unsafe) way to store an arbitrary value was void*:

// Old C-style heterogeneous storage
void *value = nullptr;
int x = 42;
value = &x;              // any pointer fits

// To get the value back:
int *p = static_cast<int *>(value);   // hope you remembered the type!

There is nothing wrong if you actually remember the type. But nothing stops you from casting to the wrong type, and the result is undefined behaviour with no diagnostic. std::any wraps the same concept with type tracking:

std::any value = 42;
// std::any_cast<std::string>(value);  // throws bad_any_cast -- caught!
int v = std::any_cast<int>(value);     // OK

The type is remembered at assignment time and checked at extraction time.

std::any

std::any (from <any>) can hold a value of any copy-constructible type. It is the type-safe version of void*.

#include <any>

std::any a = 42;          // holds an int
a = std::string{"hello"}; // now holds a std::string
a = 3.14;                 // now holds a double

At any moment, an any holds at most one value. Assigning a new value destroys the previous one.

Constructing

std::any a1;                          // empty
std::any a2 = 42;                     // holds int{42}
std::any a3 = std::make_any<int>(42); // same; make_any is explicit about T
std::any a4 = std::string{"world"};   // holds a std::string

a1.has_value();   // false
a2.has_value();   // true
a2.reset();       // a2 is now empty

Accessing: any_cast

Retrieving the stored value requires knowing its type. You declare what you expect, and the cast either succeeds or fails:

std::any a = 42;

// Cast by value: throws std::bad_any_cast if the type is wrong
int v = std::any_cast<int>(a);      // v == 42

// Cast by reference: also throws if type is wrong
std::any_cast<int &>(a) = 99;       // modifies the stored value in place

// Cast by pointer: returns nullptr instead of throwing
int *p = std::any_cast<int>(&a);    // p != nullptr
std::string *sp = std::any_cast<std::string>(&a);  // sp == nullptr

The pointer form is the right choice when you are probing for a type you are not sure about. Catching std::bad_any_cast is the right choice when a type mismatch would be a genuine programming error.

Querying the Type

a.type() returns a const std::type_info& identifying the stored type. You can compare it with typeid:

std::any a = 42;
if (a.type() == typeid(int)) {
    std::cout << "Holds an int\n";
}

This is dynamic type introspection — the runtime equivalent of what std::variant’s std::holds_alternative<T> does statically.

Comparison: optional, variant, any

Here is the decision table:

Choosing between optional, variant, and any

std::optional<T> — "maybe a T". One type, may or may not be present. Zero overhead beyond sizeof(T)+1. Use when a function might not return a value, or a configuration option might not be set.
std::variant<T1,...,Tn> — "exactly one of a closed set of types". The set is fixed at compile time. No heap allocation; the variant is as large as the largest member. The compiler enforces exhaustive handling with std::visit. Use when the set of types is known and finite.
std::any — "any copy-constructible type". The set of types is open and can grow at runtime. Typically heap-allocates the stored value (though implementations use a small-buffer optimization for small types). Use when the types genuinely cannot be predicted at compile time.

The costs matter. optional<T> and variant<T1,...> are stack-only; no heap, no indirection. any almost certainly heap-allocates anything larger than a pointer or two (small-buffer optimization varies by implementation but is not guaranteed).

The Small-Buffer Optimization

Most implementations of std::any include a small-buffer optimization (SBO): if the stored value is small enough (typically up to sizeof(void*) bytes, though the exact threshold is implementation-defined), it is stored inline inside the any object with no heap allocation. Larger values are heap-allocated.

std::any a = 42;          // int: small -- likely stored inline, no heap
std::any b = std::string{"hello world"};  // string: larger -- heap-allocates

You cannot rely on any particular threshold. If you care about allocation, prefer std::variant or design your own type-erased wrapper with a known SBO policy.

Type Erasure Connection

std::any is a concrete instance of the type erasure pattern (which we will examine more formally in Problem 56). It uses an internal virtual-dispatch mechanism to call the correct copy constructor, destructor, and typeid for whatever type it happens to hold, without the user having to name that type. You are already familiar with this idea: std::function<Ret(Args...)> from Problem 17 is another type-erased wrapper, there over callables rather than arbitrary values.

Worked Example: A Property Bag

A common use in game programming: entity components with heterogeneous attributes stored by name.

#include <any>
#include <string>
#include <unordered_map>
#include <stdexcept>

class PropertyBag {
    std::unordered_map<std::string, std::any> props_;
public:
    template<typename T>
    void set(const std::string &key, T &&value) {
        props_[key] = std::forward<T>(value);
    }

    template<typename T>
    T get(const std::string &key) const {
        auto it = props_.find(key);
        if (it == props_.end())
            throw std::out_of_range{"key not found: " + key};
        return std::any_cast<T>(it->second);  // throws bad_any_cast if wrong T
    }

    bool has(const std::string &key) const {
        return props_.count(key) > 0;
    }
};

Usage:

PropertyBag entity;
entity.set("name", std::string{"Dragon"});
entity.set("hp", 250);
entity.set("speed", 3.5);

std::string name = entity.get<std::string>("name");  // "Dragon"
int hp           = entity.get<int>("hp");            // 250

Each attribute can be a different type. New attribute types — textures, AI behaviour parameters, custom component classes — can be added later without changing PropertyBag at all. That openness is exactly what std::any is for.

Aside (CS 247 / CS 346 connection: scripting and plugin systems). Dynamic languages like Lua and Python use this pattern pervasively: every value is essentially a tagged union (the language’s runtime equivalent of any) that carries its type at runtime. When embedding Lua in a C++ game engine, the bridge between the two worlds often uses a structure exactly like PropertyBag: a map from string names to std::any values that Lua scripts can read and write. The type safety boundary is enforced at the any_cast site.

Example: Extending PropertyBag with type inspection

Sometimes you want to enumerate all stored properties without knowing their types in advance. std::any::type() enables this:

void dump(const PropertyBag &bag) {
    // In a real implementation, iterate over the internal map.
    // Here we show how to dispatch on dynamic type.
    std::any a = bag.get<std::any>("somekey");  // hypothetical raw access
    if (a.type() == typeid(int))
        std::cout << "int: " << std::any_cast<int>(a) << "\n";
    else if (a.type() == typeid(std::string))
        std::cout << "str: " << std::any_cast<std::string>(a) << "\n";
    else
        std::cout << "unknown type: " << a.type().name() << "\n";
}

The type().name() call gives an implementation-defined mangled name. Useful for debugging; not suitable for production logic.

Note. The decision guide in one sentence: prefer std::variant when you know the types at compile time; prefer inheritance and virtual functions when the types are known but the behaviour is what varies; use std::any only when the types genuinely cannot be predicted at compile time and you are prepared to pay for the heap allocation and the runtime type-check on every access. any gives you maximum flexibility at the cost of all compile-time guarantees. Use it sparingly.

Problem 49: I Want Two Things at Once

We have built a complete Vector, mastered template programming, and studied every ownership model C++ offers. There is one more dimension we have not touched: time. A modern computer has several processor cores sitting idle while our programs execute on one. Every loop, every computation, every wait-for-IO is an opportunity we leave on the table.

The problem: I want to do two things simultaneously.

What Is a Thread?

A thread is an independent sequence of instructions executing within a shared address space. Multiple threads in the same process see the same global variables and heap-allocated objects, but each thread has its own call stack, its own program counter, and its own set of registers.

C++11 standardises thread creation through std::thread, declared in <thread>. Before C++11, portable concurrency required either the POSIX thread library (pthreads) or platform-specific APIs — each with a different interface and error-handling model.

Creating and Starting a Thread

#include <thread>
#include <iostream>

void say_hello() {
    std::cout << "Hello from a thread!\n";
}

int main() {
    std::thread t{say_hello};   // create thread; it starts immediately
    std::cout << "Hello from main!\n";
    t.join();                   // wait for t to finish
}

The std::thread constructor takes a callable (function, lambda, or functor) and zero or more arguments to pass to it. The thread starts executing the callable immediately upon construction — there is no separate “start” call.

void sum_range(const std::vector<int> &v,
               size_t lo, size_t hi, long long &result) {
    result = 0;
    for (size_t i = lo; i < hi; ++i) result += v[i];
}

int main() {
    std::vector<int> v(10'000'000, 1);
    long long left = 0, right = 0;
    size_t mid = v.size() / 2;

    std::thread t{sum_range,
                  std::ref(v), 0, mid, std::ref(left)};
    sum_range(v, mid, v.size(), right);
    t.join();

    std::cout << left + right << '\n';    // 10000000
}

std::ref(v) wraps v in a reference wrapper because std::thread’s constructor copies its arguments by default — passing v directly would silently copy the entire vector into the thread.

Thread Lifecycle: `join` and `detach`

Every std::thread object that is joinable — meaning its thread is still running or has finished but has not been collected — must have exactly one of two things done to it before its destructor runs:

t.join() — the calling thread blocks until t’s thread finishes, then marks t as non-joinable. Use when you need the result or want to wait for completion.
t.detach() — the calling thread lets t’s thread run independently in the background. The thread’s resources are released when the thread finishes. t is then non-joinable. Use sparingly: once detached, you cannot wait for the thread or know when it finishes.

If a std::thread is destroyed while still joinable, its destructor calls std::terminate — the same hard abort we saw in Problem 9 when two exceptions were simultaneously active.

{
    std::thread t{some_work};
    // ... do something else ...
    // if we return here without join() or detach(): std::terminate!
}

Aside (CS 343 connection: μC++ tasks vs std::thread). In CS 343 you use μC++, a language extension built on top of C++ that introduces _Task as a first-class concept. A _Task is automatically started when declared and automatically joined at the end of its scope — the compiler enforces the lifecycle that std::thread leaves to the programmer. std::thread is lower-level and more explicit. The advantage is portability and generality; the disadvantage is that the programmer is responsible for every join and detach. C++20’s std::jthread (Chapter 54) recovers the automatic-join behaviour that μC++ provides natively.

The Exception Hazard

Look familiar?

void f() {
    std::thread t{background_work};
    do_some_work();    // might throw
    t.join();          // never reached if exception is thrown
                       // t's destructor: std::terminate!
}

This is the exact same problem as Problem 14 with raw pointers. do_some_work() throws, the exception propagates out of f, t.join() is skipped, and t’s destructor panics. The program terminates abruptly rather than handling the exception.

In Problem 14 we fixed the leak by wrapping the pointer in a class whose destructor calls delete. The fix here is identical in spirit: wrap the thread in a class whose destructor calls join.

Thread RAII

class JoinThread {
    std::thread t_;
public:
    explicit JoinThread(std::thread t) : t_{std::move(t)} {}

    ~JoinThread() {
        if (t_.joinable()) t_.join();
    }

    // Threads cannot be copied -- ownership is unique
    JoinThread(const JoinThread &)            = delete;
    JoinThread &operator=(const JoinThread &) = delete;

    // Moving transfers ownership
    JoinThread(JoinThread &&)            = default;
    JoinThread &operator=(JoinThread &&) = default;
};

Now f is safe regardless of how it exits:

void f() {
    JoinThread t{std::thread{background_work}};
    do_some_work();    // may throw
}   // t's destructor joins the thread -- even on exception

The pattern is RAII: the resource (a running thread) is acquired in the constructor and released in the destructor. The destructor runs unconditionally when the scope exits, whether by normal return or by exception propagation.

Note. C++20 ships std::jthread in <thread>, which does exactly this: it joins automatically in its destructor. Unless you need to target pre-C++20 compilers, prefer std::jthread over the JoinThread wrapper above. Chapter 54 covers std::jthread in full, including its cooperative-cancellation mechanism via std::stop_token.

Passing Arguments: the Reference Pitfall

void update(int &x) { ++x; }

int val = 0;
std::thread t{update, val};    // WRONG: copies val into thread
t.join();
// val is still 0

std::thread’s constructor always copies its arguments. If the callable takes a reference, the thread receives a reference to the copy, not to the original. Use std::ref to force pass-by-reference:

std::thread t{update, std::ref(val)};   // passes reference to val
t.join();
// val is now 1

The object referenced must outlive the thread. A local variable that outlives only the current stack frame will become a dangling reference once that frame is destroyed. Lambdas make the ownership explicit — capturing by value gives the thread its own copy; capturing by reference reintroduces the lifetime risk.

Example: Parallel sum with lambdas.

std::vector<int> v(10'000'000, 1);
long long left = 0, right = 0;
size_t mid = v.size() / 2;

JoinThread t{std::thread{[&]() {
    for (size_t i = 0; i < mid; ++i) left += v[i];
}}};
for (size_t i = mid; i < v.size(); ++i) right += v[i];
// t auto-joins here

std::cout << left + right << '\n';

What We Have Not Solved Yet

Splitting work across threads is only half the story. The parallel sum above works because the two halves write to different variables (left and right). What if two threads write to the same variable at the same time?

int counter = 0;

JoinThread t1{std::thread{[&]() {
    for (int i = 0; i < 1'000'000; ++i) ++counter;
}}};
for (int i = 0; i < 1'000'000; ++i) ++counter;
// result is NOT guaranteed to be 2000000

The increment ++counter is not a single hardware instruction; it is a read, a modify, and a write. If two threads interleave those operations, some increments are lost. This is a data race and produces undefined behaviour. Chapter 50 introduces mutexes — locks that prevent two threads from interleaving on shared data.

Problem 49 ended with an unsolved problem: two threads writing to the same integer produced undefined behaviour. The increment was not atomic. We need a way to say “only one thread at a time may touch this data.”

The root cause is a data race: two threads access the same memory location simultaneously, at least one of them writes, and there is no synchronisation between them. The C++ standard says a data race is undefined behaviour — not “probably a wrong answer”, but literally UB, as bad as a dangling pointer or an out-of-bounds array access. The compiler is allowed to assume data races do not occur and to generate code that is entirely wrong if they do. The fix is to introduce explicit synchronisation.

Data Race

A data race occurs when two threads access the same memory location concurrently, at least one access is a write, and the accesses are not ordered by a synchronisation operation. Data races are undefined behaviour in C++11 and later.

std::mutex — The Fundamental Lock

The word “mutex” is short for mutual exclusion. A mutex is an object with two operations: lock and unlock. At most one thread holds the lock at any given moment. A thread that calls lock() while another thread holds the lock sleeps until the lock is released.

#include <mutex>
#include <thread>

int counter = 0;
std::mutex mtx;

void increment_a_million() {
    for (int i = 0; i < 1'000'000; ++i) {
        mtx.lock();
        ++counter;        // only one thread at a time reaches here
        mtx.unlock();
    }
}

The region between lock() and unlock() is a critical section. Only one thread can be inside it at a time. The data race is gone.

But there is an obvious problem. If anything between lock() and unlock() exits early — an exception, a return, a continue — the mutex stays locked forever. Every other thread trying to acquire it will block indefinitely. This is the exact same exception-safety trap from Problem 14, and the fix is the exact same idea: RAII.

std::lock_guard — RAII for Mutexes

std::lock_guard<std::mutex> locks the mutex in its constructor and unlocks it in its destructor. It cannot be copied or moved. That is its entire implementation.

void increment_a_million() {
    for (int i = 0; i < 1'000'000; ++i) {
        std::lock_guard<std::mutex> guard{mtx};
        ++counter;
    }   // guard's destructor releases mtx -- even on exception
}

In C++17, class template argument deduction (CTAD) lets you omit the template argument: std::lock_guard guard{mtx}. The mutex type is deduced automatically.

std::unique_lock — Flexible Locking

std::unique_lock<std::mutex> is the more powerful sibling of lock_guard. It supports everything lock_guard does, and then some:

Deferred locking: construct with std::defer_lock and call .lock() later.
Try-locking: .try_lock() returns false instead of blocking if the mutex is already held.
Timed locking: .try_lock_for(duration).
Manual early unlock: call .unlock() before the scope ends.
Moveable: lock_guard cannot be moved; unique_lock can — this is required by std::condition_variable in Problem 53.

std::unique_lock<std::mutex> lk{mtx};  // locked immediately
// ... work under lock ...
lk.unlock();                           // release early
// ... lock-free computation ...
lk.lock();                             // re-acquire
// destructor releases again at end of scope

The general rule: use lock_guard when you just need a scope-locked mutex and nothing fancy. Switch to unique_lock when you need flexibility, particularly for condition variables.

A Thread-Safe Counter Class

The idiomatic pattern is to hide the mutex alongside the data it protects. That way a caller cannot accidentally access the data without holding the lock.

class Counter {
    int value_ = 0;
    mutable std::mutex mtx_;
public:
    void increment() {
        std::lock_guard guard{mtx_};
        ++value_;
    }
    void add(int n) {
        std::lock_guard guard{mtx_};
        value_ += n;
    }
    int get() const {
        std::lock_guard guard{mtx_};
        return value_;
    }
};

The mutex is mutable so that const member functions (like get) can still lock it. A const operation conceptually does not mutate the observable value, but acquiring a lock is a mechanical necessity — not a logical state change — so mutable is the right tool.

Note. Minimise the scope of the lock. Hold it only while actually accessing shared data, not during lengthy computations that do not touch shared state. A wide lock reduces contention but starves other threads unnecessarily.
// Prefer: compute outside the lock, write inside it
auto result = expensive_computation();  // no lock needed
{
    std::lock_guard guard{mtx};
    shared_data = result;   // lock only for the write
}

Deadlock

A deadlock occurs when two or more threads each wait for a resource held by another, so none can ever proceed. The classic two-mutex deadlock:

std::mutex m1, m2;

// Thread A:            // Thread B:
m1.lock();             m2.lock();
// --- context switch ---
m2.lock(); // sleeps   m1.lock(); // sleeps -- deadlock!

Thread A holds m1 and waits for m2. Thread B holds m2 and waits for m1. Neither can ever proceed.

The simplest defence is lock ordering: always acquire multiple mutexes in the same global order everywhere in the program. If every thread acquires m1 before m2, the cycle above cannot form.

When a global order is not easy to enforce, use std::lock, which acquires multiple locks atomically using a deadlock-avoidance algorithm:

std::unique_lock<std::mutex> lk1{m1, std::defer_lock};
std::unique_lock<std::mutex> lk2{m2, std::defer_lock};
std::lock(lk1, lk2);   // acquires both -- deadlock-free

C++17 provides std::scoped_lock, which folds this pattern into a single RAII object:

// C++17 -- the cleanest way to lock two mutexes at once
std::scoped_lock lk{m1, m2};   // acquires both; destructor releases both

Prefer scoped_lock over the defer_lock dance whenever you are on C++17 or later.

Readers-Writer Locks

Sometimes many threads read shared data while writes are rare. A plain mutex forces readers to take turns, which wastes concurrency. C++17 introduces std::shared_mutex (in <shared_mutex>): many threads may hold the lock in shared (read) mode simultaneously, but exclusive (write) mode blocks until every shared holder has released.

#include <shared_mutex>
#include <map>

std::shared_mutex rw;
std::map<int, std::string> table;

std::string lookup(int key) {
    std::shared_lock reader{rw};      // shared: many readers OK
    auto it = table.find(key);
    return it != table.end() ? it->second : "";
}

void insert(int key, std::string val) {
    std::unique_lock writer{rw};      // exclusive: all readers blocked
    table[key] = std::move(val);
}

Use std::shared_lock for read paths and std::unique_lock for write paths.

Aside (CS 343 connection: monitors in μC++). In CS 343 you used the _Monitor class. A _Monitor implicitly acquires a single mutual-exclusion lock on every public member function entry and releases it on exit. You never write lock() or unlock() — the language inserts them for you.
The Counter class above is the C++ equivalent of a manual monitor: a mutex field coupled with lock guards at every method boundary. The C++ version is more verbose but also more flexible — you choose which portion of each method runs under the lock, you can use readers-writer locks, and you can have several independent mutexes protecting different parts of the same object’s state.

Aside (CS 245 connection: mutual exclusion as a safety invariant). CS 245 taught Hoare logic: assertions {P} S {Q} that describe what a statement S guarantees. A critical section has a natural Hoare-logic framing. Let alone mean “I am the sole thread currently executing the critical section.” Then locking establishes alone: {¬alone} lock() {alone}. Unlocking surrenders it: {alone} unlock() {¬alone}. Mutual exclusion is simply a safety invariant — the kind of invariant Hoare logic was designed to reason about.

Example: The counter from Problem 49, fixed

#include <iostream>
#include <thread>
#include <mutex>

int counter = 0;
std::mutex mtx;

void add_million() {
    for (int i = 0; i < 1'000'000; ++i) {
        std::lock_guard guard{mtx};
        ++counter;
    }
}

int main() {
    std::thread t{add_million};
    add_million();
    t.join();
    std::cout << counter << '\n';   // always 2000000
}

Without the mutex, the output could be anywhere from 1,000,001 to 2,000,000 — or anything in between — because lost updates are undefined behaviour, not merely a race between exact outcomes. With the mutex, the output is always exactly 2000000.

Problem 51: I Want a Result Eventually

Problem 49 and Problem 50 gave us threads and mutexes — the lowest level of C++ concurrency. We can launch work on another thread, and we can prevent two threads from stepping on shared data. But the programming model is cumbersome: spin up a thread, set up a shared variable, protect it with a mutex, join the thread, read the variable. That is a lot of ceremony for the most common concurrency use case: “compute this value on another thread, and give it to me when you’re done.”

For that pattern, C++11 gives us std::future.

The Old Way

Before futures, getting a result back from a thread looked like this:

int result = 0;
std::mutex mtx;
bool ready = false;

std::thread t{[&]() {
    int r = expensive_computation();
    std::lock_guard guard{mtx};
    result = r;
    ready = true;
}};

// ... do other work ...

{
    std::lock_guard guard{mtx};
    while (!ready) { /* spin */ }
}
std::cout << result << '\n';
t.join();

This is verbose and wrong — the spin loop is wasteful and still racy without a proper condition variable (Problem 53). Futures replace all of this with two objects: a future (the read end) and a promise (the write end).

std::future — A Handle to a Value Not Yet Computed

std::future

A std::future<T> represents a value of type T that will be produced asynchronously at some point in the future. You obtain the value by calling .get(), which blocks until the value is ready.

The key operations:

.get(): blocks until the value is available, then returns it (by value or by reference, depending on T). Can only be called once — calling it a second time is undefined behaviour.
.wait(): blocks until the value is ready, but does not consume it. Useful when you want to synchronise without extracting the result.
.valid(): returns true if this future is associated with a shared state (i.e., some result will eventually be produced).
.wait_for(duration): waits up to a given duration; returns a std::future_status indicating whether the result is ready, deferred, or timed out.

std::async — The Easiest Entry Point

std::async is the simplest way to get a future. It takes a callable and optional arguments, runs the callable (possibly on a new thread), and returns a std::future holding the return value:

#include <future>
#include <iostream>

int expensive() {
    // ... lengthy computation ...
    return 42;
}

int main() {
    std::future<int> f = std::async(std::launch::async, expensive);
    // main thread continues here while expensive() runs
    std::cout << "waiting...\n";
    int result = f.get();   // blocks until expensive() finishes
    std::cout << result << '\n';
}

std::async accepts a launch policy as its first argument:

std::launch::async: run the callable on a new thread immediately.
std::launch::deferred: do not start the callable at all; run it lazily on the calling thread the first time .get() or .wait() is called.
No policy (default): the implementation chooses; it may do either. Prefer to specify an explicit policy. The default is deceptively named “async” but is not guaranteed to create a thread, which can cause surprising behaviour if you are counting on parallelism.

Note. A std::future returned by std::async with launch::async has a special destructor: it blocks until the background thread finishes. This is unlike std::thread, where the destructor calls std::terminate if the thread is still running. It also means that a discarded future — one not stored in a variable — blocks on the spot:
// Surprise: this does NOT run in the background!
std::async(std::launch::async, do_work);
// The temporary future's destructor blocks here until do_work() returns.
Always store the returned future if you want the work to proceed concurrently.

Exception Propagation

One of the best features of futures: if the asynchronous function throws an exception, that exception is stored in the shared state and re-thrown by .get().

std::future<int> f = std::async(std::launch::async, []() -> int {
    throw std::runtime_error{"oops"};
    return 0;
});

try {
    int v = f.get();   // re-throws std::runtime_error
} catch (const std::runtime_error &e) {
    std::cout << "caught: " << e.what() << '\n';
}

The exception crosses the thread boundary automatically. With a raw thread and a shared variable you would have to implement this yourself.

std::promise — The Write End

std::async is convenient but opaque — you cannot control exactly when and how the value gets produced. std::promise<T> gives you that control directly. A promise is the write end of the future/promise channel:

std::promise<int> prom;
std::future<int> fut = prom.get_future();

std::thread t{[&prom]() {
    int result = expensive_computation();
    prom.set_value(result);   // wakes up fut.get()
}};

int v = fut.get();   // blocks until set_value is called
t.join();

The key operations on a promise:

.get_future(): returns the associated future (call at most once).
.set_value(v): stores v and unblocks anyone waiting on the future.
.set_exception(e): stores the exception pointer e; .get() will re-throw it.

Note. If a promise is destroyed without ever calling set_value or set_exception, the shared state is set to a std::future_error with std::future_errc::broken_promise. This prevents .get() from blocking forever.

std::packaged_task — Wrapping a Callable

std::packaged_task<F> wraps a callable of type F and fulfils a promise with whatever that callable returns when invoked. It bridges the gap between a plain callable and the future/promise machinery:

std::packaged_task<int()> task{[]() {
    return expensive_computation();
}};
std::future<int> fut = task.get_future();

std::thread t{std::move(task)};  // run task on a new thread
int v = fut.get();
t.join();

packaged_task is useful when you have an existing callable and want to hand it off to a thread pool or a scheduler that expects a plain no-argument callable — the future gives you a way to collect the result later.

Polling Without Blocking: wait_for and wait_until

Sometimes you want to check whether a future is ready without committing to an indefinite wait. .wait_for(duration) and .wait_until(time_point) return a std::future_status that tells you what happened:

using namespace std::chrono_literals;

std::future<int> f = std::async(std::launch::async, slow_work);

// Poll up to 100ms at a time, doing other work in between:
while (f.wait_for(100ms) != std::future_status::ready) {
    do_other_work();
}
int result = f.get();

The three possible status values are:

std::future_status::ready: the value is available; call .get() to retrieve it.
std::future_status::timeout: the duration elapsed but the value is not yet ready.
std::future_status::deferred: the task was launched with launch::deferred and has not started yet; it will run when you call .get() or .wait().

std::shared_future — Multiple Readers

std::future has move-only semantics and allows only one call to .get(). If you need to share the result with multiple threads, convert it to a std::shared_future<T> using .share():

std::shared_future<int> sf =
    std::async(std::launch::async, compute).share();

// Multiple threads can call .get() on the same shared_future:
std::thread t1{[sf]() { std::cout << sf.get() << '\n'; }};
std::thread t2{[sf]() { std::cout << sf.get() << '\n'; }};
t1.join(); t2.join();

Each thread gets a copy of the shared_future (hence the by-value capture). All copies share the same underlying state and all calls to .get() return the same value once it is ready.

Aside (CS 343 connection: futures and actors in μC++). In CS 343 you saw how _Actor tasks communicate by passing messages. An actor sends a message and a reply comes back asynchronously — exactly what a future/promise pair encodes. set_value corresponds to sending the reply message; .get() corresponds to waiting for it.
The difference is that C++ futures are general-purpose: any callable can produce any value, and you collect it with a typed future<T>. The μC++ actor model is more structured but also more opinionated about the communication topology.

Example: Parallel file hashing

Hash four files concurrently using std::async, then collect all results.

#include <future>
#include <string>
#include <vector>
#include <iostream>

std::string hash_file(const std::string &path);  // defined elsewhere

int main() {
    std::vector<std::string> paths = {
        "a.bin", "b.bin", "c.bin", "d.bin"
    };

    // Launch all four hashes in parallel
    std::vector<std::future<std::string>> futs;
    for (auto &p : paths)
        futs.push_back(
            std::async(std::launch::async, hash_file, p));

    // Collect results in order
    for (size_t i = 0; i < paths.size(); ++i)
        std::cout << paths[i] << ": " << futs[i].get() << '\n';
}

Each hash_file call runs on its own thread. The main thread collects results as soon as each one is ready. The total wall-clock time approaches the maximum of the four individual hashing times rather than their sum.

Problem 52: The Memory Model

Problem 50 fixed the data race with a mutex. But a mutex is heavy: every lock() and unlock() may involve a system call, a context switch, or at minimum a hardware memory barrier. For a simple counter that is incremented millions of times per second, that overhead matters. C++11 provides a lighter tool for these situations: atomic operations.

Before we can use atomic operations correctly, we need to understand why they exist — and that requires understanding the C++ memory model.

The Problem: Hardware and Compilers Reorder Instructions

Consider this code, executed by two threads simultaneously:

// Global variables, both initially 0
int data = 0;
bool ready = false;

// Thread 1 (producer):          // Thread 2 (consumer):
data = 42;                        while (!ready) {}
ready = true;                     assert(data == 42);

Intuitively, the assertion should always pass. Thread 2 spins until ready is true, which only becomes true after data = 42. So by the time Thread 2 exits the loop, data should be 42.

In practice, on certain hardware architectures (ARM, PowerPC) and under aggressive compiler optimisation, this assertion can fail. The compiler may reorder the two stores in Thread 1 (because they touch independent variables) and the CPU may similarly reorder the stores in its write buffer. Thread 2 may observe ready = true before the write to data has propagated.

This is not a corner case. It is the normal behaviour of modern out-of-order processors. The C++ memory model exists to give us tools to reason about, and prevent, this kind of problem.

Happens-Before

An operation A happens-before operation B if the effects of A are guaranteed to be visible to any thread that performs B. A synchronisation operation (a mutex lock/unlock, or an appropriate atomic operation) establishes a happens-before edge between threads. Without such an edge, the order in which one thread observes another thread's writes is unspecified.

Without any synchronisation, the compiler and CPU may reorder stores and loads in any way that does not violate the behaviour of a single-threaded program. Multi-threaded behaviour is completely unspecified unless you introduce synchronisation.

std::atomic — Indivisible Operations with Defined Ordering

std::atomic<T> (in <atomic>) makes operations on a value both indivisible (no thread can observe a partial write) and subject to controlled ordering guarantees.

For simple types, usage looks almost identical to ordinary variables:

#include <atomic>

std::atomic<int> counter{0};

void increment_a_million() {
    for (int i = 0; i < 1'000'000; ++i)
        ++counter;   // atomic increment: no mutex needed
}

std::atomic<int> is a specialisation for integral types that typically compiles down to a single hardware instruction (e.g., lock xadd on x86). No OS call. No context switch. Just an atomic read-modify-write.

The explicit member functions on std::atomic<T>:

.load(): read the current value.
.store(v): write a new value.
.fetch_add(n): atomically add n, return the old value.
.exchange(v): atomically replace with v, return the old value.
.compare_exchange_weak(expected, desired): if the current value equals expected, replace it with desired and return true; otherwise write the current value into expected and return false. “Weak” means it may spuriously fail.
.compare_exchange_strong(expected, desired): same but never spuriously fails.

Memory Order Options

Every atomic operation accepts an optional std::memory_order parameter that controls what ordering guarantees the operation provides. The options, from weakest to strongest:

memory_order_relaxed

No ordering guarantee beyond atomicity. The operation is indivisible, but no happens-before edge is established. Use only when you need atomicity for bookkeeping (e.g., a simple hit counter) and the value has no dependency on other shared data.

memory_order_acquire / memory_order_release

The release/acquire pair is the fundamental synchronisation mechanism. A release store (on the writer) establishes a happens-before edge to any subsequent acquire load (on the reader) that observes the stored value. Everything the writer did before the release is visible to the reader after the acquire.

memory_order_seq_cst

Sequentially consistent ordering: the default and the strongest. All seq_cst operations appear to execute in a single total order visible to all threads. This is the easiest model to reason about and the one you should use unless you have a specific reason to do otherwise.

The Publish/Consume Pattern

The producer/consumer example from earlier, fixed correctly with atomics:

int data = 0;
std::atomic<bool> ready{false};

// Thread 1 (producer):
data = 42;                              // plain store -- fine here
ready.store(true,                       // release store: guarantees
            std::memory_order_release); // data=42 is visible after

// Thread 2 (consumer):
while (!ready.load(std::memory_order_acquire)) {}
// acquire load: everything before the release is now visible
assert(data == 42);   // guaranteed to pass

The release store on ready and the acquire load on ready form a synchronisation pair. When Thread 2’s acquire load reads true, a happens-before edge exists from Thread 1’s write of data = 42 to Thread 2’s read of data.

A Spinlock with Compare-Exchange

Compare-exchange is the building block for lock-free data structures. A spinlock is the simplest example: instead of calling the OS, a thread spins in a tight loop until it can atomically flip a flag from false to true:

class Spinlock {
    std::atomic<bool> locked_{false};
public:
    void lock() {
        bool expected = false;
        // Spin until we successfully flip false -> true
        while (!locked_.compare_exchange_weak(
                   expected, true,
                   std::memory_order_acquire,
                   std::memory_order_relaxed))
            expected = false;  // reset: CAS wrote current value into expected
    }
    void unlock() {
        locked_.store(false, std::memory_order_release);
    }
};

A spinlock is appropriate only for very short critical sections where the wait time is reliably shorter than the overhead of a real OS mutex. For anything longer, use std::mutex.

std::atomic_flag — The Truly Lock-Free Primitive

std::atomic<bool> is almost always lock-free on modern hardware, but the standard only guarantees lock-freedom for std::atomic_flag. It is the minimal atomic type: it has no load() method, only test_and_set() (atomically sets the flag and returns the old value) and clear().

std::atomic_flag lock = ATOMIC_FLAG_INIT;

void spin_lock() {
    while (lock.test_and_set(std::memory_order_acquire)) {}
}
void spin_unlock() {
    lock.clear(std::memory_order_release);
}

C++20 adds .test() so you can inspect the flag without setting it, and .wait() / .notify_one() / .notify_all() for efficient waiting (backed by futex on Linux rather than a spin loop).

std::atomic_flag is the right building block when you need a portable guarantee of lock-freedom. For everything else, std::atomic<int> or a real mutex is cleaner.

is_lock_free and ATOMIC_xxx_LOCK_FREE

You can ask at runtime whether a particular atomic type is lock-free:

std::atomic<int> a;
std::cout << a.is_lock_free() << '\n';   // typically 1 on modern hw

// Compile-time check (macro):
static_assert(ATOMIC_INT_LOCK_FREE == 2,
    "atomic<int> must be lock-free on this platform");

The macro values are 0 (never), 1 (sometimes, depends on alignment), or 2 (always). For large T (e.g., a 128-byte struct), std::atomic<T> may fall back to a hidden internal mutex; the operations are still safe and correct, just no longer truly lock-free.

Aside (CS 241 connection: hardware memory barriers). In CS 241 (Foundations of Sequential Programs) you saw how the processor executes load and store instructions, and how the memory hierarchy works. On x86, most operations have acquire/release semantics for free because x86 has a relatively strong memory model (TSO — Total Store Order). On ARM and POWER, the hardware memory model is much weaker and explicit barrier instructions (dmb, isb, dsb on ARM; lwsync, sync on POWER) are required to enforce ordering.
std::memory_order_acquire compiles to ld + dmb ld on ARM; std::memory_order_release compiles to dmb st + str. On x86, neither emits an extra instruction beyond the load or store itself. The C++ memory model abstracts these hardware differences behind a portable vocabulary.

Aside (CS 343 connection: μC++ and the simpler model). μC++ wraps all shared state in monitors and tasks with well-defined communication points. The programmer never sees acquire/release semantics directly because the runtime inserts the right barriers at every monitor entry and exit.
std::atomic exposes the machine model directly. This is powerful — you can write lock-free data structures that are faster than anything a monitor can achieve — but it is also dangerous. Weak memory orders are an expert tool. Unless you are building lock-free infrastructure, use memory_order_seq_cst (the default) or just use a mutex.

Example: Lock-free reference counter

A simplified shared-pointer reference count using atomics. This is essentially what std::shared_ptr does internally.

struct RefCounted {
    std::atomic<int> refcount{1};

    void retain() {
        refcount.fetch_add(1, std::memory_order_relaxed);
        // relaxed: we only need atomicity, not ordering
    }
    void release() {
        // release: destructor must see all prior writes
        if (refcount.fetch_sub(1, std::memory_order_release) == 1) {
            // acquire fence: ensure we see all prior releases
            std::atomic_thread_fence(std::memory_order_acquire);
            delete this;
        }
    }
};

The retain uses relaxed because incrementing a counter has no ordering dependency on anything else. The release uses release so that all writes made before the decrement are visible to the thread that eventually runs delete this. The acquire fence before the delete ensures those writes are observed. This is the pattern used in the C++ standard library’s own shared_ptr.

Note. Unless you are writing lock-free data structures or high-performance synchronisation primitives, use memory_order_seq_cst (the default) or use a mutex. The weaker orderings — particularly relaxed and the acquire/release pair — require careful reasoning about every synchronisation edge in your program. Getting them wrong produces intermittent bugs that only manifest on non-x86 hardware.

Problem 53: Producer-Consumer

Futures (Problem 51) are great for one-shot work: launch a computation, collect its result. But many real programs have a different shape: one part of the program continuously generates work items and another part continuously processes them. Think of a web server reading requests from a socket and handing them to worker threads, or a compiler’s lexer feeding tokens to a parser. We need two threads to communicate not once but indefinitely, with one side potentially outrunning the other.

The data structure in the middle is a bounded blocking queue: a fixed-capacity queue that makes the producer wait when the queue is full, and makes the consumer wait when the queue is empty. Building this correctly requires a new synchronisation primitive: the condition variable.

The Naive Approach and Why It Fails

Let us try to build a blocking queue with just a mutex and a spin loop:

template<typename T>
class NaiveQueue {
    std::deque<T> buf_;
    std::mutex mtx_;
    size_t limit_;
public:
    NaiveQueue(size_t limit) : limit_{limit} {}

    void push(T val) {
        while (true) {
            std::lock_guard guard{mtx_};
            if (buf_.size() < limit_) { buf_.push_back(std::move(val)); return; }
        }  // spin forever when full -- wastes CPU
    }
};

This is wrong in two ways. First, the producer burns CPU spinning inside the lock, preventing the consumer from ever acquiring the mutex to drain the queue. Second, even if we spin outside the lock, we waste CPU time that could be given to the consumer. What we want is for the producer to sleep when the queue is full and be woken up when space becomes available. That is exactly what a condition variable provides.

std::condition_variable

Condition Variable

A std::condition_variable (in <condition_variable>) allows a thread to sleep until a predicate becomes true. It must be used together with a std::unique_lock<std::mutex>: the thread atomically releases the lock and goes to sleep, then re-acquires the lock before returning to the caller.

The core operations:

cv.wait(lock, pred): atomically releases lock and sleeps; wakes up when cv.notify_one() or cv.notify_all() is called and pred() returns true. Re-acquires the lock before returning.
cv.notify_one(): wakes up one sleeping thread (if any).
cv.notify_all(): wakes up all sleeping threads.

The predicate version of wait is essential because of spurious wakeups: the operating system is permitted to wake a sleeping thread for no reason. Without the predicate, the thread would proceed even though the condition might still be false. With the predicate, wait loops internally until it is satisfied:

// cv.wait(lock, pred) is exactly equivalent to:
while (!pred()) cv.wait(lock);

Always use the predicate form. Never use bare cv.wait(lock).

A Complete Bounded Blocking Queue

#include <mutex>
#include <condition_variable>
#include <deque>

template<typename T>
class BoundedQueue {
    std::mutex mtx_;
    std::condition_variable not_full_;
    std::condition_variable not_empty_;
    std::deque<T> buf_;
    const size_t limit_;
public:
    explicit BoundedQueue(size_t limit) : limit_{limit} {}

    void push(T val) {
        std::unique_lock lk{mtx_};
        not_full_.wait(lk, [this]{ return buf_.size() < limit_; });
        buf_.push_back(std::move(val));
        not_empty_.notify_one();
    }

    T pop() {
        std::unique_lock lk{mtx_};
        not_empty_.wait(lk, [this]{ return !buf_.empty(); });
        T val = std::move(buf_.front());
        buf_.pop_front();
        not_full_.notify_one();
        return val;
    }
};

Walk through push: we acquire the lock, then sleep on not_full_ until there is space. When wait returns we still hold the lock, so we can safely push. After pushing we notify one thread waiting on not_empty_ — typically the consumer.

Walk through pop: mirror image. We sleep on not_empty_ until there is something to consume. After consuming we notify one thread waiting on not_full_ — typically the producer.

Note that we use std::unique_lock, not std::lock_guard. This is required: cv.wait must be able to unlock the mutex while sleeping, and lock_guard does not support that.

The Producer-Consumer Pattern

With BoundedQueue in place, the producer-consumer pattern is clean:

BoundedQueue<int> q{100};

// Producer thread: generates work
std::thread producer{[&q]() {
    for (int i = 0; i < 10'000; ++i)
        q.push(i);
}};

// Consumer thread: processes work
std::thread consumer{[&q]() {
    for (int i = 0; i < 10'000; ++i) {
        int val = q.pop();
        process(val);
    }
}};

producer.join();
consumer.join();

The producer never burns CPU when the queue is full; it sleeps. The consumer never burns CPU when the queue is empty; it sleeps too. The OS scheduler wakes them exactly when there is something to do.

Shutdown: Telling the Consumer to Stop

The example above works because both threads know the exact iteration count. In real code, the producer finishes at an unpredictable time and the consumer must be told to stop. A common approach is a sentinel value:

// Producer signals done with a sentinel
const int DONE = -1;

std::thread producer{[&q]() {
    for (int i = 0; i < n; ++i) q.push(i);
    q.push(DONE);   // sentinel
}};

std::thread consumer{[&q]() {
    while (true) {
        int val = q.pop();
        if (val == DONE) break;
        process(val);
    }
}};

For multiple consumers, push one sentinel per consumer. A cleaner alternative is to add a close() method to the queue (Problem 54 shows this with std::stop_token).

Multiple Consumers: Thread Pools

Sometimes a single consumer cannot keep up with the producer. The natural solution is a pool of consumer threads that all pop from the same queue:

BoundedQueue<int> q{512};
const int NWORKERS = 4;
std::vector<std::thread> workers;

for (int i = 0; i < NWORKERS; ++i) {
    workers.emplace_back([&q]() {
        while (true) {
            int val = q.pop();
            if (val == -1) { q.push(-1); break; } // propagate sentinel
            process(val);
        }
    });
}
// producer fills q; all workers drain it concurrently

Each worker sleeps on not_empty_ when the queue is empty. When the producer pushes an item, notify_one() wakes exactly one worker — the one that will process the item. At shutdown, the sentinel trick above passes the signal from worker to worker automatically.

The notify_one() / notify_all() choice matters here. If you call notify_all() after every push, all four workers wake up, four compete for the lock, three find the queue empty (or the item already taken) and go back to sleep. That is wasted context switching. notify_one() is strictly better when only one waiter can make progress. Use notify_all() only after a close() or when you genuinely want every waiter to re-evaluate:

void BoundedQueue<T>::push(T val) {
    std::unique_lock lk{mtx_};
    not_full_.wait(lk, [this]{ return buf_.size() < limit_; });
    buf_.push_back(std::move(val));
    not_empty_.notify_one();   // wake exactly one consumer
}

void BoundedQueue<T>::close() {
    std::lock_guard guard{mtx_};
    closed_ = true;
    not_empty_.notify_all();   // wake ALL consumers so they can exit
    not_full_.notify_all();    // wake any blocked producers too
}

Note. Use notify_one when there is at most one thread that can usefully proceed — for instance, pushing one item wakes one consumer. Use notify_all when every waiting thread needs to re-check its predicate — for instance, closing the queue means every consumer should wake up and discover that the queue is done.

Aside (CS 343 connection: bounded buffer in μC++). CS 343 introduces the bounded buffer as a canonical monitor example. In μC++, a _Monitor with condition variables named NotFull and NotEmpty maps one-to-one to our BoundedQueue: wait(NotFull) corresponds to not_full_.wait(lk, ...), and signal(NotEmpty) corresponds to not_empty_.notify_one().
The C++ version differs in one important way: μC++ condition variables use signal-and-urgent-wait semantics (the signalling thread re-enters the monitor only after the signalled thread exits), while std::condition_variable uses signal-and-continue semantics (the signalling thread keeps running; the signalled thread competes for the lock normally). This is why the predicate re-check in wait is mandatory in C++ but sometimes optional in μC++.

Example: Image-processing pipeline

One thread loads images from disk; another processes them. The queue decouples the I/O speed from the processing speed.

BoundedQueue<Image> q{8};   // buffer up to 8 images

std::thread loader{[&q]() {
    for (auto &path : image_paths) {
        q.push(load_image(path));   // sleeps when queue is full
    }
    q.push(Image{});   // empty sentinel
}};

std::thread processor{[&q]() {
    while (true) {
        Image img = q.pop();        // sleeps when queue is empty
        if (img.empty()) break;
        process_image(img);
    }
}};

loader.join();
processor.join();

If loading is faster than processing, the loader sleeps when the 8-slot buffer fills up. If processing is faster, the processor sleeps waiting for the next image. The queue acts as a shock absorber between the two rates.

Problem 54: Joinable Threads Done Right

In Problem 49 we noticed that std::thread has a nasty destructor: if the thread is still joinable when the destructor runs, the program calls std::terminate. We fixed it by writing JoinThread, a small RAII wrapper that calls join() in its destructor. We also noted at the time that C++20 ships std::jthread, which does exactly that — and more. Problem 53 left us with a second unsolved problem: gracefully stopping a long-running worker thread. std::jthread’s cooperative cancellation mechanism, built around std::stop_token, solves that too.

std::jthread — std::thread with Automatic Join

std::jthread (in <thread> since C++20) is a drop-in replacement for std::thread with one critical difference: its destructor automatically requests a stop and joins the thread. You no longer need JoinThread:

#include <thread>

void f() {
    std::jthread t{background_work};
    do_some_work();   // may throw
}   // t's destructor: requests stop, then joins -- no std::terminate

Every std::jthread owns a std::stop_source internally. Its destructor calls request_stop() on that source before joining. If the thread’s callable ignores stop requests, the destructor simply waits for the callable to finish normally — so std::jthread is backwards compatible with any callable that does not take a std::stop_token.

std::stop_token and Cooperative Cancellation

std::jthread would not be much more than a RAII wrapper if that were all it did. Its real power is cooperative cancellation: a way for one thread to politely ask another thread to stop, and a way for the worker thread to check whether it has been asked.

The mechanism has three parts:

std::stop_source: the write end. Holds the shared cancellation state. Call .request_stop() to signal that the worker should stop.
std::stop_token: the read end. Handed to the worker thread. Call .stop_requested() to check whether a stop has been requested.
std::stop_callback: registers a function to be called automatically when a stop is requested. Useful for waking up a sleeping thread.

#include <thread>
#include <stop_token>

void worker(std::stop_token stoken) {
    while (!stoken.stop_requested()) {
        do_one_unit_of_work();
    }
    // clean up and exit gracefully
}

int main() {
    std::jthread t{worker};
    std::this_thread::sleep_for(std::chrono::seconds{5});
    // jthread destructor calls request_stop() then join()
    // -- or call t.request_stop() manually earlier
}

When the callable’s first parameter is std::stop_token, std::jthread automatically passes its internal token as that first argument. You do not need to thread the token through yourself.

Cooperative Cancellation

Cooperative cancellation means the requesting thread sets a flag and the worker thread voluntarily checks that flag and exits. Nobody is forcefully interrupted. Contrast with preemptive cancellation, where the OS or runtime kills the thread at an arbitrary point.

Why Cooperative, Not Preemptive?

Forcefully terminating a thread is dangerous. The thread might be holding a mutex (now locked forever), in the middle of updating a data structure (invariant broken), or halfway through a file write (data corrupted). There is no safe point for the runtime to step in and pull the rug out.

Cooperative cancellation is safe because the worker chooses its own stopping points. It checks the token between logical operations, after releasing any locks and restoring any invariants. The calling code that sets the stop flag simply waits; the worker exits cleanly on its own schedule.

void safe_worker(std::stop_token st) {
    while (!st.stop_requested()) {
        {
            std::lock_guard guard{mtx};
            process_next_item();         // lock released here
        }
        // invariant restored -- safe to check the token here
    }
    finalise();   // always runs, cleans up resources
}

Using stop_source Independently of jthread

std::stop_source and std::stop_token are ordinary library types. You can use them independently of std::jthread — for instance, to broadcast a cancellation signal to a whole group of threads:

std::stop_source source;

// Distribute the same token to multiple workers:
auto token = source.get_token();

std::vector<std::jthread> workers;
for (int i = 0; i < 4; ++i)
    workers.emplace_back([token]() {   // capture by value!
        while (!token.stop_requested())
            do_work();
    });

// Later: request all four workers to stop
source.request_stop();
// jthread destructors join (workers have already started exiting)

Capturing the token by value is important. std::stop_token is a lightweight reference-counted handle to the shared stop state — copying it is cheap and each copy refers to the same underlying flag.

std::condition_variable_any and Integrated Stop Waiting

C++20 adds a new overload of wait to std::condition_variable_any (the generalised variant that works with any BasicLockable, not just std::mutex) that accepts a std::stop_token directly. The wait exits either when the condition is satisfied or when a stop is requested, without needing a stop_callback:

std::condition_variable_any cv;
std::mutex mtx;
std::deque<int> buf;

void consumer(std::stop_token st) {
    std::unique_lock lk{mtx};
    // Waits until buf non-empty OR stop requested:
    while (cv.wait(lk, st, [&]{ return !buf.empty(); })) {
        int v = buf.front(); buf.pop_front();
        lk.unlock();
        process(v);
        lk.lock();
    }
    // Here: stop was requested (or spurious exit handled by predicate)
}

When a stop is requested, the three-argument wait wakes up and returns false (the predicate was not satisfied — the stop fired instead). This eliminates the need for a separate stop_callback to poke the condition variable.

std::stop_callback — Waking a Sleeping Thread

A worker that sleeps on a condition variable will not notice a stop request until it wakes up. std::stop_callback bridges this gap: register a callback that notifies the condition variable when a stop is requested:

void queue_worker(std::stop_token st, BoundedQueue<int> &q) {
    // When stop is requested, wake us up even if queue is empty
    std::stop_callback on_stop{st, [&q]{ q.close(); }};

    while (!st.stop_requested()) {
        auto item = q.try_pop();    // returns std::optional<int>
        if (!item) break;
        process(*item);
    }
}

The callback runs on whatever thread calls request_stop() — it must be thread-safe and fast. Notifying a condition variable or setting a flag is ideal; doing substantial work inside a stop_callback is not.

Solving the Bounded Queue Shutdown Problem

Problem 53 ended with a workaround: push a sentinel value to tell the consumer to stop. Sentinels are fragile — they require the type to have a designated “poison” value, and with multiple consumers you need one sentinel per consumer. With std::jthread and std::stop_token we can solve this cleanly by giving the queue a close() method:

template<typename T>
class BoundedQueue {
    // ... (fields as in Problem 53) ...
    bool closed_ = false;
public:
    void close() {
        std::lock_guard guard{mtx_};
        closed_ = true;
        not_empty_.notify_all();
        not_full_.notify_all();
    }

    std::optional<T> pop() {
        std::unique_lock lk{mtx_};
        not_empty_.wait(lk, [this]{
            return !buf_.empty() || closed_;
        });
        if (buf_.empty()) return std::nullopt;
        T val = std::move(buf_.front());
        buf_.pop_front();
        not_full_.notify_one();
        return val;
    }
};

Now the producer uses a std::jthread with a stop callback that closes the queue:

BoundedQueue<int> q{64};

std::jthread producer{[&q](std::stop_token st) {
    std::stop_callback sc{st, [&q]{ q.close(); }};
    for (int i = 0; !st.stop_requested(); ++i)
        q.push(i);
}};

std::jthread consumer{[&q]() {
    while (auto item = q.pop())
        process(*item);
}};

std::this_thread::sleep_for(std::chrono::seconds{2});
// Destructors call request_stop() + join() in reverse order:
// consumer first, then producer -- or request them manually

Aside (CS 343 connection: cooperative tasks in μC++). μC++ tasks respond to signals cooperatively. A task can poll for a signal at defined checkpoints — exactly the same idea as stop_requested(). The μC++ framework also provides a uBaseTask::cancel() mechanism, which sets a flag that the task checks at its own discretion.
std::stop_token is the C++20 standardisation of the same idea. The difference is that μC++ builds the signal-checking framework into the task abstraction at the language level, while C++ exposes it as a library type you opt into explicitly.

Example: Worker thread with stop token

A worker that processes items from a queue, stopping gracefully when asked:

#include <thread>
#include <stop_token>

void run_worker(BoundedQueue<Task> &q) {
    std::jthread worker{[&q](std::stop_token st) {
        std::stop_callback sc{st, [&q]{ q.close(); }};
        while (auto task = q.pop())
            task->execute();
        // exits here when queue is closed and empty
    }};

    // ... submit tasks to q from the main thread ...
    std::this_thread::sleep_for(std::chrono::seconds{10});
    // jthread destructor: request_stop() wakes the worker,
    // then join() waits for it to finish cleanly
}

No sentinel values. No manual join() call. No std::terminate hazard. The thread lifecycle is entirely managed by RAII.

Note. std::stop_token::stop_requested() is not a mutex. Requesting a stop sets a flag in the shared state; the worker thread must check the flag itself. If the worker never checks, the stop request is ignored and the thread runs to completion normally — which is fine, because std::jthread’s destructor simply joins without timing out.

Problem 55: Parallel Algorithms

We have been using std::sort, std::transform, std::for_each, std::accumulate, and friends since Problem 17. All of that ran on a single thread, processing one element at a time. Problems 49 through 54 taught us the machinery of concurrency: threads, mutexes, futures, atomics, condition variables. Now C++17 brings those two worlds together with one small addition: an execution policy parameter. Pass the right policy to a standard algorithm and the library can distribute the work across all available CPU cores.

Execution Policies

C++17 adds three execution policies to <execution>:

std::execution::seq: sequential execution, exactly as before. This is the default when no policy is specified and is provided mainly for uniformity — so you can parameterise on policy without a special case.
std::execution::par: parallel execution. The algorithm may spread work across a thread pool. Individual operations still run one at a time within a thread, but operations from different iterations may run concurrently.
std::execution::par_unseq: parallel and vectorised (SIMD). The algorithm may also interleave operations within a single thread using SIMD instructions. This is the fastest policy but the most restrictive: the callable must not acquire locks, call non-SIMD-safe functions, or access non-local memory that might alias.

Using a policy is syntactically trivial: pass it as the first argument.

#include <algorithm>
#include <execution>
#include <vector>

std::vector<int> v(10'000'000);
std::iota(v.begin(), v.end(), 0);

// Sequential (default behaviour):
std::sort(v.begin(), v.end());

// Parallel:
std::sort(std::execution::par, v.begin(), v.end());

// Parallel + vectorised:
std::sort(std::execution::par_unseq, v.begin(), v.end());

The results are semantically identical to the sequential version: the sorted order is the same, elements are moved correctly, and exception safety is preserved (with some caveats discussed later).

std::reduce vs std::accumulate

std::accumulate processes elements strictly left-to-right, in order. That ordering prevents parallelisation: each step depends on the previous result.

std::reduce (also C++17) requires that the combining operation is both associative and commutative. When those conditions hold, the library can split the range into chunks, reduce each chunk in parallel, and then combine the partial results:

#include <numeric>
#include <execution>

std::vector<double> v(10'000'000, 1.0);

// Sequential sum -- left-to-right, always:
double sum_seq = std::accumulate(v.begin(), v.end(), 0.0);

// Parallel sum -- chunks combined in any order:
double sum_par = std::reduce(
    std::execution::par, v.begin(), v.end(), 0.0);

The two results may differ by tiny floating-point rounding errors because floating-point addition is not perfectly associative. For integer types they will be identical. std::transform_reduce combines a map step and a reduce step in a single parallel pass. Here is a parallel dot product:

#include <numeric>
#include <execution>

std::vector<double> a(N), b(N);
// ... fill a and b ...

// Sequential dot product:
double dot_seq = std::inner_product(a.begin(), a.end(), b.begin(), 0.0);

// Parallel dot product:
double dot_par = std::transform_reduce(
    std::execution::par,
    a.begin(), a.end(),   // first range
    b.begin(),            // second range
    0.0,                  // identity for the reduction
    std::plus<>{},        // reduce operation (sum)
    std::multiplies<>{});  // transform operation (multiply pairwise)

std::transform_reduce is strictly more general than std::reduce: you can supply any binary transform and any binary reduction, not just plus and identity. Use it whenever you need a map-reduce in one pass without materialising an intermediate vector.

Thread Safety of the Callable

The algorithms guarantee that the algorithm itself is correct. They do not make your callable thread-safe for you. If the callable reads or writes shared mutable state, you get the same data races as with manual thread code:

int bad_count = 0;

// DATA RACE: multiple threads increment bad_count without a lock
std::for_each(std::execution::par, v.begin(), v.end(),
              [&](int x) { if (x > 0) ++bad_count; });

// Correct: use an atomic
std::atomic<int> good_count{0};
std::for_each(std::execution::par, v.begin(), v.end(),
              [&](int x) { if (x > 0) ++good_count; });

The callable may also not acquire a non-recursive mutex while running under par_unseq: SIMD lanes sharing a stack frame cannot sleep waiting for a lock. par (without unseq) does allow mutex acquisition, but beware of deadlock if the mutex is also acquired outside the algorithm.

Exception Handling Under Parallel Policies

Under the sequential policy, exceptions propagate out of the algorithm normally. Under parallel policies, if any iteration throws an exception, the algorithm calls std::terminate. The rationale: in a parallel context, multiple exceptions might be in flight at once and there is no sensible way to combine them into a single exception to propagate.

If your callable can throw and you need to handle those exceptions, either catch them inside the callable and encode failure in the return value, or use the sequential policy.

When Parallel Wins and When It Loses

Parallelism has overhead: splitting the range, scheduling threads on a pool, and combining results all cost time. The parallel version only wins when the work saved exceeds that overhead.

Parallel wins when:

The dataset is large — millions of elements, not thousands.
The per-element work is CPU-bound (computation, not I/O).
There are no dependencies between iterations.
The machine has multiple idle cores.

Parallel loses when:

The dataset is small — thread-pool overhead dominates.
The work is I/O-bound — all threads are waiting on disk or network, not on the CPU.
Iterations share mutable state heavily — contention on locks or atomics serialises the work.

The only reliable way to know is to measure. A parallel sort of 100 elements is slower than the sequential version on every machine. A parallel sort of 100,000,000 elements is typically 2–8x faster depending on core count and memory bandwidth.

Not All Algorithms Parallelise Equally

Every standard algorithm in <algorithm> and <numeric> that takes a pair of iterators has a parallel overload in C++17. But they are not all equally amenable to parallelism. Here is a rough guide:

Excellent candidates: std::sort, std::for_each, std::transform, std::reduce, std::transform_reduce, std::find_if (early exit still works in parallel — the library abandons remaining threads once one finds a match), std::count_if.
Good but check your callable: std::copy_if, std::generate, std::fill — these are memory-bandwidth bound; parallelism helps on NUMA systems but may not on single-socket machines where all cores share the same memory bus.
Sequential semantics required: std::accumulate (use std::reduce instead), any algorithm that processes a forward-linked list (the library cannot split at arbitrary points), anything where the callable has a side-effect on a shared counter without atomics.

The C++ standard guarantees that parallel algorithms produce the same result as their sequential counterparts for side-effect-free callables. The only observable difference is wall-clock time. This guarantee is what makes the one-word upgrade from sequential to parallel safe: you do not need to rethink the algorithm, only ensure the callable is thread-safe.

Aside (CS 343 connection: task-based parallelism). In CS 343, μC++ tasks communicate by message-passing: each task is an independent thread with its own mailbox. Parallel algorithms use a different model: a thread pool picks up work items from a shared queue. The two models have the same expressive power but different tradeoffs. Message-passing is cleaner for workflows with complex data dependencies; thread pools are more efficient for homogeneous, independent work like applying the same function to every element of an array.

Aside (CS 343 / CS 370 connection: associativity and parallelism). The requirement that std::reduce’s operation be associative is not arbitrary. CS 370 (Numerical Computation) teaches that floating-point operations are not truly associative due to rounding. std::accumulate gives a reproducible result because it always evaluates left-to-right. std::reduce is allowed to evaluate in any tree-shaped order, so its floating-point result is platform-dependent. For numerical applications, use std::accumulate if reproducibility matters more than speed, or use par with a compensated summation algorithm.

Example: Parallel image processing

Apply a filter to every pixel in an image using a parallel for_each:

#include <algorithm>
#include <execution>

struct Pixel { uint8_t r, g, b, a; };

void apply_filter(std::vector<Pixel> &image) {
    std::for_each(
        std::execution::par_unseq,
        image.begin(), image.end(),
        [](Pixel &p) {
            // Convert to greyscale in place -- no shared state
            uint8_t grey = static_cast<uint8_t>(
                0.299f*p.r + 0.587f*p.g + 0.114f*p.b);
            p.r = p.g = p.b = grey;
        });
}

Each pixel is processed independently, so par_unseq is correct and safe. On a modern CPU with 8 cores and SIMD, this can process a 24-megapixel image tens of times faster than the sequential version.

Note. Availability of parallel execution policies varies by compiler and platform. MSVC supports them out of the box on Windows. GCC and Clang require linking against Intel’s TBB library (-ltbb). If TBB is not present, the parallel policies silently fall back to sequential execution on some implementations. You can check for support at compile time with the feature macro __cpp_lib_execution.
#if defined(__cpp_lib_execution) && __cpp_lib_execution >= 201902L
    std::sort(std::execution::par, v.begin(), v.end());
#else
    std::sort(v.begin(), v.end());
#endif

Problem 56: Type Erasure

Problem 20 gave us abstract classes: a common base type so that unrelated concrete classes can be manipulated through a single pointer. Problem 27 gave us CRTP for zero-overhead static polymorphism. Problem 28 used virtual clone to copy objects whose type we do not know at compile time. All of these techniques require the concrete types to share a base class.

But std::function<void(int)> can hold a plain function pointer, a lambda, a functor, or a std::bind result — and none of those types share a base class. Yet a single std::function object holds any of them and calls it with identical syntax. How? The answer is type erasure.

Type erasure is also what makes std::any work: it stores a value of any type and lets you retrieve it later by casting. It is what makes std::variant in combination with std::visit dispatch correctly over unrelated types. And — if you squint — it is what virtual dispatch has always been doing: the vtable pointer inside a polymorphic object is the erased type, recorded at construction.

Type Erasure

Any technique that allows objects of unrelated concrete types to be used through a uniform interface without imposing a common base class on those concrete types. The concrete type is recorded at construction time and “erased” from the static type that the outside world sees.

The problem: storing arbitrary callables

Suppose you are writing an event system. You want to store a list of callbacks to fire when an event occurs. The callbacks all have signature void(int), but they come from many sources:

void onEvent(int x) { /* free function */ }
struct Handler { void operator()(int x) { /* functor */ } };
auto lam = [capture](int x) { /* lambda */ };

These types are completely unrelated. You cannot put them in a vector<SomeBase*> because no such base exists. But you want exactly that: a container of things you can call with an int.

Step 1: hand-rolling type erasure

The trick is to introduce the hierarchy ourselves — not on the callable types (which we do not own), but on a private wrapper type we control.

struct CallableBase {
    virtual void operator()(int) = 0;
    virtual ~CallableBase() = default;
};

template<typename F>
struct CallableImpl : CallableBase {
    F f;
    CallableImpl(F f) : f{std::move(f)} {}
    void operator()(int x) override { f(x); }
};

Now write a public wrapper that stores a unique_ptr<CallableBase> and constructs the right CallableImpl on the fly inside its constructor:

class Function {
    std::unique_ptr<CallableBase> ptr;
public:
    template<typename F>
    Function(F f)
        : ptr{std::make_unique<CallableImpl<F>>(std::move(f))} {}

    void operator()(int x) { (*ptr)(x); }
};

The concrete type F is captured inside CallableImpl<F> on the heap. The Function object itself only ever holds a CallableBase* — the type has been erased.

Function f1{onEvent};          // wraps a free function
Function f2{Handler{}};        // wraps a functor
Function f3{[](int x){ std::cout << x; }};  // wraps a lambda

std::vector<Function> callbacks;
callbacks.push_back(std::move(f1));
callbacks.push_back(std::move(f2));
callbacks.push_back(std::move(f3));
for (auto &cb : callbacks) cb(42);   // dispatches correctly for all three

This is what std::function does

std::function<void(int)> is the standard-library version of exactly that pattern, generalized over any signature. It additionally handles:

Copy: std::function is copyable, which means CallableBase needs a virtual clone() method — exactly the pattern from Problem 28.
The empty state: a default-constructed std::function stores nothing; calling it throws std::bad_function_call.
Target query: f.target<F>() returns a pointer to the stored F, or nullptr if the type does not match.

The small buffer optimization

One subtlety: the naive implementation heap-allocates a new CallableImpl for every std::function construction, even for a tiny stateless lambda. Most standard-library implementations avoid this with a small buffer optimization (SBO): an inline byte array of, say, 24 bytes. If the callable fits, it is placement-newed into the buffer; otherwise the implementation falls back to a heap allocation. The switch is invisible to the caller.

The same trick appears in std::string (short-string optimization) and std::any. The pattern is: cheap in the common case, correct in all cases.

Note. A lambda that captures a large object by value may exceed the SBO buffer and trigger a heap allocation. If you care, keep captures small or capture a pointer to the heavy data rather than the data itself.

Type erasure in the standard library

C++ provides three canonical type-erasure tools:

std::function<Sig> (C++11) erases the type of any callable with a compatible signature. Copyable, with SBO.
std::any (C++17) erases the type of an arbitrary value. Store anything copy-constructible; retrieve it with std::any_cast<T>. Think of it as a type-safe void*.
Virtual dispatch is the original type-erasure mechanism. The vtable pointer stored in every polymorphic object is the erased type information. The compiler generates the CallableImpl analogue — the vtable and its entries — automatically.

Aside (CS 247 connection: interface vs. representation). Type erasure cleanly separates interface (CallableBase’s operator()) from representation (the concrete F stored in CallableImpl<F>). This is the same separation that abstract classes encode; type erasure just applies it to types you do not own and therefore cannot retroactively give a common base class to.

C++23: std::move_only_function

std::function requires the stored callable to be copy-constructible, because std::function itself is copyable. But lambdas that capture a unique_ptr are move-only — they cannot be copied.

C++23 adds std::move_only_function<Sig>: a type-erased callable that only requires movability, not copyability. It is non-copyable itself, but otherwise works identically to std::function and can store any move-only callable. For event handlers stored once and never duplicated, this is the right choice.

Worked example: implementing Function from scratch

Example: A complete simplified Function with copy support

Here is a self-contained implementation that shows every piece of the machinery in one place.

template<typename Sig> class Function;   // primary template left undefined

template<typename R, typename... Args>
class Function<R(Args...)> {
    struct Base {
        virtual R call(Args...) = 0;
        virtual std::unique_ptr<Base> clone() const = 0;
        virtual ~Base() = default;
    };
    template<typename F>
    struct Impl : Base {
        F f;
        Impl(F f) : f{std::move(f)} {}
        R call(Args... args) override { return f(args...); }
        std::unique_ptr<Base> clone() const override {
            return std::make_unique<Impl<F>>(f);
        }
    };
    std::unique_ptr<Base> ptr;
public:
    template<typename F>
    Function(F f) : ptr{std::make_unique<Impl<F>>(std::move(f))} {}
    Function(const Function &o) : ptr{o.ptr->clone()} {}
    Function(Function &&)           = default;
    Function &operator=(Function o) { std::swap(ptr, o.ptr); return *this; }
    R operator()(Args... args)      { return ptr->call(args...); }
};

This is roughly 25 lines. The real std::function adds std::bad_function_call for the null case, target-type introspection, and the SBO, but the core mechanism is exactly this.

Type erasure vs. templates vs. virtual dispatch

It is worth comparing the three approaches side by side.

Templates (Problem 40 concepts). When you write a template function that accepts any callable, the compiler generates a separate instantiation for each concrete type. The call is direct — no indirection — and the optimizer can inline the body. The downside: all types must be known at compile time, and storing a heterogeneous collection requires some other mechanism.

Virtual dispatch (Problem 18). When you define an abstract base class with a pure-virtual operator(), each callable type must inherit from it. You own the types, or you can add the base class. Collections of base pointers work. The cost: one vtable lookup per call, and the types must be in a single inheritance hierarchy.

Type erasure (std::function). You do not own the types; they need not share any base. Collections work. The cost: heap allocation per construction (possibly avoided by SBO), and one indirect call per invocation.

In summary: use templates when all types are compile-time known and performance is paramount. Use virtual dispatch when you own the types. Use type erasure when you need to store heterogeneous callables from code you do not control.

Note. Type erasure has real runtime overhead: a heap allocation at construction time (unless SBO applies) and a virtual call or function-pointer indirection at each invocation. When performance is critical and the set of callable types is known at compile time, prefer templates or concepts (Problem 40). Use std::function when you need to store heterogeneous callables at runtime: event systems, callbacks passed across module boundaries, plugin architectures.

Problem 57: Hiding the Implementation

Problem 2 established the golden rule of separate compilation: declarations live in headers, definitions live in .cc files. That rule minimizes what needs to be recompiled when an implementation detail changes. But there is a gap. Even if you follow the rule perfectly, the private members of a class must appear in the header because the compiler needs to know the object’s total size. And if any private member changes — its type, or the type of a type it depends on — every translation unit that includes the header must recompile. On a large project, a one-line change to a private field can trigger a complete rebuild.

The pimpl idiom (pointer to implementation) closes that gap.

The problem in concrete terms

Suppose you are writing a Widget class for a GUI library:

// widget.h
#include "font.h"         // Font is a heavy header
#include "texture.h"      // Texture is heavier still
#include "event_queue.h"  // and this one changes often

class Widget {
public:
    void draw();
    void setFont(const Font &f);
private:
    Font         font_;
    Texture      bg_;
    EventQueue   events_;
    int          x_, y_, w_, h_;
};

Every file that writes #include "widget.h" also transitively includes font.h, texture.h, and event_queue.h. When the development team changes event_queue.h — maybe to add a new field — every translation unit in the project recompiles, even if none of them ever touch the event queue directly.

The pimpl idiom

The fix is to replace all private data with a single opaque pointer to a forward-declared implementation struct. The header reveals nothing about what is inside the implementation.

// widget.h -- the public header
#include <memory>

class Widget {
public:
    Widget();
    ~Widget();
    Widget(Widget &&) noexcept;
    Widget &operator=(Widget &&) noexcept;

    void draw();
    void setFont(const std::string &name);

private:
    struct Impl;                      // forward declaration only
    std::unique_ptr<Impl> impl_;
};

The header has no includes for font.h, texture.h, or event_queue.h. Clients that include this header see only the public interface and a unique_ptr to a struct they cannot look inside. The compiler knows the size of unique_ptr<Impl> (it is one pointer) without knowing what Impl contains.

The source file defines everything:

// widget.cc -- the implementation file
#include "widget.h"
#include "font.h"
#include "texture.h"
#include "event_queue.h"

struct Widget::Impl {
    Font       font;
    Texture    bg;
    EventQueue events;
    int        x, y, w, h;
};

Widget::Widget() : impl_{std::make_unique<Impl>()} {}

Widget::~Widget() = default;   // unique_ptr<Impl> destructor needs Impl complete
Widget::Widget(Widget &&) noexcept            = default;
Widget &Widget::operator=(Widget &&) noexcept = default;

void Widget::draw()                 { impl_->bg.render(impl_->x, impl_->y); }
void Widget::setFont(const std::string &name) { impl_->font = Font{name}; }

Now when event_queue.h changes, only widget.cc recompiles. All the clients that include widget.h are untouched.

Why the destructor must be defined in the source file

This is the most common pimpl mistake. If you write:

// widget.h  -- WRONG
class Widget {
    struct Impl;
    std::unique_ptr<Impl> impl_;
public:
    ~Widget() = default;   // disaster waiting to happen
};

the = default definition is generated at the point of declaration in the header. At that point, Impl is incomplete — the compiler has seen only the forward declaration. When unique_ptr<Impl>’s destructor tries to call delete on its stored pointer, it needs to call Impl::~Impl(), and it cannot do so if Impl is incomplete. This is a compile error.

The fix is to declare the destructor in the header and define it in the source file, after the full definition of Impl is visible:

// widget.h
class Widget {
    ...
    ~Widget();           // declared, not defined here
};

// widget.cc
#include "widget.h"      // includes the forward declaration
struct Widget::Impl { ... };    // now Impl is complete
Widget::~Widget() = default;    // now safe: Impl is complete here

The same reasoning applies to the move constructor and move assignment operator: they also need to destroy or transfer the unique_ptr, which requires Impl to be complete. Declare them in the header; default them in the source.

Note. Copy operations are omitted in the example above because unique_ptr is not copyable. If you want Widget to be copyable, you must write explicit copy operations in the source file that do a deep copy of the Impl: impl_ = std::make_unique<Impl>(*other.impl_). This is perfectly safe as long as Impl has a working copy constructor, which it usually does.

Connection to Problem 14

The pimpl idiom depends critically on std::unique_ptr. A raw pointer would work — the technique predates modern C++ — but you would need a manually written destructor, copy operations, and move operations, all correct and all in the source file. unique_ptr gets you the destructor for free (once you obey the “define in source” rule) and expresses the ownership intent clearly. Raw-pointer pimpl is the kind of code that leaks under exceptions.

Aside (CS 247 connection: information hiding). Information hiding is a first principle of software design: expose what clients need to use, conceal everything else. Pimpl takes information hiding further than the private keyword alone — private prevents access but the names of private members still appear in the header. Pimpl prevents even the names from leaking, giving the implementor freedom to change the representation without touching any client file.

Worked example: a Logger class

Example: Logger with hidden file handle and ring buffer

// logger.h
#include <memory>
#include <string>

class Logger {
public:
    explicit Logger(const std::string &path);
    ~Logger();
    Logger(Logger &&) noexcept;
    Logger &operator=(Logger &&) noexcept;

    void log(const std::string &msg);
    void flush();

private:
    struct Impl;
    std::unique_ptr<Impl> impl_;
};

// logger.cc  --  includes heavy headers that callers never see
#include "logger.h"
#include <fstream>
#include <deque>

struct Logger::Impl {
    std::ofstream             file;
    std::deque<std::string>   buffer;
    static constexpr size_t   kFlushAt = 64;
};

Logger::Logger(const std::string &path)
    : impl_{std::make_unique<Impl>()} {
    impl_->file.open(path, std::ios::app);
}

Logger::~Logger()                             = default;
Logger::Logger(Logger &&) noexcept            = default;
Logger &Logger::operator=(Logger &&) noexcept = default;

void Logger::log(const std::string &msg) {
    impl_->buffer.push_back(msg);
    if (impl_->buffer.size() >= Impl::kFlushAt) flush();
}
void Logger::flush() {
    for (auto &m : impl_->buffer) impl_->file << m << '\n';
    impl_->buffer.clear();
    impl_->file.flush();
}

The <fstream> and <deque> headers are invisible to callers. They do not pay their compile-time cost.

Trade-offs

Pimpl buys build-time isolation at two runtime costs: one extra heap allocation per Widget (for the Impl), and one extra pointer dereference on every method call. For a GUI widget called at most a few hundred times per frame, this is irrelevant. For a hot inner-loop data structure called millions of times per second, it is worth reconsidering.

The idiom is most valuable for library headers — headers that many clients include, that are developed by a separate team, and that change independently of the public interface. For internal implementation files that are only included by a handful of .cc files, the compile-time savings are small and the indirection cost might not be worth it.

Note. The pimpl idiom also has a nice side effect on exception safety. Swapping two unique_ptr<Impl> values is noexcept. This means you can implement the strong exception guarantee on any mutating method by making your changes on a copy of the Impl and swapping only on success — as discussed in the patterns chapter.

Problem 58: SFINAE Revisited

Problem 24 used tag dispatch to choose between iterator implementations at compile time. Problem 40 showed that concepts do the same job more cleanly in C++20. But millions of lines of pre-C++20 code use an older technique: SFINAE. Library headers from before 2020 are full of it. You need to be able to read it.

What SFINAE means

SFINAE stands for Substitution Failure Is Not An Error. When the compiler tries to instantiate a function template, it substitutes the template arguments into the function’s signature. If that substitution produces an ill-formed type or expression, the compiler does not emit a hard error — it simply removes that template from the overload set and moves on. The next candidate gets a chance.

Example: A simple failure

template<typename T>
typename T::value_type front(const T &c) { return c.front(); }

template<typename T>
T front(const T *arr) { return arr[0]; }

front(std::vector<int>{1, 2, 3});   // uses first overload
front(new int[3]{1, 2, 3});         // uses second: int* has no ::value_type

When T = int*, substituting into typename T::value_type produces an ill-formed type. SFINAE silently discards the first overload; the second wins.

SFINAE only applies to deduced template contexts. An explicit specialization or a plain function call that fails to compile is still a hard error.

std::enable_if: gating on a condition

std::enable_if<B, T> is a type trait with a member type equal to T when B is true, and no member type when B is false. The “no member” case triggers SFINAE.

#include <type_traits>

// Only participates in overload resolution when T is an integral type
template<typename T>
std::enable_if_t<std::is_integral_v<T>, T> square(T x) { return x * x; }

// Only participates when T is a floating-point type
template<typename T>
std::enable_if_t<std::is_floating_point_v<T>, T> square(T x) { return x * x; }

When you call square(3), the compiler considers both overloads. For the first, std::is_integral_v<int> is true, so std::enable_if_t<true, int> resolves to int — valid. For the second, std::is_floating_point_v<int> is false, so std::enable_if_t<false, int> has no type member — SFINAE discards it. The first overload wins cleanly.

std::enable\_if

std::enable_if<B, T = void> has a nested type member equal to T if and only if B is true. The shorthand std::enable_if_t<B, T> is typename std::enable_if<B, T>::type.

Common type traits

The <type_traits> header provides a large family of predicates. The most commonly used ones are:

std::is_integral_v<T> — true for int, long, bool, etc.
std::is_floating_point_v<T> — true for float, double.
std::is_same_v<T, U> — true when T and U are the same type.
std::is_base_of_v<Base, Derived> — true when Derived inherits from Base.
std::is_constructible_v<T, Args...> — true when T(Args...) is valid.
std::is_copy_constructible_v<T>, std::is_move_constructible_v<T>.
std::is_const_v<T>, std::is_reference_v<T>, std::is_pointer_v<T>.

Each of these is a constexpr bool computed entirely at compile time.

std::declval: getting a value in unevaluated context

SFINAE expressions often need to test whether a type supports a particular operation. But you cannot construct an arbitrary T just to test it — it might have no default constructor, or might have side effects. std::declval<T>() solves this: it pretends to return a T&& in an unevaluated context (inside sizeof, decltype, etc.) without actually constructing anything.

// decltype(std::declval<T>().begin()) is the type of T's begin() return value
// This expression is unevaluated -- no T is ever constructed
template<typename T>
auto first(const T &c) -> decltype(*std::declval<T>().begin()) {
    return *c.begin();
}

The void_t trick: detecting member existence

std::void_t<...> (C++17) is a type alias that maps any sequence of well-formed types to void. Its value is that the attempt to form the types triggers SFINAE.

The idiom uses partial template specialization to detect whether a type has a particular member or operation:

// Primary template: T does not have begin() by default
template<typename T, typename = void>
struct has_begin : std::false_type {};

// Specialization: enabled only when T().begin() is well-formed
template<typename T>
struct has_begin<T, std::void_t<decltype(std::declval<T>().begin())>>
    : std::true_type {};

static_assert( has_begin<std::vector<int>>::value);
static_assert(!has_begin<int>::value);

When T = std::vector<int>, the expression declval<std::vector<int>>().begin() is well-formed. void_t of a well-formed type is void. The specialization matches and has_begin<vector<int>> inherits from std::true_type.

When T = int, declval<int>().begin() is ill-formed. SFINAE discards the specialization; the primary template (with std::false_type) wins.

SFINAE vs. tag dispatch

Problem 24 used tag dispatch: pass a tag object to an overloaded helper function and let overload resolution pick the right one. SFINAE is a different mechanism: exclude overloads from the candidate set before overload resolution even begins.

In practice, either technique works. Tag dispatch is often clearer because the selection logic lives in the function bodies and is readable prose. SFINAE is more powerful because it can gate on arbitrary compile-time expressions, not just a type’s tag.

How concepts (Problem 40) replace all of this

In C++20, the entire pattern becomes:

template<std::integral T>
T square(T x) { return x * x; }

template<std::floating_point T>
T square(T x) { return x * x; }

The requires clause is semantically equivalent to enable_if but readable without a decoder ring. The compiler error messages are also dramatically better: instead of a wall of template instantiation failures, you get “constraint not satisfied”.

Worked example: detecting operator« for ostream

Example: A streamable trait

We want a type trait is_streamable<T> that is true when std::cout << t is a valid expression.

#include <iostream>
#include <type_traits>

template<typename T, typename = void>
struct is_streamable : std::false_type {};

template<typename T>
struct is_streamable<T,
    std::void_t<decltype(std::declval<std::ostream &>()
                         << std::declval<T>())>>
    : std::true_type {};

template<typename T>
std::enable_if_t<is_streamable<T>::value>
printIfPossible(const T &v) {
    std::cout << v << '\n';
}

template<typename T>
std::enable_if_t<!is_streamable<T>::value>
printIfPossible(const T &) {
    std::cout << "(not printable)\n";
}

is_streamable<int>::value is true; is_streamable<struct Opaque>::value is false. The right printIfPossible overload is selected without a runtime branch.

SFINAE on function return types vs. dummy parameters

There are two common places to put enable_if_t: in the return type, and as a defaulted non-type template parameter.

// Style 1: in the return type
template<typename T>
std::enable_if_t<std::is_integral_v<T>, T> square(T x) { return x*x; }

// Style 2: as a defaulted non-type parameter
template<typename T, std::enable_if_t<std::is_integral_v<T>, int> = 0>
T square(T x) { return x*x; }

Style 1 is cleaner for functions with a meaningful return type. Style 2 is needed when the return type is void or is fixed regardless of the condition. Both achieve the same SFINAE effect. You will see both in real library headers; knowing that they are equivalent makes reading them less painful.

Conditional base classes with std::conditional

One more tool worth knowing: std::conditional<B, T, F> selects type T when B is true and type F when B is false. It is if-then-else for types.

// A buffer that is either thread-safe or not, chosen at compile time
template<bool ThreadSafe>
class Buffer {
    // Use a mutex only in the thread-safe variant
    using LockType = std::conditional_t<ThreadSafe,
                                        std::mutex,
                                        DummyMutex>;
    LockType lock_;
    std::vector<char> data_;
public:
    void write(const char *p, size_t n) {
        std::lock_guard g{lock_};  // no-op if DummyMutex
        data_.insert(data_.end(), p, p + n);
    }
};

std::conditional_t is used pervasively in the standard library’s own implementation to select between two strategies at compile time. Together with enable_if and void_t, it forms the core toolkit of pre-concepts template metaprogramming.

Note. In new code targeting C++20 or later, use concepts. Read SFINAE to understand pre-C++20 library headers — the standard library’s own implementation of std::vector, std::optional, and std::variant is full of it. Being able to parse enable_if_t and void_t is a practical skill for anyone who needs to understand or debug template-heavy library code.

Problem 59: Policy-Based Design

Problem 27 used CRTP to inject behavior at compile time, avoiding vtable overhead. Problem 29 showed how multiple inheritance assembles behaviors from a collection of mixin classes. There is a third compositional technique that sits between the two: policy-based design, where each behavior dimension is expressed as a template parameter that provides a statically-known interface.

The term was popularized by Andrei Alexandrescu in Modern C++ Design (2001). The ideas remain relevant today, especially in library code where generality and zero overhead must coexist.

The idea: behavior as a template parameter

Instead of hardcoding a behavior in a class, you parameterize the class on a policy class that provides that behavior. The host class calls into the policy through a known interface; the policy class is a regular class, not a virtual base. The binding happens at compile time — no vtable.

Policy Class

A class that provides a specific behavior to a host class via a statically-known interface (a set of methods and/or types that the host calls). Policies are chosen at instantiation time by passing them as template arguments. The host class knows only that a policy provides the required interface; it does not know — or care — which concrete policy it is.

A simple example: Logger with output and filter policies

Consider a Logger class. There are two independent behavior dimensions:

Output policy: where messages go (stdout, file, syslog).
Filter policy: which messages to pass (all, errors only, debug).

We could handle this with inheritance or runtime switches. Or we can parameterize:

struct StdoutOutput {
    void write(const std::string &msg) { std::cout << msg << '\n'; }
};

struct FileOutput {
    std::ofstream file;
    FileOutput(const std::string &path) : file{path, std::ios::app} {}
    void write(const std::string &msg) { file << msg << '\n'; }
};

struct AllMessages {
    bool accept(const std::string &) { return true; }
};

struct ErrorsOnly {
    bool accept(const std::string &msg) {
        return msg.find("[ERROR]") != std::string::npos;
    }
};

template<typename OutputPolicy, typename FilterPolicy>
class Logger : private OutputPolicy, private FilterPolicy {
public:
    using OutputPolicy::OutputPolicy;   // inherit constructors if needed

    void log(const std::string &msg) {
        if (FilterPolicy::accept(msg))
            OutputPolicy::write(msg);
    }
};

// Instantiate concrete loggers at zero overhead
using ConsoleLogger = Logger<StdoutOutput, AllMessages>;
using ProductionLog = Logger<FileOutput,   ErrorsOnly>;

The call to FilterPolicy::accept(msg) and OutputPolicy::write(msg) are direct method calls on the base class — no virtual dispatch, no indirection. The optimizer can inline them.

Private inheritance as composition

Why private inheritance rather than data members? Both work. Private inheritance enables the empty base optimization (EBO): if a policy class has no data members, the compiler is permitted to give it zero size when used as a base class, but not when used as a data member. Before C++20 and [[no_unique_address]], EBO was the only way to pay zero bytes for a stateless policy.

// with data member: sizeof(Logger) = sizeof(StdoutOutput) + sizeof(ErrorsOnly)
//   but both are empty structs, so this could be 2 bytes
// with private inheritance: sizeof(Logger) could be 1 byte (EBO)

In C++20 you can write:

template<typename OutputPolicy, typename FilterPolicy>
class Logger {
    [[no_unique_address]] OutputPolicy out_;
    [[no_unique_address]] FilterPolicy filt_;
public:
    void log(const std::string &msg) {
        if (filt_.accept(msg)) out_.write(msg);
    }
};

The [[no_unique_address]] attribute (Problem 61) tells the compiler it is allowed to give the member zero size if the type is empty. This achieves EBO without private inheritance, which is usually cleaner.

Mixing policies with CRTP

Policies and CRTP compose naturally. Suppose a policy needs to call back into the host class:

template<typename Derived>
struct PrefixFilter {
    bool accept(const std::string &msg) {
        // ask the host what prefix to require
        return msg.find(static_cast<Derived*>(this)->prefix()) == 0;
    }
};

template<typename OutputPolicy>
class MyLogger
    : private OutputPolicy
    , private PrefixFilter<MyLogger<OutputPolicy>> {
public:
    std::string prefix() const { return "[App]"; }
    void log(const std::string &msg) {
        if (PrefixFilter<MyLogger>::accept(msg))
            OutputPolicy::write(msg);
    }
};

The policy calls static_cast<Derived*>(this)->prefix() — the same upcast that CRTP uses. No virtual call.

The cost: code size

Each combination of policies creates a distinct instantiation. If you have three output policies and three filter policies, you have nine potential Logger types. Each one that you actually instantiate gets its own copy of the compiled log method. The binary grows.

For a library used in many configurations, this can matter. The trade-off is always: zero overhead vs. binary size. In practice, if constexpr and explicit instantiation can reduce duplication, and the inliner often merges identical bodies anyway.

Constraining policies with concepts

In C++20, you can express what a policy must provide as a concept (Problem 40):

template<typename P, typename Msg>
concept OutputPolicyConcept = requires(P p, const Msg &m) {
    { p.write(m) } -> std::same_as<void>;
};

template<typename OutputPolicy, typename FilterPolicy>
    requires OutputPolicyConcept<OutputPolicy, std::string>
class Logger : private OutputPolicy, private FilterPolicy { ... };

Now if you accidentally pass a type that lacks write, the error message names the unsatisfied concept rather than burying the failure inside a template instantiation chain.

Aside (CS 247 connection: open/closed principle). Policy-based design is one of the purest expressions of the Open/Closed Principle: the Logger host class is closed for modification but open for extension — you extend it by providing new policy classes, not by editing the Logger template itself.

Worked example: configurable allocator

Example: A fixed-size vs. heap allocator policy

template<size_t N>
struct FixedArena {
    alignas(std::max_align_t) char buf[N];
    size_t used = 0;
    void *allocate(size_t n) {
        if (used + n > N) throw std::bad_alloc{};
        void *p = buf + used; used += n; return p;
    }
    void deallocate(void *, size_t) {}  // no-op: arena is freed all at once
};

struct HeapAlloc {
    void *allocate(size_t n)     { return ::operator new(n); }
    void  deallocate(void *p, size_t) { ::operator delete(p); }
};

template<typename T, typename AllocPolicy>
class Pool : private AllocPolicy {
    // ... uses AllocPolicy::allocate / deallocate
public:
    T *get()  { return static_cast<T*>(AllocPolicy::allocate(sizeof(T))); }
    void put(T *p) { AllocPolicy::deallocate(p, sizeof(T)); }
};

Pool<int, FixedArena<1024>> stackPool;   // all allocations from a 1 KB buffer
Pool<int, HeapAlloc>         heapPool;   // normal heap

Policy-based design in the standard library

The standard library itself uses policy-based design throughout. The most visible example is the allocator policy in containers:

template<typename T, typename Allocator = std::allocator<T>>
class vector { /* ... */ };

// Use a custom pool allocator:
vector<int, PoolAllocator<int>> poolVec;
// Use the default heap allocator:
vector<int> heapVec;

The allocator interface (allocate, deallocate, construct, destroy) is the policy contract. Any type that satisfies it — the standard heap allocator, a pool allocator, a debug-tracking allocator, an arena allocator — can be plugged in. The vector class itself is unchanged.

Another example: std::basic_string<CharT, Traits, Allocator>. The Traits parameter is a character-traits policy that specifies how to compare, copy, and assign characters. std::char_traits<char> is the default; a case-insensitive string can be built by providing a custom traits class.

Aside (CS 247 connection: the single responsibility principle). Policy-based design enforces SRP at the type level: each policy class is responsible for exactly one axis of behavior. The host class is responsible only for the high-level algorithm. If you later find that you need a different output strategy, you replace the output policy — not the host class. SRP is not just a principle for methods; it applies to template parameters too.

Note. Policy-based design is most useful in library code that must be both general and zero-overhead: allocators, smart pointers, hash tables. For application-level code where you have one or two concrete configurations and runtime flexibility matters more than the last few nanoseconds, virtual dispatch is usually the right trade-off — it is simpler to understand and to change.

Problem 60: Reference Qualifiers on Methods

Problem 5 introduced rvalue references on function parameters to distinguish moving from copying. Problem 6 qualified member functions with const to distinguish read-only from mutating access. Both of those qualifiers describe something about the object being passed in or held. There is a third dimension: the value category of *this. A method can be called on an lvalue (a named, persistent object) or on an rvalue (a temporary that is about to be destroyed). Reference qualifiers let you distinguish those two cases.

Motivation: references to temporaries

Consider a person type with a name accessor:

class Person {
    std::string name_;
public:
    std::string &name() { return name_; }
};

The unqualified name() works fine when called on a named Person:

Person p{"Alice"};
p.name() = "Bob";   // fine: p is a persistent object

But it also compiles silently in a trap:

getPerson().name() = "Bob";   // getPerson() returns a temporary!

The assignment writes into a string that belongs to a temporary Person. That temporary is destroyed at the semicolon — the write is lost. This is a bug that the compiler does not warn about in the unqualified case.

Ref-qualifiers

C++11 added ref-qualifiers: & and && placed after the parameter list (and after const if present) to restrict which value category of *this the method accepts.

class Person {
    std::string name_;
public:
    // lvalue version: returns a reference into the (persistent) object
    std::string &name() &  { return name_; }

    // rvalue version: the object is expiring -- extract the string by move
    std::string  name() && { return std::move(name_); }
};

Now:

Person p{"Alice"};
p.name() = "Bob";                    // calls & overload -- fine
std::string n = getPerson().name();  // calls && overload -- moves out

If you try getPerson().name() = "Bob", the && overload returns a value (std::string), not a reference. Assigning to that value is a compile error — the bug is caught.

Ref-qualifier

A & or && qualifier placed after the parameter list of a member function restricts that overload to be called only on lvalue expressions (&) or only on rvalue expressions (&&) of *this. If no ref-qualifier is present, the function may be called on either.

Interaction with const

Ref-qualifiers stack with const. You can have up to four overloads:

class Widget {
public:
    void process() &;         // lvalue, non-const
    void process() const &;   // lvalue, const
    void process() &&;        // rvalue, non-const
    void process() const &&;  // rvalue, const (rare)
};

In practice you rarely need all four. The common combinations are:

const & and &&: the const & overload handles both const lvalues and (in the absence of the && overload) rvalues. Add the && overload specifically to handle the temporary case differently.
Just &: forbid calling the method on a temporary at all.

The standard library uses this

std::optional<T> overloads value() on all four combinations so that dereferencing an optional temporary can move the contained value out:

std::optional<std::string> maybeGet();

std::string s = maybeGet().value();  // calls value() &&, moves the string

Without the && overload, this would copy the string out of a temporary that is about to be destroyed — wasteful. With it, the move is free.

The fluent builder pattern

Ref-qualifiers are natural in fluent builder APIs, where you chain calls:

class QueryBuilder {
    std::string sql_;
public:
    QueryBuilder &select(const std::string &cols) & {
        sql_ += "SELECT " + cols + " ";
        return *this;                  // chain on lvalue
    }
    QueryBuilder &&select(const std::string &cols) && {
        sql_ += "SELECT " + cols + " ";
        return std::move(*this);       // chain on rvalue
    }
    QueryBuilder &from(const std::string &table) & {
        sql_ += "FROM " + table; return *this;
    }
    QueryBuilder &&from(const std::string &table) && {
        sql_ += "FROM " + table; return std::move(*this);
    }
    std::string build() && { return std::move(sql_); }
};

QueryBuilder qb;
qb.select("*").from("users");   // lvalue chain: modifies qb in place

std::string q = QueryBuilder{}.select("id").from("orders").build();
//  rvalue chain: each step moves the builder forward; build() extracts

Worked example: a Resource wrapper

Example: Extracting a resource from an rvalue

class Resource {
    int *data_;
    size_t size_;
public:
    explicit Resource(size_t n)
        : data_{new int[n]{}}, size_{n} {}
    ~Resource() { delete[] data_; }

    // lvalue: return a view -- do not take ownership
    const int *data() const &  { return data_; }
    size_t     size() const &  { return size_; }

    // rvalue: move the pointer out -- caller takes ownership
    int  *release() && {
        int *p = data_;
        data_ = nullptr;
        size_ = 0;
        return p;
    }
};

Resource makeResource(size_t n) { return Resource{n}; }

const int *view = makeResource(10).data();  // OK: const & overload
// but:
int *owned = makeResource(10).release();    // moves out; caller must delete[]

The const & overloads of data() and size() are available on both lvalues and rvalues (since a const & ref-qualifier binds both). The && overload of release() is only callable on a temporary — which is the right restriction, since you would not want to silently drain a named Resource.

The interaction with move semantics

Ref-qualifiers and move semantics (Problem 5) collaborate naturally. When a member function is called on an rvalue object, the object is expiring — it will not be used again. The && overload can therefore steal from *this using std::move(member_) without risk.

The standard library applies this broadly. std::string::operator+ has a && qualified overload that can reuse the storage of a temporary string on the left-hand side instead of allocating a new buffer. std::optional::value() returns T&& from a temporary optional, enabling move-out with no extra copy. These are not micro-optimizations: they are the difference between allocating and not allocating in common expressions like string concatenation.

std::string a = "Hello, ";
std::string b = std::move(a) + "world";   // rvalue + has && overload:
                                           // no new allocation -- reuses a's buffer

Aside (CS 343 connection: ownership transfer). The && ref-qualifier formalizes a transfer of ownership at the language level. In CS 343, resource handoff between coroutines was managed explicitly through invariant protocols. Ref-qualifiers let the type system enforce those protocols: a method decorated with && can only be invoked when the object is about to be abandoned, giving the method permission to plunder it.

Note. An important caveat: if you add a ref-qualifier to any overload of a member function, you must add a ref-qualifier to all overloads of that function. Mixing qualified and unqualified overloads of the same name is ill- formed. Plan out which qualifications you need before writing the first overload.

Problem 61: Attributes and Contracts

C++11 introduced a new syntax for attaching standardized annotations to declarations: [[attribute]]. These are not comments. Comments are invisible to the compiler; attributes participate in the grammar and the compiler can — and sometimes must — act on them. They are a disciplined way to communicate intent that both humans and tools can read.

[[nodiscard]]: you should not ignore this

The most practically important attribute. When a function is declared [[nodiscard]], the compiler emits a warning if the caller discards the return value without using it.

[[nodiscard]] bool write(const char *buf, size_t len);
[[nodiscard]] std::unique_ptr<Widget> makeWidget();

write(buf, n);          // warning: ignoring return value of 'write'
makeWidget();           // warning: ignoring return value of 'makeWidget'
bool ok = write(buf, n);    // no warning
auto w  = makeWidget();     // no warning

This catches a class of bugs that are otherwise silent. The canonical example is an error-code return: if you call write and ignore whether it succeeded, you have silently swallowed a potential failure.

In C++20, you can add a message:

[[nodiscard("ignoring the error code silently swallows failures")]]
bool write(const char *buf, size_t len);

The message appears in the compiler’s warning output, directing the programmer to the reason, not just the fact.

When to use [[nodiscard]]. Put it on:

Functions that return error codes or status values.
Functions that return a resource handle — the returned unique_ptr is the ownership; ignoring it immediately leaks the resource.
Pure computations with no observable side effects, where calling without using the result is certainly a mistake.

Aside (CS 245 connection: Hoare triples on the call site). In CS 245, a Hoare triple {P} f {Q} specifies that after f runs, the postcondition Q holds. [[nodiscard]] is a machine-checked postcondition on the call site: it asserts that the caller must use the returned value. It is the closest C++ gets to enforcing a Hoare triple without a separate verification tool.

[[maybe_unused]]: silence intentional unused warnings

The compiler warns about unused variables and parameters. Sometimes an unused variable is intentional: a value computed only in a NDEBUG build, or a parameter that a particular override does not need but the interface requires.

void logIfDebug([[maybe_unused]] const std::string &msg) {
#ifndef NDEBUG
    std::cerr << msg << '\n';
#endif
    // In release builds, msg is unused -- but that is intentional
}

template<typename T>
void process(T &value, [[maybe_unused]] int flags) {
    // flags used only in some specializations
    value.process();
}

[[maybe_unused]] suppresses the warning without removing the variable and without resorting to the classic “cast to void” workaround ((void)flags;).

[[likely]] and [[unlikely]]: branch predictor hints

Modern processors speculatively execute branches. When a branch is almost always taken one way, hinting the compiler can improve branch-prediction accuracy and instruction cache usage.

int safeDivide(int a, int b) {
    if ([[unlikely]] b == 0) return 0;   // rarely zero -- cold path
    return a / b;                         // hot path
}

void processPackets(const Packet *p, size_t n) {
    for (size_t i = 0; i < n; ++i) {
        if ([[likely]] p[i].isValid()) {
            handle(p[i]);
        }
    }
}

These attributes are optimizer hints only. They never change the observable semantics of the program; they only influence code layout. Use them sparingly, and only after profiling confirms that a branch is a bottleneck.

[[deprecated]]: retiring an API

When you replace part of your API, you want to warn callers that the old interface is going away without breaking them immediately.

[[deprecated("use sendMessage(Packet) instead")]]
void send(const char *buf, size_t len);

void sendMessage(Packet p);  // the new API

Any call site that uses send receives a compiler warning naming the replacement. The old function continues to compile and link; no existing code breaks. The deprecation gives callers time to migrate.

[[fallthrough]]: intentional switch fall-through

The compiler warns when a case falls through to the next without a break or return, because unintentional fall-through is a common bug. When the fall-through is intentional, annotate it:

switch (event) {
    case MOUSE_DOWN:
        recordPress();
        [[fallthrough]];    // intentional: MOUSE_DOWN also triggers the click handler
    case MOUSE_CLICK:
        handleClick();
        break;
    case MOUSE_UP:
        recordRelease();
        break;
}

The attribute silences the warning and makes the intent clear to the next reader.

[[no_unique_address]]: enabling the empty base optimization on members

Problem 59 used private inheritance to get the empty base optimization for stateless policies. In C++20, [[no_unique_address]] achieves the same result for data members:

struct EmptyFilter {};   // stateless policy: zero bytes

template<typename Alloc, typename Filter>
struct Container {
    [[no_unique_address]] Alloc alloc_;    // zero bytes if Alloc is empty
    [[no_unique_address]] Filter filter_;  // zero bytes if Filter is empty
    int *data_;
    size_t size_;
};

// With two empty policies: sizeof(Container<EmptyFilter,EmptyFilter>) == sizeof(int*)+sizeof(size_t)

Without [[no_unique_address]], every member must have a distinct address even if it is an empty type, so the compiler must pad it to at least 1 byte.

Worked example: annotating a file I/O class

Example: Annotated FileWriter

class FileWriter {
public:
    [[nodiscard("file may not have opened")]]
    bool open(const std::string &path);

    [[nodiscard("check for short write or error")]]
    bool write(const void *buf, size_t len);

    void close();

    [[deprecated("use write(buf, len) instead; this overload copies unnecessarily")]]
    bool writeString(const std::string &s);
};

void example(FileWriter &fw) {
    fw.open("out.bin");    // warning: ignored return value
    fw.write(buf, 4);      // warning: ignored return value

    bool ok = fw.open("out.bin");   // fine
    if (!ok) return;
    if (!fw.write(buf, 4)) { /* handle error */ }
}

Applying [[nodiscard]] to types

In C++17 and later, [[nodiscard]] can be applied not just to functions but to entire types. If a type is marked [[nodiscard]], any function that returns it by value produces a warning whenever the return value is discarded, even if the function itself is not marked.

[[nodiscard]] struct ErrorCode {
    int code;
    bool ok() const { return code == 0; }
};

ErrorCode doSomething();   // no [[nodiscard]] on the function itself

void caller() {
    doSomething();   // warning: return value of type ErrorCode should not be ignored
}

This is useful for error-code types, optional result types, and resource handles — any type where ignoring the return is almost certainly a bug. You annotate the type once and every function returning it automatically gains the warning.

Note. Attributes are not macros. They participate in the language grammar and are defined by the standard. Vendor-specific attributes exist but must be written in a vendor namespace to avoid name collisions: [[gnu::noinline]], [[clang::warn_unused_result]], [[msvc::forceinline]]. Standard attributes (nodiscard, maybe_unused, likely, etc.) work across all conforming compilers.

Problem 62: User-Defined Literals

Problem 43 used duration literals to write 5s and 200ms instead of std::chrono::seconds{5} and std::chrono::milliseconds{200}. These look like magic compiler syntax, but they are not. They are user-defined literals: a language feature that lets you attach meaning to suffix notation on numeric and string constants. And you can create them for your own types.

The syntax: operator"" _suffix

A user-defined literal is defined as a function named operator"" followed by the suffix. The suffix determines the function that the compiler calls when it sees the literal in source code.

There are four forms, corresponding to the four kinds of literal:

// Integer literal: 42_km
Distance operator""_km(unsigned long long x);

// Floating-point literal: 3.14_km
Distance operator""_km(long double x);

// String literal: "Paris"_city
City operator""_city(const char *s, size_t n);

// Character literal: 'A'_enc
Encoded operator""_enc(char c);

The compiler chooses which overload to call based on the type of the literal. Integer constants call the unsigned long long overload; floating-point constants call the long double overload.

The underscore convention

Standard library literals use no underscore: s (string or seconds), ms (milliseconds), i (imaginary), sv (string_view), etc. User-defined literals must start with an underscore. This avoids collisions: a future standard library version might define km for the same purpose, and your code would not break because your suffix is _km.

In practice, put user-defined literals in a namespace and require a using namespace to activate them — exactly how the standard library does it:

namespace my_units {
    inline Distance operator""_km(long double x) { return Distance{x * 1000.0}; }
    inline Distance operator""_m (long double x) { return Distance{x};          }
    inline Distance operator""_cm(long double x) { return Distance{x / 100.0};  }
}

using namespace my_units;
auto d = 5.0_km + 200.0_m;   // type-safe arithmetic

A type-safe unit system

The real power of user-defined literals is not the pretty syntax alone — it is that they let you construct a strongly-typed value from a raw number, preventing unit mismatches.

class Distance {
    double metres_;
public:
    explicit Distance(double m) : metres_{m} {}
    double metres() const { return metres_; }
    Distance operator+(Distance o) const {
        return Distance{metres_ + o.metres_};
    }
    bool operator<(Distance o) const { return metres_ < o.metres_; }
};

namespace units {
    Distance operator""_km(long double x) { return Distance{(double)x * 1000.0}; }
    Distance operator""_m (long double x) { return Distance{(double)x};          }
    Distance operator""_cm(long double x) { return Distance{(double)x / 100.0};  }
}

using namespace units;

Distance marathon = 42.195_km;
Distance sprint   = 100.0_m;
Distance total    = marathon + sprint;    // type-safe: both are Distance
// double wrong = 42.195 + 100.0;        // dimensionless -- easy to misuse

The literal syntax makes the intent clear at the point of use. The compiler enforces correctness: you cannot accidentally add metres to kilograms.

The standard library literals

You have already used several:

"hello"s — std::string{"hello"} (from <string>, using namespace std::string_literals)
"hello"sv — std::string_view{"hello"} (from <string_view>, using namespace std::string_view_literals)
5s, 200ms, 1h — std::chrono durations (from <chrono>, using namespace std::chrono_literals)
3.0if, 4.2i — std::complex<double> (from <complex>, using namespace std::complex_literals)

Compile-time literals with consteval

In C++20, you can declare the operator consteval to force evaluation at compile time. This is useful for validated literals: parsing and validating a string at compile time so that ill-formed literals are a compile error, not a runtime exception.

consteval int parsePort(const char *s, size_t n) {
    int port = 0;
    for (size_t i = 0; i < n; ++i) {
        if (s[i] < '0' || s[i] > '9')
            throw "port must be all digits";  // compile error if taken
        port = port * 10 + (s[i] - '0');
    }
    if (port < 1 || port > 65535)
        throw "port out of range";
    return port;
}

consteval int operator""_port(const char *s, size_t n) {
    return parsePort(s, n);
}

constexpr int http  = "80"_port;    // OK: 80
constexpr int bad   = "99999"_port; // compile error: port out of range

The validation happens entirely at compile time. Any port number that would fail at runtime is instead a compile error.

Aside (CS 146 connection: reader macros and syntax extension). In Racket, reader macros let you extend the syntax of the language itself: #42 can be made to mean anything you define. Haskell’s Template Haskell does the same with compile-time code generation. User-defined literals in C++ are a narrower but safer version of the same idea: you can attach new meaning to literal syntax without arbitrary metaprogramming.

Worked example: money type with currency suffix

Example: USD and EUR literals

enum class Currency { USD, EUR };

class Money {
    long long cents_;
    Currency  currency_;
public:
    Money(long long cents, Currency c) : cents_{cents}, currency_{c} {}
    long long cents()    const { return cents_; }
    Currency  currency() const { return currency_; }
    Money operator+(Money o) const {
        if (currency_ != o.currency_) throw std::runtime_error{"currency mismatch"};
        return Money{cents_ + o.cents_, currency_};
    }
};

namespace money_literals {
    Money operator""_usd(long double amount) {
        return Money{(long long)(amount * 100 + 0.5), Currency::USD};
    }
    Money operator""_eur(long double amount) {
        return Money{(long long)(amount * 100 + 0.5), Currency::EUR};
    }
}

using namespace money_literals;
Money price  = 9.99_usd;
Money tax    = 0.80_usd;
Money total  = price + tax;   // fine: both USD
// Money bad = price + 5.0_eur;  // runtime error: currency mismatch

Template literal operators

For integer literals, there is a fourth form that receives the digits as individual characters at compile time, enabling arbitrary compile-time processing:

template<char... Digits>
constexpr int operator""_bin() {
    // convert binary digits to int at compile time
    int result = 0;
    for (char c : {Digits...}) {
        result = result * 2 + (c - '0');
    }
    return result;
}

constexpr int x = 1010_bin;   // == 10 at compile time
constexpr int y = 1111_bin;   // == 15

The template receives every character of the literal as a separate template argument, before any type conversion. This makes it possible to validate or transform arbitrary numeric literals at compile time — binary, hexadecimal with digit separators, or any custom format. This form is unusual in practice but occasionally very useful for domain-specific numeric encodings.

Namespace discipline for literal operators

Because suffixes occupy a flat global namespace, user-defined literals without namespaces create collision risks. The idiomatic pattern is to define literals inside a namespace and provide an inline namespace for opt-in activation:

namespace units {
    inline namespace literals {
        Distance operator""_km(long double x);
        Distance operator""_m (long double x);
    }
}

// Users can opt in at the file level:
using namespace units::literals;

// Or import the whole units namespace, which automatically includes literals:
using namespace units;

The standard library uses exactly this pattern: std::chrono_literals, std::string_literals, and std::string_view_literals are all inline namespaces inside std. This is why using namespace std::literals brings all the standard literals into scope at once, while using namespace std::chrono_literals brings only the chrono ones.

Note. User-defined literals should be used for pure type decoration and unit conversion. They are cleanest when the operator body is a single constructor call. Avoid complex processing in a literal operator — if the conversion is non-trivial, provide a named constructor or factory function instead and use the literal only as a convenience shorthand.

Problem 63: Modules

Problem 2 built the separate-compilation model: declarations in headers, definitions in .cc files, include guards to prevent double inclusion. This model has served C++ for forty years — and it is terrible. It is slow, order-dependent, and it leaks macros across translation units. C++20 fixes it.

What is wrong with #include

The preprocessor’s #include directive is textual inclusion. When you write #include <vector>, the preprocessor pastes the entire contents of vector into your translation unit. On a typical implementation, #include <iostream> alone adds tens of thousands of lines to your file. A non-trivial header tree can add hundreds of thousands. Every .cc file in your project pays this cost independently.

Three specific problems:

Speed. The compiler re-parses those lines for every translation unit. Precompiled headers help, but they are fragile — the wrong #define before the inclusion breaks the cache.
Macro leakage. A macro defined in foo.h is visible in everything that includes foo.h, directly or transitively. This is a global namespace pollution with no opt-out.
Order sensitivity. Including headers in the wrong order can change behavior. A macro defined in one header can alter the meaning of a name in the next. This makes builds fragile and hard to reproduce.

The module model

A C++20 module is a named, self-contained unit that explicitly declares what it exports. The compiler processes it once and stores a precompiled interface. Importers get the exported declarations without any of the implementation, the macros, or the non-exported names.

A module interface unit declares the module and its exports:

// mylib.ixx  (or mylib.cppm -- compiler-dependent extension)
export module mylib;

export void greet(const std::string &name);
export int  add(int a, int b);

// internal helper -- NOT exported; invisible to importers
static void log(const std::string &msg) { /* ... */ }

A module implementation unit provides the definitions:

// mylib.cc
module mylib;   // claims ownership -- no 'export' here

void greet(const std::string &name) {
    log("greeting " + name);
    std::cout << "Hello, " << name << '\n';
}

int add(int a, int b) { return a + b; }

A consumer simply imports:

// main.cc
import mylib;

int main() {
    greet("world");
    return add(1, 2);
}

No include guard. No macro leakage — any #define inside the module implementation is invisible to the importer. The compiler processes the module interface once and reuses the result for every importer.

What modules provide

Isolation. Macros defined in a module do not leak. Names not explicitly exported are not visible to importers.
Speed. The module interface is compiled once to a binary module interface (BMI) file. Importers read the BMI — much faster than re-parsing thousands of lines of source text.
Correct dependency tracking. The module system knows exactly what each module depends on. Build systems can rebuild only what changed.
No order sensitivity. A module’s semantics do not depend on what macros happened to be defined when it was imported.

Module partitions

Large modules can be split into partitions:

// mylib-utils.ixx
export module mylib:utils;   // a partition of mylib
export std::string trim(std::string s);

// mylib.ixx
export module mylib;
export import :utils;        // re-export the partition
export void greet(const std::string &name);

Partitions let you organize a module across multiple files without splitting it into separate modules. Importers see a single import mylib;.

Header units: the compatibility bridge

You cannot rewrite every third-party header overnight. The header unit bridge lets you import legacy headers through the module syntax:

import <vector>;       // treat <vector> as a header unit
import "mylegacy.h";   // treat a local header as a header unit

Header units are compiled as if they were modules. Macros still leak from header units (unlike true modules), but they gain some of the speed benefits of precompilation.

C++23: import std

C++23 defines standard library modules. Instead of including dozens of headers, you write:

import std;   // the entire C++23 standard library in one import

This replaces every #include for standard library types. In exchange you get fast compilation and no macro pollution from the standard library implementation.

Aside (CS 241 connection: separate compilation and linking). CS 241 showed how the assembler and linker combine separately compiled object files. Modules extend that model: the BMI file is like a header that has already gone through the compiler front end. The module interface is the ABI boundary — what names are exported, what their types are — and the linker only needs to resolve the exported symbols.

Worked example: convert the Node/List from Problem 2

Example: Node and List as a module

Problem 2 gave us a singly-linked list split across node.h, node.cc, list.h, list.cc. Here is the module equivalent.

// list.ixx  -- module interface
export module list;
import std;

export struct Node {
    int data;
    Node *next;
};

export class List {
    Node *head = nullptr;
public:
    List() = default;
    ~List();
    void push_front(int x);
    void pop_front();
    bool empty() const { return head == nullptr; }
    int  front()  const;
};

// list.cc  -- module implementation
module list;

List::~List() { while (!empty()) pop_front(); }

void List::push_front(int x) {
    head = new Node{x, head};
}

void List::pop_front() {
    if (empty()) return;
    Node *tmp = head;
    head = head->next;
    delete tmp;
}

int List::front() const { return head->data; }

// main.cc
import list;
int main() {
    List l;
    l.push_front(3); l.push_front(2); l.push_front(1);
    while (!l.empty()) {
        std::cout << l.front() << ' ';
        l.pop_front();
    }
}

No include guards, no forward-declaration headers, no risk of including node.h before list.h.

Compiler support and adoption

Modules are fully supported in GCC 11+, Clang 16+, and MSVC 2019 (v16.8+). Build system support (CMake, Meson, Bazel) has matured significantly since 2022. As of 2024, most large projects are still header-based, but new projects and active modernization efforts are adopting modules. Expect the transition to accelerate as toolchain support matures.

Note. Modules and #include can coexist in the same project. Migration is incremental: start by converting your most-included internal headers to modules, and leave third-party headers as header units or plain #includes. You do not need to convert everything at once.

Problem 64: Coroutines I: Generators

Problem 8 built a custom iterator for Vector by hand: a class with operator++, operator*, begin(), and end(). That was a substantial amount of code to express a simple concept: produce a sequence of values one at a time. Problem 41 gave us lazy views over existing ranges. C++20 goes further with coroutines: a way to write an iterator as a simple loop body, letting the language machinery handle the suspension and resumption.

What is a coroutine?

A regular function runs from entry to return in one shot. Once it returns, it is gone; its stack frame is destroyed. A coroutine can suspend execution mid-way, yield a value to its caller, and later be resumed from exactly where it left off.

// Regular function
int makeValue() {
    int x = compute();
    return x;     // called once, done
}

// Coroutine
std::generator<int> makeValues() {
    while (true) {
        int x = compute();
        co_yield x;   // suspend here, give x to caller; resume later
    }
}

The co_yield keyword is what makes makeValues a coroutine. When the caller asks for the next value, the coroutine runs until the next co_yield, hands the value back, and freezes — all its local state intact — until the caller asks again.

Coroutine

A function that can suspend its execution at a co_yield (or co_await) expression, returning a value to the caller, and be resumed from that point later. The coroutine’s local variables survive across suspensions; they are stored in a coroutine frame on the heap.

co_yield: the generator pattern

The most intuitive use of coroutines is generating sequences. Here is an infinite Fibonacci generator:

#include <generator>   // C++23: std::generator

std::generator<int> fibonacci() {
    int a = 0, b = 1;
    while (true) {
        co_yield a;
        auto c = a + b;
        a = b;
        b = c;
    }
}

int main() {
    for (int n : fibonacci() | std::views::take(10)) {
        std::cout << n << ' ';
    }
    // prints: 0 1 1 2 3 5 8 13 21 34
}

Without coroutines, producing Fibonacci lazily requires a class with state fields a and b, an operator++ that advances them, and an operator* that returns the current value. The coroutine version holds that state as ordinary local variables. The structure of the computation is visible in the code rather than distributed across iterator methods.

Under the hood: the coroutine frame

When the compiler sees a function containing co_yield, it transforms the function into a state machine. The local variables that must survive across suspension points are lifted into a heap-allocated coroutine frame (also called the coroutine state).

The state machine looks roughly like this:

// What the compiler generates (conceptually):
struct fibonacci_frame {
    int a, b;       // locals that survive suspension
    int suspend_point;  // which co_yield we last suspended at
    int yielded_value;
};

Each call to operator++ on the generator’s iterator resumes the coroutine, running it until the next co_yield. The caller reads the yielded value; the frame stays alive. When the coroutine runs off the end (or hits co_return), the frame is destroyed.

The heap allocation is the main cost of coroutines. In practice, compilers often eliminate it via heap elision when the coroutine’s lifetime is clearly bounded — but this is an optimization, not a guarantee.

Lazy infinite sequences

The coroutine runs only when asked. If you take only the first 10 Fibonacci numbers, the coroutine suspends after the 10th co_yield and is then destroyed. The remaining (infinite) iterations never execute. This is the same laziness that Haskell provides by default — sequences are demand-driven.

Aside (CS 146 connection: lazy lists and streams). In CS 146, Racket’s streams and Haskell’s lazy lists provide infinite sequences evaluated on demand. co_yield is the C++ analogue of cons on a stream: produce one value now, defer the rest. The coroutine frame is the thunk: the suspended computation waiting to produce the next element.

Range integration

std::generator<T> models std::ranges::input_range. That means it works with all the range algorithms and views from Problem 41:

auto squares = [](int n) -> std::generator<int> {
    for (int i = 0; i < n; ++i) co_yield i * i;
};

auto big = squares(100)
         | std::views::filter([](int x){ return x > 50; })
         | std::views::take(5);

for (int x : big) std::cout << x << ' ';
// prints: 64 81 100 121 144

The pipeline is fully lazy: squares produces values only as the range adaptor asks for them.

Worked example: lazily reading lines from a file

Example: A line-reading generator

std::generator<std::string> readLines(const std::string &path) {
    std::ifstream file{path};
    std::string line;
    while (std::getline(file, line)) {
        co_yield line;
    }
}   // file is closed when the coroutine frame is destroyed

void printFirstN(const std::string &path, int n) {
    for (auto &line : readLines(path) | std::views::take(n)) {
        std::cout << line << '\n';
    }
    // file is opened lazily; once we have n lines, iteration stops
    // and the coroutine frame (and therefore the file) is destroyed
}

The file is held open inside the coroutine frame and closed automatically when the frame is destroyed — whether by taking all lines, or by stopping early. This is RAII (Problem 14) applied across a suspension boundary.

Recursive generators

C++23’s std::generator supports co_yield std::ranges::elements_of(subrange), which efficiently yields every element of a sub-generator without flattening it manually. This enables recursive data structure traversal:

struct TreeNode {
    int val;
    std::unique_ptr<TreeNode> left, right;
};

std::generator<int> inorder(const TreeNode *node) {
    if (!node) co_return;
    co_yield std::ranges::elements_of(inorder(node->left.get()));
    co_yield node->val;
    co_yield std::ranges::elements_of(inorder(node->right.get()));
}

// Usage:
for (int v : inorder(root.get())) std::cout << v << ' ';

Without elements_of, you would need to manually iterate over the sub-generator inside a loop. With it, the structure of the recursive algorithm matches the structure of the code.

A note on compiler support

std::generator is C++23. In C++20 you have the co_yield keyword but must provide the promise_type and resumable-handle infrastructure yourself, or use a library (cppcoro, libcoro). Problem 65 goes into that infrastructure. For practical use in 2024, std::generator is available in GCC 14+, Clang 17+ (with libc++), and MSVC 2022.

Generator vs. iterator: a comparison

It is illuminating to compare the hand-written iterator approach from Problem 8 with the coroutine approach for the same task.

// Iterator-based: a lot of boilerplate
class FibIterator {
    int a = 0, b = 1;
public:
    using iterator_category = std::input_iterator_tag;
    using value_type = int;
    using difference_type = std::ptrdiff_t;
    int operator*()  const { return a; }
    FibIterator &operator++() { auto c=a+b; a=b; b=c; return *this; }
    bool operator==(std::default_sentinel_t) const { return false; }
};

// Coroutine-based: intent is clear, boilerplate is gone
std::generator<int> fibonacci() {
    int a = 0, b = 1;
    while (true) { co_yield a; auto c=a+b; a=b; b=c; }
}

The coroutine version is shorter and reads more naturally as “loop and yield”. The iterator version requires understanding the iterator protocol and how to encode the generator’s internal state as class members. For producing sequences, coroutines are almost always the better abstraction.

Note. Coroutines are powerful but not free. Each generator instance pays for one heap allocation (the frame), one level of indirection per resume, and the overhead of the coroutine machinery. For hot-path sequences where performance is critical, a hand-written iterator or a precomputed array is faster. Coroutines shine when the sequence logic is complex, recursive, or comes from I/O — exactly the cases where a hand-written iterator would be painful to write and maintain.

Problem 65: Coroutines II: Async Tasks

Problem 64 showed co_yield for generators: a coroutine that suspends itself to hand a value back to the caller. The other coroutine keyword is co_await. Where co_yield produces a value for the caller, co_await waits for an external event — an I/O completion, a timer, a network response — and resumes the coroutine when it is ready, without blocking any thread.

The two keywords use the same underlying coroutine mechanism — the same compiler transformation, the same frame allocation, the same coroutine handle — but they address different problems. co_yield is a pull model: the caller pulls values from the generator. co_await is a push model: the scheduler pushes control back to the coroutine when an event arrives. Both are expressions inside a coroutine body; both cause the coroutine to suspend and transfer control elsewhere.

The problem with blocking I/O and thread-per-request

Problem 51 introduced std::async, which runs a task on a thread from the thread pool and returns a std::future<T> that you can block on. This works fine for a handful of parallel tasks, but it breaks down at scale.

Suppose you are writing a server that handles 10,000 concurrent client connections. Each connection spends most of its time waiting: waiting for the client to send data, waiting for a database query, waiting for a downstream service. With one thread per connection, you need 10,000 threads. A thread holds a stack (typically 1—8 MB), so that is 10—80 GB of stack memory, plus the OS overhead of scheduling 10,000 threads. In practice, Linux gets slow past a few hundred threads; this approach simply does not scale.

The solution: a small pool of threads, with many tasks multiplexed onto them. A task runs until it needs to wait, then suspends and hands the thread to another task. When the event arrives, the task is resumed on some thread. This is the event-loop model — and co_await is how C++20 expresses it.

co_await: suspend until ready

Task<std::string> fetchPage(std::string url) {
    auto response = co_await asyncHttpGet(url);  // suspend, don't block thread
    co_return response.body;                     // resume here when response arrives
}

When the coroutine hits co_await asyncHttpGet(url), it:

Asks the awaitable object (asyncHttpGet(url)) whether the result is already available (await_ready).
If not, suspends the coroutine and schedules resumption when the HTTP response arrives (await_suspend).
Returns control to the caller (the scheduler), which can run other tasks.
When the network completes, the scheduler resumes this coroutine and the expression co_await ... evaluates to the response (await_resume).

No thread is blocked during the wait. The thread is free to run other tasks.

The awaitable concept

Any type can be awaited if it satisfies the awaitable concept: it must provide three methods.

struct MyAwaitable {
    // Called immediately: if true, don't even suspend
    bool await_ready() { return result_already_available(); }

    // Called on suspension: register resumption with the scheduler
    void await_suspend(std::coroutine_handle<> h) {
        schedule_callback_on_completion([h]{ h.resume(); });
    }

    // Called on resumption: provides the value of the co_await expression
    Result await_resume() { return get_result(); }
};

An async framework (Boost.Asio, cppcoro, libuv via a C++ wrapper) provides ready-made awaitables for network I/O, timers, and file operations. You write coroutines that co_await those awaitables; the framework’s scheduler handles the event loop.

Task: the promise type for async coroutines

A coroutine’s behavior is controlled by a promise type. For the Task<T> type used in the example above, the promise type must specify:

What to do when the coroutine starts (immediately suspend? run eagerly?).
Where to store the result (in the Task object, for the caller to retrieve via co_await task).
What to do when the coroutine finishes or throws.

Writing a correct Task<T> promise type is non-trivial. The minimum viable implementation is around 40 lines. For production, use an existing implementation from cppcoro or a similar library:

// With cppcoro:
cppcoro::task<int> compute() {
    int a = co_await cppcoro::async_task([]{ return expensiveCompute(); });
    int b = co_await cppcoro::async_task([]{ return anotherCompute(); });
    co_return a + b;
}

int main() {
    cppcoro::sync_wait(compute());   // run the event loop until compute() finishes
}

Why coroutines beat threads for I/O

Memory: a coroutine frame is typically a few hundred bytes (just the live locals). A thread stack is 1—8 MB. You can run millions of coroutines concurrently.
Scheduling: coroutines switch cooperatively at co_await points. Thread switches require the OS and involve saving and restoring hundreds of registers. A coroutine resume is roughly equivalent to a function call.
Composability: coroutines can co_await other coroutines. Structured concurrency — a parent that waits for its children — is expressible directly. With threads, you need explicit synchronization.

Structured concurrency and co_await on multiple tasks

A parent coroutine can launch children and wait for all of them:

Task<void> parallel() {
    auto t1 = fetchPage("https://a.example/");
    auto t2 = fetchPage("https://b.example/");
    auto [r1, r2] = co_await whenAll(std::move(t1), std::move(t2));
    process(r1, r2);
}

whenAll is a utility that suspends the parent until all children complete. Neither child blocks a thread; the event loop keeps running other work.

Aside (CS 343 connection: coroutines and stackless design). CS 343’s μC++ uses coroutines with their own private stack (stackful coroutines): each coroutine can call arbitrarily deep into helper functions and suspend at any point in the call stack. C++20 coroutines are stackless: they can only suspend at top-level co_await expressions, not inside arbitrary called functions. The tradeoff is that stackless coroutines are cheaper (the frame is small and precisely sized) but require that every suspending call be itself a coroutine. μC++’s model is more flexible; C++20’s is more efficient.

Worked example: wrapping a callback-based API

Many existing async libraries use callbacks. Here is how to wrap one in a co_await-able type:

// Suppose we have: void asyncRead(int fd, char *buf, size_t n, Callback cb);

struct ReadAwaitable {
    int fd; char *buf; size_t n; ssize_t result;
    bool await_ready() { return false; }  // always suspend -- I/O is never instant
    void await_suspend(std::coroutine_handle<> h) {
        asyncRead(fd, buf, n, [this, h](ssize_t r) mutable {
            result = r;
            h.resume();   // resume the coroutine on completion
        });
    }
    ssize_t await_resume() { return result; }
};

Task<void> readFile(int fd) {
    char buf[4096];
    ssize_t n = co_await ReadAwaitable{fd, buf, sizeof(buf)};
    process(buf, n);
}

The callback captures the coroutine handle h and calls h.resume() when I/O completes. From the coroutine’s perspective, co_await is a simple expression that returns the number of bytes read.

co_return: producing the final value

A coroutine that returns a single value uses co_return, not return:

Task<int> computeAsync(int x) {
    int partial = co_await slowStep(x);
    co_return partial * 2;   // the Task<int> completes with this value
}

Task<void> useIt() {
    int result = co_await computeAsync(21);   // result == 42
    std::cout << result << '\n';
}

co_return expr stores expr in the promise object’s result storage, then finalizes the coroutine. The caller that co_awaits this Task<int> receives 42 as the value of the co_await expression. This is how coroutines form chains: each co_await extracts the co_return’d value from the inner coroutine.

The execution model: one thread, many tasks

The event-loop model enabled by co_await typically runs on a single thread (or a small, fixed thread pool). The scheduler maintains a ready queue of coroutine handles waiting to resume. When I/O completes, the callback enqueues the handle; the scheduler dequeues and calls h.resume().

This is fundamentally different from the thread-per-task model: the scheduler is in user space, context switches are just function calls, and there is no OS involvement. A single thread can drive thousands of concurrent network connections without any lock contention.

Note. The coroutine infrastructure — Task<T>, the scheduler/executor, awaitable wrappers for I/O — is not yet fully standardized. C++23 provides std::generator (Problem 64) but not a standard async executor or Task type. Full networking support is still being designed. In the meantime, use a library: cppcoro (header-only, excellent for learning), Boost.Asio (battle-tested, complex), or your platform’s native async framework (Windows I/O completion ports, Linux io_uring). The co_await syntax and the awaitable protocol are stable; the ecosystem around them is maturing.

Problem 66: State Pattern

The patterns chapter covered three behavioral patterns: Factory Method for delegating object creation, Decorator for composing behavior, and Visitor for double dispatch. Here is one we skipped that comes up constantly in UI programming, protocol parsing, and game logic: the State pattern.

Motivation: behavior that depends on mode

Consider a network connection that can be in four states: CLOSED, LISTENING, ESTABLISHED, and TIME_WAIT. The behavior of send(), receive(), and close() depends entirely on which state the connection is currently in:

State	send()	receive()	close()
CLOSED	error	error	no-op
LISTENING	error	ok	stop
ESTABLISHED	ok	ok	begin TIME_WAIT
TIME_WAIT	error	error	wait, then CLOSED

The naive implementation:

class Connection {
    enum State { CLOSED, LISTENING, ESTABLISHED, TIME_WAIT };
    State state_ = CLOSED;
public:
    void send(const Data &d) {
        switch (state_) {
            case ESTABLISHED: /* actually send */ break;
            case CLOSED:      throw ConnectionError{"not connected"}; break;
            case LISTENING:   throw ConnectionError{"write on listen socket"}; break;
            case TIME_WAIT:   throw ConnectionError{"connection closing"}; break;
        }
    }
    void receive(Data &d) {
        switch (state_) {
            // ... another switch over all four states
        }
    }
    void close() {
        // ... yet another switch
    }
};

Every method contains the same switch over the state. Adding a new state (say, SYN_SENT) requires reopening every method. Adding a new method requires writing yet another complete switch. The coupling grows quadratically: S states times M methods is S × M cases.

The State pattern

State Pattern

Encapsulate each state as a separate class that implements the object’s behavior for that state. The context object delegates every behavior method to the current state object. Transitions are performed by replacing the current state object with a new one.

The structure has three roles:

Implementation

class Connection;   // forward declaration

struct ConnState {
    virtual void send    (Connection &, const Data &) = 0;
    virtual void receive (Connection &, Data &)       = 0;
    virtual void close   (Connection &)               = 0;
    virtual ~ConnState() = default;
};

class Connection {
    std::unique_ptr<ConnState> state_;
public:
    Connection();                          // starts in Closed state
    void setState(std::unique_ptr<ConnState> s) { state_ = std::move(s); }
    void send(const Data &d) { state_->send(*this, d); }
    void receive(Data &d)    { state_->receive(*this, d); }
    void close()             { state_->close(*this); }
};

Each concrete state handles its own transitions by calling ctx.setState(...):

struct Established : ConnState {
    void send(Connection &, const Data &d) override {
        /* actually transmit d */
    }
    void receive(Connection &, Data &d) override {
        /* read into d */
    }
    void close(Connection &ctx) override {
        /* initiate TCP TIME_WAIT */
        ctx.setState(std::make_unique<TimeWait>());
    }
};

struct Closed : ConnState {
    void send   (Connection &, const Data &) override {
        throw ConnectionError{"not connected"};
    }
    void receive(Connection &, Data &) override {
        throw ConnectionError{"not connected"};
    }
    void close(Connection &) override { /* no-op */ }
};

Adding a new state is a matter of adding one new class — the existing states are not modified. Adding a new behavior to every state requires adding one method to the abstract ConnState and implementing it in each concrete state — that is still S additions, but they are localized to the state classes rather than scattered across switch statements.

Worked example: a traffic light

Example: Traffic light FSM

class TrafficLight;

struct LightState {
    virtual void advance(TrafficLight &) = 0;
    virtual std::string colour() const  = 0;
    virtual ~LightState() = default;
};

class TrafficLight {
    std::unique_ptr<LightState> state_;
public:
    TrafficLight();
    void setState(std::unique_ptr<LightState> s) { state_ = std::move(s); }
    void advance() { state_->advance(*this); }
    std::string colour() const { return state_->colour(); }
};

struct Red : LightState {
    std::string colour() const override { return "RED"; }
    void advance(TrafficLight &tl) override;   // -> Green
};
struct Green : LightState {
    std::string colour() const override { return "GREEN"; }
    void advance(TrafficLight &tl) override;   // -> Yellow
};
struct Yellow : LightState {
    std::string colour() const override { return "YELLOW"; }
    void advance(TrafficLight &tl) override;   // -> Red
};

void Red::advance   (TrafficLight &tl) { tl.setState(std::make_unique<Green>()); }
void Green::advance (TrafficLight &tl) { tl.setState(std::make_unique<Yellow>()); }
void Yellow::advance(TrafficLight &tl) { tl.setState(std::make_unique<Red>()); }
TrafficLight::TrafficLight() : state_{std::make_unique<Red>()} {}

Aside (CS 241 connection: finite automata). The State pattern is an object-oriented encoding of a finite automaton. CS 241 studied DFAs: a set of states, an alphabet, a transition function, and a current state. The pattern maps directly: ConnState subclasses are the states, the behavior methods are the alphabet, and setState is the transition function. Object-oriented state machines are particularly clean when the transition logic is complex or when states carry data.

The flyweight note

If state objects carry no data (as in the traffic light example), there is no reason to allocate a fresh one on every transition. You can share a single instance per state class:

// Pre-allocate shared singletons
static Red    theRed;
static Green  theGreen;
static Yellow theYellow;

void Red::advance(TrafficLight &tl) {
    tl.setStateRaw(&theGreen);  // no allocation
}

This is the Flyweight pattern applied to states. Use unique_ptr when states hold data that varies per context; use shared singletons when states are purely behavioral.

Note. The State pattern shines when the number of states is moderate (3—10) and the number of behaviors is also moderate. For very large FSMs (parsers, game AI) consider a data-driven approach: a table of (current_state, event) -> (next_state, action) pairs, interpreted at runtime. The table is easier to specify, serialize, and visualize than a class hierarchy with dozens of concrete states.

Problem 67: Command Pattern

The Visitor pattern from the patterns chapter separated algorithms from data structures. The State pattern in Problem 66 separated behavior from mode. The Command pattern separates the invocation of an operation from its execution: the request becomes an object, and that object can be stored, queued, logged, or reversed.

Motivation

Think about three different requirements that all have the same shape:

A text editor must support Ctrl-Z and Ctrl-Y — undo and redo.
A robot arm receives movement instructions that must execute in sequence and can be replayed.
A web server queues incoming requests and processes them on a thread pool.

In each case, an operation needs to travel: from the site where it is invoked to the site where it is executed, possibly with a detour through a history or queue. A raw function call cannot travel. A Command object can.

Command Pattern

Encapsulate a request as an object with an execute() method. The object carries everything needed to perform the request: the receiver, the arguments, and (for undoable commands) the information to reverse the operation.

The Command interface

class Command {
public:
    virtual void execute() = 0;
    virtual void undo()    = 0;   // not all commands need undo, but most do
    virtual ~Command()     = default;
};

Concrete commands store the receiver and the arguments they were invoked with:

class InsertTextCommand : public Command {
    TextBuffer &buf_;
    size_t      pos_;
    std::string text_;
public:
    InsertTextCommand(TextBuffer &buf, size_t pos, std::string text)
        : buf_{buf}, pos_{pos}, text_{std::move(text)} {}

    void execute() override { buf_.insert(pos_, text_); }
    void undo()    override { buf_.erase(pos_, text_.size()); }
};

class DeleteTextCommand : public Command {
    TextBuffer  &buf_;
    size_t       pos_;
    std::string  deleted_;   // stored on execute() so undo() can restore it
public:
    DeleteTextCommand(TextBuffer &buf, size_t pos, size_t len)
        : buf_{buf}, pos_{pos} { deleted_.reserve(len); }

    void execute() override {
        deleted_ = buf_.substr(pos_, deleted_.capacity());
        buf_.erase(pos_, deleted_.size());
    }
    void undo() override { buf_.insert(pos_, deleted_); }
};

The CommandHistory: the Invoker

The Invoker keeps track of executed commands and manages undo/redo. It owns the commands via unique_ptr — the commands are resources with lifetimes that the history controls.

class CommandHistory {
    std::stack<std::unique_ptr<Command>> done_;
    std::stack<std::unique_ptr<Command>> undone_;
public:
    void execute(std::unique_ptr<Command> cmd) {
        cmd->execute();
        done_.push(std::move(cmd));
        // A new command clears the redo stack
        while (!undone_.empty()) undone_.pop();
    }

    void undo() {
        if (done_.empty()) return;
        auto cmd = std::move(done_.top()); done_.pop();
        cmd->undo();
        undone_.push(std::move(cmd));
    }

    void redo() {
        if (undone_.empty()) return;
        auto cmd = std::move(undone_.top()); undone_.pop();
        cmd->execute();
        done_.push(std::move(cmd));
    }
};

Macro commands

A MacroCommand holds a sequence of commands and executes them as a unit. Undo reverses them in reverse order:

class MacroCommand : public Command {
    std::vector<std::unique_ptr<Command>> cmds_;
public:
    void add(std::unique_ptr<Command> c) { cmds_.push_back(std::move(c)); }
    void execute() override {
        for (auto &c : cmds_) c->execute();
    }
    void undo() override {
        for (auto it = cmds_.rbegin(); it != cmds_.rend(); ++it)
            (*it)->undo();
    }
};

A find-and-replace operation, for example, is a MacroCommand containing one DeleteTextCommand and one InsertTextCommand for each replacement site. Undoing the macro undoes all replacements atomically.

Commands for queuing

Commands are trivially queueable across threads. A producer pushes commands into a std::queue<unique_ptr<Command>> (with appropriate synchronization from Problem 49); a consumer dequeues and calls execute(). The consumer does not need to know what type of command it is running.

// Producer thread:
queue.push(std::make_unique<RenderFrameCommand>(scene, framebuf));

// Consumer (render thread):
while (auto cmd = queue.pop()) {
    cmd->execute();
}

UML structure

Aside (CS 247 connection: the open/closed principle). The Command pattern is a textbook application of OCP. Adding a new operation (say, SetFontCommand) means adding one new class. The invoker (CommandHistory), the existing commands, and the receiver (TextBuffer) are all closed for modification. The new command is open: it can do anything its execute and undo implementations specify.

Worked example: drawing application with undo

Example: Stroke commands

class Canvas { /* stores a list of strokes */ };

class AddStrokeCommand : public Command {
    Canvas      &canvas_;
    Stroke       stroke_;
public:
    AddStrokeCommand(Canvas &c, Stroke s)
        : canvas_{c}, stroke_{std::move(s)} {}
    void execute() override { canvas_.addStroke(stroke_); }
    void undo()    override { canvas_.removeLastStroke(); }
};

// In the UI event handler:
void onMouseRelease(Canvas &canvas, CommandHistory &hist, Stroke s) {
    hist.execute(std::make_unique<AddStrokeCommand>(canvas, std::move(s)));
}

Logging and serialization

Because a Command is an object, it can do more than execute — it can describe itself. Adding a pure virtual describe() method to the interface lets an audit log record every action:

class Command {
public:
    virtual void execute() = 0;
    virtual void undo()    = 0;
    virtual std::string describe() const = 0;  // human-readable description
    virtual ~Command() = default;
};

class InsertTextCommand : public Command {
    /* ... */
    std::string describe() const override {
        return "InsertText at " + std::to_string(pos_) + ": \"" + text_ + "\"";
    }
};

The command history can then write an audit trail:

void CommandHistory::execute(std::unique_ptr<Command> cmd) {
    std::clog << "[CMD] " << cmd->describe() << '\n';
    cmd->execute();
    done_.push(std::move(cmd));
}

Extending the Command interface to include serialization (toJson(), toBinary()) enables persistent undo history that survives application restarts — every action is saved to disk as a command object and replayed on reload.

Note. For small commands that do not need undo, a lambda (or any callable) is often simpler than a full Command class. A std::function<void()> stored in a queue is a lightweight command. Use the full class hierarchy when you need undo state, serialization, or the ability to compose commands into macros.

Problem 68: Mediator and Chain of Responsibility

Problem 29 used the Observer pattern for one-to-many notification: one publisher fires an event, many subscribers receive it. Sometimes the communication structure is more complex. We end this survey of design patterns with two more GoF patterns for routing messages.

Part I: Mediator — many-to-many communication

The problem

An airport control system has many aircraft, and each aircraft must coordinate with others: maintain separation, queue for runways, coordinate approaches. If every aircraft communicates directly with every other, each aircraft needs a reference to every other one. With n aircraft, that is O(n²) communication links. Adding a new aircraft means updating every existing one.

The solution: centralize through a coordinator

The Mediator pattern introduces a central coordinator object. Participants communicate with the mediator only; the mediator routes messages to the right recipients. The participant-to-participant graph collapses to a participant-to-mediator star: O(n) links instead of O(n²).

class Component;

class Mediator {
public:
    virtual void notify(Component *sender, const std::string &event) = 0;
    virtual ~Mediator() = default;
};

class Component {
protected:
    Mediator *mediator_ = nullptr;
public:
    void setMediator(Mediator *m) { mediator_ = m; }
};

class Aircraft : public Component {
    std::string id_;
public:
    explicit Aircraft(std::string id) : id_{std::move(id)} {}
    void requestLanding() {
        std::cout << id_ << " requests landing\n";
        mediator_->notify(this, "landing_request");
    }
    void clearToLand() { std::cout << id_ << " cleared to land\n"; }
    const std::string &id() const { return id_; }
};

class ControlTower : public Mediator {
    std::vector<Aircraft *> aircraft_;
    Aircraft *runway_ = nullptr;
public:
    void addAircraft(Aircraft *a) {
        aircraft_.push_back(a);
        a->setMediator(this);
    }
    void notify(Component *sender, const std::string &event) override {
        if (event == "landing_request") {
            auto *plane = static_cast<Aircraft *>(sender);
            if (!runway_) { runway_ = plane; plane->clearToLand(); }
            else { std::cout << plane->id() << " holds: runway busy\n"; }
        } else if (event == "landed") {
            runway_ = nullptr;
        }
    }
};

No Aircraft ever holds a reference to another Aircraft. All coordination happens in ControlTower::notify. Adding a new aircraft type (helicopter, drone) requires no changes to the existing participants.

Mediator Pattern

Define an object that encapsulates how a set of objects interact. Participants hold a reference to the mediator, not to each other. The mediator centralizes the coordination logic, reducing the coupling between participants from O(n²) to O(n).

When to use

Use the Mediator when many objects communicate in complex ways and you find yourself adding cross-references between them. Also useful for GUI form logic: a form has many widgets (text boxes, checkboxes, buttons) that must react to each other. A FormMediator that handles all widget interactions is cleaner than giving every widget a reference to every other.

Part II: Chain of Responsibility — pass until handled

The problem

An HTTP server receives a request. Before it reaches the business logic, the request must pass through: authentication (is the user logged in?), rate limiting (too many requests?), logging (record every access), and finally the handler. Any step can reject the request and send an error response without passing it further.

The solution: a chain of handlers

Chain of Responsibility Pattern

Assemble a chain of handler objects. Each handler either handles the request or passes it to the next handler in the chain. The sender does not know which handler will process its request.

struct Request {
    std::string user;
    std::string path;
    bool        authenticated = false;
};

class Handler {
    Handler *next_ = nullptr;
public:
    Handler &setNext(Handler &h) { next_ = &h; return h; }
    virtual bool handle(Request &req) {
        if (next_) return next_->handle(req);
        return false;
    }
    virtual ~Handler() = default;
};

class AuthHandler : public Handler {
public:
    bool handle(Request &req) override {
        if (req.user.empty()) { std::cout << "401\n"; return true; }
        req.authenticated = true;
        return Handler::handle(req);
    }
};

class RateLimiter : public Handler {
    std::unordered_map<std::string, int> hits_;
public:
    bool handle(Request &req) override {
        if (++hits_[req.user] > 100) {
            std::cout << "429 Too Many Requests\n"; return true;
        }
        return Handler::handle(req);
    }
};

class LoggingHandler : public Handler {
public:
    bool handle(Request &req) override {
        std::cout << "LOG: " << req.user << " -> " << req.path << '\n';
        return Handler::handle(req);
    }
};

Assembling the chain:

AuthHandler   auth;
RateLimiter   limiter;
LoggingHandler logger;

auth.setNext(limiter).setNext(logger);

Request req{"alice", "/api/data"};
auth.handle(req);   // auth -> limiter -> logger -> done

Modern C++ variant: function chain

A more idiomatic modern version uses std::function:

using Middleware = std::function<bool(Request &, std::function<bool(Request &)>)>;

Middleware logging = [](Request &req, auto next) {
    std::cout << "LOG: " << req.path << '\n';
    return next(req);
};

Middleware auth = [](Request &req, auto next) {
    if (req.user.empty()) { std::cout << "401\n"; return false; }
    return next(req);
};

Each middleware takes the request and a next callable; it can call next to pass control, or return early to stop the chain. The composition order is explicit and the chain can be assembled at runtime by wrapping functions.

Aside (CS 348 connection: middleware and query processing). Database query processing follows the same pattern. A query passes through a chain of operators: parse, optimize, validate permissions, cache lookup, execute, format results. Each stage either transforms the query and passes it on, or returns an error. The Chain of Responsibility is the structural backbone of every query pipeline.

Comparing the three routing patterns

Pattern	Publishers	Handlers
Observer	one	many (all receive)
Mediator	many	one coordinator
Chain of Resp.	one	one (first to claim it)

Observer broadcasts. Mediator centralizes. Chain of Responsibility passes until someone claims the request. Knowing which to reach for depends on the cardinality and ownership of the communication.

Note. The Chain of Responsibility can be combined with the Command pattern: each “request” travelling the chain is a Command object, and rejected commands can be logged or queued for retry. Design patterns compose; knowing several gives you a vocabulary for building complex routing structures from simple pieces.

Problem 69: Sanity Tools

Throughout this course we have accumulated a taxonomy of ways things can go wrong: undefined behaviour from out-of-bounds array access (Problem 9), memory leaks from exceptions thrown before a delete (Problem 14), and data races from unsynchronized thread access (Problems 49—55). We have learned how to avoid these problems by design. But design is not enough. Tests, sanitizers, and fuzzing catch what careful design misses.

Sanitizers: runtime error detectors

Sanitizers are compiler transformations that instrument your code to detect errors at the moment they occur, rather than at the moment they cause a crash — which may be much later and in completely unrelated code.

AddressSanitizer (ASan)

ASan detects memory errors: use-after-free, heap buffer overflow, stack buffer overflow, use of uninitialized stack memory, and double-free.

// bug.cc
int main() {
    int *arr = new int[5];
    arr[7] = 42;   // heap buffer overflow: index 7 is out of bounds
    delete[] arr;
}

Compile and run:

g++ -fsanitize=address -g bug.cc -o bug
./bug

Output (abbreviated):

==12345==ERROR: AddressSanitizer: heap-buffer-overflow
WRITE of size 4 at 0x602000000030 thread T0
    #0 main bug.cc:3

ASan reports the exact file, line, and type of violation. Without it, this program silently corrupts heap metadata and crashes unpredictably later.

ASan has approximately a 2× slowdown and 2× memory overhead. Use it in all debug and CI builds.

UndefinedBehaviorSanitizer (UBSan)

UBSan detects a broad class of undefined behaviour: signed integer overflow, null pointer dereference, invalid shifts, misaligned memory access, division by zero, and many others.

int x = INT_MAX;
x += 1;    // signed overflow -- undefined behaviour

g++ -fsanitize=undefined -g ub.cc -o ub && ./ub
# ub.cc:2: runtime error: signed integer overflow: 2147483647 + 1
#   cannot be represented in type 'int'

UBSan is cheap — often under a 1.5× slowdown — and should be enabled in all debug builds alongside ASan. The two sanitizers combine cleanly:

g++ -fsanitize=address,undefined -g mycode.cc -o mycode

ThreadSanitizer (TSan)

TSan detects data races: concurrent read and write (or write and write) to the same memory location from different threads without synchronization.

int counter = 0;
std::thread t1([] { ++counter; });   // data race on counter!
std::thread t2([] { ++counter; });
t1.join(); t2.join();

g++ -fsanitize=thread -g race.cc -o race && ./race
# WARNING: ThreadSanitizer: data race
#   Write of size 4 at 0x... by thread T1:   #0 main+0x...
#   Previous write of size 4 at 0x... by thread T2

Note. ASan and TSan cannot be combined — they both intercept memory allocation and would conflict. Run TSan in a separate build from ASan+UBSan.

Unit testing with gtest

Sanitizers find how things go wrong at runtime. Unit tests specify what the code is supposed to do. Google Test (gtest) is the most widely used C++ testing library:

#include <gtest/gtest.h>
#include "vector.h"

TEST(VectorTest, PushBackGrowsSize) {
    Vector<int> v;
    v.push_back(1);
    v.push_back(2);
    EXPECT_EQ(v.size(), 2u);
}

TEST(VectorTest, ElementsAccessible) {
    Vector<int> v;
    v.push_back(42);
    EXPECT_EQ(v[0], 42);
}

TEST(VectorTest, ThrowsOnOutOfBounds) {
    Vector<int> v;
    EXPECT_THROW(v.at(0), std::out_of_range);
}

Each TEST macro defines an independent test case. EXPECT_EQ, EXPECT_THROW, and their siblings emit a failure message and continue running; ASSERT_* variants abort the test on failure. A test binary runs all registered tests and prints a summary.

The key workflow: run the tests, check sanitizers, fix failures, repeat. Catching a bug in a unit test is far cheaper than catching it in production.

Catch2: a lightweight alternative

Catch2 is a single-header alternative to gtest that uses a more expressive style:

#include <catch2/catch.hpp>
#include "vector.h"

TEST_CASE("Vector push_back", "[Vector]") {
    Vector<int> v;
    REQUIRE(v.size() == 0);
    v.push_back(1);
    REQUIRE(v.size() == 1);
    CHECK(v[0] == 1);
}

Both frameworks integrate with CMake’s ctest runner and with CI systems.

Fuzzing: automated input generation

Unit tests require you to think of the cases in advance. Fuzzing does not. A fuzzer generates random, malformed, or mutated input and feeds it to your code, letting the sanitizers catch any crash or undefined behaviour it triggers.

The C++ fuzzing standard is libFuzzer, integrated into Clang and GCC. You provide a single function:

// fuzz_target.cc
#include "my_parser.h"

extern "C" int LLVMFuzzerTestOneInput(const uint8_t *data, size_t size) {
    try {
        MyParser::parse(data, size);
    } catch (const ParseError &) {
        // expected -- not a bug
    }
    return 0;
}

Compile with -fsanitize=fuzzer,address and run. The fuzzer starts with small random inputs and mutates ones that exercise new code paths, guided by coverage instrumentation. A parser that has survived a billion fuzz inputs is much more trustworthy than one with a dozen hand-written test cases.

Aside (CS 447 connection: testing and verification). Fuzzing is automated random testing — a cousin of the random testing techniques in CS 447. The key insight, the same as in property-based testing, is that manually written test cases can only test what the author imagined. Randomized testing explores the space the author did not imagine. Sanitizers turn “did it crash?” into “did it have any undefined behaviour?”, making fuzzing a much stronger correctness check.

The recommended workflow

Write code.
Write unit tests that specify the expected behavior.
Compile with -fsanitize=address,undefined -g and run the tests. Fix every sanitizer report — they are all real bugs.
In a separate build, run with -fsanitize=thread if your code is concurrent.
For security-sensitive parsers or protocol handlers, add a fuzz target and run it for at least a few hours.
Only disable sanitizers for performance-measurement or release builds.

This is not a testing course. But the tools above are simple enough to adopt immediately: -fsanitize=address,undefined is a one-line change to your build command. Every C++ developer should have sanitizers enabled by default in their debug build.

And with that, this extension comes to a close. The 69 problems in these notes have traced a path from raw arrays and manual memory management through templates, smart pointers, move semantics, concurrency, ranges, coroutines, and the full suite of design patterns. C++ is a large and demanding language, but its demands are principled: every complexity is there to give you control that no other mainstream language provides.

The rest is practice.

Sources and References

Problem 35: I Want a Maybe Value

The Problem with Sentinels

std::optional: The Type-Safe Maybe

Constructing and Accessing an optional

Worked Example: A Configuration Reader

Monadic Operations (C++23)

When Not to Use optional

Problem 36: Sum Types Without Inheritance

C’s Approach: Tagged Union

std::variant: The Type-Safe Tagged Union

Inspecting a variant

std::visit: Exhaustive Case Analysis

The Overload Trick

Worked Example: A JSON Value

Problem 37: Errors Without Exceptions

The C Approach: Error Codes

std::expected: The Type-Safe Result

Constructing and Accessing

Monadic Operations

Worked Example: A File-Reading Pipeline

When to Use expected vs. Exceptions vs. optional

Choosing the Error Type

Problem 38: Compile-Time If

The Old Way: Tag Dispatch and SFINAE

if constexpr

The Critical Constraint: Must Depend on a Template Parameter

Comparison with #ifdef

Worked Example: to_string

A Note on consteval and constexpr Functions

Problem 39: Unpack My Tuple

std::tuple: N Heterogeneous Values

Accessing Elements

std::tie: Unpacking into Existing Variables

Structured Bindings (C++17)

Structured Bindings on Pairs and Maps

Structured Bindings on Plain Structs

std::apply: Unpack as Function Arguments

Worked Example: Multiple Return Values

Tuple Comparison and Structured Binding References

Problem 40: Constraining Templates

The Problem: Opaque Template Errors

Writing a Concept

requires Expressions

Using Concepts

Standard Library Concepts

How Concepts Replace SFINAE

Worked Example: Constrained max

Problem 41: Lazy Pipelines

The Pain of Chaining Algorithms

Views: Lazy Adapters

The View Vocabulary

Ranges Algorithms

Materialising a View

Worked Example: Top 5 Words by Length

Problem 42: Non-Owning Views

The Ownership Distinction

std::string_view

Construction

Operations

The Golden Rule

string_view as a Parameter Type

std::span

Fixed-size vs Dynamic Extent

Construction

Subspans

Replacing Pointer+Length Parameters

Worked Example: Pattern Search in a Byte Buffer

Problem 43: A Real Date Library

The Core Idea: Types Prevent Bugs

Duration Types

Duration Arithmetic

User-Defined Literals

Clock Types

Time Points

Measuring Elapsed Time

A Timer RAII Class

C++20 Calendar Types

The Last Day of the Month

Problem 44: A Real Filesystem API

`std::optional`: The Type-Safe Maybe

Constructing and Accessing an `optional`

When Not to Use `optional`

When to Use `expected` vs. Exceptions vs. `optional`

Comparison with `#ifdef`

Worked Example: `to_string`

A Note on `consteval` and `constexpr` Functions

Worked Example: Constrained `max`

Thread Lifecycle: `join` and `detach`