CS 246E: Object-Oriented Software Development — Modern C++ Extension
Estimated study time: 3 hr 48 min
Table of contents
Sources and References
Stroustrup, Bjarne — A Tour of C++, 3rd ed. (2022). Addison-Wesley. Primary narrative source for C++20 features.
Meyers, Scott — Effective Modern C++ (2014). O’Reilly. Canonical reference for C++11/14 idioms; items cited by number.
Williams, Anthony — C++ Concurrency in Action, 2nd ed. (2019). Manning. Primary source for Subpart C (Concurrency).
Sutter, Herb — Exceptional C++ and More Exceptional C++ (1999, 2001). Addison-Wesley. Advanced idioms: pimpl, type erasure, SFINAE.
Josuttis, Nicolai — The C++ Standard Library, 2nd ed. (2012). Addison-Wesley. Standard library deep-dives.
Alexandrescu, Andrei — Modern C++ Design (2001). Addison-Wesley. Policy-based design (Chapter 59).
cppreference.com — Online reference; all API signatures and constraints verified here.
Problem 35: I Want a Maybe Value
Back in Problem 9 we confronted the same question that haunts every API designer: what should a function return when it has no meaningful answer? We ruled out sentinel integers — there is no magic int that universally means “absent” — and settled on exceptions as the right tool. But exceptions carry a cost and a connotation: they signal something went wrong. Sometimes, though, a missing value is perfectly normal. A configuration file may or may not define a timeout. A map lookup may or may not find a key. A command-line parser may or may not see a particular flag. None of these is an error; each is just a value that might not be there.
We need a type that honestly says so.
The Problem with Sentinels
Suppose we write a function that parses an integer from a string:
int parse_int(const std::string &s) {
// ...
}
What should parse_int("abc") return? Returning -1 fails immediately: -1 is a valid integer. Returning std::numeric_limits<int>::min() fails for the same reason. There is no integer value we can reserve as an “absence” marker.
The pair approach. We can widen the return type:
std::pair<bool, int> parse_int(const std::string &s) {
try {
return {true, std::stoi(s)};
} catch (...) {
return {false, 0};
}
}
The caller must check the bool before using the int:
auto [ok, val] = parse_int("42");
if (ok) std::cout << val;
This works, but nothing in the type system forces the caller to check ok before reading val. If they forget, they silently use a garbage 0. The types lie: the int in the pair appears to always exist, even when it does not.
std::optional: The Type-Safe Maybe
C++17 introduces std::optional<T>, a type that either holds a value of type T or holds nothing. “Nothing” is spelled std::nullopt and has its own distinct type, std::nullopt_t, so it can never be confused with a real T.
#include <optional>
std::optional<int> parse_int(const std::string &s) {
try {
return std::stoi(s); // wraps int in optional automatically
} catch (...) {
return std::nullopt; // explicitly empty
}
}
The caller now cannot accidentally use the value without checking:
auto result = parse_int("42");
if (result) { // operator bool: true iff has value
std::cout << *result; // dereference to get the int
}
int val = result.value_or(0); // 0 if absent
Constructing and Accessing an optional
std::optional<int> empty; // default: no value
std::optional<int> opt{42}; // has value 42
auto opt2 = std::make_optional(42); // same; factory deduces T
opt = 7; // reassign to new value
opt = std::nullopt; // clear it
opt.reset(); // same as above
The four ways to retrieve the value:
std::optional<int> opt{42};
opt.has_value(); // true; equivalent to bool(opt)
*opt; // 42; like dereferencing a pointer -- UB if empty
opt.value(); // 42; throws std::bad_optional_access if empty
opt.value_or(-1); // 42; returns -1 (the default) if empty
opt->some_method(); // arrow operator, like a pointer; UB if empty
Note.
operator*andoperator->follow pointer semantics: they give direct access but do not check. They are the right choice inside anif (opt)guard. Outside such a guard, prefervalue()so an accidental empty dereference throws an exception rather than producing undefined behaviour.
Worked Example: A Configuration Reader
using Config = std::unordered_map<std::string, std::string>;
std::optional<int> get_int(const Config &cfg, const std::string &key) {
auto it = cfg.find(key);
if (it == cfg.end()) return std::nullopt; // key absent
return parse_int(it->second); // may also be nullopt
}
The caller chains the two checks naturally:
Config cfg = {{"timeout", "5"}, {"retries", "bad"}};
if (auto t = get_int(cfg, "timeout")) {
set_timeout(*t);
}
if (auto r = get_int(cfg, "retries")) {
set_retries(*r); // "bad" fails parse_int; this branch not taken
}
The optional propagates absence: get_int is absent either because the key is missing or because the value is not a valid integer, and the caller handles both cases with one if check.
Aside (CS 146 connection: Haskell’s Maybe and Racket’s false). In CS 146 you met
Maybein Haskell:Maybe Intis eitherNothingorJust 42.std::optionalis exactly the same algebraic type dressed in C++ syntax. The analogy extends to the accessor names: Haskell’sfromMaybe defaultVal maybeValisopt.value_or(defaultVal). In Racket, the idiom is to return#ffor “absent” and a value otherwise — a dynamically typed convention that the type system cannot enforce.optionalmakes the same contract statically checkable: the typestd::optional<int>cannot be implicitly used as anint, so forgetting the check is a compile error, not a runtime surprise.
Monadic Operations (C++23)
C++23 adds three methods that let you chain optional-producing operations without explicit if checks. They mirror the >>= combinator from Haskell’s Maybe monad.
opt.and_then(f)— ifopthas a value, callsf(*opt)wherefitself returns anoptional; ifoptis empty, returnsstd::nulloptwithout callingf.opt.transform(f)— ifopthas a value, callsf(*opt)and wraps the result in anoptional; otherwise propagatesstd::nullopt.opt.or_else(f)— ifoptis empty, callsf()(which must return anoptional<T>); otherwise passes throughoptunchanged.
// C++23
auto timeout_ms =
get_int(cfg, "timeout")
.transform([](int secs) { return secs * 1000; }) // scale if present
.value_or(5000); // default 5 s
Each step is evaluated only if the previous step produced a value. The distinction between and_then and transform mirrors flatMap vs map in functional programming: transform applies a function that returns a plain value; and_then applies a function that itself returns an optional, for use when that sub-step can also fail.
When Not to Use optional
std::optional<T> stores its value inline — there is no heap allocation. The size is roughly sizeof(T) + 1 (plus alignment padding). This makes it cheap for small types like int or double.
Two situations where optional is the wrong tool:
Large types.
std::optional<HugeMatrix>reserves the fullHugeMatrixstorage on the stack even when the optional is empty. Considerstd::unique_ptr<HugeMatrix>instead — a null pointer costs onlysizeof(void*).Signalling genuine errors. If a function can fail for a reason the caller should act on — a network timeout, a permissions violation, a malformed file — the absence of a value should carry why it is absent. C++23’s
std::expected<T, E>(Chapter 37) does exactly that: it holds either a value of typeTor an error of typeE.optionalcarries only the absence;expectedcarries a diagnosis.
Note. Prefer
std::optionalover nullable raw pointers for “maybe a value” semantics in value types. AT*that means “optionally a T” has a second meaning: “a T owned elsewhere”. Those two ideas are better kept separate. Useoptional<T>when you own the potential value; use raw or smart pointers when the ownership question is the point.
Problem 36: Sum Types Without Inheritance
Problem 18 gave us heterogeneous data through inheritance. Store a vector<unique_ptr<Book>>, use virtual dispatch, and each element behaves as its true type at runtime. It works. But it costs: every object carries a hidden vptr, every element lives on the heap, every call to a virtual method goes through an indirection. Worse, to add a new operation over the hierarchy you must touch every class — or reach for dynamic_cast (Problem 22).
Sometimes the set of alternatives is closed and known at compile time. We do not want a vtable. We do not want heap allocation. We want value semantics and exhaustive case analysis that the compiler can enforce.
C’s Approach: Tagged Union
C programmers have always had heterogeneous data. The idiom is a struct that pairs an enum tag with a union of the possible payloads:
enum MediaKind { SONG, PODCAST, AUDIOBOOK };
struct Media {
MediaKind kind;
union {
Song song;
Podcast podcast;
Audiobook book;
};
};
Media m;
m.kind = SONG;
m.song = makeSong(/*...*/);
// Later:
switch (m.kind) {
case SONG: playSong(m.song); break;
case PODCAST: playPodcast(m.podcast); break;
case AUDIOBOOK: playBook(m.book); break;
}
The problems are familiar. Nothing stops you from writing m.podcast when m.kind == SONG: the union does not track which member is active. If you add a new MediaKind to the enum, every switch in the codebase silently becomes incomplete. And the union cannot hold types with non-trivial constructors or destructors without heroic manual management.
std::variant: The Type-Safe Tagged Union
C++17 introduces std::variant<T1, T2, ...>, which holds exactly one value from the listed alternatives at a time. The tag is implicit and always consistent. The alternatives may have constructors and destructors; the variant manages their lifetimes automatically.
#include <variant>
using BookVar = std::variant<Text, Comic, Book>;
BookVar v = Comic{"Watchmen", "Moore", 400, "Rorschach"};
The variant stores the value inline — no heap allocation. Its size is approximately max(sizeof(T1), sizeof(T2), ...) plus the tag.
Inspecting a variant
Which alternative is active?
v.index(); // 1 -- Comic is the second alternative
std::holds_alternative<Comic>(v); // true
std::holds_alternative<Book>(v); // false
Extracting the value.
// By type -- throws std::bad_variant_access if wrong alternative active:
Comic &c = std::get<Comic>(v);
// By index -- same behaviour:
Comic &c2 = std::get<1>(v);
// Non-throwing: returns nullptr if wrong alternative:
Comic *cp = std::get_if<Comic>(&v); // takes a pointer to the variant
if (cp) { /* safe to use cp */ }
Note.
std::get<T>andstd::get_if<T>are free function templates, not member functions. The template argument is the type (or zero-based index) you want to extract.
std::visit: Exhaustive Case Analysis
Calling std::get<T> by hand is error-prone for the same reason the C switch was: you must remember to handle every alternative. The right tool is std::visit, which calls a visitor — any callable — with the currently active value.
std::visit([](auto &val) {
std::cout << val.getTitle() << "\n";
}, v);
The generic lambda receives the active alternative as its argument. If all alternatives share a common interface, a single generic lambda is all you need.
The Overload Trick
When alternatives require different handling, define a visitor with one overload per type. C++ has no built-in “overloaded lambda” syntax, but a small helper makes it easy:
// Define once, reuse everywhere:
template<typename... Ts>
struct overloaded : Ts... { using Ts::operator()...; };
// C++17 deduction guide (lets you write overloaded{f, g} without <...>):
template<typename... Ts> overloaded(Ts...) -> overloaded<Ts...>;
std::visit(overloaded{
[](const Text &t) { std::cout << "Text: " << t.getTopic() << "\n"; },
[](const Comic &c) { std::cout << "Comic: " << c.getHero() << "\n"; },
[](const Book &b) { std::cout << "Book\n"; }
}, v);
If you forget to handle one of the alternatives, the code will not compile. The compiler enforces exhaustiveness — exactly what the C switch could not do.
Aside (CS 146 connection: algebraic data types and pattern matching). In CS 146 you saw Haskell’s algebraic data types:
data Shape = Circle Double | Rect Double Double area :: Shape -> Double area (Circle r) = pi * r * r area (Rect w h) = w * h
Shapeis exactly a sum type: a value is either aCirclecarrying a radius or aRectcarrying a width and height. Pattern matching is exhaustive by construction; the compiler warns if you miss a case.
std::variantis C++’s version of the same idea.std::visitcorresponds to pattern matching. Theoverloadedhelper corresponds to a function with multiple equations, one per constructor. Haskell’sEither a b(eitherLeft aorRight b) maps tostd::variant<A, B>.
Worked Example: A JSON Value
JSON has exactly six value types: null, boolean, number, string, array, and object. That is a closed, compile-time-known set — a perfect fit for variant.
#include <variant>
#include <string>
#include <vector>
#include <map>
#include <memory>
struct JsonNull {};
struct JsonValue; // forward declaration needed for the recursive types
using JsonArray = std::vector<std::shared_ptr<JsonValue>>;
using JsonObject = std::map<std::string, std::shared_ptr<JsonValue>>;
struct JsonValue {
std::variant<JsonNull, bool, double,
std::string, JsonArray, JsonObject> data;
};
A pretty-printer becomes a single std::visit:
void print(const JsonValue &jv) {
std::visit(overloaded{
[](JsonNull) { std::cout << "null"; },
[](bool b) { std::cout << (b ? "true" : "false"); },
[](double d) { std::cout << d; },
[](const std::string &s) { std::cout << '"' << s << '"'; },
[](const JsonArray &arr) {
std::cout << "[";
for (auto &el : arr) { print(*el); std::cout << ","; }
std::cout << "]";
},
[](const JsonObject &obj) {
std::cout << "{";
for (auto &[k, v] : obj) {
std::cout << '"' << k << "\":";
print(*v);
std::cout << ",";
}
std::cout << "}";
}
}, jv.data);
}
Adding a new operation (a JSON serialiser, a validator) means writing one more std::visit call — no touching the JsonValue struct. Adding a new alternative (say, a special JsonInt) means updating the variant alias and recompiling; the compiler will flag every visit that does not handle the new case.
Note. Variant vs. inheritance: a decision guide. Use
std::variantwhen the set of alternatives is closed (you control it and it does not grow at runtime), when you want value semantics (stack allocation, copyable, no heap), or when you need exhaustive case analysis enforced at compile time. Use inheritance and virtual dispatch when the set of types is open (third-party code may add new subclasses), when objects have independent lifetimes managed through pointers, or when the polymorphic interface is the stable part and new operations are rare. The two tools are complementary, not competing.
Problem 37: Errors Without Exceptions
Problem 35’s std::optional<T> tells the caller that a value might not be there. That is enough when absence is normal: a config key is missing, a map lookup finds nothing. But sometimes the caller needs to know why the value is absent. A network request failed — was it a timeout, a permissions error, or a malformed URL? We have been throwing exceptions for this, but exceptions have costs and connotations.
Back in Problem 9 we reached for exceptions as the right tool for “something went wrong.” Problem 15 reminded us of the exception-safety obligations that entails. But exceptions interrupt normal control flow, carry overhead from stack-unwinding infrastructure, and in some environments (embedded systems, certain real-time kernels) they are disabled entirely.
The C Approach: Error Codes
C programmers handle fallible operations with return codes:
// Returns 0 on success, errno value on failure.
int read_file(const char *path, char *buf, size_t *n);
// Caller:
char buf[4096]; size_t n;
int err = read_file("data.txt", buf, &n);
if (err != 0) {
fprintf(stderr, "error: %s\n", strerror(err));
return err;
}
The problems are well known. The return value is overloaded: it carries both the success result and the error code, but not at the same time. Nothing in the type system forces the caller to check for an error before using the result. Errno is a global, so it is not thread-safe. And the successful value must be returned through an output parameter, which inverts the natural reading order.
std::expected: The Type-Safe Result
C++23 introduces std::expected<T, E> (paper P0323). A value of this type holds either a T (the expected, successful result) or an E (the unexpected error). The type is explicit; the caller cannot use a T without first checking whether the result is an error.
#include <expected> // C++23
#include <system_error>
#include <string>
#include <fstream>
std::expected<std::string, std::error_code>
read_file(const std::string &path) {
std::ifstream f{path};
if (!f) return std::unexpected{
std::make_error_code(std::errc::no_such_file_or_directory)};
return std::string{std::istreambuf_iterator<char>{f}, {}};
}
Notice that returning a T directly constructs the success case. Returning std::unexpected{err} constructs the error case. std::unexpected is a thin wrapper that distinguishes an error value from a success value when they might otherwise have the same type.
Constructing and Accessing
// Constructing:
std::expected<int, std::string> ok = 42;
std::expected<int, std::string> err = std::unexpected{"oops"};
// Checking:
ok.has_value(); // true
bool(ok); // same -- operator bool
// Extracting the value (UB or throw if in error state):
int v = *ok; // like optional: UB if error
int v2 = ok.value(); // throws std::bad_expected_access if error
int v3 = ok.value_or(0); // returns 0 if error
// Extracting the error:
std::string e = err.error(); // UB if in value state
Look familiar? The interface is deliberately parallel to std::optional: has_value, operator bool, value, value_or, operator*. Once you know optional you already know most of expected.
Note.
std::bad_expected_access<E>is the exception thrown byvalue()when theexpectedholds an error. It carries the error value so the handler can inspect it.
Monadic Operations
Like std::optional (Problem 35), C++23’s expected supports and_then, transform, and or_else for chaining fallible operations without explicit if checks.
e.and_then(f)— ifehas a value, callsf(*e);fmust return anexpected<U, E>. Propagates the error unchanged ifeis in error state.e.transform(f)— ifehas a value, callsf(*e)and wraps the result inexpected; otherwise propagates the error.e.or_else(f)— ifeis in error state, callsf(e.error());fmust returnexpected<T, G>. Passes through a successfuleunchanged.
Worked Example: A File-Reading Pipeline
Suppose we want to read a configuration file, extract a particular line, and parse an integer from it. Any step can fail.
using Result = std::expected<std::string, std::error_code>;
Result read_line(const std::string &path, int n);
std::expected<int, std::error_code> parse_int(const std::string &s);
// Chain them without a single explicit if:
auto timeout =
read_line("config.txt", 3)
.and_then(parse_int)
.transform([](int secs) { return secs * 1000; })
.value_or(5000); // default if any step failed
Each step runs only if the previous step succeeded. If read_line fails, the and_then and transform are skipped and the error propagates unchanged to value_or, which supplies the fallback.
Aside (CS 245 connection: preconditions, postconditions, and partial functions). In CS 245 you wrote Hoare triples: {P} cmd {Q}, asserting that if precondition P holds before statement S executes, postcondition Q holds afterwards.
A function that can fail is a partial function: it has a postcondition but only under certain preconditions. The traditional C approach — set a global error flag, return a sentinel — is equivalent to saying the function always terminates with Q, but Q includes a hidden “or the error flag was set” disjunct that callers routinely ignore.
std::expected<T, E>makes the disjunction explicit in the type. The postcondition is: either the return value is aTsatisfying the happy-path postcondition, or the return value is anEdescribing why the preconditions could not be met. Hoare logic has a name for this: a total correctness specification with a distinguished failure mode.
When to Use expected vs. Exceptions vs. optional
Use
std::optional<T>when absence is normal and uninformative: a map lookup, an optional config value. There is nothing to explain.Use
std::expected<T, E>when failure is part of normal control flow but the caller needs to know why: parsing user input, I/O that routinely encounters missing files or permission errors, library APIs that must work without exceptions enabled.Use exceptions for truly exceptional conditions: programming errors, resource exhaustion, situations the caller cannot reasonably recover from locally. Exceptions propagate automatically;
expectedmust be explicitly threaded through every call site.
Note.
std::expectedis a C++23 feature. At the time of writing, GCC 12+, Clang 16+, and MSVC 19.36+ provide implementations under<expected>. The design is based on proposal P0323 by Vicente Botet and JF Bastien, which drew heavily onboost::outcomeand Rust’sResult<T, E>. Rust programmers will find the monadic API immediately familiar.
Choosing the Error Type
The error type E in std::expected<T, E> is up to you. There are several common choices:
std::error_code— the standard type from<system_error>, carrying an integer code plus a category (std::system_category(),std::generic_category(), etc.). Good for I/O and OS errors. Composable across library boundaries because categories are extensible.A plain
enum class— simple, fast, zero overhead. The right choice when the set of failure modes is small and fixed.
enum class ParseError { empty_input, invalid_char, overflow };
std::expected<int, ParseError> parse_int(std::string_view s);
std::string— carries a human-readable message. Convenient but expensive (heap allocation). Prefererror_codeplus a message function for library code.A
std::variantof error types — when a function can fail in structurally different ways that the caller will handle differently. This combines the ideas from Problem 36 and this one.
With exceptions you get automatic propagation: an uncaught exception unwinds the stack until a handler is found. With expected you propagate manually. The idiom is to check and re-wrap:
using Err = std::error_code;
std::expected<Config, Err> load_config(const std::string &path) {
auto text = read_file(path); // returns expected<string, Err>
if (!text) return std::unexpected{text.error()}; // propagate
auto cfg = parse_config(*text); // returns expected<Config, Err>
if (!cfg) return std::unexpected{cfg.error()}; // propagate
return *cfg;
}
The manual check is more verbose than exceptions, but it is also more explicit: every call site where a failure can originate is visible in the code. The monadic and_then from the previous section eliminates the boilerplate when errors pass through unchanged.
Problem 38: Compile-Time If
Back in Problem 24 we dispatched at compile time by overloading on iterator category tags. The technique works: the compiler sees the tag type, selects the right overload, and the wrong branch is never instantiated. But it is verbose. Every compile-time branch requires a separately named helper function and an explicit traits query. Read the code a month later and the intent is buried under machinery.
There is a simpler way.
The Old Way: Tag Dispatch and SFINAE
Recall the advance example from Problem 24. We wrote three overloads of doAdvance, one for each iterator category, and dispatched by constructing a tag object:
template<typename Iter>
Iter advance(Iter it, ptrdiff_t n) {
return doAdvance(it, n,
typename iterator_traits<Iter>::iterator_category{});
}
Alternatively, we might have tried SFINAE — Substitution Failure Is Not An Error — where we write enable conditions on template parameters to selectively disable overloads. That is even harder to read.
Both approaches share a flaw: they scatter the logic for one operation across multiple function templates. What we really want is to write the alternatives side-by-side, in one function body.
if constexpr
C++17 introduces if constexpr, which evaluates its condition at compile time. The discarded branch is not instantiated. It just needs to parse.
#include <type_traits>
template<typename T>
void serialize(const T &val) {
if constexpr (std::is_integral_v<T>) {
std::cout << "(int) " << val;
} else if constexpr (std::is_floating_point_v<T>) {
std::cout << "(float) " << val;
} else {
std::cout << "(other) " << val.toString();
}
}
When T is int, the compiler instantiates only the first branch. The val.toString() in the third branch is never compiled — so it does not matter whether int has a toString member.
When T is std::string, the compiler instantiates only the third branch. std::is_integral_v<std::string> is false, so the first branch is discarded.
Note. The discarded branch must still be syntactically valid C++. The compiler parses it and checks basic syntax. What it does not do is instantiate the branch or check that its names are valid for the concrete type
T. The rule: the discarded branch must parse, but need not type-check for the givenT.
The Critical Constraint: Must Depend on a Template Parameter
if constexpr is only meaningful inside a template. The condition must depend on a template parameter; otherwise the compiler could simply evaluate it and warn about unreachable code.
// This is fine -- condition depends on T:
template<typename T>
void f(T x) {
if constexpr (sizeof(T) > 4) { /* large type path */ }
else { /* small type path */ }
}
// This is NOT if constexpr's job -- condition is a runtime value:
void g(bool flag) {
if constexpr (flag) { } // error: flag is not a constant expression
}
Comparison with #ifdef
#ifdef also selects code at compile time, but it is a preprocessor tool: it knows nothing about types, templates, or scopes.
// #ifdef: coarse, textual, not type-aware
#ifdef PLATFORM_64
using size_type = uint64_t;
#else
using size_type = uint32_t;
#endif
// if constexpr: fine-grained, inside templates, type-aware
template<typename T>
auto make_buffer() {
if constexpr (sizeof(T) <= 8) {
return SmallBuffer<T>{};
} else {
return LargeBuffer<T>{};
}
}
if constexpr respects scoping, participates in type-checking of the taken branch, and interacts correctly with auto return type deduction. #ifdef does none of these things.
Aside (CS 146 connection: conditional at the type level). In a typed functional language like Haskell or ML, type-directed dispatch happens through type classes (Haskell) or modules (ML): different instances of a class provide different behaviour for different types. The selection happens at compile time, not runtime.
In Racket,
condandmatchare ordinary runtime conditionals — Racket is dynamically typed so there is no compile-time type to inspect.
if constexproccupies the middle ground: it is a runtimeifin syntax and scoping, but its condition is evaluated at compile time using the type system. It is as close as C++ gets to Haskell-style compile-time pattern matching on type properties.
Worked Example: to_string
Suppose we want a generic to_string that handles arithmetic types with std::to_string and everything else by calling a member str() method:
#include <string>
#include <type_traits>
template<typename T>
std::string to_string(const T &val) {
if constexpr (std::is_arithmetic_v<T>) {
return std::to_string(val); // works for int, float, etc.
} else {
return val.str(); // user-defined types supply str()
}
}
Without if constexpr we would need two overloads separated by enable_if conditions — messy boilerplate just to say “pick one of two implementations based on a type property.”
if constexpr also replaces base-case specialisations in recursive templates. A classic example: printing a parameter pack.
template<typename Head, typename... Tail>
void print_all(Head h, Tail... tail) {
std::cout << h;
if constexpr (sizeof...(tail) > 0) {
std::cout << ", ";
print_all(tail...);
}
// No separate base-case specialisation needed.
}
When the pack is empty, sizeof...(tail) == 0 and the recursive call is in the discarded branch, so it is never instantiated. Previously you would need a separate print_all() overload for zero arguments.
A Note on consteval and constexpr Functions
if constexpr selects branches at compile time. A related but distinct feature is constexpr functions: functions that may be evaluated at compile time (when their arguments are compile-time constants) and at runtime otherwise. C++20 adds consteval for functions that must be evaluated at compile time.
The relationship:
constexprfunction: compile-time if possible, runtime otherwise.constevalfunction: compile-time only; calling it with a runtime argument is an error.if constexpr: selects a branch at compile time inside a template.
They are complementary. A constexpr function can contain an if constexpr; an if constexpr condition often calls constexpr predicates from <type_traits>.
Problem 39: Unpack My Tuple
std::pair<T1, T2> gives us two heterogeneous values in one object. Problem 17 used it constantly — std::map stores key-value pairs, std::minmax returns a pair of iterators. But what about three values? Or four? Writing nested pairs (pair<int, pair<string, bool>>) is possible but unreadable. Enter std::tuple.
std::tuple: N Heterogeneous Values
std::tuple<T1, T2, ..., TN> generalises pair to an arbitrary number of elements. The types need not be related.
#include <tuple>
#include <string>
// Explicit construction:
std::tuple<int, std::string, double> t{42, "hello", 3.14};
// Factory (deduces types):
auto t2 = std::make_tuple(42, std::string{"hello"}, 3.14);
Accessing Elements
Elements are accessed by index or by type using the free function template std::get:
std::get<0>(t); // 42 (int)
std::get<1>(t); // "hello" (std::string)
std::get<2>(t); // 3.14 (double)
// By type -- only unambiguous if the type appears exactly once:
std::get<std::string>(t); // "hello"
std::tuple_size<decltype(t)>::value gives the number of elements; std::tuple_element<0, decltype(t)>::type gives the type of element 0. These are the machinery that std::get relies on internally.
std::tie: Unpacking into Existing Variables
Before C++17, the standard way to unpack a tuple into named variables was std::tie:
int n;
std::string s;
double d;
std::tie(n, s, d) = t; // n=42, s="hello", d=3.14
// Ignore fields you don't need:
std::tie(n, std::ignore, d) = t;
std::tie creates a tuple of references to its arguments. Assigning a tuple to it copies each element into the referenced variable.
Structured Bindings (C++17)
C++17 introduces a much cleaner syntax for unpacking. It works on tuples, pairs, structs, and arrays.
auto [n, s, d] = t; // n is int, s is std::string, d is double
The types are deduced; you just name the pieces. This is the same auto [ok, val] syntax from Problem 35’s pair example — now generalised to any fixed-size heterogeneous aggregate.
Structured Bindings on Pairs and Maps
Problem 17 showed that iterating over a std::map is clunky because each element is a std::pair<const Key, Value>:
std::map<std::string, int> scores{{"Alice", 90}, {"Bob", 85}};
// Pre-C++17: verbose
for (auto &kv : scores) {
std::cout << kv.first << ": " << kv.second << "\n";
}
// C++17 structured binding: clear and natural
for (auto &[name, score] : scores) {
std::cout << name << ": " << score << "\n";
}
Structured Bindings on Plain Structs
Structured bindings work on any aggregate (a struct with no user-provided constructor, no virtual functions, and no private members), binding the variables to the fields in declaration order:
struct Point { int x; int y; };
Point p{3, 4};
auto [x, y] = p; // x == 3, y == 4
For non-aggregate types, structured bindings require a get<N> specialisation and std::tuple_size / std::tuple_element traits — the same protocol that std::pair and std::tuple already satisfy.
Aside (CS 146 connection: tuples and pattern matching). Haskell and Racket both have tuples. In Haskell,
(Int, String, Double)is the type of a 3-tuple;let (n, s, d) = tis the idiom for unpacking it.In Racket,
(match t [(list n s d) ...])is the pattern-match equivalent. These are all the same idea: name the pieces of a compound value at the use site without first defining a dedicated record type.Structured bindings in C++17 bring the same convenience to statically typed C++ code. The compiler deduces the types; you just give the pieces names that make the code readable.
std::apply: Unpack as Function Arguments
Sometimes you want to pass a tuple’s elements as separate arguments to a function. std::apply does this:
#include <tuple>
#include <functional>
auto add = [](int a, int b, int c) { return a + b + c; };
auto args = std::make_tuple(1, 2, 3);
int result = std::apply(add, args); // same as add(1, 2, 3): result == 6
std::apply is useful when you build argument lists dynamically and then call a function with them — a pattern that appears in generic dispatchers, test frameworks, and serialisation libraries.
Worked Example: Multiple Return Values
A common use of tuple is returning multiple values from a function without defining a dedicated struct.
// Returns {mean, variance, count} -- all computed in one pass.
std::tuple<double, double, int>
stats(const std::vector<double> &data) {
if (data.empty()) return {0.0, 0.0, 0};
double sum = 0, sum2 = 0;
for (double x : data) { sum += x; sum2 += x * x; }
int n = static_cast<int>(data.size());
double mean = sum / n;
double var = sum2 / n - mean * mean;
return {mean, var, n};
}
// Caller unpacks with a structured binding:
auto [mean, var, n] = stats(data);
std::cout << "mean=" << mean << " var=" << var << " n=" << n << "\n";
Note. Prefer a named
structover atuplewhen the fields have semantic meaning that a reader should not have to deduce from position.tuple<double, double, int>says nothing about which double is the mean and which is the variance. A struct namedStatswith fieldsmean,var, andcountis self-documenting. Thetupleapproach is convenient for short-lived, internal helper functions where names would be over-engineering; it becomes a liability when the type crosses API boundaries.
Tuple Comparison and Structured Binding References
Tuples support lexicographic comparison with ==, <, and the rest, provided the element types do. This makes them immediately usable in sorted containers and with algorithms that require ordering.
auto a = std::make_tuple(1, "abc", 3.0);
auto b = std::make_tuple(1, "abd", 3.0);
a < b; // true: compares element by element; first difference is "abc" < "abd"
a == b; // false
Structured bindings introduce names, not copies, by default when the right-hand side is an lvalue. Using auto & gives you references; auto gives copies.
std::tuple<int, std::string> t{42, "hi"};
auto [a, b] = t; // a and b are copies; modifying a does not change t
auto &[c, d] = t; // c and d are references; modifying c changes t
const auto &[e, f] = t; // read-only references
The reference form is especially important in the range-for over a std::map: writing for (auto [k, v] : m) copies every key-value pair. Writing for (const auto &[k, v] : m) avoids the copies.
A classic use of tuple comparison: sort records by a composite key without writing a custom comparator.
struct Student { std::string name; int grade; double gpa; };
std::vector<Student> students = { /* ... */ };
// Sort by (grade descending, gpa descending, name ascending):
std::sort(students.begin(), students.end(), [](const Student &a, const Student &b) {
return std::make_tuple(-a.grade, -a.gpa, a.name)
< std::make_tuple(-b.grade, -b.gpa, b.name);
});
Negating the numeric fields reverses their ordering. The tuple comparison handles the tie-breaking automatically, left to right. No custom comparator logic to write or debug.
Problem 40: Constraining Templates
Templates accept any type that satisfies the operations the code uses. That flexibility is their strength, but it has a sharp edge. Instantiate std::sort on a type that has no operator< and the compiler produces a wall of error messages deep inside the standard library implementation — none of which mention the actual mistake at the call site. Problem 24 gave us tag dispatch as one solution. Problem 25 mentioned SFINAE as another. Both work, but both are verbose and hard to read. C++20 gives us a better one: concepts.
The Problem: Opaque Template Errors
struct NoCmp { int x; }; // no operator<
std::vector<NoCmp> v = { {3}, {1}, {2} };
std::sort(v.begin(), v.end()); // a screenful of errors
The errors name internal helper types and lambdas deep in the <algorithm> header. The user’s mistake — providing an incomparable type — is nowhere in the message.
A concept is a compile-time predicate on types. Attaching one to a template parameter produces a clean diagnostic at the call site when the constraint is not satisfied.
Writing a Concept
#include <concepts>
template<typename T>
concept Comparable = requires(T a, T b) {
{ a < b } -> std::convertible_to<bool>;
};
Read this as: T satisfies Comparable if, given two const T values a and b, the expression a < b is well-formed and its result is convertible to bool.
requires Expressions
The body of a requires expression is a list of requirements. There are four kinds:
template<typename T>
concept Printable = requires(T x, std::ostream &os) {
// Simple requirement: expression must be well-formed:
os << x;
// Type requirement: T must have a nested type:
typename T::value_type;
// Compound requirement: well-formed AND result satisfies a concept:
{ x.size() } -> std::convertible_to<std::size_t>;
// Nested requirement: a further concept must hold:
requires std::copyable<T>;
};
Using Concepts
There are three syntactic styles. All are equivalent; choose the one that reads most clearly for your context.
As a template parameter constraint.
template<Comparable T>
T max(T a, T b) { return a < b ? b : a; }
With a requires clause.
template<typename T>
requires Comparable<T>
T max(T a, T b) { return a < b ? b : a; }
Abbreviated function template (C++20).
// 'auto' makes it a template; 'Comparable auto' constrains it:
Comparable auto max(Comparable auto a, Comparable auto b) {
return a < b ? b : a;
}
The abbreviated form is the most concise. The requires-clause form is most flexible: its condition can be an arbitrary boolean expression, not just a single concept name.
Standard Library Concepts
C++20’s <concepts> header provides a rich library of built-in concepts:
std::same_as<T, U> // T and U are the same type
std::derived_from<T, B> // T is derived from B
std::convertible_to<T,U> // T is implicitly convertible to U
std::integral<T> // T is an integral type
std::floating_point<T> // T is a floating-point type
std::copyable<T> // T can be copy-constructed and copy-assigned
std::movable<T> // T can be move-constructed and move-assigned
std::equality_comparable<T> // T has operator==
std::totally_ordered<T> // T has all six comparison operators
std::invocable<F, Args...> // F is callable with Args...
The <iterator> header adds iterator concepts: std::input_iterator, std::random_access_iterator, std::sentinel_for, and so on.
How Concepts Replace SFINAE
The old SFINAE way to write a constrained max (using std::enable_if):
template<typename T,
typename = std::enable_if_t<
std::is_same_v<
decltype(std::declval<T>() < std::declval<T>()),
bool>>>
T max(T a, T b) { return a < b ? b : a; }
The concept version:
template<Comparable T>
T max(T a, T b) { return a < b ? b : a; }
Same semantics. Dramatically better error messages. And the intent is explicit in the code.
Aside (CS 246 / Problem 24 connection: tag dispatch was a hand-rolled concept). Problem 24’s iterator tags were C++’s original compile-time dispatch mechanism. Each tag (
random_access_iterator_tag, etc.) is essentially a hand-written concept: a type-level predicate that classifies iterator types. The overloadeddoAdvancefunctions are hand-written concept-driven dispatch.Concepts subsume all of this. The standard library’s iterator concepts (
std::random_access_iterator,std::bidirectional_iterator, …) now express these distinctions directly, and theadvancealgorithm can be written with arequiresclause or constrained overloads instead of tag objects.
Worked Example: Constrained max
#include <concepts>
template<typename T>
concept LessThanComparable = requires(T a, T b) {
{ a < b } -> std::convertible_to<bool>;
};
template<LessThanComparable T>
const T &max(const T &a, const T &b) { return a < b ? b : a; }
struct NoCmp { int x; };
int main() {
max(3, 5); // OK: int satisfies LessThanComparable
max(NoCmp{}, NoCmp{}); // error: NoCmp does not satisfy LessThanComparable
// Error message names the concept -- clear and helpful.
}
Note. Concepts are about what a type must do, not what it is.
std::integral<T>checks whetherTis an integral type by querying a type trait.Comparable<T>checks whetherTsupports<— it would be satisfied by any user-defined type that provides the operator, regardless of its inheritance hierarchy. This is duck typing made explicit and statically verified: if it supports<, it satisfiesComparable.
Problem 41: Lazy Pipelines
Problem 17 gave us std::transform, std::copy_if, std::accumulate, and friends. They work. But chaining them is painful. Each step allocates a new intermediate container. Each step requires an output iterator. Composing three transformations means three temporary vectors, three passes over the data, and three iterator pairs to manage.
C++20 introduces the ranges library, which solves all of these problems at once.
The Pain of Chaining Algorithms
Suppose we want the squares of all even numbers in a vector.
std::vector<int> v = {1, 2, 3, 4, 5, 6, 7, 8, 9, 10};
// Step 1: filter evens -- allocates a new vector
std::vector<int> evens;
std::copy_if(v.begin(), v.end(), std::back_inserter(evens),
[](int n) { return n % 2 == 0; });
// Step 2: square them -- allocates another new vector
std::vector<int> squares;
std::transform(evens.begin(), evens.end(), std::back_inserter(squares),
[](int n) { return n * n; });
// squares == {4, 16, 36, 64, 100}
Two temporary vectors, two passes, lots of boilerplate. With ten steps the code would be unreadable.
Views: Lazy Adapters
C++20’s std::views (also accessible as std::ranges::views) provides lazy range adapters. A view does not compute anything when constructed. It wraps the underlying range and produces elements on demand, one at a time, as you iterate. No intermediate containers. No extra passes.
#include <ranges>
#include <vector>
#include <iostream>
std::vector<int> v = {1, 2, 3, 4, 5, 6, 7, 8, 9, 10};
auto result = v
| std::views::filter([](int n) { return n % 2 == 0; })
| std::views::transform([](int n) { return n * n; });
for (int x : result) {
std::cout << x << " "; // 4 16 36 64 100
}
The | pipe operator composes views left-to-right. result is not a vector — it is a lightweight view object that remembers the composition. Elements are produced lazily as the range-for loop advances.
The View Vocabulary
The standard library ships a rich set of views. Here are the most common:
views::filter(pred) // yield only elements satisfying pred
views::transform(fn) // apply fn to each element
views::take(n) // yield at most n elements
views::drop(n) // skip first n elements
views::reverse // iterate backwards (requires bidirectional range)
views::iota(a, b) // generates a, a+1, ..., b-1 on demand
views::iota(a) // infinite sequence a, a+1, a+2, ...
views::keys // from a range of pairs, yield only the keys
views::values // from a range of pairs, yield only the values
views::enumerate // (C++23) yield {index, element} pairs
views::zip(r1, r2) // (C++23) yield pairs of elements side-by-side
views::iota is particularly powerful because it generates a range without an underlying container at all:
// Print the first 5 squares, no vector needed:
for (int x : std::views::iota(1, 6)
| std::views::transform([](int n){ return n * n; })) {
std::cout << x << " "; // 1 4 9 16 25
}
Ranges Algorithms
The <algorithm> header gains a parallel set of algorithms in the std::ranges:: namespace. These take a range directly instead of a pair of begin/end iterators.
std::vector<int> v = {3, 1, 4, 1, 5, 9, 2, 6};
std::ranges::sort(v); // sort in-place
auto it = std::ranges::find(v, 5); // returns iterator, not bool
std::ranges::copy(v, std::ostream_iterator<int>{std::cout, " "});
The range overloads are strictly more convenient: one argument instead of two, and they work directly with views.
// Sort only the even elements (copy to a temp view, sort that):
auto evens = v | std::views::filter([](int n){ return n % 2 == 0; });
// Note: filter view is not sortable directly (not a common range).
// For in-place work on views, use ranges::for_each or materialize first.
std::ranges::for_each(evens, [](int x){ std::cout << x << " "; });
Materialising a View
A view is not a container. To store the results in a vector, you must materialise the view. In C++23, std::ranges::to does this cleanly:
// C++23:
auto squares_of_evens = v
| std::views::filter([](int n){ return n % 2 == 0; })
| std::views::transform([](int n){ return n * n; })
| std::ranges::to<std::vector>();
// C++20 workaround (no std::ranges::to yet):
std::vector<int> out;
for (int x : result) out.push_back(x);
Aside (CS 146 connection: lazy evaluation and infinite lists). In Haskell, all evaluation is lazy by default.
[1..]is an infinite list of integers;take 5 (map (*2) [1..])produces[2,4,6,8,10]without ever generating the full infinite list. The pipeline is evaluated only as far astakedemands.
std::views::iota(1)is an infinite range. Composing it withviews::take(5)is the exact C++ equivalent oftake 5 [1..]. Nothing is computed until the for-loop (or a materialising algorithm) demands elements.This is lazy evaluation in a normally eager language. Ranges bring it to C++ in a controlled, opt-in way, restricted to sequence processing rather than applied globally.
Worked Example: Top 5 Words by Length
Take a string of words, find the five longest, and print them in order of length.
#include <ranges>
#include <algorithm>
#include <string>
#include <vector>
#include <iostream>
#include <sstream>
int main() {
std::string sentence = "the quick brown fox jumps over a lazy dog";
std::istringstream iss{sentence};
std::vector<std::string> words{
std::istream_iterator<std::string>{iss}, {}};
// Sort words longest-first:
std::ranges::sort(words, [](auto &a, auto &b){
return a.size() > b.size();
});
// Take the top 5 and print them:
for (auto &w : words | std::views::take(5)) {
std::cout << w << " (" << w.size() << ")\n";
}
// Output: jumps(5) quick(5) brown(5) over(4) lazy(4)
}
No temporary containers for the top-5 step. views::take(5) wraps the sorted vector lazily; the range-for loop reads the first five elements and stops.
Note. The ranges library decouples iterators from sentinels. In the classic STL,
begin()andend()must have the same type. In ranges, the sentinel (end marker) may have a different type than the iterator, as long as they can be compared with!=. This enables ranges over null-terminated C strings (where the sentinel is a comparison against'\0') and other non-symmetric iteration patterns that the old model could not express cleanly. The conceptsstd::ranges::rangeandstd::sentinel_forformalise this — we met concept syntax in Problem 40.
Problem 42: Non-Owning Views
Back in Problem 3 we built a resizable array from scratch, and we already knew how C passes arrays around: a pointer to the first element plus a separate size_t for the length. Every function that needed to look at a contiguous sequence had to accept two parameters and trust the caller to pass a matching pair. Get it wrong — pass the wrong length, forget to subtract one, slice the wrong half — and you get undefined behaviour with no compiler warning. Strings were even worse: a const char* carries no length at all, so every substring operation either allocates a fresh copy or resorts to pointer arithmetic.
There is a better way. C++17 gives us two types that bundle the pointer and the length together, add a rich interface on top, and crucially do so without owning the underlying data.
The Ownership Distinction
Problem 14 drew a sharp line between owning and non-owning handles to memory. A std::unique_ptr<T> owns its pointee: when the smart pointer dies, the object dies with it. A raw pointer T* may or may not own what it points to — the type gives you no clue.
C++17 introduces two unambiguously non-owning types:
- A view is a lightweight, non-owning reference to data that lives elsewhere. The view does not allocate, does not copy, and does not extend the lifetime of the underlying storage. The programmer is responsible for ensuring the underlying data outlives the view.
std::string_view— a view into a character sequence.std::span<T>— a view into a contiguous sequence ofT.
std::string_view
std::string_view (from <string_view>) represents a window into some existing character data. It stores just two things: a pointer to the first character and a length. It is always cheap to copy.
Construction
#include <string_view>
// From a string literal -- no allocation
std::string_view sv1 = "hello, world";
// From a std::string -- sv2 points into s's buffer
std::string s = "hello, world";
std::string_view sv2 = s;
// From a pointer + length -- useful for buffers
const char *buf = "hello, world";
std::string_view sv3{buf, 5}; // "hello"
All three constructions are O(1): no characters are copied.
Operations
string_view supports almost every read-only operation you expect from std::string: size(), empty(), operator[], front(), back(), find(), rfind(), starts_with() (C++20), ends_with() (C++20), and range-based for. The two operations most worth highlighting are substr and remove_prefix:
std::string_view sv = "2024-01-15";
// substr returns another view -- still no allocation
std::string_view year = sv.substr(0, 4); // "2024"
std::string_view month = sv.substr(5, 2); // "01"
// remove_prefix / remove_suffix adjust the view in place
sv.remove_prefix(5); // sv is now "01-15"
sv.remove_suffix(3); // sv is now "01"
Compare this with std::string::substr, which heap-allocates a brand-new string every time. string_view::substr just adjusts the pointer and length. No allocation, no copy.
The Golden Rule
Note. The underlying data must outlive the view. A
string_viewis a pointer into memory it does not own. If that memory is freed or moved — if astd::stringis destroyed, or resized, or moved to a new allocation — the view becomes a dangling pointer.std::string_view danger() { std::string s = "temporary"; return s; // s is destroyed here; caller gets a dangling view -- UB }Never return a
string_viewthat refers to a local variable. Never store astring_viewas a class data member unless you can prove the string it refers to outlives the object containing the view.
string_view as a Parameter Type
The most important use of string_view is as a function parameter. Before C++17, a function that just read a string had to choose between:
void f(const std::string &s); // forces allocation if caller passes a literal
void f(const char *s); // no length; no std::string interface
Neither is ideal. string_view accepts both:
void f(std::string_view sv); // accepts literals, strings, substrings
f("hello"); // no allocation -- literal becomes a string_view
f(some_std_string); // points into the string's buffer; no copy
f(sv.substr(0, 5)); // another view; still no allocation
It works. One parameter type, all callers, no heap traffic.
std::span
std::span<T> (from <span>, standardized in C++20 but available in C++17-era toolchains via <gsl/span> or library headers) does for contiguous sequences of arbitrary type what string_view does for characters. It stores a pointer and a length. It does not own the data.
Fixed-size vs Dynamic Extent
span comes in two flavours:
// Dynamic extent: length not known at compile time
std::span<int> s1;
// Fixed extent: length baked into the type
std::span<int, 5> s2;
The fixed-extent form span<T, N> is a zero-overhead wrapper: the compiler knows the size at compile time and can optimize loops and bounds checks accordingly, without storing the length at runtime. The dynamic-extent form span<T> is the more commonly useful one.
Construction
int arr[5] = {1, 2, 3, 4, 5};
std::vector<int> v = {10, 20, 30};
std::span<int> s1{arr}; // from C array
std::span<int> s2{v}; // from vector
std::span<int, 5> s3{arr}; // fixed-extent from C array
std::span<int> s4{arr + 1, 3}; // pointer + length: {2, 3, 4}
Like string_view, all constructions are O(1).
Subspans
std::span<int> s{v};
auto first3 = s.first(3); // first 3 elements
auto last2 = s.last(2); // last 2 elements
auto middle = s.subspan(1, 2); // starting at index 1, length 2
Each returns a new span into the same underlying storage. No copying.
Replacing Pointer+Length Parameters
The old C style:
// Two parameters that must stay synchronized -- easy to mismatch
void process(const int *data, size_t len);
The C++ view style:
// Single parameter; accepts arrays, vectors, sub-ranges, all of it
void process(std::span<const int> data) {
for (int x : data) { /* ... */ }
}
int arr[10] = {};
std::vector<int> v(20);
process(arr); // OK
process(v); // OK
process({v.data(), 5}); // first 5 elements of v
The size is carried along automatically. The mismatch bug is gone.
Aside (Rust connection: borrows and slices). Rust formalizes views at the language level. A Rust string slice
&stris exactly astring_view: a fat pointer (data address plus length) that borrows its data from some owner. Rust’s slice type&[T]is exactly aspan<T>. The difference is that Rust’s borrow checker enforces the golden rule at compile time: a slice cannot outlive the owner it borrows from. C++ has no such static enforcement — the rule is a programmer responsibility, not a language guarantee. Rust’s approach eliminates the class of dangling-view bugs at the cost of learning the borrow checker; C++’s approach gives you the performance without the restriction.
Worked Example: Pattern Search in a Byte Buffer
Suppose we write a function that searches for a short byte pattern inside a larger buffer. With the old interface:
// Old style: four parameters, two pairs to keep in sync
bool contains(const uint8_t *buf, size_t buf_len,
const uint8_t *pat, size_t pat_len);
With span:
#include <span>
#include <algorithm>
#include <cstdint>
bool contains(std::span<const uint8_t> buf,
std::span<const uint8_t> pat) {
if (pat.size() > buf.size()) return false;
for (size_t i = 0; i + pat.size() <= buf.size(); ++i) {
if (std::equal(pat.begin(), pat.end(),
buf.begin() + i))
return true;
}
return false;
}
Two parameters. The sizes are always synchronized because they are part of the type. Any contiguous range — a C array, a std::vector, a section of a memory-mapped file — can be passed without modification.
Note the use of span<const uint8_t> rather than span<uint8_t>. The const here is on the element type, not the span itself. A span<const T> is a read-only view: the elements cannot be modified through it. A span<T> is a read-write view.
A span<T> implicitly converts to span<const T> (just as T* converts to const T*). The reverse does not compile. Use span<const T> for input parameters, span<T> only when the function needs to write back into the caller’s buffer.
Note. Never store a
spanorstring_viewas a class data member unless you have ironclad proof that the referred-to data outlives every instance of the class. In practice that means the referred-to data should bestaticor live in a scope that strictly encloses every instance. When in doubt, store astd::stringorstd::vectorinstead — they own their data and the lifetime question dissolves.
Problem 43: A Real Date Library
For most of this course, “time” meant “performance.” Back in Problem 1 we used clocks briefly, timing how long operations took. But <chrono> is much more than a stopwatch. It contains a carefully designed system of types for durations, time points, and clocks — a system that prevents the classic bugs of mixing seconds with milliseconds or subtracting timestamps from different clocks. C++20 extends it further with calendar types that let you write actual dates. The full picture deserves proper coverage.
The Core Idea: Types Prevent Bugs
The classic duration bug in C:
// Old C style -- no idea what units any of these are
long timeout = 5; // seconds? milliseconds?
long heartbeat = 200; // milliseconds? microseconds?
if (elapsed > timeout + heartbeat) { /* ... */ } // silent unit mismatch
<chrono> makes the units part of the type. Adding a std::chrono::seconds to a std::chrono::milliseconds compiles and gives the right answer. Adding a steady_clock::time_point to a system_clock::time_point does not compile. The type system catches the bug before the program runs.
Duration Types
A std::chrono::duration<Rep, Period> represents a span of time. Rep is the numeric representation (usually long long) and Period is a std::ratio that relates the tick to a second. You almost never write this by hand. The standard library provides all the common specializations:
#include <chrono>
using namespace std::chrono;
nanoseconds ns{1}; // 1 nanosecond
microseconds us{1}; // 1 microsecond
milliseconds ms{200}; // 200 milliseconds
seconds s{5}; // 5 seconds
minutes m{2}; // 2 minutes
hours h{1}; // 1 hour
Duration Arithmetic
Durations support the arithmetic you expect, with automatic unit promotion:
seconds s{5};
milliseconds ms{200};
auto total = s + ms; // total has type milliseconds, value 5200
auto diff = s - ms; // milliseconds{4800}
auto triple = s * 3; // seconds{15}
s += minutes{1}; // s is now seconds{65}
Converting to a coarser unit requires an explicit duration_cast — the compiler will not silently lose precision:
milliseconds ms{5200};
seconds s = duration_cast<seconds>(ms); // s{5}: truncates toward zero
// seconds s = ms; // ERROR: narrowing conversion not allowed implicitly
User-Defined Literals
C++14 adds duration literals so you can write durations inline without constructing types explicitly:
using namespace std::chrono_literals;
auto timeout = 5s; // std::chrono::seconds{5}
auto heartbeat = 200ms; // std::chrono::milliseconds{200}
auto delay = 1h + 30min; // std::chrono::minutes{90}
auto sleep_for = 50us; // microseconds
auto precision = 10ns; // nanoseconds
The using namespace std::chrono_literals is necessary to bring the literal suffixes into scope. They live in their own sub-namespace precisely so you can import them without pulling in the entire std::chrono namespace.
Clock Types
A clock provides a now() function that returns a time point, and a period that tells you the resolution. C++ provides three:
std::chrono::steady_clock— monotonic clock. It never goes backwards, never jumps forward. Use this for measuring elapsed time.std::chrono::system_clock— the wall clock. Tracks real-world calendar time. May be adjusted by NTP; can go backwards. Use this to get the current date and time.std::chrono::high_resolution_clock— the highest-resolution clock available. May be an alias for either of the above. Avoid it if you care which clock you are using.
Time Points
clock::now() returns a time_point<clock>. Time points can be subtracted to get a duration:
auto t1 = std::chrono::steady_clock::now();
auto t2 = std::chrono::steady_clock::now();
auto elapsed = t2 - t1; // type: steady_clock::duration
Subtracting two time points from different clocks does not compile. That is the whole point.
time_since_epoch() gives the duration between the clock’s epoch (an implementation-defined point in time, typically 1970-01-01 for system_clock) and the time point:
auto now = std::chrono::system_clock::now();
auto epoch_s = duration_cast<seconds>(now.time_since_epoch());
// epoch_s.count() is the Unix timestamp
Measuring Elapsed Time
The canonical timing pattern:
#include <chrono>
#include <iostream>
using namespace std::chrono;
auto t0 = steady_clock::now();
do_work();
auto elapsed = duration_cast<milliseconds>(steady_clock::now() - t0);
std::cout << "Elapsed: " << elapsed.count() << " ms\n";
.count() extracts the raw numeric value in the duration’s units. It is the only place where you cross the boundary from the type-safe world back to a plain integer.
Note. Use
steady_clockfor measuring elapsed time. It is monotonic: it advances at a constant rate and never jumps.system_clockis the wrong tool here because it can be adjusted mid-measurement by the operating system (NTP sync, daylight saving, etc.).
A Timer RAII Class
Problem 14 showed that any resource that needs cleanup belongs in an RAII wrapper. Time measurement is no different: the “resource” is the start time, and the “cleanup” is logging the elapsed duration.
#include <chrono>
#include <iostream>
#include <string>
using namespace std::chrono;
class Timer {
std::string label_;
steady_clock::time_point start_;
public:
explicit Timer(std::string label)
: label_{std::move(label)}, start_{steady_clock::now()} {}
~Timer() {
auto ms = duration_cast<milliseconds>(
steady_clock::now() - start_).count();
std::cout << label_ << ": " << ms << " ms\n";
}
};
Usage:
void expensive_function() {
Timer t{"expensive_function"};
// ... work happens here ...
} // Timer destructor fires: "expensive_function: 47 ms"
No matter how expensive_function exits — normal return, exception, early return from a nested scope — the timer logs.
C++20 Calendar Types
C++20 adds calendar and time-zone support. The key new types are year_month_day, sys_days, and year_month_day_last.
#include <chrono>
using namespace std::chrono;
// Constructing a date: year/month/day syntax
year_month_day d1 = 2024y / January / 15;
year_month_day d2 = 2024y / 1 / 15; // same date
// Querying
int y = static_cast<int>(d1.year()); // 2024
unsigned m = static_cast<unsigned>(d1.month()); // 1
unsigned day = static_cast<unsigned>(d1.day()); // 15
// Is it a valid date?
d1.ok(); // true; would be false for 2024/February/30
sys_days converts a year_month_day to a time_point that can be used with system_clock. The bridge goes the other way too:
// time_point -> calendar date
auto today = floor<days>(system_clock::now()); // truncate to whole day
year_month_day ymd{today};
std::cout << ymd.year() << "-" << ymd.month() << "-" << ymd.day() << "\n";
// calendar date -> time_point (for arithmetic with durations)
sys_days target = 2024y / December / 25;
auto days_away = target - today;
std::cout << days_away.count() << " days until Christmas\n";
The Last Day of the Month
A classic calendar calculation:
year_month_day last_day_of_jan = 2024y / January / last;
// last_day_of_jan.day() == 31
year_month_day last_day_of_feb = 2024y / February / last;
// 2024 is a leap year; last_day_of_feb.day() == 29
The standard library handles leap years and varying month lengths for you.
Aside (CS 343 connection: timed waits). In CS 343 you will use
std::this_thread::sleep_for()to pause a thread for a specified duration. It accepts achronoduration directly:#include <thread> #include <chrono> using namespace std::chrono_literals; std::this_thread::sleep_for(100ms); // sleep for 100 milliseconds std::this_thread::sleep_until( std::chrono::steady_clock::now() + 1s); // sleep until 1 s from nowThe type safety extends into threading: you cannot accidentally pass a plain integer where a duration is expected, and the units are explicit in the code.
Note. For wall-clock time (“what time is it right now?”) use
system_clock. For measuring elapsed real time (“how long did this take?”) usesteady_clock. The two purposes are different and deserve different clocks.high_resolution_clockis a shorthand for whichever of the two has finer resolution on the current platform; it is fine for quick experiments but says nothing about monotonicity.
Problem 44: A Real Filesystem API
Problem 1 opened files with std::ifstream and std::ofstream. That covers reading and writing the contents of files. But what about the world around those files? Creating directories. Listing what is in a folder. Checking whether a file exists before trying to open it. Copying, renaming, removing. In Problem 2 we briefly thought about separate compilation and the .cc/.h file structure of a project — all of which assumes you are navigating a directory tree. Doing any of that with portable code used to require calling POSIX functions (opendir, stat, mkdir) or Windows APIs, and writing separate implementations for each platform.
C++17 changes that. <filesystem> is a full, portable API for the directory structure around your files.
std::filesystem::path
Everything in <filesystem> revolves around std::filesystem::path. A path represents a location in the filesystem: it might name an existing file, a directory, or something that does not exist yet. It knows nothing about permissions or contents; it is just a name.
#include <filesystem>
namespace fs = std::filesystem;
fs::path p1 = "/usr/local/include"; // absolute path (POSIX)
fs::path p2 = "src/main.cc"; // relative path
fs::path p3 = "C:\\Users\\Alice"; // Windows path also works
Composing Paths
The / operator appends a component:
fs::path base = "/home/alice";
fs::path full = base / "projects" / "myapp" / "src";
// full == "/home/alice/projects/myapp/src"
Look familiar? It is operator overloading to make path manipulation read like shell notation.
Decomposing Paths
fs::path p = "/home/alice/report.pdf";
p.filename(); // "report.pdf"
p.stem(); // "report"
p.extension(); // ".pdf"
p.parent_path(); // "/home/alice"
p.root_path(); // "/"
p.is_absolute(); // true
p.string(); // native string representation
These all return path objects (except string(), which returns std::string, and the boolean queries).
Querying the Filesystem
fs::path p = "/home/alice/data.csv";
fs::exists(p); // does anything at this path exist?
fs::is_regular_file(p); // is it a regular file?
fs::is_directory(p); // is it a directory?
fs::is_symlink(p); // is it a symbolic link?
fs::file_size(p); // size in bytes (throws if not a regular file)
fs::last_write_time(p); // returns a filesystem::file_time_type
All of these query the filesystem on every call — they are not cached. If you need to make multiple queries about the same file, it is more efficient to use a fs::directory_entry, which caches the stat results.
Iterating a Directory
Flat Iteration
std::filesystem::directory_iterator yields one directory_entry for each item directly inside a directory (not recursively):
for (const fs::directory_entry &entry : fs::directory_iterator{"."}) {
std::cout << entry.path().filename() << "\n";
}
The order is unspecified. Use std::sort on the paths if you need them in alphabetical order.
Recursive Iteration
std::filesystem::recursive_directory_iterator walks the entire subtree:
for (const fs::directory_entry &entry :
fs::recursive_directory_iterator{"."}) {
if (entry.is_regular_file()) {
std::cout << entry.path() << "\n";
}
}
Creating and Removing
fs::create_directory("output"); // create one directory
fs::create_directories("a/b/c"); // create whole path, like mkdir -p
fs::remove("output/old.txt"); // remove a single file
fs::remove_all("output"); // remove directory and all contents
fs::rename("old_name.txt", "new_name.txt"); // rename or move
fs::copy("src.txt", "dst.txt"); // copy a file
fs::copy("src_dir", "dst_dir",
fs::copy_options::recursive); // copy directory tree
Error Handling: Two Forms
Most <filesystem> functions have two overloads. The default throws std::filesystem::filesystem_error on failure. An overload that takes an std::error_code& stores the error instead of throwing:
// Throwing form: use when failure is unexpected
fs::file_size("/etc/passwd"); // throws filesystem_error if it fails
// Non-throwing form: use when failure is routine
std::error_code ec;
auto sz = fs::file_size("/etc/passwd", ec);
if (ec) {
std::cerr << "Error: " << ec.message() << "\n";
}
The non-throwing form is useful in performance-sensitive loops or when you expect certain failures (a file that might or might not exist).
Aside (CS 246 / CS 136 connection: POSIX underpinnings). Under the hood,
<filesystem>on Linux is implemented using the same POSIX calls you saw in CS 136:stat(2),opendir(3)/readdir(3),mkdir(2),rename(2), andunlink(2). On Windows it usesGetFileAttributes,FindFirstFile,CreateDirectory, etc. The<filesystem>API gives you a single interface that compiles and behaves consistently on both. If you ever need a capability it does not expose — file locking, extended attributes, device numbers — you still have to drop down to the platform API.
Worked Example: Total Size of Source Files
A function that recursively totals the size of all .cpp files in a directory tree:
#include <filesystem>
#include <iostream>
namespace fs = std::filesystem;
uintmax_t cpp_total_size(const fs::path &root) {
uintmax_t total = 0;
std::error_code ec;
for (const auto &entry :
fs::recursive_directory_iterator{root, ec}) {
if (ec) break; // skip unreadable subtrees
if (entry.is_regular_file() &&
entry.path().extension() == ".cpp") {
total += entry.file_size();
}
}
return total;
}
int main() {
auto bytes = cpp_total_size(".");
std::cout << bytes << " bytes in .cpp files\n";
}
Note the use of entry.file_size() rather than fs::file_size(entry.path()). Both give the same result, but the directory_entry version is cheaper because the stat information was fetched during iteration and is already cached.
Combine last_write_time() with <chrono> (Problem 43) to find files modified in the last 24 hours:
auto cutoff = fs::file_time_type::clock::now()
- std::chrono::hours{24};
for (const auto &entry : fs::recursive_directory_iterator{"."}) {
if (entry.is_regular_file() &&
entry.last_write_time() > cutoff) {
std::cout << entry.path() << "\n";
}
}
Paths Are Not Files
One subtlety worth internalizing: a fs::path object is just a string with path-manipulation methods. It does not verify that the path names an existing file. You can construct a path to something that does not exist, compose new paths from it, and only then call fs::exists() to check:
fs::path target = "/nonexistent/dir" / "file.txt";
// Constructing the path: always succeeds
// Querying it:
if (!fs::exists(target)) {
std::cout << "no such file\n";
}
The / operator does not touch the disk. It purely manipulates the string representation of the path, using the platform’s separator character. This matters because constructing a path is cheap and can be done offline from any I/O; the I/O only happens when you call a querying or modifying function.
Canonical Paths
fs::canonical(p) resolves all symbolic links and .. components and returns an absolute path that refers unambiguously to the same filesystem object:
fs::path p = "src/../include/./foo.h";
fs::path c = fs::canonical(p); // e.g. "/home/alice/include/foo.h"
// c has no ../ or ./ components, and no symlink indirections
fs::weakly_canonical(p) is like canonical but does not require the path to exist: it resolves as much as it can and leaves the rest as-is.
Note. Path comparison is operating-system sensitive. On Windows, filesystem paths are case-insensitive:
C:\Users\AliceandC:\USERS\ALICErefer to the same file, butoperator==onfs::pathdoes a lexicographic comparison and would consider them different. To test whether two paths refer to the same filesystem object regardless of case or symlinks, usefs::equivalent(p1, p2), which resolves both paths through the OS and compares inodes (on POSIX) or file identifiers (on Windows).
Problem 45: Type-Safe Formatting
Problem 1 taught us std::cout and the stream output model. Stream output is type safe — you cannot accidentally print an integer with a %s format specifier — but it is awkward for anything beyond the simplest output. Aligning a table means calling std::setw, std::left, std::setprecision, and remembering to std::setfill if you want something other than spaces. Worse, many of those settings are sticky: they persist across subsequent output unless you explicitly reset them. You set the flags, print one value, and then spend the next ten minutes wondering why all subsequent output is still left-aligned.
The alternative is printf-style formatting. It is concise and familiar from CS 136. It is also a trap: passing the wrong argument for a format specifier is undefined behaviour, and the compiler is not required to catch it.
C++20 gives us the best of both: std::format.
The reveal: what was wrong before
Here is what a formatted table looked like with stream manipulators before C++20:
#include <iostream>
#include <iomanip>
void old_style(const std::string &name, double val) {
std::cout << std::left << std::setw(20) << name
<< std::right << std::setw(10) << std::fixed
<< std::setprecision(2) << val << "\n";
// Did you remember to reset std::fixed and std::setprecision afterward?
// The next caller's output will be affected if you didn't.
}
The flags set on std::cout persist across calls. std::fixed and std::setprecision(2) stay in effect until something else changes them. This makes output functions non-composable: calling one function can silently alter the formatting of another.
The Basic Idea
std::format takes a format string with replacement fields (marked by curly braces) and a list of arguments. It returns a std::string:
#include <format>
std::string s = std::format("{} + {} = {}", 1, 2, 3);
// s == "1 + 2 = 3"
Every argument is formatted using its type’s formatter. There is no %d/%s distinction; the type of the argument determines how it is rendered. Passing the wrong number of arguments is a compile error (in most implementations; the standard requires a runtime exception for mismatches that cannot be caught at compile time, but implementations go further).
Format Specifiers
Inside the curly braces you can put a format specification after a colon. The syntax is:
{[index]:[fill][align][sign][#][0][width][.precision][type]}
Most of the fields are optional. Here are the most useful ones:
Width and Alignment
std::format("{:10}", 42); // " 42" (right-align, width 10)
std::format("{:<10}", 42); // "42 " (left-align, width 10)
std::format("{:^10}", 42); // " 42 " (center, width 10)
std::format("{:*^10}", 42); // "****42****" (fill with '*', center)
std::format("{:010}", 42); // "0000000042" (zero-pad, width 10)
Precision and Floating Point
std::format("{:.3f}", 3.14159); // "3.142" (3 decimal places)
std::format("{:>10.3f}", 3.14159); // " 3.142" (right-align, width 10)
std::format("{:.6g}", 0.000123456); // "0.000123456" (general format)
std::format("{:e}", 12345.678); // "1.234568e+04" (scientific)
Integer Bases
std::format("{:d}", 255); // "255" (decimal, the default)
std::format("{:x}", 255); // "ff" (lowercase hex)
std::format("{:X}", 255); // "FF" (uppercase hex)
std::format("{:#x}", 255); // "0xff" (hex with 0x prefix)
std::format("{:b}", 255); // "11111111" (binary)
std::format("{:o}", 255); // "377" (octal)
Positional Arguments
Arguments are used in order by default, but you can address them by index:
std::format("{0} {1} {0}", "ab", "cd"); // "ab cd ab"
std::format("{1} {0}", "world", "hello"); // "hello world"
std::print (C++23)
Constructing a std::string and then writing it is sometimes wasteful. C++23 adds std::print and std::println which format and write in a single call:
#include <print>
std::print("Hello, {}!\n", "world"); // no intermediate string
std::println("x = {}", 42); // println appends '\n' automatically
// Write to a specific stream
std::print(std::cerr, "Error: {}\n", message);
std::print is also guaranteed to be atomic for each call, which means it is safer than interleaving std::cout << calls from multiple threads.
std::format_to
When you need to write into an existing buffer or iterator without creating a temporary string, use std::format_to:
#include <format>
#include <iterator>
std::string buf;
buf.reserve(64);
std::format_to(std::back_inserter(buf), "x = {:>8.3f}", 3.14159);
// buf == "x = 3.142"
std::format_to_n writes at most n characters, avoiding buffer overflows when writing into a fixed-size array.
Custom Formatters
You can teach std::format how to format your own types by specializing std::formatter<T>. The specialization must provide two methods: parse (which reads the format specification) and format (which produces the output):
struct Point { int x, y; };
template<>
struct std::formatter<Point> {
// parse: read any format spec (we ignore it; just consume it)
constexpr auto parse(std::format_parse_context &ctx) {
return ctx.begin();
}
// format: produce the output
auto format(const Point &p, std::format_context &ctx) const {
return std::format_to(ctx.out(), "({}, {})", p.x, p.y);
}
};
Point pt{3, 4};
std::string s = std::format("Point: {}", pt); // "Point: (3, 4)"
Aside (CS 146 connection: format strings across languages). Python’s f-strings (
f"{x:.2f}") were a direct inspiration forstd::format. Racket’sformatprocedure (~a,~s) is similar in spirit but dynamically typed: the format string is not checked against the argument types until runtime. Python’s f-strings resolve the format at compile time (the expression is evaluated, not matched to a spec string), so type errors surface immediately.std::formatsits between the two: the format string is a runtime value, but the argument types are checked at compile time (in conforming implementations), giving type safety without requiring interpolation syntax.
A Performance Note
std::format does not use the global locale by default. Numbers always use . as the decimal separator, regardless of the user’s locale setting. This is intentional: locale-sensitive output was one of the least loved aspects of std::cout. To opt into locale formatting, add the L flag:
std::format("{:L}", 1234567.89); // locale-specific: "1,234,567.89" in en_US
std::format("{}", 1234567.89); // always: "1234567.89"
Worked Example: A Table Formatter
A function that prints a two-column table with fixed-width columns:
#include <format>
#include <iostream>
#include <vector>
#include <string>
void print_table(const std::vector<std::pair<std::string,double>> &rows) {
std::cout << std::format("{:<20} {:>10}\n", "Name", "Value");
std::cout << std::string(32, '-') << "\n";
for (const auto &[name, val] : rows) {
std::cout << std::format("{:<20} {:>10.2f}\n", name, val);
}
}
Sample output:
Name Value
--------------------------------
Alice 92.50
Bob 87.33
Carol 100.00
Compare writing this with stream manipulators. The std::format version states the layout at the call site, is self-resetting (the format string is consumed per-call, not sticky), and reads as a single coherent expression.
Dynamic Width and Precision
Width and precision can themselves come from arguments, using {:{n}} or {:.{n}} syntax where n is the index of the width argument:
int width = 12;
int prec = 4;
// {:*<{}.{}} means: fill '*', left-align, width from arg, precision from arg
std::format("{:*<{}.{}f}", 3.14159, width, prec);
// "3.1416******"
This is useful when the column widths are not known at compile time — for example, when building a table where the widths are computed by scanning the data first.
A common idiom: measure the longest entry, then format all entries to that width.
std::vector<std::string> names = {"Alice", "Bob", "Christopher"};
size_t col = 0;
for (const auto &n : names) col = std::max(col, n.size());
for (const auto &n : names) {
std::cout << std::format("{:<{}}", n, col) << "\n";
}
// Alice
// Bob
// Christopher (all padded to width 11)
Note.
std::formatwas introduced in C++20. Compiler support arrived gradually: GCC 13, Clang 14, and MSVC 19.29 have complete implementations. For earlier compilers thefmtliblibrary (https://fmt.dev) provides the same interface as a third-party header. Many codebases usedfmtlibbefore C++20 standardized it; the APIs are nearly identical. If you are writing production code that must support older compilers,fmtlibis the right answer today.
Problem 46: Strings That Match
Problem 1 read input line by line with std::getline and character by character with >>. For simple inputs that works. But sometimes input has structure — dates in a log file, email addresses in a CSV, URLs embedded in HTML — and writing a hand-rolled parser for each format is tedious and error-prone. The natural tool for recognising structured text patterns is regular expressions. C++11 adds them to the standard library.
Quick Regex Reference
The ECMAScript dialect used by default supports the following meta-characters:
.— any character except newline.\d/\w/\s— digit / word character / whitespace.[abc],[a-z],[^abc]— character class, range, negated class.*,+,?— zero-or-more, one-or-more, zero-or-one (greedy by default; append?for lazy:*?,+?).{n},{n,},{n,m}— exactly n, at least n, between n and m repetitions.(...)— capture group.(?:...)— non-capturing group.|— alternation:cat|dogmatches either.^,$— start and end of string (or line with multiline flag).
Constructing a Pattern
std::regex (from <regex>) compiles a pattern string into an internal representation that can be matched against strings:
#include <regex>
std::regex date_re{R"(\d{4}-\d{2}-\d{2})"}; // ISO date: 2024-01-15
std::regex email_re{R"([a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\.[a-zA-Z]{2,})"};
Raw string literals (R"(...)") are convenient here because the regex syntax uses many backslashes and the raw form avoids double-escaping.
The default syntax is ECMAScript (the same flavour as JavaScript). Other syntaxes are available as flags:
std::regex re{"^hello$", std::regex::icase}; // case-insensitive
std::regex posix_re{"[[:alpha:]]+", std::regex::awk}; // POSIX ERE syntax
Matching
std::regex_match: Does the Whole String Match?
std::regex_match requires the entire input string to match the pattern. It returns true or false:
std::regex date_re{R"(\d{4}-\d{2}-\d{2})"};
std::regex_match("2024-01-15", date_re); // true
std::regex_match("2024-1-15", date_re); // false -- only one month digit
std::regex_match("prefix 2024-01-15", date_re); // false -- extra prefix
std::regex_search: Does Any Part Match?
std::regex_search succeeds if the pattern matches anywhere inside the input:
std::string log = "[2024-01-15] Server started on port 8080";
std::regex date_re{R"(\d{4}-\d{2}-\d{2})"};
std::regex_search(log, date_re); // true -- found the date inside
Capturing Subgroups with std::smatch
Parentheses in the pattern define capture groups. The match results are stored in a std::smatch (string match) object:
std::string s = "2024-01-15";
std::regex date_re{R"((\d{4})-(\d{2})-(\d{2}))"};
std::smatch m;
if (std::regex_match(s, m, date_re)) {
std::string whole = m[0]; // "2024-01-15" (the entire match)
std::string year = m[1]; // "2024"
std::string month = m[2]; // "01"
std::string day = m[3]; // "15"
}
m[0] is always the full match. m[1], m[2], etc. are the captured groups in left-to-right order. Each element is a std::ssub_match which converts implicitly to std::string.
std::cmatch works the same way but for const char* instead of std::string.
Replacement
std::regex_replace returns a new string with all matches substituted:
std::string text = "cat and cat and cat";
std::regex cat_re{"cat"};
// Replace every match
std::string r = std::regex_replace(text, cat_re, "dog");
// r == "dog and dog and dog"
// Back-references in the replacement: $1 refers to capture group 1
std::regex date_re{R"((\d{4})-(\d{2})-(\d{2}))"};
std::string rearranged = std::regex_replace(
"2024-01-15", date_re, "$3/$2/$1");
// rearranged == "15/01/2024"
Iterating Over All Matches
std::sregex_iterator produces one match result per occurrence:
std::string text = "foo@bar.com and baz@qux.org are valid";
std::regex email_re{R"([a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\.[a-zA-Z]{2,})"};
auto begin = std::sregex_iterator{text.begin(), text.end(), email_re};
auto end = std::sregex_iterator{};
for (auto it = begin; it != end; ++it) {
std::smatch m = *it;
std::cout << m[0] << "\n";
}
// foo@bar.com
// baz@qux.org
Look familiar? It is the same iterator protocol from Problem 8 and Problem 17. sregex_iterator is an input iterator; you can pass it to the algorithms in <algorithm> just like any other iterator pair.
A Performance Warning
Note.
std::regexcompiles the pattern at runtime. Compilation is expensive: it builds a finite automaton from the pattern string, which takes time proportional to the size of the pattern. If you use a regex inside a loop, declare itconstoutside the loop so it is compiled only once:// SLOW: recompiles the pattern on every iteration for (const auto &line : lines) { if (std::regex_search(line, std::regex{R"(\d+)"})) { ... } } // FAST: compile once, search many times const std::regex num_re{R"(\d+)"}; for (const auto &line : lines) { if (std::regex_search(line, num_re)) { ... } }This is the single most common
std::regexperformance mistake.
Regex Flags
Several flags can be combined with | when constructing a std::regex:
// icase: case-insensitive matching
std::regex ci_re{"hello", std::regex::icase};
std::regex_search("HELLO world", ci_re); // true
// multiline: ^ and $ match start/end of each line, not just the whole string
std::regex ml_re{"^start", std::regex::multiline};
// nosubs: do not store sub-expression matches (faster if you don't need them)
std::regex fast_re{R"(\d+)", std::regex::nosubs | std::regex::optimize};
The std::regex::optimize flag hints to the implementation that the pattern will be used many times, so it should spend extra time upfront constructing an efficient automaton.
Worked Example: Extracting URLs from Text
#include <regex>
#include <string>
#include <vector>
std::vector<std::string> extract_urls(const std::string &text) {
// simplified URL pattern: http(s)://...
const std::regex url_re{
R"(https?://[^\s<>"{}|\\^`\[\]]+)"
};
std::vector<std::string> urls;
auto begin = std::sregex_iterator{text.begin(), text.end(), url_re};
for (auto it = begin; it != std::sregex_iterator{}; ++it) {
urls.push_back((*it)[0]);
}
return urls;
}
The URL pattern is intentionally simple. Real URL parsing is more complex, but this covers the common case. Note that the const std::regex is declared inside the function body — it is still constructed once per call to extract_urls. If extract_urls is called in a tight loop, move the regex to a static const local to get the familiar “construct once per program” guarantee.
Aside (CS 241 connection: regular languages and DFAs). In CS 241 you studied the formal theory of regular languages. A regular expression describes a regular language; matching a string against the regex is equivalent to running the corresponding DFA.
std::regeximplements this at runtime: the constructor builds the automaton, and the search functions run it. The ECMAScript dialect adds back-references (\1,\2), which are not part of the strict formal definition of regular expressions — back-references can express non-regular patterns, so they are implemented with backtracking rather than a pure DFA, which can cause exponential worst-case behaviour on pathological inputs.
A typical use: extract the timestamp and severity from a structured log line.
// Log format: "[2024-01-15 08:32:01] ERROR: disk full"
const std::regex log_re{
R"(\[(\d{4}-\d{2}-\d{2} \d{2}:\d{2}:\d{2})\] (\w+): (.+))"
};
std::string line = "[2024-01-15 08:32:01] ERROR: disk full";
std::smatch m;
if (std::regex_match(line, m, log_re)) {
std::string ts = m[1]; // "2024-01-15 08:32:01"
std::string severity = m[2]; // "ERROR"
std::string message = m[3]; // "disk full"
}
Note.
std::regexis the right tool for recognising patterns. For simple prefix or suffix checks,string_viewmethods from Problem 42 are faster and clearer:sv.starts_with("http")is better than constructing and running a regex. For structured formats with nested grammar (JSON, XML, C++ source), a regex is the wrong tool entirely — use a proper parser.
Problem 47: Numbers Drawn From a Distribution
At some point every program needs a random number. The C function rand() has been in the language since C89, and you have probably used it before. But it is broken in several ways. The randomness is poor — most implementations use a linear congruential generator whose low-order bits cycle with very short periods. The range is implementation-defined (RAND_MAX could be 32767 on some platforms). It is seeded globally, so you cannot have two independent streams of random numbers running at the same time. And rand() % N introduces a bias whenever RAND_MAX + 1 is not a multiple of N. In Problem 17 we saw that std::random_shuffle — which was removed in C++17 — was based on rand() and inherited all of these problems.
C++11 provides a proper replacement in <random>. It separates two orthogonal concerns: generating random bits, and mapping those bits to values in a desired distribution.
Why rand() is Broken
The classic C idiom for a random integer in the range [0, N) is:
#include <cstdlib>
srand(time(nullptr)); // seed once at startup
int r = rand() % N; // NOT uniformly distributed!
The modulo introduces a bias. If RAND_MAX + 1 is not divisible by N, the values near the bottom of the range come up slightly more often than those near the top. With RAND_MAX = 32767 and N = 100, the values 0–67 appear 329 times each and 68–99 appear 328 times each — a small but real bias. For a card shuffle this matters: you do not want 75% of deals to slightly favour certain orderings.
The fix is not to use modulo. The C++11 distributions handle the bias correctly by using rejection sampling: if the engine produces a value in the biased tail, the distribution discards it and tries again. This is exactly what std::uniform_int_distribution does internally.
The Two Concepts: Engines and Distributions
- A random engine produces a stream of pseudo-random bits. It is seeded, stateful, and deterministic given the same seed.
- A distribution takes raw bits from an engine and transforms them into values drawn from a specific probability distribution (uniform, normal, etc.).
The key insight is that the same engine can drive many different distributions without bias between them. In the old rand() world, the entire program shared one global stream. With <random>, each distribution request draws from a well-defined engine, with no interference.
Engines
The workhorse is std::mt19937, an implementation of the Mersenne Twister algorithm. It has a period of 2^19937 - 1, excellent statistical properties, and is the right default for simulation and games. There is also a 64-bit version, std::mt19937_64, for generating 64-bit random values.
Seeding
#include <random>
// Deterministic seed: same output every run (useful for testing)
std::mt19937 rng{42};
// Non-deterministic seed: different output on each run
std::random_device rd;
std::mt19937 rng2{rd()};
std::random_device reads from the OS’s entropy source (/dev/urandom on Linux, BCryptGenRandom on Windows). It is not a pseudo-random engine; it is a gateway to true randomness. Using it to seed the Mersenne Twister is the standard idiom.
Note. On some embedded or virtualized platforms,
std::random_devicemay not have access to a hardware entropy source and falls back to a deterministic sequence. Checkrd.entropy()— if it returns 0.0, the device is deterministic and you should use a different seeding strategy.
Distributions
A distribution object is constructed with its parameters, then called with the engine to generate a value:
std::mt19937 rng{std::random_device{}()};
// Uniform integer in [1, 6] inclusive
std::uniform_int_distribution<int> die{1, 6};
int roll = die(rng);
// Uniform real in [0.0, 1.0)
std::uniform_real_distribution<double> unit{0.0, 1.0};
double u = unit(rng);
// Normal distribution with mean 0, standard deviation 1
std::normal_distribution<double> gauss{0.0, 1.0};
double x = gauss(rng);
// Bernoulli: true with probability p
std::bernoulli_distribution coin{0.5};
bool heads = coin(rng);
The engine is passed by reference to operator(). The distribution reads as many bits as it needs and updates the engine’s state. Two distributions can share an engine with no interaction between them.
Other Useful Distributions
The standard library provides many others: std::poisson_distribution, std::exponential_distribution, std::discrete_distribution (a weighted choice over a finite set), and std::piecewise_constant_distribution. They all follow the same dist(engine) pattern.
Discrete Distribution: Weighted Choice
A common game-programming pattern is a weighted random selection from a finite menu. std::discrete_distribution takes a list of weights and produces indices according to those weights:
// Loot table: 60% common, 30% rare, 10% legendary
std::discrete_distribution<int> loot{{60.0, 30.0, 10.0}};
std::mt19937 rng{std::random_device{}()};
int tier = loot(rng); // 0, 1, or 2 with probabilities 0.6, 0.3, 0.1
The weights do not need to sum to 1; the distribution normalizes them.
Shuffling
std::shuffle (from <algorithm>) replaces the removed std::random_shuffle:
#include <algorithm>
#include <vector>
std::vector<int> deck = {1, 2, 3, 4, 5, 6, 7, 8, 9, 10};
std::mt19937 rng{std::random_device{}()};
std::shuffle(deck.begin(), deck.end(), rng);
// deck is now in a uniformly random permutation
std::random_shuffle was removed in C++17 because it used rand(). std::shuffle takes an explicit engine, so the source of randomness is under your control.
Thread Safety
Note. Random engines are not thread-safe. Calling
dist(rng)from two threads simultaneously on the same engine is a data race. The fix is to give each thread its own engine:thread_local std::mt19937 local_rng{std::random_device{}()}; void worker() { std::uniform_int_distribution<int> dist{1, 100}; int v = dist(local_rng); // safe: local_rng is per-thread }The
thread_localstorage class (introduced in C++11) gives each thread its own copy of the variable, initialized the first time that thread touches it.
Worked Example: Monte Carlo Estimation of π
A classic application: estimate π by sampling random points in the unit square and checking whether they fall inside the unit circle.
#include <random>
#include <cmath>
#include <iostream>
double estimate_pi(long long trials) {
std::mt19937_64 rng{std::random_device{}()};
std::uniform_real_distribution<double> dist{-1.0, 1.0};
long long inside = 0;
for (long long i = 0; i < trials; ++i) {
double x = dist(rng), y = dist(rng);
if (x*x + y*y <= 1.0) ++inside;
}
return 4.0 * static_cast<double>(inside) / trials;
}
int main() {
std::cout << estimate_pi(10'000'000) << "\n"; // ~3.1416
}
The 10'000'000 uses C++14 digit separators for readability. The uniform_real_distribution covers the range [-1, 1] on both axes, so the unit circle fits inside the sampling square with area 4; multiplying the hit rate by 4 gives the estimate of π.
Aside (CS 466 / STAT courses connection: pseudo-random sampling). The quality of Monte Carlo estimates depends on the quality of the underlying random number generator. A poor generator introduces correlations between samples that can bias results in subtle, hard-to-detect ways. This is exactly why
std::mt19937is the standard recommendation: its statistical properties have been rigorously tested (Diehard battery, TestU01) and it passes all standard randomness tests. In numerical methods courses you will encounter quasi-random (low-discrepancy) sequences (Halton, Sobol) that converge faster than pseudo-random numbers for certain integration problems; C++ does not provide these in the standard library, but the engine/distribution separation makes it easy to plug in your own engine.
For scientific simulations, you often want results to be reproducible: run the same seed, get the same sequence. Fixed seeds make bugs debuggable.
// Set via command line: ./sim --seed 42
// Or use random_device for one-off runs
unsigned seed = 42;
std::mt19937 rng{seed};
std::cout << "seed: " << seed << "\n"; // log it so you can reproduce
std::uniform_int_distribution<int> d{1, 100};
for (int i = 0; i < 5; ++i)
std::cout << d(rng) << " "; // deterministic: same every run with seed 42
This pattern is standard in research code: log the seed alongside results so that any run can be exactly replicated.
Note. For cryptographic purposes — generating session tokens, passwords, keys — do not use
std::mt19937. The Mersenne Twister is a predictable generator: after observing 624 consecutive outputs, an attacker can reconstruct the internal state and predict all future outputs. For cryptographic randomness, usestd::random_devicedirectly (it reads from the OS’s cryptographically secure source) or a platform-specific API (BCryptGenRandomon Windows,getrandom(2)on Linux).
Problem 48: Heterogeneous Storage Revisited
Problem 36 gave us std::variant for storing one of a closed set of types — closed meaning the set is fixed at compile time. Problem 35 gave us std::optional for the degenerate case of one type that may or may not be present. Between those two tools, a great deal of heterogeneous storage is covered. But not all of it.
Sometimes the set of types is genuinely open. A scripting language interpreter stores values that can be integers, floats, strings, lists, or any user-defined type — and users can add new types. A plugin system passes messages between components that were compiled independently and know nothing about each other’s type hierarchy. A property bag on a game entity holds arbitrary attributes whose types are set at data-load time, not compile time. For those cases, C++17 provides std::any.
The void* Comparison
Before C++17, the idiomatic (if unsafe) way to store an arbitrary value was void*:
// Old C-style heterogeneous storage
void *value = nullptr;
int x = 42;
value = &x; // any pointer fits
// To get the value back:
int *p = static_cast<int *>(value); // hope you remembered the type!
There is nothing wrong if you actually remember the type. But nothing stops you from casting to the wrong type, and the result is undefined behaviour with no diagnostic. std::any wraps the same concept with type tracking:
std::any value = 42;
// std::any_cast<std::string>(value); // throws bad_any_cast -- caught!
int v = std::any_cast<int>(value); // OK
The type is remembered at assignment time and checked at extraction time.
std::any
std::any (from <any>) can hold a value of any copy-constructible type. It is the type-safe version of void*.
#include <any>
std::any a = 42; // holds an int
a = std::string{"hello"}; // now holds a std::string
a = 3.14; // now holds a double
At any moment, an any holds at most one value. Assigning a new value destroys the previous one.
Constructing
std::any a1; // empty
std::any a2 = 42; // holds int{42}
std::any a3 = std::make_any<int>(42); // same; make_any is explicit about T
std::any a4 = std::string{"world"}; // holds a std::string
a1.has_value(); // false
a2.has_value(); // true
a2.reset(); // a2 is now empty
Accessing: any_cast
Retrieving the stored value requires knowing its type. You declare what you expect, and the cast either succeeds or fails:
std::any a = 42;
// Cast by value: throws std::bad_any_cast if the type is wrong
int v = std::any_cast<int>(a); // v == 42
// Cast by reference: also throws if type is wrong
std::any_cast<int &>(a) = 99; // modifies the stored value in place
// Cast by pointer: returns nullptr instead of throwing
int *p = std::any_cast<int>(&a); // p != nullptr
std::string *sp = std::any_cast<std::string>(&a); // sp == nullptr
The pointer form is the right choice when you are probing for a type you are not sure about. Catching std::bad_any_cast is the right choice when a type mismatch would be a genuine programming error.
Querying the Type
a.type() returns a const std::type_info& identifying the stored type. You can compare it with typeid:
std::any a = 42;
if (a.type() == typeid(int)) {
std::cout << "Holds an int\n";
}
This is dynamic type introspection — the runtime equivalent of what std::variant’s std::holds_alternative<T> does statically.
Comparison: optional, variant, any
Here is the decision table:
std::optional<T>— "maybe aT". One type, may or may not be present. Zero overhead beyondsizeof(T)+1. Use when a function might not return a value, or a configuration option might not be set.std::variant<T1,...,Tn>— "exactly one of a closed set of types". The set is fixed at compile time. No heap allocation; the variant is as large as the largest member. The compiler enforces exhaustive handling withstd::visit. Use when the set of types is known and finite.std::any— "any copy-constructible type". The set of types is open and can grow at runtime. Typically heap-allocates the stored value (though implementations use a small-buffer optimization for small types). Use when the types genuinely cannot be predicted at compile time.
The costs matter. optional<T> and variant<T1,...> are stack-only; no heap, no indirection. any almost certainly heap-allocates anything larger than a pointer or two (small-buffer optimization varies by implementation but is not guaranteed).
The Small-Buffer Optimization
Most implementations of std::any include a small-buffer optimization (SBO): if the stored value is small enough (typically up to sizeof(void*) bytes, though the exact threshold is implementation-defined), it is stored inline inside the any object with no heap allocation. Larger values are heap-allocated.
std::any a = 42; // int: small -- likely stored inline, no heap
std::any b = std::string{"hello world"}; // string: larger -- heap-allocates
You cannot rely on any particular threshold. If you care about allocation, prefer std::variant or design your own type-erased wrapper with a known SBO policy.
Type Erasure Connection
std::any is a concrete instance of the type erasure pattern (which we will examine more formally in Problem 56). It uses an internal virtual-dispatch mechanism to call the correct copy constructor, destructor, and typeid for whatever type it happens to hold, without the user having to name that type. You are already familiar with this idea: std::function<Ret(Args...)> from Problem 17 is another type-erased wrapper, there over callables rather than arbitrary values.
Worked Example: A Property Bag
A common use in game programming: entity components with heterogeneous attributes stored by name.
#include <any>
#include <string>
#include <unordered_map>
#include <stdexcept>
class PropertyBag {
std::unordered_map<std::string, std::any> props_;
public:
template<typename T>
void set(const std::string &key, T &&value) {
props_[key] = std::forward<T>(value);
}
template<typename T>
T get(const std::string &key) const {
auto it = props_.find(key);
if (it == props_.end())
throw std::out_of_range{"key not found: " + key};
return std::any_cast<T>(it->second); // throws bad_any_cast if wrong T
}
bool has(const std::string &key) const {
return props_.count(key) > 0;
}
};
Usage:
PropertyBag entity;
entity.set("name", std::string{"Dragon"});
entity.set("hp", 250);
entity.set("speed", 3.5);
std::string name = entity.get<std::string>("name"); // "Dragon"
int hp = entity.get<int>("hp"); // 250
Each attribute can be a different type. New attribute types — textures, AI behaviour parameters, custom component classes — can be added later without changing PropertyBag at all. That openness is exactly what std::any is for.
Aside (CS 247 / CS 346 connection: scripting and plugin systems). Dynamic languages like Lua and Python use this pattern pervasively: every value is essentially a tagged union (the language’s runtime equivalent of
any) that carries its type at runtime. When embedding Lua in a C++ game engine, the bridge between the two worlds often uses a structure exactly likePropertyBag: a map from string names tostd::anyvalues that Lua scripts can read and write. The type safety boundary is enforced at theany_castsite.
Sometimes you want to enumerate all stored properties without knowing their types in advance. std::any::type() enables this:
void dump(const PropertyBag &bag) {
// In a real implementation, iterate over the internal map.
// Here we show how to dispatch on dynamic type.
std::any a = bag.get<std::any>("somekey"); // hypothetical raw access
if (a.type() == typeid(int))
std::cout << "int: " << std::any_cast<int>(a) << "\n";
else if (a.type() == typeid(std::string))
std::cout << "str: " << std::any_cast<std::string>(a) << "\n";
else
std::cout << "unknown type: " << a.type().name() << "\n";
}
The type().name() call gives an implementation-defined mangled name. Useful for debugging; not suitable for production logic.
Note. The decision guide in one sentence: prefer
std::variantwhen you know the types at compile time; prefer inheritance and virtual functions when the types are known but the behaviour is what varies; usestd::anyonly when the types genuinely cannot be predicted at compile time and you are prepared to pay for the heap allocation and the runtime type-check on every access.anygives you maximum flexibility at the cost of all compile-time guarantees. Use it sparingly.
Problem 49: I Want Two Things at Once
We have built a complete Vector, mastered template programming, and studied every ownership model C++ offers. There is one more dimension we have not touched: time. A modern computer has several processor cores sitting idle while our programs execute on one. Every loop, every computation, every wait-for-IO is an opportunity we leave on the table.
The problem: I want to do two things simultaneously.
What Is a Thread?
A thread is an independent sequence of instructions executing within a shared address space. Multiple threads in the same process see the same global variables and heap-allocated objects, but each thread has its own call stack, its own program counter, and its own set of registers.
C++11 standardises thread creation through std::thread, declared in <thread>. Before C++11, portable concurrency required either the POSIX thread library (pthreads) or platform-specific APIs — each with a different interface and error-handling model.
Creating and Starting a Thread
#include <thread>
#include <iostream>
void say_hello() {
std::cout << "Hello from a thread!\n";
}
int main() {
std::thread t{say_hello}; // create thread; it starts immediately
std::cout << "Hello from main!\n";
t.join(); // wait for t to finish
}
The std::thread constructor takes a callable (function, lambda, or functor) and zero or more arguments to pass to it. The thread starts executing the callable immediately upon construction — there is no separate “start” call.
void sum_range(const std::vector<int> &v,
size_t lo, size_t hi, long long &result) {
result = 0;
for (size_t i = lo; i < hi; ++i) result += v[i];
}
int main() {
std::vector<int> v(10'000'000, 1);
long long left = 0, right = 0;
size_t mid = v.size() / 2;
std::thread t{sum_range,
std::ref(v), 0, mid, std::ref(left)};
sum_range(v, mid, v.size(), right);
t.join();
std::cout << left + right << '\n'; // 10000000
}
std::ref(v) wraps v in a reference wrapper because std::thread’s constructor copies its arguments by default — passing v directly would silently copy the entire vector into the thread.
Thread Lifecycle: join and detach
Every std::thread object that is joinable — meaning its thread is still running or has finished but has not been collected — must have exactly one of two things done to it before its destructor runs:
t.join()— the calling thread blocks untilt’s thread finishes, then markstas non-joinable. Use when you need the result or want to wait for completion.t.detach()— the calling thread letst’s thread run independently in the background. The thread’s resources are released when the thread finishes.tis then non-joinable. Use sparingly: once detached, you cannot wait for the thread or know when it finishes.
If a std::thread is destroyed while still joinable, its destructor calls std::terminate — the same hard abort we saw in Problem 9 when two exceptions were simultaneously active.
{
std::thread t{some_work};
// ... do something else ...
// if we return here without join() or detach(): std::terminate!
}
Aside (CS 343 connection: μC++ tasks vs
std::thread). In CS 343 you use μC++, a language extension built on top of C++ that introduces_Taskas a first-class concept. A_Taskis automatically started when declared and automatically joined at the end of its scope — the compiler enforces the lifecycle thatstd::threadleaves to the programmer.std::threadis lower-level and more explicit. The advantage is portability and generality; the disadvantage is that the programmer is responsible for every join and detach. C++20’sstd::jthread(Chapter 54) recovers the automatic-join behaviour that μC++ provides natively.
The Exception Hazard
Look familiar?
void f() {
std::thread t{background_work};
do_some_work(); // might throw
t.join(); // never reached if exception is thrown
// t's destructor: std::terminate!
}
This is the exact same problem as Problem 14 with raw pointers. do_some_work() throws, the exception propagates out of f, t.join() is skipped, and t’s destructor panics. The program terminates abruptly rather than handling the exception.
In Problem 14 we fixed the leak by wrapping the pointer in a class whose destructor calls delete. The fix here is identical in spirit: wrap the thread in a class whose destructor calls join.
Thread RAII
class JoinThread {
std::thread t_;
public:
explicit JoinThread(std::thread t) : t_{std::move(t)} {}
~JoinThread() {
if (t_.joinable()) t_.join();
}
// Threads cannot be copied -- ownership is unique
JoinThread(const JoinThread &) = delete;
JoinThread &operator=(const JoinThread &) = delete;
// Moving transfers ownership
JoinThread(JoinThread &&) = default;
JoinThread &operator=(JoinThread &&) = default;
};
Now f is safe regardless of how it exits:
void f() {
JoinThread t{std::thread{background_work}};
do_some_work(); // may throw
} // t's destructor joins the thread -- even on exception
The pattern is RAII: the resource (a running thread) is acquired in the constructor and released in the destructor. The destructor runs unconditionally when the scope exits, whether by normal return or by exception propagation.
Note. C++20 ships
std::jthreadin<thread>, which does exactly this: it joins automatically in its destructor. Unless you need to target pre-C++20 compilers, preferstd::jthreadover theJoinThreadwrapper above. Chapter 54 coversstd::jthreadin full, including its cooperative-cancellation mechanism viastd::stop_token.
Passing Arguments: the Reference Pitfall
void update(int &x) { ++x; }
int val = 0;
std::thread t{update, val}; // WRONG: copies val into thread
t.join();
// val is still 0
std::thread’s constructor always copies its arguments. If the callable takes a reference, the thread receives a reference to the copy, not to the original. Use std::ref to force pass-by-reference:
std::thread t{update, std::ref(val)}; // passes reference to val
t.join();
// val is now 1
The object referenced must outlive the thread. A local variable that outlives only the current stack frame will become a dangling reference once that frame is destroyed. Lambdas make the ownership explicit — capturing by value gives the thread its own copy; capturing by reference reintroduces the lifetime risk.
std::vector<int> v(10'000'000, 1);
long long left = 0, right = 0;
size_t mid = v.size() / 2;
JoinThread t{std::thread{[&]() {
for (size_t i = 0; i < mid; ++i) left += v[i];
}}};
for (size_t i = mid; i < v.size(); ++i) right += v[i];
// t auto-joins here
std::cout << left + right << '\n';
What We Have Not Solved Yet
Splitting work across threads is only half the story. The parallel sum above works because the two halves write to different variables (left and right). What if two threads write to the same variable at the same time?
int counter = 0;
JoinThread t1{std::thread{[&]() {
for (int i = 0; i < 1'000'000; ++i) ++counter;
}}};
for (int i = 0; i < 1'000'000; ++i) ++counter;
// result is NOT guaranteed to be 2000000
The increment ++counter is not a single hardware instruction; it is a read, a modify, and a write. If two threads interleave those operations, some increments are lost. This is a data race and produces undefined behaviour. Chapter 50 introduces mutexes — locks that prevent two threads from interleaving on shared data.
Problem 50: Sharing Without Stepping on Toes
Problem 49 ended with an unsolved problem: two threads writing to the same integer produced undefined behaviour. The increment was not atomic. We need a way to say “only one thread at a time may touch this data.”
The root cause is a data race: two threads access the same memory location simultaneously, at least one of them writes, and there is no synchronisation between them. The C++ standard says a data race is undefined behaviour — not “probably a wrong answer”, but literally UB, as bad as a dangling pointer or an out-of-bounds array access. The compiler is allowed to assume data races do not occur and to generate code that is entirely wrong if they do. The fix is to introduce explicit synchronisation.
- A data race occurs when two threads access the same memory location concurrently, at least one access is a write, and the accesses are not ordered by a synchronisation operation. Data races are undefined behaviour in C++11 and later.
std::mutex — The Fundamental Lock
The word “mutex” is short for mutual exclusion. A mutex is an object with two operations: lock and unlock. At most one thread holds the lock at any given moment. A thread that calls lock() while another thread holds the lock sleeps until the lock is released.
#include <mutex>
#include <thread>
int counter = 0;
std::mutex mtx;
void increment_a_million() {
for (int i = 0; i < 1'000'000; ++i) {
mtx.lock();
++counter; // only one thread at a time reaches here
mtx.unlock();
}
}
The region between lock() and unlock() is a critical section. Only one thread can be inside it at a time. The data race is gone.
But there is an obvious problem. If anything between lock() and unlock() exits early — an exception, a return, a continue — the mutex stays locked forever. Every other thread trying to acquire it will block indefinitely. This is the exact same exception-safety trap from Problem 14, and the fix is the exact same idea: RAII.
std::lock_guard — RAII for Mutexes
std::lock_guard<std::mutex> locks the mutex in its constructor and unlocks it in its destructor. It cannot be copied or moved. That is its entire implementation.
void increment_a_million() {
for (int i = 0; i < 1'000'000; ++i) {
std::lock_guard<std::mutex> guard{mtx};
++counter;
} // guard's destructor releases mtx -- even on exception
}
In C++17, class template argument deduction (CTAD) lets you omit the template argument: std::lock_guard guard{mtx}. The mutex type is deduced automatically.
std::unique_lock — Flexible Locking
std::unique_lock<std::mutex> is the more powerful sibling of lock_guard. It supports everything lock_guard does, and then some:
- Deferred locking: construct with
std::defer_lockand call.lock()later. - Try-locking:
.try_lock()returnsfalseinstead of blocking if the mutex is already held. - Timed locking:
.try_lock_for(duration). - Manual early unlock: call
.unlock()before the scope ends. - Moveable:
lock_guardcannot be moved;unique_lockcan — this is required bystd::condition_variablein Problem 53.
std::unique_lock<std::mutex> lk{mtx}; // locked immediately
// ... work under lock ...
lk.unlock(); // release early
// ... lock-free computation ...
lk.lock(); // re-acquire
// destructor releases again at end of scope
The general rule: use lock_guard when you just need a scope-locked mutex and nothing fancy. Switch to unique_lock when you need flexibility, particularly for condition variables.
A Thread-Safe Counter Class
The idiomatic pattern is to hide the mutex alongside the data it protects. That way a caller cannot accidentally access the data without holding the lock.
class Counter {
int value_ = 0;
mutable std::mutex mtx_;
public:
void increment() {
std::lock_guard guard{mtx_};
++value_;
}
void add(int n) {
std::lock_guard guard{mtx_};
value_ += n;
}
int get() const {
std::lock_guard guard{mtx_};
return value_;
}
};
The mutex is mutable so that const member functions (like get) can still lock it. A const operation conceptually does not mutate the observable value, but acquiring a lock is a mechanical necessity — not a logical state change — so mutable is the right tool.
Note. Minimise the scope of the lock. Hold it only while actually accessing shared data, not during lengthy computations that do not touch shared state. A wide lock reduces contention but starves other threads unnecessarily.
// Prefer: compute outside the lock, write inside it auto result = expensive_computation(); // no lock needed { std::lock_guard guard{mtx}; shared_data = result; // lock only for the write }
Deadlock
A deadlock occurs when two or more threads each wait for a resource held by another, so none can ever proceed. The classic two-mutex deadlock:
std::mutex m1, m2;
// Thread A: // Thread B:
m1.lock(); m2.lock();
// --- context switch ---
m2.lock(); // sleeps m1.lock(); // sleeps -- deadlock!
Thread A holds m1 and waits for m2. Thread B holds m2 and waits for m1. Neither can ever proceed.
The simplest defence is lock ordering: always acquire multiple mutexes in the same global order everywhere in the program. If every thread acquires m1 before m2, the cycle above cannot form.
When a global order is not easy to enforce, use std::lock, which acquires multiple locks atomically using a deadlock-avoidance algorithm:
std::unique_lock<std::mutex> lk1{m1, std::defer_lock};
std::unique_lock<std::mutex> lk2{m2, std::defer_lock};
std::lock(lk1, lk2); // acquires both -- deadlock-free
C++17 provides std::scoped_lock, which folds this pattern into a single RAII object:
// C++17 -- the cleanest way to lock two mutexes at once
std::scoped_lock lk{m1, m2}; // acquires both; destructor releases both
Prefer scoped_lock over the defer_lock dance whenever you are on C++17 or later.
Readers-Writer Locks
Sometimes many threads read shared data while writes are rare. A plain mutex forces readers to take turns, which wastes concurrency. C++17 introduces std::shared_mutex (in <shared_mutex>): many threads may hold the lock in shared (read) mode simultaneously, but exclusive (write) mode blocks until every shared holder has released.
#include <shared_mutex>
#include <map>
std::shared_mutex rw;
std::map<int, std::string> table;
std::string lookup(int key) {
std::shared_lock reader{rw}; // shared: many readers OK
auto it = table.find(key);
return it != table.end() ? it->second : "";
}
void insert(int key, std::string val) {
std::unique_lock writer{rw}; // exclusive: all readers blocked
table[key] = std::move(val);
}
Use std::shared_lock for read paths and std::unique_lock for write paths.
Aside (CS 343 connection: monitors in μC++). In CS 343 you used the
_Monitorclass. A_Monitorimplicitly acquires a single mutual-exclusion lock on every public member function entry and releases it on exit. You never writelock()orunlock()— the language inserts them for you.The
Counterclass above is the C++ equivalent of a manual monitor: a mutex field coupled with lock guards at every method boundary. The C++ version is more verbose but also more flexible — you choose which portion of each method runs under the lock, you can use readers-writer locks, and you can have several independent mutexes protecting different parts of the same object’s state.
Aside (CS 245 connection: mutual exclusion as a safety invariant). CS 245 taught Hoare logic: assertions {P} S {Q} that describe what a statement S guarantees. A critical section has a natural Hoare-logic framing. Let
alonemean “I am the sole thread currently executing the critical section.” Then locking establishesalone: {¬alone}lock(){alone}. Unlocking surrenders it: {alone}unlock(){¬alone}. Mutual exclusion is simply a safety invariant — the kind of invariant Hoare logic was designed to reason about.
#include <iostream>
#include <thread>
#include <mutex>
int counter = 0;
std::mutex mtx;
void add_million() {
for (int i = 0; i < 1'000'000; ++i) {
std::lock_guard guard{mtx};
++counter;
}
}
int main() {
std::thread t{add_million};
add_million();
t.join();
std::cout << counter << '\n'; // always 2000000
}
Without the mutex, the output could be anywhere from 1,000,001 to 2,000,000 — or anything in between — because lost updates are undefined behaviour, not merely a race between exact outcomes. With the mutex, the output is always exactly 2000000.
Problem 51: I Want a Result Eventually
Problem 49 and Problem 50 gave us threads and mutexes — the lowest level of C++ concurrency. We can launch work on another thread, and we can prevent two threads from stepping on shared data. But the programming model is cumbersome: spin up a thread, set up a shared variable, protect it with a mutex, join the thread, read the variable. That is a lot of ceremony for the most common concurrency use case: “compute this value on another thread, and give it to me when you’re done.”
For that pattern, C++11 gives us std::future.
The Old Way
Before futures, getting a result back from a thread looked like this:
int result = 0;
std::mutex mtx;
bool ready = false;
std::thread t{[&]() {
int r = expensive_computation();
std::lock_guard guard{mtx};
result = r;
ready = true;
}};
// ... do other work ...
{
std::lock_guard guard{mtx};
while (!ready) { /* spin */ }
}
std::cout << result << '\n';
t.join();
This is verbose and wrong — the spin loop is wasteful and still racy without a proper condition variable (Problem 53). Futures replace all of this with two objects: a future (the read end) and a promise (the write end).
std::future — A Handle to a Value Not Yet Computed
- A
std::future<T>represents a value of typeTthat will be produced asynchronously at some point in the future. You obtain the value by calling.get(), which blocks until the value is ready.
The key operations:
.get(): blocks until the value is available, then returns it (by value or by reference, depending onT). Can only be called once — calling it a second time is undefined behaviour..wait(): blocks until the value is ready, but does not consume it. Useful when you want to synchronise without extracting the result..valid(): returnstrueif this future is associated with a shared state (i.e., some result will eventually be produced)..wait_for(duration): waits up to a given duration; returns astd::future_statusindicating whether the result is ready, deferred, or timed out.
std::async — The Easiest Entry Point
std::async is the simplest way to get a future. It takes a callable and optional arguments, runs the callable (possibly on a new thread), and returns a std::future holding the return value:
#include <future>
#include <iostream>
int expensive() {
// ... lengthy computation ...
return 42;
}
int main() {
std::future<int> f = std::async(std::launch::async, expensive);
// main thread continues here while expensive() runs
std::cout << "waiting...\n";
int result = f.get(); // blocks until expensive() finishes
std::cout << result << '\n';
}
std::async accepts a launch policy as its first argument:
std::launch::async: run the callable on a new thread immediately.std::launch::deferred: do not start the callable at all; run it lazily on the calling thread the first time.get()or.wait()is called.- No policy (default): the implementation chooses; it may do either. Prefer to specify an explicit policy. The default is deceptively named “async” but is not guaranteed to create a thread, which can cause surprising behaviour if you are counting on parallelism.
Note. A
std::futurereturned bystd::asyncwithlaunch::asynchas a special destructor: it blocks until the background thread finishes. This is unlikestd::thread, where the destructor callsstd::terminateif the thread is still running. It also means that a discarded future — one not stored in a variable — blocks on the spot:// Surprise: this does NOT run in the background! std::async(std::launch::async, do_work); // The temporary future's destructor blocks here until do_work() returns.Always store the returned future if you want the work to proceed concurrently.
Exception Propagation
One of the best features of futures: if the asynchronous function throws an exception, that exception is stored in the shared state and re-thrown by .get().
std::future<int> f = std::async(std::launch::async, []() -> int {
throw std::runtime_error{"oops"};
return 0;
});
try {
int v = f.get(); // re-throws std::runtime_error
} catch (const std::runtime_error &e) {
std::cout << "caught: " << e.what() << '\n';
}
The exception crosses the thread boundary automatically. With a raw thread and a shared variable you would have to implement this yourself.
std::promise — The Write End
std::async is convenient but opaque — you cannot control exactly when and how the value gets produced. std::promise<T> gives you that control directly. A promise is the write end of the future/promise channel:
std::promise<int> prom;
std::future<int> fut = prom.get_future();
std::thread t{[&prom]() {
int result = expensive_computation();
prom.set_value(result); // wakes up fut.get()
}};
int v = fut.get(); // blocks until set_value is called
t.join();
The key operations on a promise:
.get_future(): returns the associated future (call at most once)..set_value(v): storesvand unblocks anyone waiting on the future..set_exception(e): stores the exception pointere;.get()will re-throw it.
Note. If a promise is destroyed without ever calling
set_valueorset_exception, the shared state is set to astd::future_errorwithstd::future_errc::broken_promise. This prevents.get()from blocking forever.
std::packaged_task — Wrapping a Callable
std::packaged_task<F> wraps a callable of type F and fulfils a promise with whatever that callable returns when invoked. It bridges the gap between a plain callable and the future/promise machinery:
std::packaged_task<int()> task{[]() {
return expensive_computation();
}};
std::future<int> fut = task.get_future();
std::thread t{std::move(task)}; // run task on a new thread
int v = fut.get();
t.join();
packaged_task is useful when you have an existing callable and want to hand it off to a thread pool or a scheduler that expects a plain no-argument callable — the future gives you a way to collect the result later.
Polling Without Blocking: wait_for and wait_until
Sometimes you want to check whether a future is ready without committing to an indefinite wait. .wait_for(duration) and .wait_until(time_point) return a std::future_status that tells you what happened:
using namespace std::chrono_literals;
std::future<int> f = std::async(std::launch::async, slow_work);
// Poll up to 100ms at a time, doing other work in between:
while (f.wait_for(100ms) != std::future_status::ready) {
do_other_work();
}
int result = f.get();
The three possible status values are:
std::future_status::ready: the value is available; call.get()to retrieve it.std::future_status::timeout: the duration elapsed but the value is not yet ready.std::future_status::deferred: the task was launched withlaunch::deferredand has not started yet; it will run when you call.get()or.wait().
std::shared_future — Multiple Readers
std::future has move-only semantics and allows only one call to .get(). If you need to share the result with multiple threads, convert it to a std::shared_future<T> using .share():
std::shared_future<int> sf =
std::async(std::launch::async, compute).share();
// Multiple threads can call .get() on the same shared_future:
std::thread t1{[sf]() { std::cout << sf.get() << '\n'; }};
std::thread t2{[sf]() { std::cout << sf.get() << '\n'; }};
t1.join(); t2.join();
Each thread gets a copy of the shared_future (hence the by-value capture). All copies share the same underlying state and all calls to .get() return the same value once it is ready.
Aside (CS 343 connection: futures and actors in μC++). In CS 343 you saw how
_Actortasks communicate by passing messages. An actor sends a message and a reply comes back asynchronously — exactly what a future/promise pair encodes.set_valuecorresponds to sending the reply message;.get()corresponds to waiting for it.The difference is that C++ futures are general-purpose: any callable can produce any value, and you collect it with a typed
future<T>. The μC++ actor model is more structured but also more opinionated about the communication topology.
Hash four files concurrently using std::async, then collect all results.
#include <future>
#include <string>
#include <vector>
#include <iostream>
std::string hash_file(const std::string &path); // defined elsewhere
int main() {
std::vector<std::string> paths = {
"a.bin", "b.bin", "c.bin", "d.bin"
};
// Launch all four hashes in parallel
std::vector<std::future<std::string>> futs;
for (auto &p : paths)
futs.push_back(
std::async(std::launch::async, hash_file, p));
// Collect results in order
for (size_t i = 0; i < paths.size(); ++i)
std::cout << paths[i] << ": " << futs[i].get() << '\n';
}
Each hash_file call runs on its own thread. The main thread collects results as soon as each one is ready. The total wall-clock time approaches the maximum of the four individual hashing times rather than their sum.
Problem 52: The Memory Model
Problem 50 fixed the data race with a mutex. But a mutex is heavy: every lock() and unlock() may involve a system call, a context switch, or at minimum a hardware memory barrier. For a simple counter that is incremented millions of times per second, that overhead matters. C++11 provides a lighter tool for these situations: atomic operations.
Before we can use atomic operations correctly, we need to understand why they exist — and that requires understanding the C++ memory model.
The Problem: Hardware and Compilers Reorder Instructions
Consider this code, executed by two threads simultaneously:
// Global variables, both initially 0
int data = 0;
bool ready = false;
// Thread 1 (producer): // Thread 2 (consumer):
data = 42; while (!ready) {}
ready = true; assert(data == 42);
Intuitively, the assertion should always pass. Thread 2 spins until ready is true, which only becomes true after data = 42. So by the time Thread 2 exits the loop, data should be 42.
In practice, on certain hardware architectures (ARM, PowerPC) and under aggressive compiler optimisation, this assertion can fail. The compiler may reorder the two stores in Thread 1 (because they touch independent variables) and the CPU may similarly reorder the stores in its write buffer. Thread 2 may observe ready = true before the write to data has propagated.
This is not a corner case. It is the normal behaviour of modern out-of-order processors. The C++ memory model exists to give us tools to reason about, and prevent, this kind of problem.
- An operation A happens-before operation B if the effects of A are guaranteed to be visible to any thread that performs B. A synchronisation operation (a mutex lock/unlock, or an appropriate atomic operation) establishes a happens-before edge between threads. Without such an edge, the order in which one thread observes another thread's writes is unspecified.
Without any synchronisation, the compiler and CPU may reorder stores and loads in any way that does not violate the behaviour of a single-threaded program. Multi-threaded behaviour is completely unspecified unless you introduce synchronisation.
std::atomic — Indivisible Operations with Defined Ordering
std::atomic<T> (in <atomic>) makes operations on a value both indivisible (no thread can observe a partial write) and subject to controlled ordering guarantees.
For simple types, usage looks almost identical to ordinary variables:
#include <atomic>
std::atomic<int> counter{0};
void increment_a_million() {
for (int i = 0; i < 1'000'000; ++i)
++counter; // atomic increment: no mutex needed
}
std::atomic<int> is a specialisation for integral types that typically compiles down to a single hardware instruction (e.g., lock xadd on x86). No OS call. No context switch. Just an atomic read-modify-write.
The explicit member functions on std::atomic<T>:
.load(): read the current value..store(v): write a new value..fetch_add(n): atomically addn, return the old value..exchange(v): atomically replace withv, return the old value..compare_exchange_weak(expected, desired): if the current value equalsexpected, replace it withdesiredand returntrue; otherwise write the current value intoexpectedand returnfalse. “Weak” means it may spuriously fail..compare_exchange_strong(expected, desired): same but never spuriously fails.
Memory Order Options
Every atomic operation accepts an optional std::memory_order parameter that controls what ordering guarantees the operation provides. The options, from weakest to strongest:
- No ordering guarantee beyond atomicity. The operation is indivisible, but no happens-before edge is established. Use only when you need atomicity for bookkeeping (e.g., a simple hit counter) and the value has no dependency on other shared data.
- The release/acquire pair is the fundamental synchronisation mechanism. A release store (on the writer) establishes a happens-before edge to any subsequent acquire load (on the reader) that observes the stored value. Everything the writer did before the release is visible to the reader after the acquire.
- Sequentially consistent ordering: the default and the strongest. All
seq_cstoperations appear to execute in a single total order visible to all threads. This is the easiest model to reason about and the one you should use unless you have a specific reason to do otherwise.
The Publish/Consume Pattern
The producer/consumer example from earlier, fixed correctly with atomics:
int data = 0;
std::atomic<bool> ready{false};
// Thread 1 (producer):
data = 42; // plain store -- fine here
ready.store(true, // release store: guarantees
std::memory_order_release); // data=42 is visible after
// Thread 2 (consumer):
while (!ready.load(std::memory_order_acquire)) {}
// acquire load: everything before the release is now visible
assert(data == 42); // guaranteed to pass
The release store on ready and the acquire load on ready form a synchronisation pair. When Thread 2’s acquire load reads true, a happens-before edge exists from Thread 1’s write of data = 42 to Thread 2’s read of data.
A Spinlock with Compare-Exchange
Compare-exchange is the building block for lock-free data structures. A spinlock is the simplest example: instead of calling the OS, a thread spins in a tight loop until it can atomically flip a flag from false to true:
class Spinlock {
std::atomic<bool> locked_{false};
public:
void lock() {
bool expected = false;
// Spin until we successfully flip false -> true
while (!locked_.compare_exchange_weak(
expected, true,
std::memory_order_acquire,
std::memory_order_relaxed))
expected = false; // reset: CAS wrote current value into expected
}
void unlock() {
locked_.store(false, std::memory_order_release);
}
};
A spinlock is appropriate only for very short critical sections where the wait time is reliably shorter than the overhead of a real OS mutex. For anything longer, use std::mutex.
std::atomic_flag — The Truly Lock-Free Primitive
std::atomic<bool> is almost always lock-free on modern hardware, but the standard only guarantees lock-freedom for std::atomic_flag. It is the minimal atomic type: it has no load() method, only test_and_set() (atomically sets the flag and returns the old value) and clear().
std::atomic_flag lock = ATOMIC_FLAG_INIT;
void spin_lock() {
while (lock.test_and_set(std::memory_order_acquire)) {}
}
void spin_unlock() {
lock.clear(std::memory_order_release);
}
C++20 adds .test() so you can inspect the flag without setting it, and .wait() / .notify_one() / .notify_all() for efficient waiting (backed by futex on Linux rather than a spin loop).
std::atomic_flag is the right building block when you need a portable guarantee of lock-freedom. For everything else, std::atomic<int> or a real mutex is cleaner.
is_lock_free and ATOMIC_xxx_LOCK_FREE
You can ask at runtime whether a particular atomic type is lock-free:
std::atomic<int> a;
std::cout << a.is_lock_free() << '\n'; // typically 1 on modern hw
// Compile-time check (macro):
static_assert(ATOMIC_INT_LOCK_FREE == 2,
"atomic<int> must be lock-free on this platform");
The macro values are 0 (never), 1 (sometimes, depends on alignment), or 2 (always). For large T (e.g., a 128-byte struct), std::atomic<T> may fall back to a hidden internal mutex; the operations are still safe and correct, just no longer truly lock-free.
Aside (CS 241 connection: hardware memory barriers). In CS 241 (Foundations of Sequential Programs) you saw how the processor executes load and store instructions, and how the memory hierarchy works. On x86, most operations have acquire/release semantics for free because x86 has a relatively strong memory model (TSO — Total Store Order). On ARM and POWER, the hardware memory model is much weaker and explicit barrier instructions (
dmb,isb,dsbon ARM;lwsync,syncon POWER) are required to enforce ordering.
std::memory_order_acquirecompiles told+dmb ldon ARM;std::memory_order_releasecompiles todmb st+str. On x86, neither emits an extra instruction beyond the load or store itself. The C++ memory model abstracts these hardware differences behind a portable vocabulary.
Aside (CS 343 connection: μC++ and the simpler model). μC++ wraps all shared state in monitors and tasks with well-defined communication points. The programmer never sees acquire/release semantics directly because the runtime inserts the right barriers at every monitor entry and exit.
std::atomicexposes the machine model directly. This is powerful — you can write lock-free data structures that are faster than anything a monitor can achieve — but it is also dangerous. Weak memory orders are an expert tool. Unless you are building lock-free infrastructure, usememory_order_seq_cst(the default) or just use a mutex.
A simplified shared-pointer reference count using atomics. This is essentially what std::shared_ptr does internally.
struct RefCounted {
std::atomic<int> refcount{1};
void retain() {
refcount.fetch_add(1, std::memory_order_relaxed);
// relaxed: we only need atomicity, not ordering
}
void release() {
// release: destructor must see all prior writes
if (refcount.fetch_sub(1, std::memory_order_release) == 1) {
// acquire fence: ensure we see all prior releases
std::atomic_thread_fence(std::memory_order_acquire);
delete this;
}
}
};
The retain uses relaxed because incrementing a counter has no ordering dependency on anything else. The release uses release so that all writes made before the decrement are visible to the thread that eventually runs delete this. The acquire fence before the delete ensures those writes are observed. This is the pattern used in the C++ standard library’s own shared_ptr.
Note. Unless you are writing lock-free data structures or high-performance synchronisation primitives, use
memory_order_seq_cst(the default) or use a mutex. The weaker orderings — particularlyrelaxedand theacquire/releasepair — require careful reasoning about every synchronisation edge in your program. Getting them wrong produces intermittent bugs that only manifest on non-x86 hardware.
Problem 53: Producer-Consumer
Futures (Problem 51) are great for one-shot work: launch a computation, collect its result. But many real programs have a different shape: one part of the program continuously generates work items and another part continuously processes them. Think of a web server reading requests from a socket and handing them to worker threads, or a compiler’s lexer feeding tokens to a parser. We need two threads to communicate not once but indefinitely, with one side potentially outrunning the other.
The data structure in the middle is a bounded blocking queue: a fixed-capacity queue that makes the producer wait when the queue is full, and makes the consumer wait when the queue is empty. Building this correctly requires a new synchronisation primitive: the condition variable.
The Naive Approach and Why It Fails
Let us try to build a blocking queue with just a mutex and a spin loop:
template<typename T>
class NaiveQueue {
std::deque<T> buf_;
std::mutex mtx_;
size_t limit_;
public:
NaiveQueue(size_t limit) : limit_{limit} {}
void push(T val) {
while (true) {
std::lock_guard guard{mtx_};
if (buf_.size() < limit_) { buf_.push_back(std::move(val)); return; }
} // spin forever when full -- wastes CPU
}
};
This is wrong in two ways. First, the producer burns CPU spinning inside the lock, preventing the consumer from ever acquiring the mutex to drain the queue. Second, even if we spin outside the lock, we waste CPU time that could be given to the consumer. What we want is for the producer to sleep when the queue is full and be woken up when space becomes available. That is exactly what a condition variable provides.
std::condition_variable
- A
std::condition_variable(in<condition_variable>) allows a thread to sleep until a predicate becomes true. It must be used together with astd::unique_lock<std::mutex>: the thread atomically releases the lock and goes to sleep, then re-acquires the lock before returning to the caller.
The core operations:
cv.wait(lock, pred): atomically releaseslockand sleeps; wakes up whencv.notify_one()orcv.notify_all()is called andpred()returnstrue. Re-acquires the lock before returning.cv.notify_one(): wakes up one sleeping thread (if any).cv.notify_all(): wakes up all sleeping threads.
The predicate version of wait is essential because of spurious wakeups: the operating system is permitted to wake a sleeping thread for no reason. Without the predicate, the thread would proceed even though the condition might still be false. With the predicate, wait loops internally until it is satisfied:
// cv.wait(lock, pred) is exactly equivalent to:
while (!pred()) cv.wait(lock);
Always use the predicate form. Never use bare cv.wait(lock).
A Complete Bounded Blocking Queue
#include <mutex>
#include <condition_variable>
#include <deque>
template<typename T>
class BoundedQueue {
std::mutex mtx_;
std::condition_variable not_full_;
std::condition_variable not_empty_;
std::deque<T> buf_;
const size_t limit_;
public:
explicit BoundedQueue(size_t limit) : limit_{limit} {}
void push(T val) {
std::unique_lock lk{mtx_};
not_full_.wait(lk, [this]{ return buf_.size() < limit_; });
buf_.push_back(std::move(val));
not_empty_.notify_one();
}
T pop() {
std::unique_lock lk{mtx_};
not_empty_.wait(lk, [this]{ return !buf_.empty(); });
T val = std::move(buf_.front());
buf_.pop_front();
not_full_.notify_one();
return val;
}
};
Walk through push: we acquire the lock, then sleep on not_full_ until there is space. When wait returns we still hold the lock, so we can safely push. After pushing we notify one thread waiting on not_empty_ — typically the consumer.
Walk through pop: mirror image. We sleep on not_empty_ until there is something to consume. After consuming we notify one thread waiting on not_full_ — typically the producer.
Note that we use std::unique_lock, not std::lock_guard. This is required: cv.wait must be able to unlock the mutex while sleeping, and lock_guard does not support that.
The Producer-Consumer Pattern
With BoundedQueue in place, the producer-consumer pattern is clean:
BoundedQueue<int> q{100};
// Producer thread: generates work
std::thread producer{[&q]() {
for (int i = 0; i < 10'000; ++i)
q.push(i);
}};
// Consumer thread: processes work
std::thread consumer{[&q]() {
for (int i = 0; i < 10'000; ++i) {
int val = q.pop();
process(val);
}
}};
producer.join();
consumer.join();
The producer never burns CPU when the queue is full; it sleeps. The consumer never burns CPU when the queue is empty; it sleeps too. The OS scheduler wakes them exactly when there is something to do.
Shutdown: Telling the Consumer to Stop
The example above works because both threads know the exact iteration count. In real code, the producer finishes at an unpredictable time and the consumer must be told to stop. A common approach is a sentinel value:
// Producer signals done with a sentinel
const int DONE = -1;
std::thread producer{[&q]() {
for (int i = 0; i < n; ++i) q.push(i);
q.push(DONE); // sentinel
}};
std::thread consumer{[&q]() {
while (true) {
int val = q.pop();
if (val == DONE) break;
process(val);
}
}};
For multiple consumers, push one sentinel per consumer. A cleaner alternative is to add a close() method to the queue (Problem 54 shows this with std::stop_token).
Multiple Consumers: Thread Pools
Sometimes a single consumer cannot keep up with the producer. The natural solution is a pool of consumer threads that all pop from the same queue:
BoundedQueue<int> q{512};
const int NWORKERS = 4;
std::vector<std::thread> workers;
for (int i = 0; i < NWORKERS; ++i) {
workers.emplace_back([&q]() {
while (true) {
int val = q.pop();
if (val == -1) { q.push(-1); break; } // propagate sentinel
process(val);
}
});
}
// producer fills q; all workers drain it concurrently
Each worker sleeps on not_empty_ when the queue is empty. When the producer pushes an item, notify_one() wakes exactly one worker — the one that will process the item. At shutdown, the sentinel trick above passes the signal from worker to worker automatically.
The notify_one() / notify_all() choice matters here. If you call notify_all() after every push, all four workers wake up, four compete for the lock, three find the queue empty (or the item already taken) and go back to sleep. That is wasted context switching. notify_one() is strictly better when only one waiter can make progress. Use notify_all() only after a close() or when you genuinely want every waiter to re-evaluate:
void BoundedQueue<T>::push(T val) {
std::unique_lock lk{mtx_};
not_full_.wait(lk, [this]{ return buf_.size() < limit_; });
buf_.push_back(std::move(val));
not_empty_.notify_one(); // wake exactly one consumer
}
void BoundedQueue<T>::close() {
std::lock_guard guard{mtx_};
closed_ = true;
not_empty_.notify_all(); // wake ALL consumers so they can exit
not_full_.notify_all(); // wake any blocked producers too
}
Note. Use
notify_onewhen there is at most one thread that can usefully proceed — for instance, pushing one item wakes one consumer. Usenotify_allwhen every waiting thread needs to re-check its predicate — for instance, closing the queue means every consumer should wake up and discover that the queue is done.
Aside (CS 343 connection: bounded buffer in μC++). CS 343 introduces the bounded buffer as a canonical monitor example. In μC++, a
_Monitorwith condition variables namedNotFullandNotEmptymaps one-to-one to ourBoundedQueue:wait(NotFull)corresponds tonot_full_.wait(lk, ...), andsignal(NotEmpty)corresponds tonot_empty_.notify_one().The C++ version differs in one important way: μC++ condition variables use signal-and-urgent-wait semantics (the signalling thread re-enters the monitor only after the signalled thread exits), while
std::condition_variableuses signal-and-continue semantics (the signalling thread keeps running; the signalled thread competes for the lock normally). This is why the predicate re-check inwaitis mandatory in C++ but sometimes optional in μC++.
One thread loads images from disk; another processes them. The queue decouples the I/O speed from the processing speed.
BoundedQueue<Image> q{8}; // buffer up to 8 images
std::thread loader{[&q]() {
for (auto &path : image_paths) {
q.push(load_image(path)); // sleeps when queue is full
}
q.push(Image{}); // empty sentinel
}};
std::thread processor{[&q]() {
while (true) {
Image img = q.pop(); // sleeps when queue is empty
if (img.empty()) break;
process_image(img);
}
}};
loader.join();
processor.join();
If loading is faster than processing, the loader sleeps when the 8-slot buffer fills up. If processing is faster, the processor sleeps waiting for the next image. The queue acts as a shock absorber between the two rates.
Problem 54: Joinable Threads Done Right
In Problem 49 we noticed that std::thread has a nasty destructor: if the thread is still joinable when the destructor runs, the program calls std::terminate. We fixed it by writing JoinThread, a small RAII wrapper that calls join() in its destructor. We also noted at the time that C++20 ships std::jthread, which does exactly that — and more. Problem 53 left us with a second unsolved problem: gracefully stopping a long-running worker thread. std::jthread’s cooperative cancellation mechanism, built around std::stop_token, solves that too.
std::jthread — std::thread with Automatic Join
std::jthread (in <thread> since C++20) is a drop-in replacement for std::thread with one critical difference: its destructor automatically requests a stop and joins the thread. You no longer need JoinThread:
#include <thread>
void f() {
std::jthread t{background_work};
do_some_work(); // may throw
} // t's destructor: requests stop, then joins -- no std::terminate
Every std::jthread owns a std::stop_source internally. Its destructor calls request_stop() on that source before joining. If the thread’s callable ignores stop requests, the destructor simply waits for the callable to finish normally — so std::jthread is backwards compatible with any callable that does not take a std::stop_token.
std::stop_token and Cooperative Cancellation
std::jthread would not be much more than a RAII wrapper if that were all it did. Its real power is cooperative cancellation: a way for one thread to politely ask another thread to stop, and a way for the worker thread to check whether it has been asked.
The mechanism has three parts:
std::stop_source: the write end. Holds the shared cancellation state. Call.request_stop()to signal that the worker should stop.std::stop_token: the read end. Handed to the worker thread. Call.stop_requested()to check whether a stop has been requested.std::stop_callback: registers a function to be called automatically when a stop is requested. Useful for waking up a sleeping thread.
#include <thread>
#include <stop_token>
void worker(std::stop_token stoken) {
while (!stoken.stop_requested()) {
do_one_unit_of_work();
}
// clean up and exit gracefully
}
int main() {
std::jthread t{worker};
std::this_thread::sleep_for(std::chrono::seconds{5});
// jthread destructor calls request_stop() then join()
// -- or call t.request_stop() manually earlier
}
When the callable’s first parameter is std::stop_token, std::jthread automatically passes its internal token as that first argument. You do not need to thread the token through yourself.
- Cooperative cancellation means the requesting thread sets a flag and the worker thread voluntarily checks that flag and exits. Nobody is forcefully interrupted. Contrast with preemptive cancellation, where the OS or runtime kills the thread at an arbitrary point.
Why Cooperative, Not Preemptive?
Forcefully terminating a thread is dangerous. The thread might be holding a mutex (now locked forever), in the middle of updating a data structure (invariant broken), or halfway through a file write (data corrupted). There is no safe point for the runtime to step in and pull the rug out.
Cooperative cancellation is safe because the worker chooses its own stopping points. It checks the token between logical operations, after releasing any locks and restoring any invariants. The calling code that sets the stop flag simply waits; the worker exits cleanly on its own schedule.
void safe_worker(std::stop_token st) {
while (!st.stop_requested()) {
{
std::lock_guard guard{mtx};
process_next_item(); // lock released here
}
// invariant restored -- safe to check the token here
}
finalise(); // always runs, cleans up resources
}
Using stop_source Independently of jthread
std::stop_source and std::stop_token are ordinary library types. You can use them independently of std::jthread — for instance, to broadcast a cancellation signal to a whole group of threads:
std::stop_source source;
// Distribute the same token to multiple workers:
auto token = source.get_token();
std::vector<std::jthread> workers;
for (int i = 0; i < 4; ++i)
workers.emplace_back([token]() { // capture by value!
while (!token.stop_requested())
do_work();
});
// Later: request all four workers to stop
source.request_stop();
// jthread destructors join (workers have already started exiting)
Capturing the token by value is important. std::stop_token is a lightweight reference-counted handle to the shared stop state — copying it is cheap and each copy refers to the same underlying flag.
std::condition_variable_any and Integrated Stop Waiting
C++20 adds a new overload of wait to std::condition_variable_any (the generalised variant that works with any BasicLockable, not just std::mutex) that accepts a std::stop_token directly. The wait exits either when the condition is satisfied or when a stop is requested, without needing a stop_callback:
std::condition_variable_any cv;
std::mutex mtx;
std::deque<int> buf;
void consumer(std::stop_token st) {
std::unique_lock lk{mtx};
// Waits until buf non-empty OR stop requested:
while (cv.wait(lk, st, [&]{ return !buf.empty(); })) {
int v = buf.front(); buf.pop_front();
lk.unlock();
process(v);
lk.lock();
}
// Here: stop was requested (or spurious exit handled by predicate)
}
When a stop is requested, the three-argument wait wakes up and returns false (the predicate was not satisfied — the stop fired instead). This eliminates the need for a separate stop_callback to poke the condition variable.
std::stop_callback — Waking a Sleeping Thread
A worker that sleeps on a condition variable will not notice a stop request until it wakes up. std::stop_callback bridges this gap: register a callback that notifies the condition variable when a stop is requested:
void queue_worker(std::stop_token st, BoundedQueue<int> &q) {
// When stop is requested, wake us up even if queue is empty
std::stop_callback on_stop{st, [&q]{ q.close(); }};
while (!st.stop_requested()) {
auto item = q.try_pop(); // returns std::optional<int>
if (!item) break;
process(*item);
}
}
The callback runs on whatever thread calls request_stop() — it must be thread-safe and fast. Notifying a condition variable or setting a flag is ideal; doing substantial work inside a stop_callback is not.
Solving the Bounded Queue Shutdown Problem
Problem 53 ended with a workaround: push a sentinel value to tell the consumer to stop. Sentinels are fragile — they require the type to have a designated “poison” value, and with multiple consumers you need one sentinel per consumer. With std::jthread and std::stop_token we can solve this cleanly by giving the queue a close() method:
template<typename T>
class BoundedQueue {
// ... (fields as in Problem 53) ...
bool closed_ = false;
public:
void close() {
std::lock_guard guard{mtx_};
closed_ = true;
not_empty_.notify_all();
not_full_.notify_all();
}
std::optional<T> pop() {
std::unique_lock lk{mtx_};
not_empty_.wait(lk, [this]{
return !buf_.empty() || closed_;
});
if (buf_.empty()) return std::nullopt;
T val = std::move(buf_.front());
buf_.pop_front();
not_full_.notify_one();
return val;
}
};
Now the producer uses a std::jthread with a stop callback that closes the queue:
BoundedQueue<int> q{64};
std::jthread producer{[&q](std::stop_token st) {
std::stop_callback sc{st, [&q]{ q.close(); }};
for (int i = 0; !st.stop_requested(); ++i)
q.push(i);
}};
std::jthread consumer{[&q]() {
while (auto item = q.pop())
process(*item);
}};
std::this_thread::sleep_for(std::chrono::seconds{2});
// Destructors call request_stop() + join() in reverse order:
// consumer first, then producer -- or request them manually
Aside (CS 343 connection: cooperative tasks in μC++). μC++ tasks respond to signals cooperatively. A task can poll for a signal at defined checkpoints — exactly the same idea as
stop_requested(). The μC++ framework also provides auBaseTask::cancel()mechanism, which sets a flag that the task checks at its own discretion.
std::stop_tokenis the C++20 standardisation of the same idea. The difference is that μC++ builds the signal-checking framework into the task abstraction at the language level, while C++ exposes it as a library type you opt into explicitly.
A worker that processes items from a queue, stopping gracefully when asked:
#include <thread>
#include <stop_token>
void run_worker(BoundedQueue<Task> &q) {
std::jthread worker{[&q](std::stop_token st) {
std::stop_callback sc{st, [&q]{ q.close(); }};
while (auto task = q.pop())
task->execute();
// exits here when queue is closed and empty
}};
// ... submit tasks to q from the main thread ...
std::this_thread::sleep_for(std::chrono::seconds{10});
// jthread destructor: request_stop() wakes the worker,
// then join() waits for it to finish cleanly
}
No sentinel values. No manual join() call. No std::terminate hazard. The thread lifecycle is entirely managed by RAII.
Note.
std::stop_token::stop_requested()is not a mutex. Requesting a stop sets a flag in the shared state; the worker thread must check the flag itself. If the worker never checks, the stop request is ignored and the thread runs to completion normally — which is fine, becausestd::jthread’s destructor simply joins without timing out.
Problem 55: Parallel Algorithms
We have been using std::sort, std::transform, std::for_each, std::accumulate, and friends since Problem 17. All of that ran on a single thread, processing one element at a time. Problems 49 through 54 taught us the machinery of concurrency: threads, mutexes, futures, atomics, condition variables. Now C++17 brings those two worlds together with one small addition: an execution policy parameter. Pass the right policy to a standard algorithm and the library can distribute the work across all available CPU cores.
Execution Policies
C++17 adds three execution policies to <execution>:
std::execution::seq: sequential execution, exactly as before. This is the default when no policy is specified and is provided mainly for uniformity — so you can parameterise on policy without a special case.std::execution::par: parallel execution. The algorithm may spread work across a thread pool. Individual operations still run one at a time within a thread, but operations from different iterations may run concurrently.std::execution::par_unseq: parallel and vectorised (SIMD). The algorithm may also interleave operations within a single thread using SIMD instructions. This is the fastest policy but the most restrictive: the callable must not acquire locks, call non-SIMD-safe functions, or access non-local memory that might alias.
Using a policy is syntactically trivial: pass it as the first argument.
#include <algorithm>
#include <execution>
#include <vector>
std::vector<int> v(10'000'000);
std::iota(v.begin(), v.end(), 0);
// Sequential (default behaviour):
std::sort(v.begin(), v.end());
// Parallel:
std::sort(std::execution::par, v.begin(), v.end());
// Parallel + vectorised:
std::sort(std::execution::par_unseq, v.begin(), v.end());
The results are semantically identical to the sequential version: the sorted order is the same, elements are moved correctly, and exception safety is preserved (with some caveats discussed later).
std::reduce vs std::accumulate
std::accumulate processes elements strictly left-to-right, in order. That ordering prevents parallelisation: each step depends on the previous result.
std::reduce (also C++17) requires that the combining operation is both associative and commutative. When those conditions hold, the library can split the range into chunks, reduce each chunk in parallel, and then combine the partial results:
#include <numeric>
#include <execution>
std::vector<double> v(10'000'000, 1.0);
// Sequential sum -- left-to-right, always:
double sum_seq = std::accumulate(v.begin(), v.end(), 0.0);
// Parallel sum -- chunks combined in any order:
double sum_par = std::reduce(
std::execution::par, v.begin(), v.end(), 0.0);
The two results may differ by tiny floating-point rounding errors because floating-point addition is not perfectly associative. For integer types they will be identical. std::transform_reduce combines a map step and a reduce step in a single parallel pass. Here is a parallel dot product:
#include <numeric>
#include <execution>
std::vector<double> a(N), b(N);
// ... fill a and b ...
// Sequential dot product:
double dot_seq = std::inner_product(a.begin(), a.end(), b.begin(), 0.0);
// Parallel dot product:
double dot_par = std::transform_reduce(
std::execution::par,
a.begin(), a.end(), // first range
b.begin(), // second range
0.0, // identity for the reduction
std::plus<>{}, // reduce operation (sum)
std::multiplies<>{}); // transform operation (multiply pairwise)
std::transform_reduce is strictly more general than std::reduce: you can supply any binary transform and any binary reduction, not just plus and identity. Use it whenever you need a map-reduce in one pass without materialising an intermediate vector.
Thread Safety of the Callable
The algorithms guarantee that the algorithm itself is correct. They do not make your callable thread-safe for you. If the callable reads or writes shared mutable state, you get the same data races as with manual thread code:
int bad_count = 0;
// DATA RACE: multiple threads increment bad_count without a lock
std::for_each(std::execution::par, v.begin(), v.end(),
[&](int x) { if (x > 0) ++bad_count; });
// Correct: use an atomic
std::atomic<int> good_count{0};
std::for_each(std::execution::par, v.begin(), v.end(),
[&](int x) { if (x > 0) ++good_count; });
The callable may also not acquire a non-recursive mutex while running under par_unseq: SIMD lanes sharing a stack frame cannot sleep waiting for a lock. par (without unseq) does allow mutex acquisition, but beware of deadlock if the mutex is also acquired outside the algorithm.
Exception Handling Under Parallel Policies
Under the sequential policy, exceptions propagate out of the algorithm normally. Under parallel policies, if any iteration throws an exception, the algorithm calls std::terminate. The rationale: in a parallel context, multiple exceptions might be in flight at once and there is no sensible way to combine them into a single exception to propagate.
If your callable can throw and you need to handle those exceptions, either catch them inside the callable and encode failure in the return value, or use the sequential policy.
When Parallel Wins and When It Loses
Parallelism has overhead: splitting the range, scheduling threads on a pool, and combining results all cost time. The parallel version only wins when the work saved exceeds that overhead.
Parallel wins when:
- The dataset is large — millions of elements, not thousands.
- The per-element work is CPU-bound (computation, not I/O).
- There are no dependencies between iterations.
- The machine has multiple idle cores.
Parallel loses when:
- The dataset is small — thread-pool overhead dominates.
- The work is I/O-bound — all threads are waiting on disk or network, not on the CPU.
- Iterations share mutable state heavily — contention on locks or atomics serialises the work.
The only reliable way to know is to measure. A parallel sort of 100 elements is slower than the sequential version on every machine. A parallel sort of 100,000,000 elements is typically 2–8x faster depending on core count and memory bandwidth.
Not All Algorithms Parallelise Equally
Every standard algorithm in <algorithm> and <numeric> that takes a pair of iterators has a parallel overload in C++17. But they are not all equally amenable to parallelism. Here is a rough guide:
- Excellent candidates:
std::sort,std::for_each,std::transform,std::reduce,std::transform_reduce,std::find_if(early exit still works in parallel — the library abandons remaining threads once one finds a match),std::count_if. - Good but check your callable:
std::copy_if,std::generate,std::fill— these are memory-bandwidth bound; parallelism helps on NUMA systems but may not on single-socket machines where all cores share the same memory bus. - Sequential semantics required:
std::accumulate(usestd::reduceinstead), any algorithm that processes a forward-linked list (the library cannot split at arbitrary points), anything where the callable has a side-effect on a shared counter without atomics.
The C++ standard guarantees that parallel algorithms produce the same result as their sequential counterparts for side-effect-free callables. The only observable difference is wall-clock time. This guarantee is what makes the one-word upgrade from sequential to parallel safe: you do not need to rethink the algorithm, only ensure the callable is thread-safe.
Aside (CS 343 connection: task-based parallelism). In CS 343, μC++ tasks communicate by message-passing: each task is an independent thread with its own mailbox. Parallel algorithms use a different model: a thread pool picks up work items from a shared queue. The two models have the same expressive power but different tradeoffs. Message-passing is cleaner for workflows with complex data dependencies; thread pools are more efficient for homogeneous, independent work like applying the same function to every element of an array.
Aside (CS 343 / CS 370 connection: associativity and parallelism). The requirement that
std::reduce’s operation be associative is not arbitrary. CS 370 (Numerical Computation) teaches that floating-point operations are not truly associative due to rounding.std::accumulategives a reproducible result because it always evaluates left-to-right.std::reduceis allowed to evaluate in any tree-shaped order, so its floating-point result is platform-dependent. For numerical applications, usestd::accumulateif reproducibility matters more than speed, or useparwith a compensated summation algorithm.
Apply a filter to every pixel in an image using a parallel for_each:
#include <algorithm>
#include <execution>
struct Pixel { uint8_t r, g, b, a; };
void apply_filter(std::vector<Pixel> &image) {
std::for_each(
std::execution::par_unseq,
image.begin(), image.end(),
[](Pixel &p) {
// Convert to greyscale in place -- no shared state
uint8_t grey = static_cast<uint8_t>(
0.299f*p.r + 0.587f*p.g + 0.114f*p.b);
p.r = p.g = p.b = grey;
});
}
Each pixel is processed independently, so par_unseq is correct and safe. On a modern CPU with 8 cores and SIMD, this can process a 24-megapixel image tens of times faster than the sequential version.
Note. Availability of parallel execution policies varies by compiler and platform. MSVC supports them out of the box on Windows. GCC and Clang require linking against Intel’s TBB library (
-ltbb). IfTBBis not present, the parallel policies silently fall back to sequential execution on some implementations. You can check for support at compile time with the feature macro__cpp_lib_execution.#if defined(__cpp_lib_execution) && __cpp_lib_execution >= 201902L std::sort(std::execution::par, v.begin(), v.end()); #else std::sort(v.begin(), v.end()); #endif
Problem 56: Type Erasure
Problem 20 gave us abstract classes: a common base type so that unrelated
concrete classes can be manipulated through a single pointer. Problem 27 gave
us CRTP for zero-overhead static polymorphism. Problem 28 used virtual
clone to copy objects whose type we do not know at compile time. All of
these techniques require the concrete types to share a base class.
But std::function<void(int)> can hold a plain function pointer, a lambda,
a functor, or a std::bind result — and none of those types share a base
class. Yet a single std::function object holds any of them and calls it
with identical syntax. How? The answer is type erasure.
Type erasure is also what makes std::any work: it stores a value of
any type and lets you retrieve it later by casting. It is what makes
std::variant in combination with std::visit dispatch correctly over
unrelated types. And — if you squint — it is what virtual dispatch has
always been doing: the vtable pointer inside a polymorphic object is the
erased type, recorded at construction.
Any technique that allows objects of unrelated concrete types to be used through a uniform interface without imposing a common base class on those concrete types. The concrete type is recorded at construction time and “erased” from the static type that the outside world sees.
The problem: storing arbitrary callables
Suppose you are writing an event system. You want to store a list of callbacks
to fire when an event occurs. The callbacks all have signature void(int),
but they come from many sources:
void onEvent(int x) { /* free function */ }
struct Handler { void operator()(int x) { /* functor */ } };
auto lam = [capture](int x) { /* lambda */ };
These types are completely unrelated. You cannot put them in a
vector<SomeBase*> because no such base exists. But you want exactly
that: a container of things you can call with an int.
Step 1: hand-rolling type erasure
The trick is to introduce the hierarchy ourselves — not on the callable types (which we do not own), but on a private wrapper type we control.
struct CallableBase {
virtual void operator()(int) = 0;
virtual ~CallableBase() = default;
};
template<typename F>
struct CallableImpl : CallableBase {
F f;
CallableImpl(F f) : f{std::move(f)} {}
void operator()(int x) override { f(x); }
};
Now write a public wrapper that stores a unique_ptr<CallableBase> and
constructs the right CallableImpl on the fly inside its constructor:
class Function {
std::unique_ptr<CallableBase> ptr;
public:
template<typename F>
Function(F f)
: ptr{std::make_unique<CallableImpl<F>>(std::move(f))} {}
void operator()(int x) { (*ptr)(x); }
};
The concrete type F is captured inside CallableImpl<F> on the heap.
The Function object itself only ever holds a CallableBase* — the
type has been erased.
Function f1{onEvent}; // wraps a free function
Function f2{Handler{}}; // wraps a functor
Function f3{[](int x){ std::cout << x; }}; // wraps a lambda
std::vector<Function> callbacks;
callbacks.push_back(std::move(f1));
callbacks.push_back(std::move(f2));
callbacks.push_back(std::move(f3));
for (auto &cb : callbacks) cb(42); // dispatches correctly for all three
This is what std::function does
std::function<void(int)> is the standard-library version of exactly that
pattern, generalized over any signature. It additionally handles:
- Copy:
std::functionis copyable, which meansCallableBaseneeds a virtualclone()method — exactly the pattern from Problem 28. - The empty state: a default-constructed
std::functionstores nothing; calling it throwsstd::bad_function_call. - Target query:
f.target<F>()returns a pointer to the storedF, ornullptrif the type does not match.
The small buffer optimization
One subtlety: the naive implementation heap-allocates a new CallableImpl
for every std::function construction, even for a tiny stateless lambda.
Most standard-library implementations avoid this with a small buffer
optimization (SBO): an inline byte array of, say, 24 bytes. If the callable
fits, it is placement-newed into the buffer; otherwise the implementation falls
back to a heap allocation. The switch is invisible to the caller.
The same trick appears in std::string (short-string optimization) and
std::any. The pattern is: cheap in the common case, correct in all
cases.
Note. A lambda that captures a large object by value may exceed the SBO buffer and trigger a heap allocation. If you care, keep captures small or capture a pointer to the heavy data rather than the data itself.
Type erasure in the standard library
C++ provides three canonical type-erasure tools:
std::function<Sig>(C++11) erases the type of any callable with a compatible signature. Copyable, with SBO.std::any(C++17) erases the type of an arbitrary value. Store anything copy-constructible; retrieve it withstd::any_cast<T>. Think of it as a type-safevoid*.Virtual dispatch is the original type-erasure mechanism. The vtable pointer stored in every polymorphic object is the erased type information. The compiler generates the
CallableImplanalogue — the vtable and its entries — automatically.
Aside (CS 247 connection: interface vs. representation). Type erasure cleanly separates interface (
CallableBase’soperator()) from representation (the concreteFstored inCallableImpl<F>). This is the same separation that abstract classes encode; type erasure just applies it to types you do not own and therefore cannot retroactively give a common base class to.
C++23: std::move_only_function
std::function requires the stored callable to be copy-constructible,
because std::function itself is copyable. But lambdas that capture a
unique_ptr are move-only — they cannot be copied.
C++23 adds std::move_only_function<Sig>: a type-erased callable that only
requires movability, not copyability. It is non-copyable itself, but otherwise
works identically to std::function and can store any move-only callable.
For event handlers stored once and never duplicated, this is the right choice.
Worked example: implementing Function from scratch
Here is a self-contained implementation that shows every piece of the machinery in one place.
template<typename Sig> class Function; // primary template left undefined
template<typename R, typename... Args>
class Function<R(Args...)> {
struct Base {
virtual R call(Args...) = 0;
virtual std::unique_ptr<Base> clone() const = 0;
virtual ~Base() = default;
};
template<typename F>
struct Impl : Base {
F f;
Impl(F f) : f{std::move(f)} {}
R call(Args... args) override { return f(args...); }
std::unique_ptr<Base> clone() const override {
return std::make_unique<Impl<F>>(f);
}
};
std::unique_ptr<Base> ptr;
public:
template<typename F>
Function(F f) : ptr{std::make_unique<Impl<F>>(std::move(f))} {}
Function(const Function &o) : ptr{o.ptr->clone()} {}
Function(Function &&) = default;
Function &operator=(Function o) { std::swap(ptr, o.ptr); return *this; }
R operator()(Args... args) { return ptr->call(args...); }
};
This is roughly 25 lines. The real std::function adds
std::bad_function_call for the null case, target-type introspection, and
the SBO, but the core mechanism is exactly this.
Type erasure vs. templates vs. virtual dispatch
It is worth comparing the three approaches side by side.
Templates (Problem 40 concepts). When you write a template function that accepts any callable, the compiler generates a separate instantiation for each concrete type. The call is direct — no indirection — and the optimizer can inline the body. The downside: all types must be known at compile time, and storing a heterogeneous collection requires some other mechanism.
Virtual dispatch (Problem 18). When you define an abstract base class
with a pure-virtual operator(), each callable type must inherit from it.
You own the types, or you can add the base class. Collections of base pointers
work. The cost: one vtable lookup per call, and the types must be in a single
inheritance hierarchy.
Type erasure (std::function). You do not own the types; they
need not share any base. Collections work. The cost: heap allocation per
construction (possibly avoided by SBO), and one indirect call per invocation.
In summary: use templates when all types are compile-time known and performance is paramount. Use virtual dispatch when you own the types. Use type erasure when you need to store heterogeneous callables from code you do not control.
Note. Type erasure has real runtime overhead: a heap allocation at construction time (unless SBO applies) and a virtual call or function-pointer indirection at each invocation. When performance is critical and the set of callable types is known at compile time, prefer templates or concepts (Problem 40). Use
std::functionwhen you need to store heterogeneous callables at runtime: event systems, callbacks passed across module boundaries, plugin architectures.
Problem 57: Hiding the Implementation
Problem 2 established the golden rule of separate compilation: declarations live
in headers, definitions live in .cc files. That rule minimizes what needs
to be recompiled when an implementation detail changes. But there is a gap.
Even if you follow the rule perfectly, the private members of a class must appear
in the header because the compiler needs to know the object’s total size. And
if any private member changes — its type, or the type of a type it depends on
— every translation unit that includes the header must recompile. On a large
project, a one-line change to a private field can trigger a complete rebuild.
The pimpl idiom (pointer to implementation) closes that gap.
The problem in concrete terms
Suppose you are writing a Widget class for a GUI library:
// widget.h
#include "font.h" // Font is a heavy header
#include "texture.h" // Texture is heavier still
#include "event_queue.h" // and this one changes often
class Widget {
public:
void draw();
void setFont(const Font &f);
private:
Font font_;
Texture bg_;
EventQueue events_;
int x_, y_, w_, h_;
};
Every file that writes #include "widget.h" also transitively includes
font.h, texture.h, and event_queue.h. When the development
team changes event_queue.h — maybe to add a new field — every
translation unit in the project recompiles, even if none of them ever touch the
event queue directly.
The pimpl idiom
The fix is to replace all private data with a single opaque pointer to a forward-declared implementation struct. The header reveals nothing about what is inside the implementation.
// widget.h -- the public header
#include <memory>
class Widget {
public:
Widget();
~Widget();
Widget(Widget &&) noexcept;
Widget &operator=(Widget &&) noexcept;
void draw();
void setFont(const std::string &name);
private:
struct Impl; // forward declaration only
std::unique_ptr<Impl> impl_;
};
The header has no includes for font.h, texture.h, or
event_queue.h. Clients that include this header see only the public
interface and a unique_ptr to a struct they cannot look inside. The
compiler knows the size of unique_ptr<Impl> (it is one pointer) without
knowing what Impl contains.
The source file defines everything:
// widget.cc -- the implementation file
#include "widget.h"
#include "font.h"
#include "texture.h"
#include "event_queue.h"
struct Widget::Impl {
Font font;
Texture bg;
EventQueue events;
int x, y, w, h;
};
Widget::Widget() : impl_{std::make_unique<Impl>()} {}
Widget::~Widget() = default; // unique_ptr<Impl> destructor needs Impl complete
Widget::Widget(Widget &&) noexcept = default;
Widget &Widget::operator=(Widget &&) noexcept = default;
void Widget::draw() { impl_->bg.render(impl_->x, impl_->y); }
void Widget::setFont(const std::string &name) { impl_->font = Font{name}; }
Now when event_queue.h changes, only widget.cc recompiles. All the
clients that include widget.h are untouched.
Why the destructor must be defined in the source file
This is the most common pimpl mistake. If you write:
// widget.h -- WRONG
class Widget {
struct Impl;
std::unique_ptr<Impl> impl_;
public:
~Widget() = default; // disaster waiting to happen
};
the = default definition is generated at the point of declaration
in the header. At that point, Impl is incomplete — the compiler has
seen only the forward declaration. When unique_ptr<Impl>’s destructor
tries to call delete on its stored pointer, it needs to call
Impl::~Impl(), and it cannot do so if Impl is incomplete. This
is a compile error.
The fix is to declare the destructor in the header and define it in the source
file, after the full definition of Impl is visible:
// widget.h
class Widget {
...
~Widget(); // declared, not defined here
};
// widget.cc
#include "widget.h" // includes the forward declaration
struct Widget::Impl { ... }; // now Impl is complete
Widget::~Widget() = default; // now safe: Impl is complete here
The same reasoning applies to the move constructor and move assignment operator:
they also need to destroy or transfer the unique_ptr, which requires
Impl to be complete. Declare them in the header; default them in the
source.
Note. Copy operations are omitted in the example above because
unique_ptris not copyable. If you wantWidgetto be copyable, you must write explicit copy operations in the source file that do a deep copy of theImpl:impl_ = std::make_unique<Impl>(*other.impl_). This is perfectly safe as long asImplhas a working copy constructor, which it usually does.
Connection to Problem 14
The pimpl idiom depends critically on std::unique_ptr. A raw pointer
would work — the technique predates modern C++ — but you would need a
manually written destructor, copy operations, and move operations, all correct
and all in the source file. unique_ptr gets you the destructor for free
(once you obey the “define in source” rule) and expresses the ownership intent
clearly. Raw-pointer pimpl is the kind of code that leaks under exceptions.
Aside (CS 247 connection: information hiding). Information hiding is a first principle of software design: expose what clients need to use, conceal everything else. Pimpl takes information hiding further than the
privatekeyword alone —privateprevents access but the names of private members still appear in the header. Pimpl prevents even the names from leaking, giving the implementor freedom to change the representation without touching any client file.
Worked example: a Logger class
// logger.h
#include <memory>
#include <string>
class Logger {
public:
explicit Logger(const std::string &path);
~Logger();
Logger(Logger &&) noexcept;
Logger &operator=(Logger &&) noexcept;
void log(const std::string &msg);
void flush();
private:
struct Impl;
std::unique_ptr<Impl> impl_;
};
// logger.cc -- includes heavy headers that callers never see
#include "logger.h"
#include <fstream>
#include <deque>
struct Logger::Impl {
std::ofstream file;
std::deque<std::string> buffer;
static constexpr size_t kFlushAt = 64;
};
Logger::Logger(const std::string &path)
: impl_{std::make_unique<Impl>()} {
impl_->file.open(path, std::ios::app);
}
Logger::~Logger() = default;
Logger::Logger(Logger &&) noexcept = default;
Logger &Logger::operator=(Logger &&) noexcept = default;
void Logger::log(const std::string &msg) {
impl_->buffer.push_back(msg);
if (impl_->buffer.size() >= Impl::kFlushAt) flush();
}
void Logger::flush() {
for (auto &m : impl_->buffer) impl_->file << m << '\n';
impl_->buffer.clear();
impl_->file.flush();
}
The <fstream> and <deque> headers are invisible to callers. They
do not pay their compile-time cost.
Trade-offs
Pimpl buys build-time isolation at two runtime costs: one extra heap allocation
per Widget (for the Impl), and one extra pointer dereference on
every method call. For a GUI widget called at most a few hundred times per
frame, this is irrelevant. For a hot inner-loop data structure called millions
of times per second, it is worth reconsidering.
The idiom is most valuable for library headers — headers that many
clients include, that are developed by a separate team, and that change
independently of the public interface. For internal implementation files that
are only included by a handful of .cc files, the compile-time savings are
small and the indirection cost might not be worth it.
Note. The pimpl idiom also has a nice side effect on exception safety. Swapping two
unique_ptr<Impl>values isnoexcept. This means you can implement the strong exception guarantee on any mutating method by making your changes on a copy of theImpland swapping only on success — as discussed in the patterns chapter.
Problem 58: SFINAE Revisited
Problem 24 used tag dispatch to choose between iterator implementations at compile time. Problem 40 showed that concepts do the same job more cleanly in C++20. But millions of lines of pre-C++20 code use an older technique: SFINAE. Library headers from before 2020 are full of it. You need to be able to read it.
What SFINAE means
SFINAE stands for Substitution Failure Is Not An Error. When the compiler tries to instantiate a function template, it substitutes the template arguments into the function’s signature. If that substitution produces an ill-formed type or expression, the compiler does not emit a hard error — it simply removes that template from the overload set and moves on. The next candidate gets a chance.
template<typename T>
typename T::value_type front(const T &c) { return c.front(); }
template<typename T>
T front(const T *arr) { return arr[0]; }
front(std::vector<int>{1, 2, 3}); // uses first overload
front(new int[3]{1, 2, 3}); // uses second: int* has no ::value_type
When T = int*, substituting into typename T::value_type produces
an ill-formed type. SFINAE silently discards the first overload; the second
wins.
SFINAE only applies to deduced template contexts. An explicit specialization or a plain function call that fails to compile is still a hard error.
std::enable_if: gating on a condition
std::enable_if<B, T> is a type trait with a member type equal to
T when B is true, and no member type when
B is false. The “no member” case triggers SFINAE.
#include <type_traits>
// Only participates in overload resolution when T is an integral type
template<typename T>
std::enable_if_t<std::is_integral_v<T>, T> square(T x) { return x * x; }
// Only participates when T is a floating-point type
template<typename T>
std::enable_if_t<std::is_floating_point_v<T>, T> square(T x) { return x * x; }
When you call square(3), the compiler considers both overloads. For the
first, std::is_integral_v<int> is true, so
std::enable_if_t<true, int> resolves to int — valid. For the
second, std::is_floating_point_v<int> is false, so
std::enable_if_t<false, int> has no type member — SFINAE
discards it. The first overload wins cleanly.
std::enable_if<B, T = void> has a nested type member equal to
T if and only if B is true. The shorthand
std::enable_if_t<B, T> is typename std::enable_if<B, T>::type.
Common type traits
The <type_traits> header provides a large family of predicates. The most
commonly used ones are:
std::is_integral_v<T>— true forint,long,bool, etc.std::is_floating_point_v<T>— true forfloat,double.std::is_same_v<T, U>— true whenTandUare the same type.std::is_base_of_v<Base, Derived>— true whenDerivedinherits fromBase.std::is_constructible_v<T, Args...>— true whenT(Args...)is valid.std::is_copy_constructible_v<T>,std::is_move_constructible_v<T>.std::is_const_v<T>,std::is_reference_v<T>,std::is_pointer_v<T>.
Each of these is a constexpr bool computed entirely at compile time.
std::declval: getting a value in unevaluated context
SFINAE expressions often need to test whether a type supports a particular
operation. But you cannot construct an arbitrary T just to test it —
it might have no default constructor, or might have side effects.
std::declval<T>() solves this: it pretends to return a T&&
in an unevaluated context (inside sizeof, decltype, etc.) without
actually constructing anything.
// decltype(std::declval<T>().begin()) is the type of T's begin() return value
// This expression is unevaluated -- no T is ever constructed
template<typename T>
auto first(const T &c) -> decltype(*std::declval<T>().begin()) {
return *c.begin();
}
The void_t trick: detecting member existence
std::void_t<...> (C++17) is a type alias that maps any sequence of
well-formed types to void. Its value is that the attempt to form
the types triggers SFINAE.
The idiom uses partial template specialization to detect whether a type has a particular member or operation:
// Primary template: T does not have begin() by default
template<typename T, typename = void>
struct has_begin : std::false_type {};
// Specialization: enabled only when T().begin() is well-formed
template<typename T>
struct has_begin<T, std::void_t<decltype(std::declval<T>().begin())>>
: std::true_type {};
static_assert( has_begin<std::vector<int>>::value);
static_assert(!has_begin<int>::value);
When T = std::vector<int>, the expression
declval<std::vector<int>>().begin() is well-formed. void_t of a
well-formed type is void. The specialization matches and
has_begin<vector<int>> inherits from std::true_type.
When T = int, declval<int>().begin() is ill-formed. SFINAE
discards the specialization; the primary template (with std::false_type)
wins.
SFINAE vs. tag dispatch
Problem 24 used tag dispatch: pass a tag object to an overloaded helper function and let overload resolution pick the right one. SFINAE is a different mechanism: exclude overloads from the candidate set before overload resolution even begins.
In practice, either technique works. Tag dispatch is often clearer because the selection logic lives in the function bodies and is readable prose. SFINAE is more powerful because it can gate on arbitrary compile-time expressions, not just a type’s tag.
How concepts (Problem 40) replace all of this
In C++20, the entire pattern becomes:
template<std::integral T>
T square(T x) { return x * x; }
template<std::floating_point T>
T square(T x) { return x * x; }
The requires clause is semantically equivalent to enable_if but
readable without a decoder ring. The compiler error messages are also
dramatically better: instead of a wall of template instantiation failures, you
get “constraint not satisfied”.
Worked example: detecting operator« for ostream
We want a type trait is_streamable<T> that is true when
std::cout << t is a valid expression.
#include <iostream>
#include <type_traits>
template<typename T, typename = void>
struct is_streamable : std::false_type {};
template<typename T>
struct is_streamable<T,
std::void_t<decltype(std::declval<std::ostream &>()
<< std::declval<T>())>>
: std::true_type {};
template<typename T>
std::enable_if_t<is_streamable<T>::value>
printIfPossible(const T &v) {
std::cout << v << '\n';
}
template<typename T>
std::enable_if_t<!is_streamable<T>::value>
printIfPossible(const T &) {
std::cout << "(not printable)\n";
}
is_streamable<int>::value is true; is_streamable<struct Opaque>::value is
false. The right printIfPossible overload is selected without a
runtime branch.
SFINAE on function return types vs. dummy parameters
There are two common places to put enable_if_t: in the return type, and
as a defaulted non-type template parameter.
// Style 1: in the return type
template<typename T>
std::enable_if_t<std::is_integral_v<T>, T> square(T x) { return x*x; }
// Style 2: as a defaulted non-type parameter
template<typename T, std::enable_if_t<std::is_integral_v<T>, int> = 0>
T square(T x) { return x*x; }
Style 1 is cleaner for functions with a meaningful return type. Style 2 is
needed when the return type is void or is fixed regardless of the
condition. Both achieve the same SFINAE effect. You will see both in real
library headers; knowing that they are equivalent makes reading them less
painful.
Conditional base classes with std::conditional
One more tool worth knowing: std::conditional<B, T, F> selects type
T when B is true and type F when B is
false. It is if-then-else for types.
// A buffer that is either thread-safe or not, chosen at compile time
template<bool ThreadSafe>
class Buffer {
// Use a mutex only in the thread-safe variant
using LockType = std::conditional_t<ThreadSafe,
std::mutex,
DummyMutex>;
LockType lock_;
std::vector<char> data_;
public:
void write(const char *p, size_t n) {
std::lock_guard g{lock_}; // no-op if DummyMutex
data_.insert(data_.end(), p, p + n);
}
};
std::conditional_t is used pervasively in the standard library’s own
implementation to select between two strategies at compile time. Together with
enable_if and void_t, it forms the core toolkit of pre-concepts
template metaprogramming.
Note. In new code targeting C++20 or later, use concepts. Read SFINAE to understand pre-C++20 library headers — the standard library’s own implementation of
std::vector,std::optional, andstd::variantis full of it. Being able to parseenable_if_tandvoid_tis a practical skill for anyone who needs to understand or debug template-heavy library code.
Problem 59: Policy-Based Design
Problem 27 used CRTP to inject behavior at compile time, avoiding vtable overhead. Problem 29 showed how multiple inheritance assembles behaviors from a collection of mixin classes. There is a third compositional technique that sits between the two: policy-based design, where each behavior dimension is expressed as a template parameter that provides a statically-known interface.
The term was popularized by Andrei Alexandrescu in Modern C++ Design (2001). The ideas remain relevant today, especially in library code where generality and zero overhead must coexist.
The idea: behavior as a template parameter
Instead of hardcoding a behavior in a class, you parameterize the class on a policy class that provides that behavior. The host class calls into the policy through a known interface; the policy class is a regular class, not a virtual base. The binding happens at compile time — no vtable.
A class that provides a specific behavior to a host class via a statically-known interface (a set of methods and/or types that the host calls). Policies are chosen at instantiation time by passing them as template arguments. The host class knows only that a policy provides the required interface; it does not know — or care — which concrete policy it is.
A simple example: Logger with output and filter policies
Consider a Logger class. There are two independent behavior dimensions:
- Output policy: where messages go (stdout, file, syslog).
- Filter policy: which messages to pass (all, errors only, debug).
We could handle this with inheritance or runtime switches. Or we can parameterize:
struct StdoutOutput {
void write(const std::string &msg) { std::cout << msg << '\n'; }
};
struct FileOutput {
std::ofstream file;
FileOutput(const std::string &path) : file{path, std::ios::app} {}
void write(const std::string &msg) { file << msg << '\n'; }
};
struct AllMessages {
bool accept(const std::string &) { return true; }
};
struct ErrorsOnly {
bool accept(const std::string &msg) {
return msg.find("[ERROR]") != std::string::npos;
}
};
template<typename OutputPolicy, typename FilterPolicy>
class Logger : private OutputPolicy, private FilterPolicy {
public:
using OutputPolicy::OutputPolicy; // inherit constructors if needed
void log(const std::string &msg) {
if (FilterPolicy::accept(msg))
OutputPolicy::write(msg);
}
};
// Instantiate concrete loggers at zero overhead
using ConsoleLogger = Logger<StdoutOutput, AllMessages>;
using ProductionLog = Logger<FileOutput, ErrorsOnly>;
The call to FilterPolicy::accept(msg) and
OutputPolicy::write(msg) are direct method calls on the base class —
no virtual dispatch, no indirection. The optimizer can inline them.
Private inheritance as composition
Why private inheritance rather than data members? Both work. Private
inheritance enables the empty base optimization (EBO): if a policy class
has no data members, the compiler is permitted to give it zero size when used
as a base class, but not when used as a data member. Before C++20 and
[[no_unique_address]], EBO was the only way to pay zero bytes for a
stateless policy.
// with data member: sizeof(Logger) = sizeof(StdoutOutput) + sizeof(ErrorsOnly)
// but both are empty structs, so this could be 2 bytes
// with private inheritance: sizeof(Logger) could be 1 byte (EBO)
In C++20 you can write:
template<typename OutputPolicy, typename FilterPolicy>
class Logger {
[[no_unique_address]] OutputPolicy out_;
[[no_unique_address]] FilterPolicy filt_;
public:
void log(const std::string &msg) {
if (filt_.accept(msg)) out_.write(msg);
}
};
The [[no_unique_address]] attribute (Problem 61) tells the compiler it
is allowed to give the member zero size if the type is empty. This achieves
EBO without private inheritance, which is usually cleaner.
Mixing policies with CRTP
Policies and CRTP compose naturally. Suppose a policy needs to call back into the host class:
template<typename Derived>
struct PrefixFilter {
bool accept(const std::string &msg) {
// ask the host what prefix to require
return msg.find(static_cast<Derived*>(this)->prefix()) == 0;
}
};
template<typename OutputPolicy>
class MyLogger
: private OutputPolicy
, private PrefixFilter<MyLogger<OutputPolicy>> {
public:
std::string prefix() const { return "[App]"; }
void log(const std::string &msg) {
if (PrefixFilter<MyLogger>::accept(msg))
OutputPolicy::write(msg);
}
};
The policy calls static_cast<Derived*>(this)->prefix() — the same
upcast that CRTP uses. No virtual call.
The cost: code size
Each combination of policies creates a distinct instantiation. If you have
three output policies and three filter policies, you have nine potential
Logger types. Each one that you actually instantiate gets its own copy
of the compiled log method. The binary grows.
For a library used in many configurations, this can matter. The trade-off
is always: zero overhead vs. binary size. In practice, if constexpr
and explicit instantiation can reduce duplication, and the inliner often
merges identical bodies anyway.
Constraining policies with concepts
In C++20, you can express what a policy must provide as a concept (Problem 40):
template<typename P, typename Msg>
concept OutputPolicyConcept = requires(P p, const Msg &m) {
{ p.write(m) } -> std::same_as<void>;
};
template<typename OutputPolicy, typename FilterPolicy>
requires OutputPolicyConcept<OutputPolicy, std::string>
class Logger : private OutputPolicy, private FilterPolicy { ... };
Now if you accidentally pass a type that lacks write, the error message
names the unsatisfied concept rather than burying the failure inside a template
instantiation chain.
Aside (CS 247 connection: open/closed principle). Policy-based design is one of the purest expressions of the Open/Closed Principle: the
Loggerhost class is closed for modification but open for extension — you extend it by providing new policy classes, not by editing theLoggertemplate itself.
Worked example: configurable allocator
template<size_t N>
struct FixedArena {
alignas(std::max_align_t) char buf[N];
size_t used = 0;
void *allocate(size_t n) {
if (used + n > N) throw std::bad_alloc{};
void *p = buf + used; used += n; return p;
}
void deallocate(void *, size_t) {} // no-op: arena is freed all at once
};
struct HeapAlloc {
void *allocate(size_t n) { return ::operator new(n); }
void deallocate(void *p, size_t) { ::operator delete(p); }
};
template<typename T, typename AllocPolicy>
class Pool : private AllocPolicy {
// ... uses AllocPolicy::allocate / deallocate
public:
T *get() { return static_cast<T*>(AllocPolicy::allocate(sizeof(T))); }
void put(T *p) { AllocPolicy::deallocate(p, sizeof(T)); }
};
Pool<int, FixedArena<1024>> stackPool; // all allocations from a 1 KB buffer
Pool<int, HeapAlloc> heapPool; // normal heap
Policy-based design in the standard library
The standard library itself uses policy-based design throughout. The most visible example is the allocator policy in containers:
template<typename T, typename Allocator = std::allocator<T>>
class vector { /* ... */ };
// Use a custom pool allocator:
vector<int, PoolAllocator<int>> poolVec;
// Use the default heap allocator:
vector<int> heapVec;
The allocator interface (allocate, deallocate, construct,
destroy) is the policy contract. Any type that satisfies it — the
standard heap allocator, a pool allocator, a debug-tracking allocator, an
arena allocator — can be plugged in. The vector class itself is
unchanged.
Another example: std::basic_string<CharT, Traits, Allocator>. The
Traits parameter is a character-traits policy that specifies how to
compare, copy, and assign characters. std::char_traits<char> is the
default; a case-insensitive string can be built by providing a custom traits
class.
Aside (CS 247 connection: the single responsibility principle). Policy-based design enforces SRP at the type level: each policy class is responsible for exactly one axis of behavior. The host class is responsible only for the high-level algorithm. If you later find that you need a different output strategy, you replace the output policy — not the host class. SRP is not just a principle for methods; it applies to template parameters too.
Note. Policy-based design is most useful in library code that must be both general and zero-overhead: allocators, smart pointers, hash tables. For application-level code where you have one or two concrete configurations and runtime flexibility matters more than the last few nanoseconds, virtual dispatch is usually the right trade-off — it is simpler to understand and to change.
Problem 60: Reference Qualifiers on Methods
Problem 5 introduced rvalue references on function parameters to
distinguish moving from copying. Problem 6 qualified member functions with
const to distinguish read-only from mutating access. Both of those
qualifiers describe something about the object being passed in or held. There
is a third dimension: the value category of *this. A method can
be called on an lvalue (a named, persistent object) or on an rvalue (a
temporary that is about to be destroyed). Reference qualifiers let you
distinguish those two cases.
Motivation: references to temporaries
Consider a person type with a name accessor:
class Person {
std::string name_;
public:
std::string &name() { return name_; }
};
The unqualified name() works fine when called on a named Person:
Person p{"Alice"};
p.name() = "Bob"; // fine: p is a persistent object
But it also compiles silently in a trap:
getPerson().name() = "Bob"; // getPerson() returns a temporary!
The assignment writes into a string that belongs to a temporary Person.
That temporary is destroyed at the semicolon — the write is lost. This is
a bug that the compiler does not warn about in the unqualified case.
Ref-qualifiers
C++11 added ref-qualifiers: & and && placed after the
parameter list (and after const if present) to restrict which value
category of *this the method accepts.
class Person {
std::string name_;
public:
// lvalue version: returns a reference into the (persistent) object
std::string &name() & { return name_; }
// rvalue version: the object is expiring -- extract the string by move
std::string name() && { return std::move(name_); }
};
Now:
Person p{"Alice"};
p.name() = "Bob"; // calls & overload -- fine
std::string n = getPerson().name(); // calls && overload -- moves out
If you try getPerson().name() = "Bob", the && overload returns a
value (std::string), not a reference. Assigning to that value is a
compile error — the bug is caught.
A & or && qualifier placed after the parameter list of a member
function restricts that overload to be called only on lvalue expressions
(&) or only on rvalue expressions (&&) of *this. If no
ref-qualifier is present, the function may be called on either.
Interaction with const
Ref-qualifiers stack with const. You can have up to four overloads:
class Widget {
public:
void process() &; // lvalue, non-const
void process() const &; // lvalue, const
void process() &&; // rvalue, non-const
void process() const &&; // rvalue, const (rare)
};
In practice you rarely need all four. The common combinations are:
const &and&&: theconst &overload handles both const lvalues and (in the absence of the&&overload) rvalues. Add the&&overload specifically to handle the temporary case differently.- Just
&: forbid calling the method on a temporary at all.
The standard library uses this
std::optional<T> overloads value() on all four combinations so that
dereferencing an optional temporary can move the contained value out:
std::optional<std::string> maybeGet();
std::string s = maybeGet().value(); // calls value() &&, moves the string
Without the && overload, this would copy the string out of a temporary
that is about to be destroyed — wasteful. With it, the move is free.
The fluent builder pattern
Ref-qualifiers are natural in fluent builder APIs, where you chain calls:
class QueryBuilder {
std::string sql_;
public:
QueryBuilder &select(const std::string &cols) & {
sql_ += "SELECT " + cols + " ";
return *this; // chain on lvalue
}
QueryBuilder &&select(const std::string &cols) && {
sql_ += "SELECT " + cols + " ";
return std::move(*this); // chain on rvalue
}
QueryBuilder &from(const std::string &table) & {
sql_ += "FROM " + table; return *this;
}
QueryBuilder &&from(const std::string &table) && {
sql_ += "FROM " + table; return std::move(*this);
}
std::string build() && { return std::move(sql_); }
};
QueryBuilder qb;
qb.select("*").from("users"); // lvalue chain: modifies qb in place
std::string q = QueryBuilder{}.select("id").from("orders").build();
// rvalue chain: each step moves the builder forward; build() extracts
Worked example: a Resource wrapper
class Resource {
int *data_;
size_t size_;
public:
explicit Resource(size_t n)
: data_{new int[n]{}}, size_{n} {}
~Resource() { delete[] data_; }
// lvalue: return a view -- do not take ownership
const int *data() const & { return data_; }
size_t size() const & { return size_; }
// rvalue: move the pointer out -- caller takes ownership
int *release() && {
int *p = data_;
data_ = nullptr;
size_ = 0;
return p;
}
};
Resource makeResource(size_t n) { return Resource{n}; }
const int *view = makeResource(10).data(); // OK: const & overload
// but:
int *owned = makeResource(10).release(); // moves out; caller must delete[]
The const & overloads of data() and size() are available on
both lvalues and rvalues (since a const & ref-qualifier binds both).
The && overload of release() is only callable on a temporary —
which is the right restriction, since you would not want to silently drain a
named Resource.
The interaction with move semantics
Ref-qualifiers and move semantics (Problem 5) collaborate naturally. When a
member function is called on an rvalue object, the object is expiring — it
will not be used again. The && overload can therefore steal from
*this using std::move(member_) without risk.
The standard library applies this broadly. std::string::operator+ has a
&& qualified overload that can reuse the storage of a temporary string on
the left-hand side instead of allocating a new buffer. std::optional::value()
returns T&& from a temporary optional, enabling move-out with no extra
copy. These are not micro-optimizations: they are the difference between
allocating and not allocating in common expressions like string concatenation.
std::string a = "Hello, ";
std::string b = std::move(a) + "world"; // rvalue + has && overload:
// no new allocation -- reuses a's buffer
Aside (CS 343 connection: ownership transfer). The
&&ref-qualifier formalizes a transfer of ownership at the language level. In CS 343, resource handoff between coroutines was managed explicitly through invariant protocols. Ref-qualifiers let the type system enforce those protocols: a method decorated with&&can only be invoked when the object is about to be abandoned, giving the method permission to plunder it.
Note. An important caveat: if you add a ref-qualifier to any overload of a member function, you must add a ref-qualifier to all overloads of that function. Mixing qualified and unqualified overloads of the same name is ill- formed. Plan out which qualifications you need before writing the first overload.
Problem 61: Attributes and Contracts
C++11 introduced a new syntax for attaching standardized annotations to
declarations: [[attribute]]. These are not comments. Comments are
invisible to the compiler; attributes participate in the grammar and the
compiler can — and sometimes must — act on them. They are a disciplined
way to communicate intent that both humans and tools can read.
[[nodiscard]]: you should not ignore this
The most practically important attribute. When a function is declared
[[nodiscard]], the compiler emits a warning if the caller discards the
return value without using it.
[[nodiscard]] bool write(const char *buf, size_t len);
[[nodiscard]] std::unique_ptr<Widget> makeWidget();
write(buf, n); // warning: ignoring return value of 'write'
makeWidget(); // warning: ignoring return value of 'makeWidget'
bool ok = write(buf, n); // no warning
auto w = makeWidget(); // no warning
This catches a class of bugs that are otherwise silent. The canonical example
is an error-code return: if you call write and ignore whether it
succeeded, you have silently swallowed a potential failure.
In C++20, you can add a message:
[[nodiscard("ignoring the error code silently swallows failures")]]
bool write(const char *buf, size_t len);
The message appears in the compiler’s warning output, directing the programmer to the reason, not just the fact.
When to use [[nodiscard]]. Put it on:
- Functions that return error codes or status values.
- Functions that return a resource handle — the returned
unique_ptris the ownership; ignoring it immediately leaks the resource. - Pure computations with no observable side effects, where calling without using the result is certainly a mistake.
Aside (CS 245 connection: Hoare triples on the call site). In CS 245, a Hoare triple {P} f {Q} specifies that after f runs, the postcondition Q holds.
[[nodiscard]]is a machine-checked postcondition on the call site: it asserts that the caller must use the returned value. It is the closest C++ gets to enforcing a Hoare triple without a separate verification tool.
[[maybe_unused]]: silence intentional unused warnings
The compiler warns about unused variables and parameters. Sometimes an unused
variable is intentional: a value computed only in a NDEBUG build, or a
parameter that a particular override does not need but the interface requires.
void logIfDebug([[maybe_unused]] const std::string &msg) {
#ifndef NDEBUG
std::cerr << msg << '\n';
#endif
// In release builds, msg is unused -- but that is intentional
}
template<typename T>
void process(T &value, [[maybe_unused]] int flags) {
// flags used only in some specializations
value.process();
}
[[maybe_unused]] suppresses the warning without removing the variable
and without resorting to the classic “cast to void” workaround
((void)flags;).
[[likely]] and [[unlikely]]: branch predictor hints
Modern processors speculatively execute branches. When a branch is almost always taken one way, hinting the compiler can improve branch-prediction accuracy and instruction cache usage.
int safeDivide(int a, int b) {
if ([[unlikely]] b == 0) return 0; // rarely zero -- cold path
return a / b; // hot path
}
void processPackets(const Packet *p, size_t n) {
for (size_t i = 0; i < n; ++i) {
if ([[likely]] p[i].isValid()) {
handle(p[i]);
}
}
}
These attributes are optimizer hints only. They never change the observable semantics of the program; they only influence code layout. Use them sparingly, and only after profiling confirms that a branch is a bottleneck.
[[deprecated]]: retiring an API
When you replace part of your API, you want to warn callers that the old interface is going away without breaking them immediately.
[[deprecated("use sendMessage(Packet) instead")]]
void send(const char *buf, size_t len);
void sendMessage(Packet p); // the new API
Any call site that uses send receives a compiler warning naming the
replacement. The old function continues to compile and link; no existing code
breaks. The deprecation gives callers time to migrate.
[[fallthrough]]: intentional switch fall-through
The compiler warns when a case falls through to the next without a
break or return, because unintentional fall-through is a common
bug. When the fall-through is intentional, annotate it:
switch (event) {
case MOUSE_DOWN:
recordPress();
[[fallthrough]]; // intentional: MOUSE_DOWN also triggers the click handler
case MOUSE_CLICK:
handleClick();
break;
case MOUSE_UP:
recordRelease();
break;
}
The attribute silences the warning and makes the intent clear to the next reader.
[[no_unique_address]]: enabling the empty base optimization on members
Problem 59 used private inheritance to get the empty base optimization for
stateless policies. In C++20, [[no_unique_address]] achieves the same
result for data members:
struct EmptyFilter {}; // stateless policy: zero bytes
template<typename Alloc, typename Filter>
struct Container {
[[no_unique_address]] Alloc alloc_; // zero bytes if Alloc is empty
[[no_unique_address]] Filter filter_; // zero bytes if Filter is empty
int *data_;
size_t size_;
};
// With two empty policies: sizeof(Container<EmptyFilter,EmptyFilter>) == sizeof(int*)+sizeof(size_t)
Without [[no_unique_address]], every member must have a distinct address
even if it is an empty type, so the compiler must pad it to at least 1 byte.
Worked example: annotating a file I/O class
class FileWriter {
public:
[[nodiscard("file may not have opened")]]
bool open(const std::string &path);
[[nodiscard("check for short write or error")]]
bool write(const void *buf, size_t len);
void close();
[[deprecated("use write(buf, len) instead; this overload copies unnecessarily")]]
bool writeString(const std::string &s);
};
void example(FileWriter &fw) {
fw.open("out.bin"); // warning: ignored return value
fw.write(buf, 4); // warning: ignored return value
bool ok = fw.open("out.bin"); // fine
if (!ok) return;
if (!fw.write(buf, 4)) { /* handle error */ }
}
Applying [[nodiscard]] to types
In C++17 and later, [[nodiscard]] can be applied not just to functions
but to entire types. If a type is marked [[nodiscard]], any function
that returns it by value produces a warning whenever the return value is
discarded, even if the function itself is not marked.
[[nodiscard]] struct ErrorCode {
int code;
bool ok() const { return code == 0; }
};
ErrorCode doSomething(); // no [[nodiscard]] on the function itself
void caller() {
doSomething(); // warning: return value of type ErrorCode should not be ignored
}
This is useful for error-code types, optional result types, and resource handles — any type where ignoring the return is almost certainly a bug. You annotate the type once and every function returning it automatically gains the warning.
Note. Attributes are not macros. They participate in the language grammar and are defined by the standard. Vendor-specific attributes exist but must be written in a vendor namespace to avoid name collisions:
[[gnu::noinline]],[[clang::warn_unused_result]],[[msvc::forceinline]]. Standard attributes (nodiscard,maybe_unused,likely, etc.) work across all conforming compilers.
Problem 62: User-Defined Literals
Problem 43 used duration literals to write 5s and 200ms instead of
std::chrono::seconds{5} and std::chrono::milliseconds{200}. These
look like magic compiler syntax, but they are not. They are user-defined
literals: a language feature that lets you attach meaning to suffix notation on
numeric and string constants. And you can create them for your own types.
The syntax: operator"" _suffix
A user-defined literal is defined as a function named operator"" followed
by the suffix. The suffix determines the function that the compiler calls when
it sees the literal in source code.
There are four forms, corresponding to the four kinds of literal:
// Integer literal: 42_km
Distance operator""_km(unsigned long long x);
// Floating-point literal: 3.14_km
Distance operator""_km(long double x);
// String literal: "Paris"_city
City operator""_city(const char *s, size_t n);
// Character literal: 'A'_enc
Encoded operator""_enc(char c);
The compiler chooses which overload to call based on the type of the literal.
Integer constants call the unsigned long long overload; floating-point
constants call the long double overload.
The underscore convention
Standard library literals use no underscore: s (string or seconds),
ms (milliseconds), i (imaginary), sv (string_view), etc.
User-defined literals must start with an underscore. This avoids
collisions: a future standard library version might define km for the same
purpose, and your code would not break because your suffix is _km.
In practice, put user-defined literals in a namespace and require a
using namespace to activate them — exactly how the standard library
does it:
namespace my_units {
inline Distance operator""_km(long double x) { return Distance{x * 1000.0}; }
inline Distance operator""_m (long double x) { return Distance{x}; }
inline Distance operator""_cm(long double x) { return Distance{x / 100.0}; }
}
using namespace my_units;
auto d = 5.0_km + 200.0_m; // type-safe arithmetic
A type-safe unit system
The real power of user-defined literals is not the pretty syntax alone — it is that they let you construct a strongly-typed value from a raw number, preventing unit mismatches.
class Distance {
double metres_;
public:
explicit Distance(double m) : metres_{m} {}
double metres() const { return metres_; }
Distance operator+(Distance o) const {
return Distance{metres_ + o.metres_};
}
bool operator<(Distance o) const { return metres_ < o.metres_; }
};
namespace units {
Distance operator""_km(long double x) { return Distance{(double)x * 1000.0}; }
Distance operator""_m (long double x) { return Distance{(double)x}; }
Distance operator""_cm(long double x) { return Distance{(double)x / 100.0}; }
}
using namespace units;
Distance marathon = 42.195_km;
Distance sprint = 100.0_m;
Distance total = marathon + sprint; // type-safe: both are Distance
// double wrong = 42.195 + 100.0; // dimensionless -- easy to misuse
The literal syntax makes the intent clear at the point of use. The compiler enforces correctness: you cannot accidentally add metres to kilograms.
The standard library literals
You have already used several:
"hello"s—std::string{"hello"}(from<string>,using namespace std::string_literals)"hello"sv—std::string_view{"hello"}(from<string_view>,using namespace std::string_view_literals)5s,200ms,1h—std::chronodurations (from<chrono>,using namespace std::chrono_literals)3.0if,4.2i—std::complex<double>(from<complex>,using namespace std::complex_literals)
Compile-time literals with consteval
In C++20, you can declare the operator consteval to force evaluation at
compile time. This is useful for validated literals: parsing and validating a
string at compile time so that ill-formed literals are a compile error, not a
runtime exception.
consteval int parsePort(const char *s, size_t n) {
int port = 0;
for (size_t i = 0; i < n; ++i) {
if (s[i] < '0' || s[i] > '9')
throw "port must be all digits"; // compile error if taken
port = port * 10 + (s[i] - '0');
}
if (port < 1 || port > 65535)
throw "port out of range";
return port;
}
consteval int operator""_port(const char *s, size_t n) {
return parsePort(s, n);
}
constexpr int http = "80"_port; // OK: 80
constexpr int bad = "99999"_port; // compile error: port out of range
The validation happens entirely at compile time. Any port number that would fail at runtime is instead a compile error.
Aside (CS 146 connection: reader macros and syntax extension). In Racket, reader macros let you extend the syntax of the language itself:
#42can be made to mean anything you define. Haskell’s Template Haskell does the same with compile-time code generation. User-defined literals in C++ are a narrower but safer version of the same idea: you can attach new meaning to literal syntax without arbitrary metaprogramming.
Worked example: money type with currency suffix
enum class Currency { USD, EUR };
class Money {
long long cents_;
Currency currency_;
public:
Money(long long cents, Currency c) : cents_{cents}, currency_{c} {}
long long cents() const { return cents_; }
Currency currency() const { return currency_; }
Money operator+(Money o) const {
if (currency_ != o.currency_) throw std::runtime_error{"currency mismatch"};
return Money{cents_ + o.cents_, currency_};
}
};
namespace money_literals {
Money operator""_usd(long double amount) {
return Money{(long long)(amount * 100 + 0.5), Currency::USD};
}
Money operator""_eur(long double amount) {
return Money{(long long)(amount * 100 + 0.5), Currency::EUR};
}
}
using namespace money_literals;
Money price = 9.99_usd;
Money tax = 0.80_usd;
Money total = price + tax; // fine: both USD
// Money bad = price + 5.0_eur; // runtime error: currency mismatch
Template literal operators
For integer literals, there is a fourth form that receives the digits as individual characters at compile time, enabling arbitrary compile-time processing:
template<char... Digits>
constexpr int operator""_bin() {
// convert binary digits to int at compile time
int result = 0;
for (char c : {Digits...}) {
result = result * 2 + (c - '0');
}
return result;
}
constexpr int x = 1010_bin; // == 10 at compile time
constexpr int y = 1111_bin; // == 15
The template receives every character of the literal as a separate template argument, before any type conversion. This makes it possible to validate or transform arbitrary numeric literals at compile time — binary, hexadecimal with digit separators, or any custom format. This form is unusual in practice but occasionally very useful for domain-specific numeric encodings.
Namespace discipline for literal operators
Because suffixes occupy a flat global namespace, user-defined literals without namespaces create collision risks. The idiomatic pattern is to define literals inside a namespace and provide an inline namespace for opt-in activation:
namespace units {
inline namespace literals {
Distance operator""_km(long double x);
Distance operator""_m (long double x);
}
}
// Users can opt in at the file level:
using namespace units::literals;
// Or import the whole units namespace, which automatically includes literals:
using namespace units;
The standard library uses exactly this pattern: std::chrono_literals,
std::string_literals, and std::string_view_literals are all inline
namespaces inside std. This is why using namespace std::literals
brings all the standard literals into scope at once, while
using namespace std::chrono_literals brings only the chrono ones.
Note. User-defined literals should be used for pure type decoration and unit conversion. They are cleanest when the operator body is a single constructor call. Avoid complex processing in a literal operator — if the conversion is non-trivial, provide a named constructor or factory function instead and use the literal only as a convenience shorthand.
Problem 63: Modules
Problem 2 built the separate-compilation model: declarations in headers,
definitions in .cc files, include guards to prevent double inclusion.
This model has served C++ for forty years — and it is terrible. It is slow,
order-dependent, and it leaks macros across translation units. C++20 fixes it.
What is wrong with #include
The preprocessor’s #include directive is textual inclusion. When you
write #include <vector>, the preprocessor pastes the entire contents of
vector into your translation unit. On a typical implementation,
#include <iostream> alone adds tens of thousands of lines to your file.
A non-trivial header tree can add hundreds of thousands. Every .cc file
in your project pays this cost independently.
Three specific problems:
Speed. The compiler re-parses those lines for every translation unit. Precompiled headers help, but they are fragile — the wrong
#definebefore the inclusion breaks the cache.Macro leakage. A macro defined in
foo.his visible in everything that includesfoo.h, directly or transitively. This is a global namespace pollution with no opt-out.Order sensitivity. Including headers in the wrong order can change behavior. A macro defined in one header can alter the meaning of a name in the next. This makes builds fragile and hard to reproduce.
The module model
A C++20 module is a named, self-contained unit that explicitly declares what it exports. The compiler processes it once and stores a precompiled interface. Importers get the exported declarations without any of the implementation, the macros, or the non-exported names.
A module interface unit declares the module and its exports:
// mylib.ixx (or mylib.cppm -- compiler-dependent extension)
export module mylib;
export void greet(const std::string &name);
export int add(int a, int b);
// internal helper -- NOT exported; invisible to importers
static void log(const std::string &msg) { /* ... */ }
A module implementation unit provides the definitions:
// mylib.cc
module mylib; // claims ownership -- no 'export' here
void greet(const std::string &name) {
log("greeting " + name);
std::cout << "Hello, " << name << '\n';
}
int add(int a, int b) { return a + b; }
A consumer simply imports:
// main.cc
import mylib;
int main() {
greet("world");
return add(1, 2);
}
No include guard. No macro leakage — any #define inside the module
implementation is invisible to the importer. The compiler processes the
module interface once and reuses the result for every importer.
What modules provide
Isolation. Macros defined in a module do not leak. Names not explicitly
exported are not visible to importers.Speed. The module interface is compiled once to a binary module interface (BMI) file. Importers read the BMI — much faster than re-parsing thousands of lines of source text.
Correct dependency tracking. The module system knows exactly what each module depends on. Build systems can rebuild only what changed.
No order sensitivity. A module’s semantics do not depend on what macros happened to be defined when it was imported.
Module partitions
Large modules can be split into partitions:
// mylib-utils.ixx
export module mylib:utils; // a partition of mylib
export std::string trim(std::string s);
// mylib.ixx
export module mylib;
export import :utils; // re-export the partition
export void greet(const std::string &name);
Partitions let you organize a module across multiple files without splitting it
into separate modules. Importers see a single import mylib;.
Header units: the compatibility bridge
You cannot rewrite every third-party header overnight. The header unit bridge lets you import legacy headers through the module syntax:
import <vector>; // treat <vector> as a header unit
import "mylegacy.h"; // treat a local header as a header unit
Header units are compiled as if they were modules. Macros still leak from header units (unlike true modules), but they gain some of the speed benefits of precompilation.
C++23: import std
C++23 defines standard library modules. Instead of including dozens of headers, you write:
import std; // the entire C++23 standard library in one import
This replaces every #include for standard library types. In exchange you
get fast compilation and no macro pollution from the standard library
implementation.
Aside (CS 241 connection: separate compilation and linking). CS 241 showed how the assembler and linker combine separately compiled object files. Modules extend that model: the BMI file is like a header that has already gone through the compiler front end. The module interface is the ABI boundary — what names are exported, what their types are — and the linker only needs to resolve the exported symbols.
Worked example: convert the Node/List from Problem 2
Problem 2 gave us a singly-linked list split across node.h,
node.cc, list.h, list.cc. Here is the module equivalent.
// list.ixx -- module interface
export module list;
import std;
export struct Node {
int data;
Node *next;
};
export class List {
Node *head = nullptr;
public:
List() = default;
~List();
void push_front(int x);
void pop_front();
bool empty() const { return head == nullptr; }
int front() const;
};
// list.cc -- module implementation
module list;
List::~List() { while (!empty()) pop_front(); }
void List::push_front(int x) {
head = new Node{x, head};
}
void List::pop_front() {
if (empty()) return;
Node *tmp = head;
head = head->next;
delete tmp;
}
int List::front() const { return head->data; }
// main.cc
import list;
int main() {
List l;
l.push_front(3); l.push_front(2); l.push_front(1);
while (!l.empty()) {
std::cout << l.front() << ' ';
l.pop_front();
}
}
No include guards, no forward-declaration headers, no risk of including
node.h before list.h.
Compiler support and adoption
Modules are fully supported in GCC 11+, Clang 16+, and MSVC 2019 (v16.8+). Build system support (CMake, Meson, Bazel) has matured significantly since 2022. As of 2024, most large projects are still header-based, but new projects and active modernization efforts are adopting modules. Expect the transition to accelerate as toolchain support matures.
Note. Modules and
#includecan coexist in the same project. Migration is incremental: start by converting your most-included internal headers to modules, and leave third-party headers as header units or plain#includes. You do not need to convert everything at once.
Problem 64: Coroutines I: Generators
Problem 8 built a custom iterator for Vector by hand: a class with
operator++, operator*, begin(), and end(). That was a
substantial amount of code to express a simple concept: produce a sequence of
values one at a time. Problem 41 gave us lazy views over existing ranges.
C++20 goes further with coroutines: a way to write an iterator as a
simple loop body, letting the language machinery handle the suspension and
resumption.
What is a coroutine?
A regular function runs from entry to return in one shot. Once it returns, it is gone; its stack frame is destroyed. A coroutine can suspend execution mid-way, yield a value to its caller, and later be resumed from exactly where it left off.
// Regular function
int makeValue() {
int x = compute();
return x; // called once, done
}
// Coroutine
std::generator<int> makeValues() {
while (true) {
int x = compute();
co_yield x; // suspend here, give x to caller; resume later
}
}
The co_yield keyword is what makes makeValues a coroutine. When
the caller asks for the next value, the coroutine runs until the next
co_yield, hands the value back, and freezes — all its local state
intact — until the caller asks again.
A function that can suspend its execution at a co_yield (or co_await)
expression, returning a value to the caller, and be resumed from that point
later. The coroutine’s local variables survive across suspensions; they are
stored in a coroutine frame on the heap.
co_yield: the generator pattern
The most intuitive use of coroutines is generating sequences. Here is an infinite Fibonacci generator:
#include <generator> // C++23: std::generator
std::generator<int> fibonacci() {
int a = 0, b = 1;
while (true) {
co_yield a;
auto c = a + b;
a = b;
b = c;
}
}
int main() {
for (int n : fibonacci() | std::views::take(10)) {
std::cout << n << ' ';
}
// prints: 0 1 1 2 3 5 8 13 21 34
}
Without coroutines, producing Fibonacci lazily requires a class with state
fields a and b, an operator++ that advances them, and an
operator* that returns the current value. The coroutine version holds
that state as ordinary local variables. The structure of the computation is
visible in the code rather than distributed across iterator methods.
Under the hood: the coroutine frame
When the compiler sees a function containing co_yield, it transforms the
function into a state machine. The local variables that must survive across
suspension points are lifted into a heap-allocated coroutine frame
(also called the coroutine state).
The state machine looks roughly like this:
// What the compiler generates (conceptually):
struct fibonacci_frame {
int a, b; // locals that survive suspension
int suspend_point; // which co_yield we last suspended at
int yielded_value;
};
Each call to operator++ on the generator’s iterator resumes the
coroutine, running it until the next co_yield. The caller reads the
yielded value; the frame stays alive. When the coroutine runs off the end (or
hits co_return), the frame is destroyed.
The heap allocation is the main cost of coroutines. In practice, compilers often eliminate it via heap elision when the coroutine’s lifetime is clearly bounded — but this is an optimization, not a guarantee.
Lazy infinite sequences
The coroutine runs only when asked. If you take only the first 10
Fibonacci numbers, the coroutine suspends after the 10th co_yield and
is then destroyed. The remaining (infinite) iterations never execute. This
is the same laziness that Haskell provides by default — sequences are
demand-driven.
Aside (CS 146 connection: lazy lists and streams). In CS 146, Racket’s streams and Haskell’s lazy lists provide infinite sequences evaluated on demand.
co_yieldis the C++ analogue ofconson a stream: produce one value now, defer the rest. The coroutine frame is the thunk: the suspended computation waiting to produce the next element.
Range integration
std::generator<T> models std::ranges::input_range. That means it
works with all the range algorithms and views from Problem 41:
auto squares = [](int n) -> std::generator<int> {
for (int i = 0; i < n; ++i) co_yield i * i;
};
auto big = squares(100)
| std::views::filter([](int x){ return x > 50; })
| std::views::take(5);
for (int x : big) std::cout << x << ' ';
// prints: 64 81 100 121 144
The pipeline is fully lazy: squares produces values only as the range
adaptor asks for them.
Worked example: lazily reading lines from a file
std::generator<std::string> readLines(const std::string &path) {
std::ifstream file{path};
std::string line;
while (std::getline(file, line)) {
co_yield line;
}
} // file is closed when the coroutine frame is destroyed
void printFirstN(const std::string &path, int n) {
for (auto &line : readLines(path) | std::views::take(n)) {
std::cout << line << '\n';
}
// file is opened lazily; once we have n lines, iteration stops
// and the coroutine frame (and therefore the file) is destroyed
}
The file is held open inside the coroutine frame and closed automatically when the frame is destroyed — whether by taking all lines, or by stopping early. This is RAII (Problem 14) applied across a suspension boundary.
Recursive generators
C++23’s std::generator supports co_yield std::ranges::elements_of(subrange),
which efficiently yields every element of a sub-generator without flattening it
manually. This enables recursive data structure traversal:
struct TreeNode {
int val;
std::unique_ptr<TreeNode> left, right;
};
std::generator<int> inorder(const TreeNode *node) {
if (!node) co_return;
co_yield std::ranges::elements_of(inorder(node->left.get()));
co_yield node->val;
co_yield std::ranges::elements_of(inorder(node->right.get()));
}
// Usage:
for (int v : inorder(root.get())) std::cout << v << ' ';
Without elements_of, you would need to manually iterate over the
sub-generator inside a loop. With it, the structure of the recursive algorithm
matches the structure of the code.
A note on compiler support
std::generator is C++23. In C++20 you have the co_yield keyword
but must provide the promise_type and resumable-handle infrastructure
yourself, or use a library (cppcoro, libcoro). Problem 65 goes into that
infrastructure. For practical use in 2024, std::generator is available
in GCC 14+, Clang 17+ (with libc++), and MSVC 2022.
Generator vs. iterator: a comparison
It is illuminating to compare the hand-written iterator approach from Problem 8 with the coroutine approach for the same task.
// Iterator-based: a lot of boilerplate
class FibIterator {
int a = 0, b = 1;
public:
using iterator_category = std::input_iterator_tag;
using value_type = int;
using difference_type = std::ptrdiff_t;
int operator*() const { return a; }
FibIterator &operator++() { auto c=a+b; a=b; b=c; return *this; }
bool operator==(std::default_sentinel_t) const { return false; }
};
// Coroutine-based: intent is clear, boilerplate is gone
std::generator<int> fibonacci() {
int a = 0, b = 1;
while (true) { co_yield a; auto c=a+b; a=b; b=c; }
}
The coroutine version is shorter and reads more naturally as “loop and yield”. The iterator version requires understanding the iterator protocol and how to encode the generator’s internal state as class members. For producing sequences, coroutines are almost always the better abstraction.
Note. Coroutines are powerful but not free. Each generator instance pays for one heap allocation (the frame), one level of indirection per resume, and the overhead of the coroutine machinery. For hot-path sequences where performance is critical, a hand-written iterator or a precomputed array is faster. Coroutines shine when the sequence logic is complex, recursive, or comes from I/O — exactly the cases where a hand-written iterator would be painful to write and maintain.
Problem 65: Coroutines II: Async Tasks
Problem 64 showed co_yield for generators: a coroutine that suspends
itself to hand a value back to the caller. The other coroutine keyword is
co_await. Where co_yield produces a value for the caller,
co_await waits for an external event — an I/O completion, a
timer, a network response — and resumes the coroutine when it is ready,
without blocking any thread.
The two keywords use the same underlying coroutine mechanism — the same
compiler transformation, the same frame allocation, the same coroutine handle
— but they address different problems. co_yield is a pull model:
the caller pulls values from the generator. co_await is a push model:
the scheduler pushes control back to the coroutine when an event arrives.
Both are expressions inside a coroutine body; both cause the coroutine to
suspend and transfer control elsewhere.
The problem with blocking I/O and thread-per-request
Problem 51 introduced std::async, which runs a task on a thread from the
thread pool and returns a std::future<T> that you can block on. This
works fine for a handful of parallel tasks, but it breaks down at scale.
Suppose you are writing a server that handles 10,000 concurrent client connections. Each connection spends most of its time waiting: waiting for the client to send data, waiting for a database query, waiting for a downstream service. With one thread per connection, you need 10,000 threads. A thread holds a stack (typically 1—8 MB), so that is 10—80 GB of stack memory, plus the OS overhead of scheduling 10,000 threads. In practice, Linux gets slow past a few hundred threads; this approach simply does not scale.
The solution: a small pool of threads, with many tasks multiplexed onto
them. A task runs until it needs to wait, then suspends and hands the thread
to another task. When the event arrives, the task is resumed on some thread.
This is the event-loop model — and co_await is how C++20 expresses it.
co_await: suspend until ready
Task<std::string> fetchPage(std::string url) {
auto response = co_await asyncHttpGet(url); // suspend, don't block thread
co_return response.body; // resume here when response arrives
}
When the coroutine hits co_await asyncHttpGet(url), it:
- Asks the awaitable object (
asyncHttpGet(url)) whether the result is already available (await_ready). - If not, suspends the coroutine and schedules resumption when the HTTP
response arrives (
await_suspend). - Returns control to the caller (the scheduler), which can run other tasks.
- When the network completes, the scheduler resumes this coroutine and the
expression
co_await ...evaluates to the response (await_resume).
No thread is blocked during the wait. The thread is free to run other tasks.
The awaitable concept
Any type can be awaited if it satisfies the awaitable concept: it must provide three methods.
struct MyAwaitable {
// Called immediately: if true, don't even suspend
bool await_ready() { return result_already_available(); }
// Called on suspension: register resumption with the scheduler
void await_suspend(std::coroutine_handle<> h) {
schedule_callback_on_completion([h]{ h.resume(); });
}
// Called on resumption: provides the value of the co_await expression
Result await_resume() { return get_result(); }
};
An async framework (Boost.Asio, cppcoro, libuv via a C++ wrapper) provides
ready-made awaitables for network I/O, timers, and file operations. You write
coroutines that co_await those awaitables; the framework’s scheduler
handles the event loop.
Task: the promise type for async coroutines
A coroutine’s behavior is controlled by a promise type. For the
Task<T> type used in the example above, the promise type must specify:
- What to do when the coroutine starts (immediately suspend? run eagerly?).
- Where to store the result (in the
Taskobject, for the caller to retrieve viaco_await task). - What to do when the coroutine finishes or throws.
Writing a correct Task<T> promise type is non-trivial. The minimum
viable implementation is around 40 lines. For production, use an existing
implementation from cppcoro or a similar library:
// With cppcoro:
cppcoro::task<int> compute() {
int a = co_await cppcoro::async_task([]{ return expensiveCompute(); });
int b = co_await cppcoro::async_task([]{ return anotherCompute(); });
co_return a + b;
}
int main() {
cppcoro::sync_wait(compute()); // run the event loop until compute() finishes
}
Why coroutines beat threads for I/O
Memory: a coroutine frame is typically a few hundred bytes (just the live locals). A thread stack is 1—8 MB. You can run millions of coroutines concurrently.
Scheduling: coroutines switch cooperatively at
co_awaitpoints. Thread switches require the OS and involve saving and restoring hundreds of registers. A coroutine resume is roughly equivalent to a function call.Composability: coroutines can
co_awaitother coroutines. Structured concurrency — a parent that waits for its children — is expressible directly. With threads, you need explicit synchronization.
Structured concurrency and co_await on multiple tasks
A parent coroutine can launch children and wait for all of them:
Task<void> parallel() {
auto t1 = fetchPage("https://a.example/");
auto t2 = fetchPage("https://b.example/");
auto [r1, r2] = co_await whenAll(std::move(t1), std::move(t2));
process(r1, r2);
}
whenAll is a utility that suspends the parent until all children complete.
Neither child blocks a thread; the event loop keeps running other work.
Aside (CS 343 connection: coroutines and stackless design). CS 343’s μC++ uses coroutines with their own private stack (stackful coroutines): each coroutine can call arbitrarily deep into helper functions and suspend at any point in the call stack. C++20 coroutines are stackless: they can only suspend at top-level
co_awaitexpressions, not inside arbitrary called functions. The tradeoff is that stackless coroutines are cheaper (the frame is small and precisely sized) but require that every suspending call be itself a coroutine. μC++’s model is more flexible; C++20’s is more efficient.
Worked example: wrapping a callback-based API
Many existing async libraries use callbacks. Here is how to wrap one in a
co_await-able type:
// Suppose we have: void asyncRead(int fd, char *buf, size_t n, Callback cb);
struct ReadAwaitable {
int fd; char *buf; size_t n; ssize_t result;
bool await_ready() { return false; } // always suspend -- I/O is never instant
void await_suspend(std::coroutine_handle<> h) {
asyncRead(fd, buf, n, [this, h](ssize_t r) mutable {
result = r;
h.resume(); // resume the coroutine on completion
});
}
ssize_t await_resume() { return result; }
};
Task<void> readFile(int fd) {
char buf[4096];
ssize_t n = co_await ReadAwaitable{fd, buf, sizeof(buf)};
process(buf, n);
}
The callback captures the coroutine handle h and calls h.resume()
when I/O completes. From the coroutine’s perspective, co_await is a
simple expression that returns the number of bytes read.
co_return: producing the final value
A coroutine that returns a single value uses co_return, not return:
Task<int> computeAsync(int x) {
int partial = co_await slowStep(x);
co_return partial * 2; // the Task<int> completes with this value
}
Task<void> useIt() {
int result = co_await computeAsync(21); // result == 42
std::cout << result << '\n';
}
co_return expr stores expr in the promise object’s result storage,
then finalizes the coroutine. The caller that co_awaits this
Task<int> receives 42 as the value of the co_await expression.
This is how coroutines form chains: each co_await extracts the
co_return’d value from the inner coroutine.
The execution model: one thread, many tasks
The event-loop model enabled by co_await typically runs on a single
thread (or a small, fixed thread pool). The scheduler maintains a ready queue
of coroutine handles waiting to resume. When I/O completes, the callback enqueues
the handle; the scheduler dequeues and calls h.resume().
This is fundamentally different from the thread-per-task model: the scheduler is in user space, context switches are just function calls, and there is no OS involvement. A single thread can drive thousands of concurrent network connections without any lock contention.
Note. The coroutine infrastructure —
Task<T>, the scheduler/executor, awaitable wrappers for I/O — is not yet fully standardized. C++23 providesstd::generator(Problem 64) but not a standard async executor orTasktype. Full networking support is still being designed. In the meantime, use a library: cppcoro (header-only, excellent for learning), Boost.Asio (battle-tested, complex), or your platform’s native async framework (Windows I/O completion ports, Linuxio_uring). Theco_awaitsyntax and the awaitable protocol are stable; the ecosystem around them is maturing.
Problem 66: State Pattern
The patterns chapter covered three behavioral patterns: Factory Method for delegating object creation, Decorator for composing behavior, and Visitor for double dispatch. Here is one we skipped that comes up constantly in UI programming, protocol parsing, and game logic: the State pattern.
Motivation: behavior that depends on mode
Consider a network connection that can be in four states: CLOSED, LISTENING,
ESTABLISHED, and TIME_WAIT. The behavior of send(), receive(),
and close() depends entirely on which state the connection is currently
in:
| State | send() | receive() | close() |
|---|---|---|---|
| CLOSED | error | error | no-op |
| LISTENING | error | ok | stop |
| ESTABLISHED | ok | ok | begin TIME_WAIT |
| TIME_WAIT | error | error | wait, then CLOSED |
The naive implementation:
class Connection {
enum State { CLOSED, LISTENING, ESTABLISHED, TIME_WAIT };
State state_ = CLOSED;
public:
void send(const Data &d) {
switch (state_) {
case ESTABLISHED: /* actually send */ break;
case CLOSED: throw ConnectionError{"not connected"}; break;
case LISTENING: throw ConnectionError{"write on listen socket"}; break;
case TIME_WAIT: throw ConnectionError{"connection closing"}; break;
}
}
void receive(Data &d) {
switch (state_) {
// ... another switch over all four states
}
}
void close() {
// ... yet another switch
}
};
Every method contains the same switch over the state. Adding a new state (say, SYN_SENT) requires reopening every method. Adding a new method requires writing yet another complete switch. The coupling grows quadratically: S states times M methods is S × M cases.
The State pattern
Encapsulate each state as a separate class that implements the object’s behavior for that state. The context object delegates every behavior method to the current state object. Transitions are performed by replacing the current state object with a new one.
The structure has three roles:
Implementation
class Connection; // forward declaration
struct ConnState {
virtual void send (Connection &, const Data &) = 0;
virtual void receive (Connection &, Data &) = 0;
virtual void close (Connection &) = 0;
virtual ~ConnState() = default;
};
class Connection {
std::unique_ptr<ConnState> state_;
public:
Connection(); // starts in Closed state
void setState(std::unique_ptr<ConnState> s) { state_ = std::move(s); }
void send(const Data &d) { state_->send(*this, d); }
void receive(Data &d) { state_->receive(*this, d); }
void close() { state_->close(*this); }
};
Each concrete state handles its own transitions by calling
ctx.setState(...):
struct Established : ConnState {
void send(Connection &, const Data &d) override {
/* actually transmit d */
}
void receive(Connection &, Data &d) override {
/* read into d */
}
void close(Connection &ctx) override {
/* initiate TCP TIME_WAIT */
ctx.setState(std::make_unique<TimeWait>());
}
};
struct Closed : ConnState {
void send (Connection &, const Data &) override {
throw ConnectionError{"not connected"};
}
void receive(Connection &, Data &) override {
throw ConnectionError{"not connected"};
}
void close(Connection &) override { /* no-op */ }
};
Adding a new state is a matter of adding one new class — the existing states
are not modified. Adding a new behavior to every state requires adding one
method to the abstract ConnState and implementing it in each concrete
state — that is still S additions, but they are localized to the state
classes rather than scattered across switch statements.
Worked example: a traffic light
class TrafficLight;
struct LightState {
virtual void advance(TrafficLight &) = 0;
virtual std::string colour() const = 0;
virtual ~LightState() = default;
};
class TrafficLight {
std::unique_ptr<LightState> state_;
public:
TrafficLight();
void setState(std::unique_ptr<LightState> s) { state_ = std::move(s); }
void advance() { state_->advance(*this); }
std::string colour() const { return state_->colour(); }
};
struct Red : LightState {
std::string colour() const override { return "RED"; }
void advance(TrafficLight &tl) override; // -> Green
};
struct Green : LightState {
std::string colour() const override { return "GREEN"; }
void advance(TrafficLight &tl) override; // -> Yellow
};
struct Yellow : LightState {
std::string colour() const override { return "YELLOW"; }
void advance(TrafficLight &tl) override; // -> Red
};
void Red::advance (TrafficLight &tl) { tl.setState(std::make_unique<Green>()); }
void Green::advance (TrafficLight &tl) { tl.setState(std::make_unique<Yellow>()); }
void Yellow::advance(TrafficLight &tl) { tl.setState(std::make_unique<Red>()); }
TrafficLight::TrafficLight() : state_{std::make_unique<Red>()} {}
Aside (CS 241 connection: finite automata). The State pattern is an object-oriented encoding of a finite automaton. CS 241 studied DFAs: a set of states, an alphabet, a transition function, and a current state. The pattern maps directly:
ConnStatesubclasses are the states, the behavior methods are the alphabet, andsetStateis the transition function. Object-oriented state machines are particularly clean when the transition logic is complex or when states carry data.
The flyweight note
If state objects carry no data (as in the traffic light example), there is no reason to allocate a fresh one on every transition. You can share a single instance per state class:
// Pre-allocate shared singletons
static Red theRed;
static Green theGreen;
static Yellow theYellow;
void Red::advance(TrafficLight &tl) {
tl.setStateRaw(&theGreen); // no allocation
}
This is the Flyweight pattern applied to states. Use unique_ptr when
states hold data that varies per context; use shared singletons when states are
purely behavioral.
Note. The State pattern shines when the number of states is moderate (3—10) and the number of behaviors is also moderate. For very large FSMs (parsers, game AI) consider a data-driven approach: a table of (current_state, event) -> (next_state, action) pairs, interpreted at runtime. The table is easier to specify, serialize, and visualize than a class hierarchy with dozens of concrete states.
Problem 67: Command Pattern
The Visitor pattern from the patterns chapter separated algorithms from data structures. The State pattern in Problem 66 separated behavior from mode. The Command pattern separates the invocation of an operation from its execution: the request becomes an object, and that object can be stored, queued, logged, or reversed.
Motivation
Think about three different requirements that all have the same shape:
- A text editor must support Ctrl-Z and Ctrl-Y — undo and redo.
- A robot arm receives movement instructions that must execute in sequence and can be replayed.
- A web server queues incoming requests and processes them on a thread pool.
In each case, an operation needs to travel: from the site where it is invoked to the site where it is executed, possibly with a detour through a history or queue. A raw function call cannot travel. A Command object can.
Encapsulate a request as an object with an execute() method. The object
carries everything needed to perform the request: the receiver, the arguments,
and (for undoable commands) the information to reverse the operation.
The Command interface
class Command {
public:
virtual void execute() = 0;
virtual void undo() = 0; // not all commands need undo, but most do
virtual ~Command() = default;
};
Concrete commands store the receiver and the arguments they were invoked with:
class InsertTextCommand : public Command {
TextBuffer &buf_;
size_t pos_;
std::string text_;
public:
InsertTextCommand(TextBuffer &buf, size_t pos, std::string text)
: buf_{buf}, pos_{pos}, text_{std::move(text)} {}
void execute() override { buf_.insert(pos_, text_); }
void undo() override { buf_.erase(pos_, text_.size()); }
};
class DeleteTextCommand : public Command {
TextBuffer &buf_;
size_t pos_;
std::string deleted_; // stored on execute() so undo() can restore it
public:
DeleteTextCommand(TextBuffer &buf, size_t pos, size_t len)
: buf_{buf}, pos_{pos} { deleted_.reserve(len); }
void execute() override {
deleted_ = buf_.substr(pos_, deleted_.capacity());
buf_.erase(pos_, deleted_.size());
}
void undo() override { buf_.insert(pos_, deleted_); }
};
The CommandHistory: the Invoker
The Invoker keeps track of executed commands and manages undo/redo. It
owns the commands via unique_ptr — the commands are resources with
lifetimes that the history controls.
class CommandHistory {
std::stack<std::unique_ptr<Command>> done_;
std::stack<std::unique_ptr<Command>> undone_;
public:
void execute(std::unique_ptr<Command> cmd) {
cmd->execute();
done_.push(std::move(cmd));
// A new command clears the redo stack
while (!undone_.empty()) undone_.pop();
}
void undo() {
if (done_.empty()) return;
auto cmd = std::move(done_.top()); done_.pop();
cmd->undo();
undone_.push(std::move(cmd));
}
void redo() {
if (undone_.empty()) return;
auto cmd = std::move(undone_.top()); undone_.pop();
cmd->execute();
done_.push(std::move(cmd));
}
};
Macro commands
A MacroCommand holds a sequence of commands and executes them as a unit.
Undo reverses them in reverse order:
class MacroCommand : public Command {
std::vector<std::unique_ptr<Command>> cmds_;
public:
void add(std::unique_ptr<Command> c) { cmds_.push_back(std::move(c)); }
void execute() override {
for (auto &c : cmds_) c->execute();
}
void undo() override {
for (auto it = cmds_.rbegin(); it != cmds_.rend(); ++it)
(*it)->undo();
}
};
A find-and-replace operation, for example, is a MacroCommand containing
one DeleteTextCommand and one InsertTextCommand for each replacement
site. Undoing the macro undoes all replacements atomically.
Commands for queuing
Commands are trivially queueable across threads. A producer pushes commands
into a std::queue<unique_ptr<Command>> (with appropriate synchronization
from Problem 49); a consumer dequeues and calls execute(). The consumer
does not need to know what type of command it is running.
// Producer thread:
queue.push(std::make_unique<RenderFrameCommand>(scene, framebuf));
// Consumer (render thread):
while (auto cmd = queue.pop()) {
cmd->execute();
}
UML structure
Aside (CS 247 connection: the open/closed principle). The Command pattern is a textbook application of OCP. Adding a new operation (say,
SetFontCommand) means adding one new class. The invoker (CommandHistory), the existing commands, and the receiver (TextBuffer) are all closed for modification. The new command is open: it can do anything itsexecuteandundoimplementations specify.
Worked example: drawing application with undo
class Canvas { /* stores a list of strokes */ };
class AddStrokeCommand : public Command {
Canvas &canvas_;
Stroke stroke_;
public:
AddStrokeCommand(Canvas &c, Stroke s)
: canvas_{c}, stroke_{std::move(s)} {}
void execute() override { canvas_.addStroke(stroke_); }
void undo() override { canvas_.removeLastStroke(); }
};
// In the UI event handler:
void onMouseRelease(Canvas &canvas, CommandHistory &hist, Stroke s) {
hist.execute(std::make_unique<AddStrokeCommand>(canvas, std::move(s)));
}
Logging and serialization
Because a Command is an object, it can do more than execute — it can describe
itself. Adding a pure virtual describe() method to the interface lets an
audit log record every action:
class Command {
public:
virtual void execute() = 0;
virtual void undo() = 0;
virtual std::string describe() const = 0; // human-readable description
virtual ~Command() = default;
};
class InsertTextCommand : public Command {
/* ... */
std::string describe() const override {
return "InsertText at " + std::to_string(pos_) + ": \"" + text_ + "\"";
}
};
The command history can then write an audit trail:
void CommandHistory::execute(std::unique_ptr<Command> cmd) {
std::clog << "[CMD] " << cmd->describe() << '\n';
cmd->execute();
done_.push(std::move(cmd));
}
Extending the Command interface to include serialization (toJson(),
toBinary()) enables persistent undo history that survives application
restarts — every action is saved to disk as a command object and replayed on
reload.
Note. For small commands that do not need undo, a lambda (or any callable) is often simpler than a full Command class. A
std::function<void()>stored in a queue is a lightweight command. Use the full class hierarchy when you need undo state, serialization, or the ability to compose commands into macros.
Problem 68: Mediator and Chain of Responsibility
Problem 29 used the Observer pattern for one-to-many notification: one publisher fires an event, many subscribers receive it. Sometimes the communication structure is more complex. We end this survey of design patterns with two more GoF patterns for routing messages.
Part I: Mediator — many-to-many communication
The problem
An airport control system has many aircraft, and each aircraft must coordinate with others: maintain separation, queue for runways, coordinate approaches. If every aircraft communicates directly with every other, each aircraft needs a reference to every other one. With n aircraft, that is O(n²) communication links. Adding a new aircraft means updating every existing one.
The solution: centralize through a coordinator
The Mediator pattern introduces a central coordinator object. Participants communicate with the mediator only; the mediator routes messages to the right recipients. The participant-to-participant graph collapses to a participant-to-mediator star: O(n) links instead of O(n²).
class Component;
class Mediator {
public:
virtual void notify(Component *sender, const std::string &event) = 0;
virtual ~Mediator() = default;
};
class Component {
protected:
Mediator *mediator_ = nullptr;
public:
void setMediator(Mediator *m) { mediator_ = m; }
};
class Aircraft : public Component {
std::string id_;
public:
explicit Aircraft(std::string id) : id_{std::move(id)} {}
void requestLanding() {
std::cout << id_ << " requests landing\n";
mediator_->notify(this, "landing_request");
}
void clearToLand() { std::cout << id_ << " cleared to land\n"; }
const std::string &id() const { return id_; }
};
class ControlTower : public Mediator {
std::vector<Aircraft *> aircraft_;
Aircraft *runway_ = nullptr;
public:
void addAircraft(Aircraft *a) {
aircraft_.push_back(a);
a->setMediator(this);
}
void notify(Component *sender, const std::string &event) override {
if (event == "landing_request") {
auto *plane = static_cast<Aircraft *>(sender);
if (!runway_) { runway_ = plane; plane->clearToLand(); }
else { std::cout << plane->id() << " holds: runway busy\n"; }
} else if (event == "landed") {
runway_ = nullptr;
}
}
};
No Aircraft ever holds a reference to another Aircraft. All
coordination happens in ControlTower::notify. Adding a new aircraft type
(helicopter, drone) requires no changes to the existing participants.
Define an object that encapsulates how a set of objects interact. Participants hold a reference to the mediator, not to each other. The mediator centralizes the coordination logic, reducing the coupling between participants from O(n²) to O(n).
When to use
Use the Mediator when many objects communicate in complex ways and you find
yourself adding cross-references between them. Also useful for GUI form logic:
a form has many widgets (text boxes, checkboxes, buttons) that must react to
each other. A FormMediator that handles all widget interactions is
cleaner than giving every widget a reference to every other.
Part II: Chain of Responsibility — pass until handled
The problem
An HTTP server receives a request. Before it reaches the business logic, the request must pass through: authentication (is the user logged in?), rate limiting (too many requests?), logging (record every access), and finally the handler. Any step can reject the request and send an error response without passing it further.
The solution: a chain of handlers
Assemble a chain of handler objects. Each handler either handles the request or passes it to the next handler in the chain. The sender does not know which handler will process its request.
struct Request {
std::string user;
std::string path;
bool authenticated = false;
};
class Handler {
Handler *next_ = nullptr;
public:
Handler &setNext(Handler &h) { next_ = &h; return h; }
virtual bool handle(Request &req) {
if (next_) return next_->handle(req);
return false;
}
virtual ~Handler() = default;
};
class AuthHandler : public Handler {
public:
bool handle(Request &req) override {
if (req.user.empty()) { std::cout << "401\n"; return true; }
req.authenticated = true;
return Handler::handle(req);
}
};
class RateLimiter : public Handler {
std::unordered_map<std::string, int> hits_;
public:
bool handle(Request &req) override {
if (++hits_[req.user] > 100) {
std::cout << "429 Too Many Requests\n"; return true;
}
return Handler::handle(req);
}
};
class LoggingHandler : public Handler {
public:
bool handle(Request &req) override {
std::cout << "LOG: " << req.user << " -> " << req.path << '\n';
return Handler::handle(req);
}
};
Assembling the chain:
AuthHandler auth;
RateLimiter limiter;
LoggingHandler logger;
auth.setNext(limiter).setNext(logger);
Request req{"alice", "/api/data"};
auth.handle(req); // auth -> limiter -> logger -> done
Modern C++ variant: function chain
A more idiomatic modern version uses std::function:
using Middleware = std::function<bool(Request &, std::function<bool(Request &)>)>;
Middleware logging = [](Request &req, auto next) {
std::cout << "LOG: " << req.path << '\n';
return next(req);
};
Middleware auth = [](Request &req, auto next) {
if (req.user.empty()) { std::cout << "401\n"; return false; }
return next(req);
};
Each middleware takes the request and a next callable; it can call
next to pass control, or return early to stop the chain. The composition
order is explicit and the chain can be assembled at runtime by wrapping
functions.
Aside (CS 348 connection: middleware and query processing). Database query processing follows the same pattern. A query passes through a chain of operators: parse, optimize, validate permissions, cache lookup, execute, format results. Each stage either transforms the query and passes it on, or returns an error. The Chain of Responsibility is the structural backbone of every query pipeline.
Comparing the three routing patterns
| Pattern | Publishers | Handlers |
|---|---|---|
| Observer | one | many (all receive) |
| Mediator | many | one coordinator |
| Chain of Resp. | one | one (first to claim it) |
Observer broadcasts. Mediator centralizes. Chain of Responsibility passes until someone claims the request. Knowing which to reach for depends on the cardinality and ownership of the communication.
Note. The Chain of Responsibility can be combined with the Command pattern: each “request” travelling the chain is a Command object, and rejected commands can be logged or queued for retry. Design patterns compose; knowing several gives you a vocabulary for building complex routing structures from simple pieces.
Problem 69: Sanity Tools
Throughout this course we have accumulated a taxonomy of ways things can go
wrong: undefined behaviour from out-of-bounds array access (Problem 9), memory
leaks from exceptions thrown before a delete (Problem 14), and data races
from unsynchronized thread access (Problems 49—55). We have learned how to
avoid these problems by design. But design is not enough. Tests, sanitizers,
and fuzzing catch what careful design misses.
Sanitizers: runtime error detectors
Sanitizers are compiler transformations that instrument your code to detect errors at the moment they occur, rather than at the moment they cause a crash — which may be much later and in completely unrelated code.
AddressSanitizer (ASan)
ASan detects memory errors: use-after-free, heap buffer overflow, stack buffer overflow, use of uninitialized stack memory, and double-free.
// bug.cc
int main() {
int *arr = new int[5];
arr[7] = 42; // heap buffer overflow: index 7 is out of bounds
delete[] arr;
}
Compile and run:
g++ -fsanitize=address -g bug.cc -o bug
./bug
Output (abbreviated):
==12345==ERROR: AddressSanitizer: heap-buffer-overflow
WRITE of size 4 at 0x602000000030 thread T0
#0 main bug.cc:3
ASan reports the exact file, line, and type of violation. Without it, this program silently corrupts heap metadata and crashes unpredictably later.
ASan has approximately a 2× slowdown and 2× memory overhead. Use it in all debug and CI builds.
UndefinedBehaviorSanitizer (UBSan)
UBSan detects a broad class of undefined behaviour: signed integer overflow, null pointer dereference, invalid shifts, misaligned memory access, division by zero, and many others.
int x = INT_MAX;
x += 1; // signed overflow -- undefined behaviour
g++ -fsanitize=undefined -g ub.cc -o ub && ./ub
# ub.cc:2: runtime error: signed integer overflow: 2147483647 + 1
# cannot be represented in type 'int'
UBSan is cheap — often under a 1.5× slowdown — and should be enabled in all debug builds alongside ASan. The two sanitizers combine cleanly:
g++ -fsanitize=address,undefined -g mycode.cc -o mycode
ThreadSanitizer (TSan)
TSan detects data races: concurrent read and write (or write and write) to the same memory location from different threads without synchronization.
int counter = 0;
std::thread t1([] { ++counter; }); // data race on counter!
std::thread t2([] { ++counter; });
t1.join(); t2.join();
g++ -fsanitize=thread -g race.cc -o race && ./race
# WARNING: ThreadSanitizer: data race
# Write of size 4 at 0x... by thread T1: #0 main+0x...
# Previous write of size 4 at 0x... by thread T2
Note. ASan and TSan cannot be combined — they both intercept memory allocation and would conflict. Run TSan in a separate build from ASan+UBSan.
Unit testing with gtest
Sanitizers find how things go wrong at runtime. Unit tests specify what the code is supposed to do. Google Test (gtest) is the most widely used C++ testing library:
#include <gtest/gtest.h>
#include "vector.h"
TEST(VectorTest, PushBackGrowsSize) {
Vector<int> v;
v.push_back(1);
v.push_back(2);
EXPECT_EQ(v.size(), 2u);
}
TEST(VectorTest, ElementsAccessible) {
Vector<int> v;
v.push_back(42);
EXPECT_EQ(v[0], 42);
}
TEST(VectorTest, ThrowsOnOutOfBounds) {
Vector<int> v;
EXPECT_THROW(v.at(0), std::out_of_range);
}
Each TEST macro defines an independent test case. EXPECT_EQ,
EXPECT_THROW, and their siblings emit a failure message and continue
running; ASSERT_* variants abort the test on failure. A test binary
runs all registered tests and prints a summary.
The key workflow: run the tests, check sanitizers, fix failures, repeat. Catching a bug in a unit test is far cheaper than catching it in production.
Catch2: a lightweight alternative
Catch2 is a single-header alternative to gtest that uses a more expressive style:
#include <catch2/catch.hpp>
#include "vector.h"
TEST_CASE("Vector push_back", "[Vector]") {
Vector<int> v;
REQUIRE(v.size() == 0);
v.push_back(1);
REQUIRE(v.size() == 1);
CHECK(v[0] == 1);
}
Both frameworks integrate with CMake’s ctest runner and with CI systems.
Fuzzing: automated input generation
Unit tests require you to think of the cases in advance. Fuzzing does not. A fuzzer generates random, malformed, or mutated input and feeds it to your code, letting the sanitizers catch any crash or undefined behaviour it triggers.
The C++ fuzzing standard is libFuzzer, integrated into Clang and GCC. You provide a single function:
// fuzz_target.cc
#include "my_parser.h"
extern "C" int LLVMFuzzerTestOneInput(const uint8_t *data, size_t size) {
try {
MyParser::parse(data, size);
} catch (const ParseError &) {
// expected -- not a bug
}
return 0;
}
Compile with -fsanitize=fuzzer,address and run. The fuzzer starts with
small random inputs and mutates ones that exercise new code paths, guided by
coverage instrumentation. A parser that has survived a billion fuzz inputs is
much more trustworthy than one with a dozen hand-written test cases.
Aside (CS 447 connection: testing and verification). Fuzzing is automated random testing — a cousin of the random testing techniques in CS 447. The key insight, the same as in property-based testing, is that manually written test cases can only test what the author imagined. Randomized testing explores the space the author did not imagine. Sanitizers turn “did it crash?” into “did it have any undefined behaviour?”, making fuzzing a much stronger correctness check.
The recommended workflow
- Write code.
- Write unit tests that specify the expected behavior.
- Compile with
-fsanitize=address,undefined -gand run the tests. Fix every sanitizer report — they are all real bugs. - In a separate build, run with
-fsanitize=threadif your code is concurrent. - For security-sensitive parsers or protocol handlers, add a fuzz target and run it for at least a few hours.
- Only disable sanitizers for performance-measurement or release builds.
This is not a testing course. But the tools above are simple enough to adopt
immediately: -fsanitize=address,undefined is a one-line change to your
build command. Every C++ developer should have sanitizers enabled by default
in their debug build.
And with that, this extension comes to a close. The 69 problems in these notes have traced a path from raw arrays and manual memory management through templates, smart pointers, move semantics, concurrency, ranges, coroutines, and the full suite of design patterns. C++ is a large and demanding language, but its demands are principled: every complexity is there to give you control that no other mainstream language provides.
The rest is practice.