CO 456: Game Theory

Martin Pei

Estimated study time: 1 hr 13 min

Table of contents

Based on lecture notes by Sibelius Peng — PDF source

Sources and References

Primary textbooks — Osborne, M.J. An Introduction to Game Theory (Oxford University Press, 2004); Roughgarden, T. Twenty Lectures on Algorithmic Game Theory (Cambridge University Press, 2016).

Supplementary texts — Nisan, N., Roughgarden, T., Tardos, É., and Vazirani, V. (eds.), Algorithmic Game Theory (Cambridge University Press, 2007); Karlin, A. and Peres, Y., Game Theory, Alive (American Mathematical Society, 2017); Berlekamp, E., Conway, J.H., and Guy, R., Winning Ways for Your Mathematical Plays (A K Peters, 2001).

Online resources — Roughgarden, T., CS364A: Algorithmic Game Theory, Stanford University, Fall 2013, https://timroughgarden.org/f13/f13.html

Lecture notes — Peng, S. CO 456 Lecture Notes (Fall 2020), https://pdf.sibeliusp.com/1209/co456.pdf

Chapter 1: Combinatorial Games

Impartial Games and the Structure of Nim

The theory of combinatorial games asks a precise and beautiful question: given a two-player game with no hidden information and no element of chance, which player has a winning strategy? This question, deceptively elementary in its formulation, turns out to be surprisingly deep. The answer connects XOR arithmetic, inductive proofs, and an elegant structural theorem due to Roland Sprague and Patrick Michael Grundy.

The canonical starting point is the Game of Nim. We are given a collection of piles of chips. Two players alternate turns; on each turn, a player removes at least one chip from a single pile. The player who cannot move — because all piles are empty — loses. With two piles of equal size, the second player simply mirrors every move of the first on the opposite pile, guaranteeing a win. With two piles of unequal size, say 5 and 7, the first player evens them by removing 2 from the larger pile, becoming the second player’s position.

Lemma 1.1. In a two-pile Nim game with pile sizes $m$ and $n$, the first player has a winning strategy if and only if $m \neq n$.

This two-pile result is easy to handle directly. The general multi-pile case requires a more structural approach.

An impartial game is a two-player, zero-sum, perfect-information game satisfying the following axioms: there are exactly two players; there is a set of positions with a designated starting position; on each turn the player chooses from the available moves, which depend only on the current position and not on which player is moving (this is the impartial condition); players alternate; there is complete information; there are no chance elements; the player with no available move loses; and the rules guarantee that every play terminates. Nim satisfies all of these. Other examples include the subtraction game (one pile, each move removes 1, 2, or 3 chips), the rook game (a rook on an $m \times n$ board can only move left or upward, and the player who cannot move loses), and Green Hackenbush (a graph attached to a floor where moves delete edges and any component disconnected from the floor vanishes). What Nim does not capture alone, as we will see, is arbitrary combinations of these games — but the Sprague-Grundy theorem will.

Lemma 1.2. In any impartial game, exactly one of the two players has a winning strategy.

The proof is a clean induction on the simplicity of the game (a game $H$ is simpler than $G$ if $H$ is reachable from $G$ in finitely many moves). If the current position has no legal moves, player I loses immediately, so player II wins. Otherwise, if player I can reach some option where she wins (by induction), she moves there. If no such option exists, then for every option available to player I, player II wins by induction. This dichotomy is exhaustive: every impartial game is either a winning game (player I has a winning strategy) or a losing game (player II has a winning strategy).

Game Sums and Equivalence

Two impartial games $G$ and $H$ can be combined into a single game. The game sum $G + H$ consists of both games played simultaneously: on each turn, a player picks one of the two component games and makes a legal move there. The game ends when neither component has any legal move. Nim with piles $(a, b, c)$ is the game sum $\ast a + \ast b + \ast c$, where $\ast n$ denotes a single-pile Nim game of size $n$.

Game sums form an abelian monoid: closure, associativity, commutativity all hold, and the identity element is $\ast 0$, the game with no legal moves (since adding a position with no options to any game $G$ leaves $G + \ast 0 = G$ unchanged).

What do we mean when we say two games are “the same” for strategic purposes? Two games $G$ and $H$ are equivalent, written $G \equiv H$, if for every game $J$, the game $G + J$ has the same outcome as $H + J$ — both are winning games or both are losing games.

The most striking structural fact about equivalence is the copycat principle.

Lemma 1.8 (Copycat Principle). For any impartial game $G$, we have $G + G \equiv \ast 0$.

The proof is inductive. If both components are identical, whatever move player I makes in one copy (say reaching $G'$), player II makes the exact same move in the other copy, arriving at $G' + G'$. By induction, this is a losing game for the next player to move — which is player I. So $G + G$ is always a losing game, hence $G + G \equiv \ast 0$.

This has a striking corollary: every game is its own inverse in the world of game equivalence, and $G \equiv H$ if and only if $G + H \equiv \ast 0$. The equivalence classes of impartial games under $\equiv$ form a group under the sum operation — a group where every element has order 2.

Nimbers and the XOR Formula

Given a game $G$, we say $n$ is the nimber (or Grundy value) of $G$ if $G \equiv \ast n$. The nimber of $\ast n$ is $n$ itself. Any losing game has nimber 0 by Theorem 1.6: $G$ is a losing game if and only if $G \equiv \ast 0$.

For a multi-pile Nim game, the nimber of the sum is computed via a beautiful combinatorial operation.

Corollary 1.12. $\ast b_1 + \ast b_2 + \cdots + \ast b_n \equiv \ast(b_1 \oplus b_2 \oplus \cdots \oplus b_n)$, where $\oplus$ denotes the bitwise XOR operation.

That is, the nimber of a multi-pile Nim game is the XOR of all the pile sizes. To compute $n_1 \oplus n_2$, write both numbers in binary and add without carrying. For example, $11 = 1011_2$ and $13 = 1101_2$, so $11 \oplus 13 = 0110_2 = 6$. The game $\ast 11 + \ast 13$ is therefore equivalent to $\ast 6$, a winning game for player I.

The deeper fact behind this formula — Theorem 1.11 — says that a Nim pile of size $n = 2^{a_1} + 2^{a_2} + \cdots$ (binary expansion) decomposes as $\ast n \equiv \ast 2^{a_1} + \ast 2^{a_2} + \cdots$. The proof proceeds by showing that the two games have equivalent option sets (Lemma 1.10), using Lemma 1.14 (which proves that XOR of numbers below $2^a$ stays below $2^a$) as the key inductive ingredient.

Finding winning moves. Given a Nim position $\ast b_1 + \cdots + \ast b_n$ with nimber $s > 0$, the winning move is to find some pile $b_i$ such that $b_i \oplus s < b_i$ (Lemma 1.13 guarantees such a pile exists — specifically, any pile containing the highest set bit of $s$), then remove $b_i - (b_i \oplus s)$ chips from that pile, leaving a position with nimber 0.

The Sprague-Grundy Theorem

The minimum excluded integer (mex) of a set $S \subseteq \mathbb{N}_0$ is $\operatorname{mex}(S) = \min(\mathbb{N}_0 \setminus S)$, the smallest non-negative integer not in $S$. The function mex is the bridge between general impartial games and Nim games.

Theorem 1.15. Let $G$ be an impartial game, and let $S$ be the set of nimbers of all options of $G$. Then $G \equiv \ast(\operatorname{mex}(S))$.

The proof shows that $G + \ast(\operatorname{mex}(S)) \equiv \ast 0$ by verifying that any move player I makes leads to a position that player II can answer with a move to a losing game. This is the essential recursive structure: nimbers propagate backward through the game tree using mex.

Theorem 1.16 (Sprague-Grundy Theorem). Every impartial game is equivalent to a single-pile Nim game $\ast n$ for some $n \geq 0$.

The proof follows immediately from induction and Theorem 1.15: compute the nimbers of the options recursively, then apply mex. This theorem is the central result of combinatorial game theory: no matter how complicated the game, no matter how many moves each player can make, the entire game is equivalent to a single pile of chips. The rich structure of subtraction games, rook games, and Hackenbush graphs all collapse to a single integer.

To find the nimber of a game recursively, assign 0 to all positions with no moves. Then work backward: the nimber of a position is the mex of the nimbers of all positions reachable from it. For the subtraction game (remove 1, 2, or 3 chips), this produces the pattern $0, 1, 2, 3, 0, 1, 2, 3, \ldots$ — the nimber of a position with $n$ chips is $n \mod 4$. The game is losing (nimber 0) precisely when $n \equiv 0 \pmod{4}$. For combined games, the nimber of the sum is the XOR of the component nimbers, so strategy for arbitrary combinations of impartial games reduces to computing a handful of nimbers and a single XOR.

Chapter 2: Strategic Games

Nash Equilibrium: Stability Under Self-Interest

Combinatorial games assume sequential play in a game with perfect information. Strategic games — sometimes called normal-form games — model a very different kind of interaction: players choose simultaneously, each knowing the rules but not what the others will do, and each acting purely to maximize their own payoff.

A strategic game is defined by a finite set of players $N = \{1, \ldots, n\}$, a strategy set $S_i$ for each player, and a utility function $u_i : S_1 \times \cdots \times S_n \to \mathbb{R}$ describing the payoff player $i$ receives given the joint choices of all players. A strategy profile is a vector $s = (s_1, \ldots, s_n) \in S_1 \times \cdots \times S_n$. Players are assumed to be rational and self-interested, seeking to maximize their own utility.

The Prisoner’s Dilemma illustrates the core tension. Two suspects can each either cooperate (“share”) or defect (“steal”). Regardless of what the other player does, each player strictly prefers to defect: if the other shares, stealing gains more; if the other steals, stealing loses less. Both defecting is the rational individual outcome, even though mutual cooperation would leave both better off. The social optimum and the individually rational outcome diverge — a fundamental theme throughout game theory.

The right solution concept for strategic games is the Nash equilibrium, named for John Nash who proved its existence in 1951. A strategy profile $s^* \in S$ is a Nash equilibrium if no player can strictly improve their payoff by unilaterally deviating: $u_i(s^*) \geq u_i(s_i', s^*_{-i})$ for all $s_i' \in S_i$ and all $i \in N$. Here $s^*_{-i}$ denotes the strategies of all players except $i$, which are held fixed. A Nash equilibrium is a stable strategy profile: once reached, no individual has an incentive to deviate.

The Bach or Stravinsky game shows that a game can have multiple Nash equilibria. Both (Bach, Bach) and (Stravinsky, Stravinsky) are Nash equilibria — in each, neither player wants to deviate and miss the concert with the other. This multiplicity raises the question of which equilibrium will actually be played, a problem Nash’s theorem does not settle.

Best Response Functions

\[ B_i(s_{-i}) = \bigl\{ s_i' \in S_i : u_i(s_i', s_{-i}) \geq u_i(s_i, s_{-i}) \text{ for all } s_i \in S_i \bigr\}. \]

Lemma 2.1. A strategy profile $s^* \in S$ is a Nash equilibrium if and only if $s_i^* \in B_i(s^*_{-i})$ for every $i \in N$.

This lemma converts the Nash equilibrium condition into a fixed-point condition: every player must be playing a best response to the others. To find Nash equilibria of finite games, one can iterate over the strategy sets, mark best responses, and look for profiles where all players are simultaneously best responding.

Cournot’s Oligopoly Model

\[ u_i(q) = q_i \cdot P(q) - c q_i = q_i \bigl(\alpha - c - \textstyle\sum_{j \neq i} q_j - q_i\bigr) \]

when the price is positive. This is a quadratic in $q_i$, maximized by the best response $B_i(q_{-i}) = (\alpha - c - \sum_{j \neq i} q_j)/2$ when the denominator is positive.

In the two-firm case, the symmetric Nash equilibrium satisfies $q_1^* = q_2^* = (\alpha - c)/3$. As the number of firms grows, each firm’s output shrinks but the aggregate supply $\sum_j q_j^* = n(\alpha - c)/(n+1)$ rises, pushing the market price $P(q^*) \to c$ as $n \to \infty$. Competition drives price toward marginal cost — a result familiar from microeconomics, here derived from game-theoretic first principles. Collusion (the two firms acting as a monopoly) would yield higher per-firm profits $(\alpha - c)^2/8$ versus the Nash equilibrium’s $(\alpha - c)^2/9$, but truthful coordination is not incentive-compatible.

Dominance and Iterated Elimination

Strategy $s_i^{(1)}$ strictly dominates $s_i^{(2)}$ if $u_i(s_i^{(1)}, s_{-i}) > u_i(s_i^{(2)}, s_{-i})$ for every opponent strategy profile $s_{-i}$. A strictly dominated strategy will never appear in any Nash equilibrium (Lemma 2.3), since a rational player always prefers the dominating strategy. When a strictly dominating strategy exists, the game is easy to analyze — the unique Nash equilibrium has every player choosing their dominating strategy.

Iterated Elimination of Strictly Dominated Strategies (IESDS) leverages this observation repeatedly. After removing a strictly dominated strategy, other strategies that were not previously dominated may become so. One keeps eliminating until no more eliminations are possible.

Theorem 2.4. If IESDS reduces the game to a single strategy profile $s^*$, then $s^*$ is the unique Nash equilibrium.

The facility location game (two firms choosing among towns arranged on a highway) illustrates this beautifully: extreme locations are strictly dominated, their elimination makes new extremes dominated, and iterated elimination converges to the unique equilibrium at the center — the two firms competing for the median voter.

Weak dominance relaxes strict dominance: $s_i^{(1)}$ weakly dominates $s_i^{(2)}$ if $u_i(s_i^{(1)}, s_{-i}) \geq u_i(s_i^{(2)}, s_{-i})$ for all $s_{-i}$, with strict inequality for at least one. Iterated elimination of weakly dominated strategies (IEWDS) can also identify Nash equilibria, but is less powerful: different elimination orders may yield different surviving profiles, and the result is a Nash equilibrium but not necessarily unique.

Second-Price Auctions

A stark application of strategic game analysis is the second-price auction (Vickrey auction). A seller has one item; $n$ buyers simultaneously submit bids; the highest bidder wins and pays the second-highest bid (not their own). Counterintuitively, bidding your true valuation is the right strategy regardless of what others do.

Theorem 2.8. In the second-price auction, bidding one’s true valuation $v_i$ is a weakly dominant strategy for each player $i$.

The proof analyzes two cases. If $v_i$ is a winning bid, any deviation that preserves winning produces the same payment, while a deviation that causes a loss yields utility 0, which is no better. If $v_i$ is a losing bid, any deviation that keeps losing produces utility 0, while a deviation that turns the bid into a winning bid yields negative utility (since the winner pays at least $v_i$, the highest competing bid exceeding $v_i$). In both cases, truthful bidding weakly dominates all alternatives — and the strategy is strict over specific competitor bid profiles.

The second-price auction makes strategic play trivially easy: you do not need to know your competitors’ valuations or beliefs. Simply bid what the item is worth to you. This dominant strategy incentive compatibility property — that truthfulness is optimal regardless of others’ actions — becomes a central design criterion in Chapter 4.

Mixed Strategies and Nash’s Theorem

Many games — including Rock-Paper-Scissors and Matching Pennies — have no Nash equilibrium in pure strategies. Extending the strategy space to probability distributions over pure strategies resolves this.

\[ u_i(x) = \sum_{s \in S_1 \times \cdots \times S_n} u_i(s) \prod_{j=1}^n x_j^{s_j}. \]

A mixed Nash equilibrium is a profile $\bar{x} \in \Delta$ such that $u_i(\bar{x}) \geq u_i(x_i, \bar{x}_{-i})$ for all $x_i \in \Delta_i$ and all $i$. The best response characterization of Lemma 2.1 extends: $\bar{x}$ is a Nash equilibrium if and only if each $\bar{x}_i$ is a best response to $\bar{x}_{-i}$.

A critical tool for computing mixed equilibria is the Support Characterization Theorem (Theorem 2.10), derived from LP duality applied to the best response problem. Given $\bar{x}_{-i}$, the best mixed response for player $i$ maximizes $u_i(x_i, \bar{x}_{-i})$ over $\Delta_i$, a linear program in $x_i$. By duality, the maximum expected utility from any mixed strategy equals the maximum expected utility from any pure strategy. Moreover, complementary slackness gives a clean characterization: $x_i^s > 0$ only if $s$ achieves the maximum utility against $\bar{x}_{-i}$. In other words, a player randomizes only over strategies that are simultaneously best responses. To find a mixed Nash equilibrium, one looks for strategy supports where all strategies in each player’s support have equal expected utility.

The Voter Participation Game

The Downs paradox notes that the probability one vote changes an election’s outcome is negligible, yet the cost of voting is positive — so rational voters should abstain. Yet people vote in large numbers. Game theory offers a resolution through equilibrium analysis.

In a symmetric model with $a = b$ supporters for each of two candidates (with $a \geq 2$), voting costs $c \in (0, 1)$, and payoffs 2, 1, 0 for win, tie, loss: the unique symmetric pure Nash equilibrium (when $a = b$) is for everyone to vote, because any supporter who abstains while others vote can switch to voting and gain (moving from certain loss to near-tie). In a more realistic asymmetric model where party A has $a > b$ supporters, a mixed Nash equilibrium arises where B supporters each vote with probability $p = c^{1/(b-1)}$, chosen so that each B supporter is indifferent between voting and abstaining. Crucially, as the cost $c$ of voting increases, the probability $p$ increases — more people vote when voting is more costly, because higher costs eliminate all but the most committed voters from the pool, making each remaining voter more pivotal. This counterintuitive prediction is a signature of equilibrium reasoning.

Two-Player Zero-Sum Games and LP Duality

A strategic game is zero-sum if $\sum_i u_i(s) = 0$ for every strategy profile $s$. In the two-player case with payoff matrix $A \in \mathbb{R}^{m \times n}$ (so player I gets $A_{ij}$ and player II gets $-A_{ij}$ in profile $(i,j)$), finding the equilibrium decomposes into two linear programs.

\[ \max_{x_1 \geq 0,\, \sum x_1^i = 1}\, \min_{j} (x_1)^T A_{\cdot j}. \]\[ \min_{x_2 \geq 0,\, \sum x_2^j = 1}\, \max_{i} A_{i \cdot} x_2. \]

Both optimization problems are converted to linear programs, and these LPs turn out to be duals of each other. By LP duality, both have optimal solutions with equal objective values — the same equilibrium payoff — and the optimal mixed strategies form a Nash equilibrium. This proves:

Theorem 2.11. Every finite two-player zero-sum game has a mixed Nash equilibrium, computable in polynomial time via linear programming.

The result extends von Neumann’s minimax theorem (1928): $\max_{x_1} \min_j (x_1)^T A_{\cdot j} = \min_{x_2} \max_i A_{i \cdot} x_2$. The two-player zero-sum case is thus very tractable. In sharp contrast, computing Nash equilibria in general-sum two-player games is PPAD-complete — a hardness result proven only decades later.

Nash’s Theorem and Brouwer’s Fixed Point

Theorem 2.12 (Nash, 1951). Every finite strategic game has at least one mixed Nash equilibrium.

The proof uses Brouwer’s fixed-point theorem, one of the most famous results in topology. Brouwer’s theorem states that any continuous function $f : X \to X$ from a compact convex set $X$ in finite-dimensional Euclidean space to itself has a fixed point $x_0$ with $f(x_0) = x_0$.

\[ f(x)_i^s = \frac{x_i^s + \Phi_i^s(x)}{1 + \sum_{s' \in S_i} \Phi_i^{s'}(x)}. \]

The function $f$ shifts probability toward strategies that strictly improve utility, normalized to stay in the simplex. It is continuous (since $\Phi$ is continuous) and maps $\Delta$ to itself. By Brouwer’s theorem, there exists a fixed point $\hat{x}$ with $f(\hat{x}) = \hat{x}$. At a fixed point, the denominator $1 + \sum_{s'} \Phi_i^{s'}(\hat{x}) = 1$, which forces $\Phi_i^{s'}(\hat{x}) = 0$ for all $i, s'$. This means no player can strictly benefit from switching to any pure strategy, so $\hat{x}$ is a Nash equilibrium. The proof is non-constructive: it guarantees existence without providing an algorithm to find the equilibrium, mirroring the challenge of computing Nash equilibria efficiently.

Chapter 3: Cooperative Games

Coalition Formation and Characteristic Functions

Strategic games assume players act individually and simultaneously. Cooperative games model a different world: players can form coalitions, pool their resources, and negotiate binding agreements. The question shifts from “what strategy should I play?” to “which coalition should I join, and how should we divide the resulting value?”

A cooperative game is a pair $(N, v)$ where $N = \{1, \ldots, n\}$ is a set of players and $v : 2^N \to \mathbb{R}$ is a characteristic function assigning a value $v(S)$ to each coalition $S \subseteq N$, with $v(\emptyset) = 0$ and $v(S) \geq 0$.

Three natural classes of games arise. A game is monotone if $S \subseteq T$ implies $v(S) \leq v(T)$: larger coalitions produce at least as much value. A game is superadditive if for disjoint $S, T$: $v(S) + v(T) \leq v(S \cup T)$ — merging two coalitions always creates at least as much value as the sum of their separate productions. Superadditivity implies that players are always better off forming larger coalitions. A game is convex (or supermodular) if $v(S) + v(T) \leq v(S \cup T) + v(S \cap T)$ for all $S, T$, equivalently if the marginal contribution of any player $i$ to a coalition is non-decreasing as the coalition grows: $v(T \cup \{i\}) - v(T) \geq v(S \cup \{i\}) - v(S)$ whenever $S \subseteq T$. Convexity implies superadditivity.

An outcome of a cooperative game is a pair $(\pi, x)$ of a coalition structure (a partition $\pi$ of $N$) and a payoff vector $x \in \mathbb{R}^N$ with $x \geq 0$ and $x(C) \leq v(C)$ for each coalition $C \in \pi$. An outcome is efficient if $x(C) = v(C)$ for every $C \in \pi$ — the full value of each coalition is distributed.

Shapley Values: Fairness via Averaging

Two competing criteria guide what payoff vector to select: fairness (the payoff should reflect actual contributions) and stability (no subgroup should have an incentive to break away). The Shapley value, introduced by Lloyd Shapley in 1953, addresses fairness.

\[ \Delta_\sigma(\sigma_i) = v(\{\sigma_1, \ldots, \sigma_i\}) - v(\{\sigma_1, \ldots, \sigma_{i-1}\}). \]

This measures how much value player $\sigma_i$ adds when joining the coalition of all predecessors in the ordering $\sigma$. Different orderings give different marginal contributions. Shapley’s idea is to average over all $n!$ orderings:

\[ \phi_i = \frac{1}{n!} \sum_{\sigma \in S_N} \Delta_\sigma(i). \]

This averaging over orderings corresponds to imagining players arriving in a uniformly random order and each receiving the marginal value they add upon arrival. The result is the unique allocation satisfying four natural axioms:

Efficiency (Proposition 3.2): $\sum_{i \in N} \phi_i = v(N)$. The Shapley values distribute the entire value of the grand coalition, proved by observing that for any permutation $\sigma$, the sum of marginal contributions telescopes to $v(N) - v(\emptyset) = v(N)$.

Symmetry (Proposition 3.3): If players $i$ and $j$ make identical contributions to every coalition — $v(C \cup \{i\}) = v(C \cup \{j\})$ for all $C \subseteq N \setminus \{i,j\}$ — then $\phi_i = \phi_j$. The proof constructs a bijection on $S_N$ by swapping the positions of $i$ and $j$, showing the marginal contributions of $i$ and $j$ are interchanged.

Dummy Player (Proposition 3.4): If player $i$ contributes nothing to any coalition — $v(S \cup \{i\}) = v(S)$ for all $S$ — then $\phi_i = 0$.

Additivity (Proposition 3.5): If two games $(N, v_1)$ and $(N, v_2)$ are combined additively to form $(N, v_3)$ with $v_3(S) = v_1(S) + v_2(S)$, then $\phi_i^{(3)} = \phi_i^{(1)} + \phi_i^{(2)}$.

A deep theorem — not proved in the course — states that the Shapley value is the unique function from games to payoff vectors satisfying all four axioms simultaneously. In the ice cream game (three players with $\$3, \$4, \$5$, buying sizes $1\text{L}, 1.5\text{L}, 2\text{L}$ at costs $\$6, \$9, \$11$), the Shapley values work out to $\phi_A = 1/2$, $\phi_B = \phi_C = 3/4$, reflecting that $B$ and $C$ each contribute equally more than $A$ does.

The Core: Stability of Coalitions

While Shapley values measure fairness, the core captures stability. An outcome $(\pi, x)$ is in the core if no coalition $S \subseteq N$ has an incentive to break away: $x(S) = \sum_{i \in S} x_i \geq v(S)$ for all $S \subseteq N$. In other words, every coalition is paid at least what it could generate on its own.

Propositions 3.6 and 3.7 establish that any outcome in the core must be efficient and must use a coalition structure that maximizes total value. These are necessary conditions. Whether they are sufficient — whether a core outcome actually exists — is the central question.

The three-player majority game ($v(S) = 1$ for $|S| \geq 2$, else 0) has an empty core: any two-player coalition can generate value 1, but if $x_1 + x_2 + x_3 = 1$ and $x_i \geq 1/3$ for some $i$, then the coalition of the other two gets less than 1, violating the core condition.

Theorem 3.9 (Bondareva-Shapley, 1963). A superadditive game has a non-empty core if and only if it is balanced.

A game is balanced if for every balancing weight — a vector $(y_C)_{C \subseteq N}$ with $y_C \geq 0$ and $\sum_{C: i \in C} y_C = 1$ for all $i$ — one has $\sum_{C \subseteq N} y_C v(C) \leq v(N)$. Geometrically, a balancing weight is a fractional coalition structure where each player’s time is fully allocated across coalitions, and the condition says the grand coalition generates at least as much total value as any such fractional arrangement.

The proof derives from LP duality: formulate the core non-emptiness as a linear feasibility problem, take the dual, and observe that the dual conditions precisely characterize balanced games. The Bondareva-Shapley theorem gives a complete characterization but is computationally expensive to check directly (exponentially many constraints). For convex games, however, a simpler result applies.

Proposition 3.12. Every convex game has a non-empty core.

The core element is constructed explicitly: any vector of marginal contributions $(\Delta_\sigma(1), \ldots, \Delta_\sigma(n))$ for any permutation $\sigma$ lies in the core. The convexity condition ensures that each marginal contribution is large enough to satisfy the core inequalities — the key calculation shows that for any coalition $C = \{i_1 < i_2 < \cdots < i_k\}$, the sum of marginal contributions at positions $i_1, \ldots, i_k$ in $\sigma$ is at least $v(C)$, using the convexity condition to bound each summand below by the corresponding marginal contribution of players joining in the order $i_1, i_2, \ldots, i_k$. Furthermore, the set of all core payoff vectors is precisely the convex hull of all marginal contribution vectors over all permutations, and the Shapley value (as their average) therefore lies in the core of every convex game.

Matching Games and LP Duality

A matching game is defined by a graph $G = (V, E)$ with edge weights $w : E \to \mathbb{R}_{\geq 0}$. Players are the vertices $N = V$. The value $v(S)$ of a coalition $S$ is the maximum weight of a matching using vertices in $S$. The weight $w_{uv}$ represents the value produced when players $u$ and $v$ collaborate.

\[ \mathcal{C}_0 = \bigl\{ x \in \mathbb{R}^N : x(N) = v(N),\; x_u + x_v \geq w_{uv}\; \forall uv \in E,\; x \geq 0 \bigr\}. \]\[ \min x(N) \quad \text{s.t.}\; x_u + x_v \geq w_{uv}\; \forall uv \in E,\; x \geq 0, \]\[ \max \sum_{uv \in E} w_{uv} y_{uv} \quad \text{s.t.}\; \sum_{\{uv \in E\}} y_{uv} \leq 1\; \forall u \in N,\; y \geq 0 \]

has an integral optimal solution. The dual is exactly the fractional matching LP; an integral solution is a matching in the graph. Thus:

Proposition 3.14. The core of a matching game is non-empty if and only if the fractional maximum-weight matching LP has an integral optimal solution.

For bipartite graphs, every feasible LP has an integral optimal solution (by total unimodularity of the constraint matrix), so bipartite matching games always have non-empty cores. For general graphs, this is a non-trivial condition; an efficient algorithm exists but requires Edmonds’ matching theory.

Network Bargaining: Stable and Balanced Outcomes

Network bargaining generalizes matching games by asking not only which pairs form but how they split the resulting value. A graph $G$ with edge weights $w$ represents a network of potential bilateral deals; an outcome $(M, x)$ consists of a matching $M$ (the deals actually struck) and a payoff vector $x$ with $x_u + x_v = w_{uv}$ for every matched edge $uv$, and $x_v = 0$ for unmatched vertices.

\[ \alpha_u = \max\bigl(\{0\} \cup \{w_{uv} - x_v : uv \in E \setminus M\}\bigr). \]

An outcome is stable if $\alpha_u \leq x_u$ for every player $u$: no one can improve by defecting to an unmatched partner. The connection to the core of the matching game is immediate: a stable outcome satisfies $x_u + x_v \geq w_{uv}$ for every edge, placing $x$ in $\mathcal{C}_0$.

Nash’s bargaining theory suggests a stronger condition. In a matched pair, after both players secure their outside options, the remaining surplus should be split equally. An outcome is balanced if $x_u - \alpha_u = x_v - \alpha_v \geq 0$ for every matched edge $uv$: both players receive the same amount above their outside option.

Theorem 3.16. A network bargaining game has a balanced outcome if and only if it has a stable outcome.

The equivalence means stability (which is a linear feasibility condition) is equivalent to equal-split bargaining. The Kleinberg-Tardos theorem further characterizes when stable outcomes exist in terms of the graph structure: in unit-weight graphs, a balanced outcome exists if and only if no two inessential vertices are adjacent, where a vertex is inessential if it is missed by some maximum matching. More generally (Theorem 3.18), if a balanced outcome exists, it must use the maximum-weight matching, and it can be found in polynomial time.

Chapter 4: Mechanism Design

Ideal Auctions: DSIC, Welfare, and Efficiency

Mechanism design inverts the usual game theory question. Rather than analysing an existing game, we design one: we choose the rules so that self-interested players, pursuing their own goals, collectively produce an outcome we want. The paradigmatic setting is an auction.

In a single-item auction, a seller wishes to allocate an indivisible good to one of $n$ buyers, each with a private valuation $v_i \geq 0$. The seller chooses an allocation rule (who wins) and a payment rule (who pays what). The seller wants the design to have three properties: dominant strategy incentive compatibility (DSIC) means that truthful bidding is a weakly dominant strategy and yields non-negative utility; welfare-maximizing means that when players bid truthfully, the player with the highest valuation wins; and efficient means the rules can be implemented in polynomial time.

Theorem 4.1. The second-price auction is ideal — it is DSIC, welfare-maximizing, and efficient — and moreover it is the unique single-item ideal auction.

The uniqueness claim follows from Myerson’s Lemma (proved below): once we fix a monotone allocation rule, the payment rule is uniquely determined by the DSIC constraint.

Myerson’s Lemma: The Characterization of DSIC Auctions

The cleanest framework for mechanism design is the single-parameter environment: there are $n$ players each with a private type $v_i \geq 0$ (a single real number), and a feasible allocation set $X \subseteq \mathbb{R}^N$. A direct revelation mechanism collects bids $b \in \mathbb{R}^N$, applies an allocation rule $x : \mathbb{R}^N \to X$, and charges payments $p : \mathbb{R}^N \to \mathbb{R}^N$. Player $i$’s utility is $u_i(b) = v_i x_i(b) - p_i(b)$.

An allocation rule $x$ is monotone if for every player $i$ and fixed competing bids $b_{-i}$, the function $z \mapsto x_i(z, b_{-i})$ is non-decreasing: bidding higher never reduces what you receive. The second-price rule (give to highest bidder) is monotone; the rule “second-highest bidder wins” is not.

Theorem 4.2 (Myerson’s Lemma). An allocation rule $x$ is implementable (there exists a DSIC payment rule) if and only if $x$ is monotone. Moreover, for any monotone $x$, there is a unique DSIC payment rule satisfying $p_i = 0$ whenever $b_i = 0$.

\[ z\bigl(x(y) - x(z)\bigr) \leq p(y) - p(z) \leq y\bigl(x(y) - x(z)\bigr), \]

which forces $x(y) \geq x(z)$ since $y \geq z \geq 0$. Monotonicity is necessary.

\[ p_i(z, b_{-i}) = \sum_{j : \bar{z}_j \leq z} \bar{z}_j h_j. \]

Geometrically, this is the area to the left of the allocation step-function: the payment is the integral $\int_0^z t \cdot dx_i(t, b_{-i})$ under the allocation curve. The DSIC property of this payment rule is verified by picture: underbidding loses some value but also loses some payment, with net utility loss; overbidding sometimes wins at a cost exceeding the value, with net utility loss.

For the second-price auction, the allocation function for player $i$ is a single jump from 0 to 1 at $z = B = \max_{j \neq i} b_j$. The payment formula gives $p_i = B \cdot 1 = B$, the second-highest bid — recovering the familiar result.

\[ p_i(z, b_{-i}) = \sum_{\{j : b_j < z\}} b_j (\alpha_j - \alpha_{j+1}), \]

where we set $\alpha_{k+1} = 0$. This is the unique truthful payment rule for sponsored search, different from the generalized second-price auction.

The Knapsack Auction: Approximation When Optimality is Hard

Not every ideal mechanism exists. The knapsack auction illustrates an unavoidable tension. A TV station has $T$ seconds of advertising time; advertiser $i$ has an ad of length $t_i$ seconds and value $v_i$. The set of feasible allocations $X = \{x \in \{0,1\}^N : \sum_i t_i x_i \leq T\}$ makes this a binary knapsack problem. Maximizing social welfare with integer allocations is NP-hard — no polynomial-time algorithm is known unless P = NP.

An ideal auction requires efficiency (polynomial time) and welfare maximization simultaneously. These cannot both be achieved. The solution is to keep efficiency and approximate welfare. Myerson’s Lemma does not require exact welfare maximization — any monotone allocation rule has a unique DSIC payment rule. So we design an efficient approximation algorithm whose allocation rule is monotone.

The proposed algorithm sorts advertisers by value density $v_i/t_i$ in decreasing order. It takes the greedy prefix $S = \{1, \ldots, i\}$ where $i$ is the largest index with $t_1 + \cdots + t_i \leq T$, then compares: if $v(S) = v_1 + \cdots + v_i \geq v_{i+1}$, the winners are $S$; otherwise, the sole winner is $\{i+1\}$. The extra comparison prevents the failure case where one very large item is worth more than all the small greedy items combined.

Theorem 4.6. This approximation algorithm achieves at least $\frac{1}{2} \cdot \text{OPT}$.

The proof uses the LP relaxation: the fractional knapsack solution fills items by density until the knapsack is full, giving LP optimal value $v^* \leq v(S) + v_{i+1}$. Whether the algorithm picks $S$ or $\{i+1\}$, its value equals $\max(v(S), v_{i+1}) \geq v^*/2 \geq \text{OPT}/2$.

The algorithm’s allocation rule is monotone (raising a bid only helps or maintains your position), so Myerson’s Lemma provides a unique DSIC payment rule. The payment for each winning advertiser can be found by binary search: lower the bid until the algorithm stops selecting you. This threshold bid is the payment.

Arrow’s Impossibility Theorem

Mechanism design for elections faces harder impossibility results. A social welfare function $F : L^n \to L$ aggregates the total orderings of $n$ voters over a set $A$ of candidates ($|A| \geq 3$) into a single societal ordering. Two good properties are unanimity (if every voter prefers $a$ to $b$, so does society) and independence of irrelevant alternatives (IIA: society’s ranking of $a$ vs $b$ depends only on how voters rank $a$ vs $b$, not on how either is ranked relative to other candidates). The bad property is dictatorship: one voter’s preferences completely override the others.

Theorem 4.7 (Arrow, 1951). Every social welfare function over at least 3 candidates satisfying unanimity and IIA is a dictatorship.

The proof is a masterpiece of logical construction. First, one shows that if all voters rank candidate $b$ either first or last, society must do the same (otherwise unanimity and IIA produce a contradiction via transitivity). Then, consider a sequence of preference profiles where $b$ starts at everyone’s bottom and is moved to the top one voter at a time. At some step — say voter $i$’s move — society’s ranking of $b$ flips from bottom to top. This voter $i$ turns out to be the dictator: for any two candidates $a$ and $c$ other than $b$, IIA and the properties of the sequence force society’s ranking of $a$ vs $c$ to agree with voter $i$’s, and then a further argument with a fourth candidate $e$ shows the same voter is the dictator for every pair involving $b$.

Arrow’s theorem does not say all voting systems are bad — it says no system simultaneously satisfies two modest-sounding axiomatic properties without becoming a dictatorship. One escape is to restrict the domain of preferences (e.g., single-peaked preferences, where all voters’ ideal points lie on a line, allow majority voting to work properly). Another is to introduce monetary transfers.

The Gibbard-Satterthwaite Theorem

A social choice function $f : L^n \to A$ selects a single candidate. Good properties: strategy-proofness (no voter can manipulate the outcome by misrepresenting preferences) and surjectivity (every candidate can in principle win). The bad property remains dictatorship.

Theorem 4.8 (Gibbard 1973, Satterthwaite 1975). Any surjective strategy-proof social choice function over at least 3 candidates is a dictatorship.

The proof reduces to Arrow’s theorem: given a strategy-proof $f$, construct a social welfare function $F$ by using $f$ to rank pairs of candidates (the idea being that $f$ “decides” which of $a, b$ is preferred by moving both to the top of every voter’s list and seeing which $f$ selects), then show $F$ satisfies unanimity and IIA, invoke Arrow’s theorem to get a dictatorial $F$, and conclude $f$ is dictatorial.

Together, Arrow’s and Gibbard-Satterthwaite’s theorems paint a bleak picture for democratic mechanism design. The escape routes are constrained domains (e.g., single-peaked preferences), randomized social choice functions (where probabilistic dictatorship-free mechanisms exist), or monetary transfers (turning elections into auctions, trading money for outcomes).

Chapter 5: Beyond This Course

Natural Next Topics

Game theory as presented in CO 456 covers the essential foundations — combinatorial games, strategic equilibrium, cooperative stability, and incentive-compatible mechanism design. But these foundations open onto a vast landscape of active research and deep results that a one-semester course can only gesture toward. This chapter sketches the most natural extensions, organizes the follow-up literature, and surveys open problems that motivate current work in the field.

Extensive-Form Games and Sequential Rationality

The strategic games of Chapter 2 capture simultaneous decisions. Real interactions are almost always sequential: players see each other’s prior moves, update beliefs, and make decisions at branches of a game tree. Extensive-form games (also called game trees) model this structure explicitly, representing the game as a tree of decision nodes with information sets capturing what each player knows when it is her turn to act.

The equilibrium concept for extensive-form games is subgame perfect equilibrium (SPE), introduced by Reinhard Selten (who shared the 1994 Nobel Prize with Nash and Harsanyi). An SPE requires that the restriction of the strategy profile to every subgame is itself a Nash equilibrium. This rules out “non-credible threats” — Nash equilibria that depend on threats a player would never actually carry out. The algorithm for finding SPE is backward induction: solve the game tree from the leaves upward, at each node choosing the action that maximizes the current player’s payoff given optimal play downstream. Backward induction applies cleanly to finite games; in infinite horizon games, it requires more care and connects directly to the theory of repeated games.

Repeated Games and the Folk Theorem

A profound question in game theory is: how does behavior change when a strategic game is played repeatedly? The prisoner’s dilemma has a unique Nash equilibrium with both players defecting, yielding a socially inferior outcome. But if the same two players interact indefinitely, cooperation might be sustained by the threat of future punishment. This is the domain of repeated games.

In an infinitely repeated game with discount factor $\delta \in (0,1)$, each player values a stream of per-period payoffs $(r_1, r_2, \ldots)$ as $(1-\delta)\sum_{t=1}^\infty \delta^{t-1} r_t$. The discount factor captures the time value of future payoffs: $\delta$ close to 1 means players are patient and care about the long run.

The Folk Theorem (attributed informally to many economists in the 1950s-60s, formalized rigorously by Aumann and Shapley, Rubinstein, Fudenberg and Maskin) characterizes the set of all payoff vectors achievable as subgame perfect equilibria of infinitely repeated games. Remarkably, for sufficiently patient players ($\delta$ close enough to 1), any feasible and individually rational payoff vector can be sustained as an SPE. “Individually rational” means each player gets at least their minmax value — the payoff they can guarantee themselves even against adversarial opponents. For the prisoner’s dilemma, this means mutual cooperation (the socially efficient outcome) is achievable as an SPE via “grim trigger” strategies: cooperate until someone defects, then defect forever. The Folk Theorem provides a sweeping explanation for how cooperation emerges in long-run relationships: patience enables trust.

The Folk Theorem has many variants and generalizations: games with imperfect monitoring (players observe outcomes but not actions, raising statistical inference questions), communication games, and games with private information. Each variant makes the characterization of equilibrium payoff sets more subtle.

Price of Anarchy and Network Games

\[ \text{PoA} = \frac{\text{worst-case Nash equilibrium cost}}{\text{optimal cost}}. \]

The foundational model is Selfish Routing. Given a network with source $s$ and sink $t$, multiple flows of traffic each seek to minimize their own travel time, where travel times depend on congestion. At Nash equilibrium, no driver can reduce their own travel time by switching routes unilaterally. Roughgarden and Tardos proved in 2002 that for networks with linear latency functions, the price of anarchy is at most $4/3$ — remarkably, adding 33% traffic achieves the equilibrium, not far from optimal.

Braess’s Paradox (1968) demonstrates that adding a road can make everyone worse off: in a classic four-node network, opening a zero-latency shortcut causes equilibrium travel time to increase from 1.5 to 2 units. This paradox has been observed empirically in transportation networks (removal of roads in Seoul and Stuttgart improved traffic flow) and has counterparts in electrical networks (the Braess paradox for electrical circuits). It illustrates that intuitive network improvements may backfire in selfish systems.

Price-of-anarchy analysis has extended to many settings: scheduling games (jobs choosing machines), market sharing games, network formation, and congestion games with more general latency functions. A central result, the “smoothness framework” of Roughgarden (2009), provides a unified approach to bounding the price of anarchy across many game classes and extends to settings with no-regret learners, where the time-average play of agents using no-regret algorithms converges toward correlated equilibria with provably bounded inefficiency.

No-Regret Learning and Correlated Equilibria

Nash equilibrium assumes players instantly find an equilibrium. In practice, players learn from repeated interactions. Online learning and no-regret algorithms provide a model for this. An algorithm has no external regret if the time-average regret — the gap between its average payoff and the average payoff of the best fixed action in hindsight — converges to 0 as the number of rounds grows.

A classic no-regret algorithm is multiplicative weights update (MWU): maintain a distribution over actions, update by multiplying weights of better-performing actions by $e^\epsilon$ and renormalizing. MWU achieves external regret $O(\sqrt{T \ln |A|})$ after $T$ rounds, regardless of the opponent’s strategy.

When all players in a game run no-regret algorithms, the time-average joint distribution of play converges to a correlated equilibrium — a distribution over strategy profiles where, conditional on learning your own action from the distribution, you have no incentive to deviate. Every Nash equilibrium is a correlated equilibrium (via independent sampling), but not vice versa; correlated equilibria can be more efficient and are computationally easier to find (they are characterized by linear inequalities, making the set convex and computable in polynomial time). Aumann’s work on correlated equilibria (1974, 1987) showed them to be a natural equilibrium concept arising from shared signals or common knowledge of rationality.

Revenue-Maximizing Mechanism Design

Myerson’s Lemma tells us how to implement any monotone allocation rule with DSIC payments, but it does not specify which allocation rule to use. CO 456 focused on welfare maximization (allocating to the buyer with the highest value). A seller with no intrinsic valuation for the item may care more about revenue maximization.

Roger Myerson’s 1981 paper (for which he received the 2007 Nobel Prize) solved optimal auction design completely for single-item auctions with independent private values. The optimal mechanism is characterized by the concept of virtual valuations: bidder $i$’s virtual valuation is $\psi_i(v_i) = v_i - \frac{1 - F_i(v_i)}{f_i(v_i)}$, where $F_i, f_i$ are the distribution and density of $v_i$. The revenue-optimal mechanism allocates to the bidder with the highest non-negative virtual valuation (and withholds the item if all virtual valuations are negative, corresponding to a reserve price). When bidders are symmetric ($F_i = F$ for all $i$), the optimal auction is a second-price auction with a reserve price equal to the monopoly price.

Beyond the single-item symmetric case, optimal auction design is largely open or computationally intractable. Multi-item mechanism design — simultaneous auctions, bundled items, combinatorial auctions — is a frontier area where theoretical understanding is incomplete and practical auction formats (like the FCC spectrum auctions) represent major engineering challenges.

Matching Markets and Stable Matchings

A different flavor of mechanism design concerns matching markets, where the goal is not to allocate goods via prices but to match participants who have preferences over each other. The medical residency problem (hospitals and residents), college admissions, school choice, and kidney exchange are canonical examples.

The Gale-Shapley deferred acceptance algorithm (1962) finds a stable matching — a matching where no unmatched pair (hospital, resident) both prefer each other over their current assignment — in polynomial time. The algorithm produces the unique resident-optimal stable matching: every resident is matched with the best hospital across all stable matchings. Remarkably, this mechanism is strategy-proof for residents (each resident maximizes their outcome by reporting preferences truthfully), though not for hospitals.

The Gale-Shapley algorithm is now used in the National Resident Matching Program (matching ~40,000 residents per year in the US), school choice programs in New York and Boston, and many other large-scale matching markets. Roth and Shapley received the 2012 Nobel Prize for developing the theory of stable allocations and the practice of market design. Current research extends the theory to settings with couples (who must be placed jointly), complementarities in preferences (which can break stability), and dynamic markets where participants enter and leave over time.

Follow-up Courses and Reading

Game theory spans many disciplines and connects to computer science, economics, operations research, and pure mathematics. The following courses and resources deepen the material from different angles.

UWaterloo Courses

CO 759: Topics in Combinatorial Optimization covers advanced topics that directly extend CO 456, including network flow theory underlying matching games, polyhedral combinatorics useful for the Bondareva-Shapley analysis, and algorithmic mechanism design problems beyond the knapsack case. The course varies by year; check the current offering.

CO 671: Semidefinite Programming connects to game theory via LP/SDP formulations of equilibrium problems, the computation of Nash equilibria via semidefinite relaxations, and robust mechanism design. Semidefinite programming provides stronger bounds for network game problems and has applications in hardness-of-approximation results for Nash equilibria.

CS 486: Introduction to Artificial Intelligence covers game-theoretic aspects from the AI perspective, including adversarial search (minimax with alpha-beta pruning, the algorithmic version of the zero-sum LP), multi-agent systems, and auction design as applied to AI systems like advertising platforms.

PMATH 432: Real Analysis and PMATH 465: Algebraic Topology provide the mathematical foundations (fixed-point theorems, compactness, continuous functions) that underpinned Nash’s theorem. The algebraic topology of simplicial complexes connects to Sperner’s lemma, which gives a combinatorial proof of Brouwer’s fixed-point theorem and is itself the basis of the polynomial-time algorithm for approximating fixed points.

External Courses

Stanford CS364A: Algorithmic Game Theory (Tim Roughgarden, lecture notes at timroughgarden.org) is the gold standard course on the algorithmic aspects of mechanism design. It covers material directly following CO 456: multi-parameter mechanism design, VCG mechanisms, revenue maximization, price of anarchy, and no-regret learning. The lecture notes are freely available and excellent.

Stanford CS364B: Advanced Mechanism Design (Roughgarden) goes deeper into multi-item auctions, approximation in mechanism design, and the limits of efficient truthful mechanisms.

MIT 6.853 / 6.890: Topics in Algorithmic Game Theory (available via MIT OpenCourseWare) covers similar ground with different emphasis, including extensive work on equilibrium computation complexity (PPAD hardness) and matching theory.

Cambridge Part III: Advanced Topics in Game Theory covers cooperative game theory at a graduate level, connecting to mathematical economics, social choice, and axiomatic bargaining theory.

Standard Texts

Osborne, M.J., An Introduction to Game Theory (Oxford University Press, 2004) is the standard undergraduate reference for strategic and extensive-form games, covering Nash equilibrium, extensive-form games, evolutionary games, and repeated games. Clear, rigorous, and accessible.

Roughgarden, T., Twenty Lectures on Algorithmic Game Theory (Cambridge University Press, 2016) is the companion text to CS364A, covering mechanism design, price of anarchy, and equilibrium computation. The book corresponds closely to the Mechanism Design chapter of CO 456 and its extensions.

Nisan, N., Roughgarden, T., Tardos, É., and Vazirani, V. (eds.), Algorithmic Game Theory (Cambridge University Press, 2007) is the graduate-level comprehensive reference. Individual chapters cover every major topic: equilibrium computation, mechanism design, auction theory, fair division, network games, and online learning. Available freely at the authors’ websites.

Karlin, A. and Peres, Y., Game Theory, Alive (AMS, 2017) covers combinatorial game theory, Nash equilibrium, mechanism design, and evolutionary game theory in a style that combines mathematical rigor with concrete algorithms. An excellent bridge between CO 456 and deeper study.

Berlekamp, E., Conway, J.H., and Guy, R., Winning Ways for Your Mathematical Plays (A K Peters, 2001) is the encyclopedic treatment of combinatorial game theory, extending far beyond impartial games to partisan games, surreal numbers, and the full theory of combinatorial game equivalence. Essential for anyone captivated by Chapter 1.

Shoham, Y. and Leyton-Brown, K., Multiagent Systems: Algorithmic, Game-Theoretic, and Logical Foundations (Cambridge University Press, 2009) covers game theory from the AI and multiagent systems perspective, including extensive treatment of learning in games, communication, and practical auction platforms.

Open Problems and Active Research

Game theory is far from a closed field. The following open problems and active research directions connect directly to material in CO 456 and represent some of the deepest unsolved questions in the area.

The Complexity of Computing Nash Equilibria

Chapter 2 noted that computing Nash equilibria is “hard” in general, while providing a polynomial-time algorithm for two-player zero-sum games. The precise complexity-theoretic statement is far sharper and deeper.

PPAD-completeness of Nash equilibria (Daskalakis, Goldberg, Papadimitriou, 2006; Chen, Deng, Teng, 2009): Computing a Nash equilibrium of a two-player general-sum game is PPAD-complete. PPAD (Polynomial Parity Arguments on Directed graphs) is a complexity class defined by Papadimitriou in 1994; membership in PPAD means the problem is as hard as finding a fixed point of a continuous function, which Nash’s proof shows must exist but cannot be found efficiently. PPAD-completeness means Nash is one of the hardest problems in the class — any PPAD problem reduces to it. This is a landmark negative result: there is no polynomial-time algorithm for general Nash equilibrium computation unless PPAD collapses.

Yet this hardness is worst-case. Most games arising in practice have additional structure (zero-sum, small support equilibria, large populations, coordination games) that makes equilibrium computation tractable. An active research direction is identifying natural structural conditions under which Nash equilibria can be computed efficiently, or approximately found in polynomial time. For three-player zero-sum games and two-player non-zero-sum games with $m$ strategies per player, quasi-polynomial time algorithms running in $n^{O(\log n)}$ time are known but not polynomial, and improving these bounds is an open problem.

Optimal Mechanism Design Beyond Single-Item Auctions

Myerson’s 1981 characterization of optimal single-item auctions is complete and beautiful. Its multi-item generalization is far more complex and largely open.

For two items and two buyers, Manelli and Vincent (2007) showed that the revenue-optimal mechanism may need to offer infinitely many bundles at different prices — the optimal mechanism is not a simple generalization of the Myerson auction. Hart and Nisan (2012) showed that bundling can extract more revenue than selling items separately, but the optimal mechanism may extract exponentially more. Understanding even the structure of optimal mechanisms for two items is an active area.

For additive valuations (buyer $i$ values bundle $S$ as $\sum_{j \in S} v_{ij}$), Daskalakis, Deckelbaum, and Tzamos (2017) and subsequent work have characterized optimal mechanisms in some special cases. The general multi-dimensional Myerson problem remains wide open: no closed-form characterization of optimal auction formats exists for more than one item.

Computational complexity of revenue-optimal mechanisms: Even approximately computing the optimal mechanism for multi-item auctions is believed hard. Conitzer and Sandholm (2003) showed certain formulations are NP-hard. A major open problem is characterizing the boundary between tractable and intractable mechanism design.

Truthful Approximation Algorithms

Mechanism design and approximation algorithms come into tension: the best approximation algorithm for a problem may not be monotone (implementable by a truthful mechanism), and the best truthful mechanism may produce a poor approximation.

The knapsack auction illustrates this: the greedy density-sorted algorithm achieves a $1/2$-approximation and is monotone. But the best-known polynomial-time approximation for the knapsack problem achieves a $(1-\varepsilon)$-approximation for any $\varepsilon > 0$ via a fully polynomial-time approximation scheme (FPTAS). This FPTAS is not monotone. An open problem: is there a monotone $(1-\varepsilon)$-approximation for the knapsack auction?

For combinatorial auctions (multiple items, general valuations), the best known truthful mechanism achieves an $O(\sqrt{m})$ approximation where $m$ is the number of items, while the best non-truthful mechanism achieves a constant approximation. Whether truthful mechanisms must sacrifice approximation quality — a “price of truthfulness” — is a central open problem in algorithmic mechanism design.

Nisan and Ronen (2001) introduced the VCG-based mechanism as a general framework for truthful combinatorial auctions. The VCG mechanism is truthful and welfare-optimal when the social welfare optimization problem is solvable in polynomial time, but fails to provide revenue guarantees. Extending truthful auction design beyond welfare maximization, especially in the revenue-maximizing direction, is a major frontier.

No-Regret Learning and Convergence to Equilibrium

When players run no-regret algorithms in a repeated game, their time-average play converges to the set of correlated equilibria (Theorem due to Hart and Mas-Colell, 2000). A natural question is whether this convergence can be improved: does no-regret play converge to Nash equilibrium?

For zero-sum games, the answer is yes via the minimax theorem and LP duality: both players running multiplicative weights update converge to a Nash equilibrium. For general games, no-regret learning can cycle or converge to correlated equilibria that are not Nash equilibria. The gap between correlated and Nash equilibria — and whether natural dynamics can bridge it — is an active research area.

Last-iterate convergence: It is known that the time-average of iterates converges to equilibrium, but the actual current iterate may cycle. Recent work (Daskalakis, Panageas, 2018; Mertikopoulos et al., 2018) has shown that certain gradient-based learning algorithms (optimistic gradient descent/ascent) achieve last-iterate convergence in zero-sum games. Extending last-iterate convergence to general-sum games is an open problem, with implications for machine learning (where gradient descent on GANs corresponds to zero-sum game dynamics).

Equilibrium selection in learning dynamics: Multiple Nash equilibria exist in most games, and different no-regret algorithms may converge to different equilibria. Which equilibrium is selected by a given learning process, and whether this selection is robust to perturbations, are questions at the intersection of game theory, dynamical systems, and machine learning.

Algorithmic Fairness in Mechanism Design

As automated mechanisms are deployed at scale — allocating housing, screening job applicants, matching students to schools — the interaction between incentive compatibility and algorithmic fairness has become a pressing research area.

A mechanism may be truthful and welfare-optimal yet systematically disadvantage certain demographic groups. For example, a welfare-maximizing school choice algorithm may reproduce historical inequalities if valuations and priorities reflect socioeconomic disparities. Incorporating fairness constraints (demographic parity, equalized odds, individual fairness) into mechanism design without destroying truthfulness or efficiency is technically challenging and not fully understood.

Fundamental tensions arise: Vickrey-Clarke-Groves (VCG) mechanisms are simultaneously efficient and truthful but may produce inequitable allocations. Allocations that are fair under distributional criteria may require non-monotone allocation rules, which are incompatible with DSIC by Myerson’s Lemma. Research at the boundary of mechanism design and algorithmic fairness — including work by Dwork, Hardt, Pitassi, Reingold, Zemel and many others — seeks tractable notions of fairness compatible with incentive constraints. This is one of the youngest and fastest-growing areas in the field, with deep connections to machine learning fairness and policy applications.

Matching Market Design at Scale

The Gale-Shapley algorithm and its variants have transformed practical market design. Current open problems include: How should matching algorithms handle complementarities in preferences (hospitals preferring pairs of residents who are a couple) without losing stability guarantees? How do matching markets behave dynamically when participants arrive over time (e.g., online platforms)? Can price mechanisms be used alongside stability concepts to handle markets where preferences change with prices?

For dynamic matching markets, the theory is largely open. Results by Akbarpour, Li, and Gharan (2020) on dynamic kidney exchange and Baccara, Lee, Monnet, and Weill on dynamic labor markets provide early frameworks, but general dynamic matching theory remains underdeveloped. On the platform economy side, understanding how Uber, Airbnb, and similar two-sided markets achieve approximately stable matchings with real-time algorithmic market clearing is both a theoretical and practical frontier.

Shapley and Shubik’s foundational work on assignment games (1972) showed that stable matchings in markets with transferable utility correspond exactly to optimal solutions of the linear assignment problem. This connection to LP and network flow theory underlies the computational tractability of matching markets with prices, but the extension to non-transferable utility settings — where money cannot substitute for preference — remains technically demanding.

CO 456 provides the conceptual language and analytical tools to engage with all of these directions: the Shapley axioms speak directly to fairness in cooperative games; Myerson’s Lemma is the starting point for every auction design question; Nash’s theorem is the existence result against which all approximation results are measured; and the LP duality framework that runs through Chapter 3 is the computational backbone of matching and stability. The subject as a whole rewards continued study: every result here has a deeper version, and every open problem is both technically challenging and practically important.