These notes synthesize material from S. Jukna’s Extremal Combinatorics, R. Graham, B. Rothschild, and J. Spencer’s Ramsey Theory, N. Alon and J. Spencer’s The Probabilistic Method, and D. Conlon, J. Fox, and B. Sudakov’s survey articles, enriched with material from CMU 21-738 (P.-S. Loh) and Y. Zhao’s Graph Theory and Additive Combinatorics.
1. Extremal Graph Theory: Turán-type Problems
Extremal graph theory asks one of the most natural questions in combinatorics: given a forbidden subgraph \(H\), what is the maximum number of edges a graph on \(n\) vertices can have without containing a copy of \(H\)? This deceptively simple question, first systematically studied by Pál Turán during World War II, has blossomed into one of the richest areas of modern combinatorics, with deep connections to algebra, probability, analysis, and theoretical computer science.
1.1 The Turán Problem and Turán’s Theorem
We begin with the fundamental definitions that frame the entire subject.
Definition (Extremal Number). For a graph H and a positive integer n, the extremal number \(\operatorname{ex}(n, H)\) is the maximum number of edges in a graph on n vertices that does not contain H as a subgraph. A graph on n vertices with \(\operatorname{ex}(n, H)\) edges and no copy of H is called an extremal graph for H.
The first and most fundamental result in extremal graph theory determines \(\operatorname{ex}(n, K_{r+1})\) exactly.
Definition (Turán Graph). The Turán graph \(T(n, r)\) is the complete r-partite graph on n vertices in which the part sizes are as equal as possible — that is, each part has size either \(\lfloor n/r \rfloor\) or \(\lceil n/r \rceil\). The number of edges in T(n,r) is denoted \(t(n,r)\).
\[
t(n,r) = \binom{n}{2} - \sum_{i=1}^{r} \binom{|V_i|}{2} = \left(1 - \frac{1}{r}\right)\frac{n^2}{2} + O(n),
\]
where \(V_1, \ldots, V_r\) are the parts. Now we state the celebrated theorem.
Theorem 1.1 (Turán, 1941). For all integers \(r \geq 1\) and \(n \geq 1\),
\[
\operatorname{ex}(n, K_{r+1}) = t(n,r).
\]
Moreover, the Turán graph \(T(n,r)\) is the unique extremal graph.
Turán’s theorem admits several beautiful proofs, each illuminating different aspects of the result. We present three.
Proof 1 (Induction on n). We proceed by induction on n. The base case \(n \leq r\) is trivial, since any graph on at most r vertices is \(K_{r+1}\)-free. For the inductive step, let G be a \(K_{r+1}\)-free graph on n vertices with the maximum number of edges. Since G is \(K_{r+1}\)-free but edge-maximal with this property, G must contain a copy of \(K_r\) (otherwise we could add edges). Let \(v_1, \ldots, v_r\) be the vertices of such a \(K_r\).
Every other vertex \(w\) is adjacent to at most \(r-1\) of the \(v_i\), since otherwise \(\{w, v_1, \ldots, v_r\}\) would form a \(K_{r+1}\). Removing the r vertices \(v_1, \ldots, v_r\), the remaining graph \(G'\) on \(n - r\) vertices is still \(K_{r+1}\)-free, so by induction \(e(G') \leq t(n-r, r)\). The total number of edges is:
\[
e(G) = e(G') + \binom{r}{2} + \sum_{w \notin \{v_1,\ldots,v_r\}} |N(w) \cap \{v_1,\ldots,v_r\}| \leq t(n-r,r) + \binom{r}{2} + (n-r)(r-1).
\]
A direct calculation shows this equals \(t(n,r)\). Uniqueness follows from analyzing the equality conditions.
Proof 2 (Zykov Symmetrization). Let G be a \(K_{r+1}\)-free graph on n vertices with the maximum number of edges. We show G must be a complete multipartite graph.
Suppose two non-adjacent vertices u and v exist with \(\deg(u) \leq \deg(v)\). We perform Zykov symmetrization: replace u with a copy of v — that is, delete all edges incident to u, then for every neighbour w of v, add the edge \(uw\). The resulting graph G' has at least as many edges as G (since \(\deg(v) \geq \deg(u)\)), and G' is still \(K_{r+1}\)-free because any clique in G' containing u corresponds to a clique in G with u replaced by v.
Repeating this process, we eventually obtain a graph where every pair of non-adjacent vertices has the same neighbourhood — i.e., a complete multipartite graph. The number of edges is maximized when the parts are as equal as possible (by convexity of \(\binom{x}{2}\)), giving the Turán graph \(T(n,r)\).
Proof 3 (Probabilistic / Double Counting — Motzkin-Straus). We establish the result through a beautiful connection to optimization. Let G be a graph on \([n]\) and assign weights \(x_1, \ldots, x_n \geq 0\) with \(\sum x_i = 1\). Define
\[
f(x_1, \ldots, x_n) = \sum_{ij \in E(G)} x_i x_j.
\]
Motzkin-Straus Theorem: If \(G\) is \(K_{r+1}\)-free, then \(\max f \leq \frac{1}{2}\left(1 - \frac{1}{r}\right)\), with equality achieved when the weights are uniformly distributed on an r-clique.
To see this, observe that at the maximum, the weight is supported on a clique (any non-edge with positive weights on both endpoints can have its weight redistributed to increase \(f\)). For a clique of size \(s \leq r\), the maximum of \(f\) with uniform weights \(1/s\) each is \(\binom{s}{2}/s^2 = (1 - 1/s)/2 \leq (1 - 1/r)/2\).
Setting \(x_i = 1/n\) for all i, we get \(f = e(G)/n^2\), so \(e(G)/n^2 \leq (1 - 1/r)/2\), yielding \(e(G) \leq (1 - 1/r)n^2/2\). The exact bound \(t(n,r)\) follows by optimizing over integer part sizes.
1.2 The Erdős–Stone–Simonovits Theorem
Turán’s theorem determines \(\operatorname{ex}(n, K_{r+1})\) exactly. What about general graphs \(H\)? The Erdős–Stone theorem, often called the “fundamental theorem of extremal graph theory,” answers this asymptotically.
Definition (Turán Density). The Turán density of a graph H is
\[
\pi(H) = \lim_{n \to \infty} \frac{\operatorname{ex}(n,H)}{\binom{n}{2}}.
\]
This limit exists by a standard averaging argument (or by appealing to the Kruskal-Katona machinery).
Theorem 1.2 (Erdős–Stone, 1946; Erdős–Simonovits, 1966). For any graph H with chromatic number \(\chi(H) = r + 1 \geq 2\),
\[
\operatorname{ex}(n, H) = \left(1 - \frac{1}{r}\right)\frac{n^2}{2} + o(n^2).
\]
Equivalently, \(\pi(H) = 1 - \frac{1}{\chi(H) - 1}\).
The beauty of this result is that the asymptotic extremal number depends only on the chromatic number of \(H\). For non-bipartite graphs (\(\chi(H) \geq 3\)), the Turán density is strictly between 0 and 1, and is completely determined by \(\chi(H)\). We give a proof sketch.
Proof Sketch. The lower bound \(\operatorname{ex}(n, H) \geq t(n,r)\) follows from the fact that the Turán graph \(T(n,r)\) is r-partite, hence \(r\)-colourable, and so does not contain any graph with chromatic number \(r+1\).
The upper bound is the deep part. The Erdős–Stone theorem states that for every \(r \geq 1\), \(\varepsilon > 0\), and integer \(t\), there exists \(n_0\) such that every graph on \(n \geq n_0\) vertices with at least \(t(n,r) + \varepsilon n^2\) edges contains a copy of \(K_{r+1}(t)\) — the complete \((r+1)\)-partite graph with each part of size \(t\). Since any graph with chromatic number \(r+1\) is a subgraph of some \(K_{r+1}(t)\), this gives the desired upper bound.
The proof of the Erdős–Stone theorem itself proceeds via the regularity lemma (Section 5), or via Ramsey-type arguments and dependent random choice.
1.3 The Kővári–Sós–Turán Theorem and the Zarankiewicz Problem
For bipartite forbidden subgraphs, the Zarankiewicz problem asks for better bounds.
Theorem 1.3 (Kővári–Sós–Turán, 1954). For integers \(2 \leq s \leq t\) and \(n \geq s\),
\[
\operatorname{ex}(n, K_{s,t}) \leq \frac{1}{2}\left( (t-1)^{1/s} \cdot n^{2 - 1/s} + (s-1)n \right).
\]
In particular, \(\operatorname{ex}(n, K_{s,t}) = O(n^{2 - 1/s})\).
Proof. Let G be a \(K_{s,t}\)-free graph on n vertices. Count the number of s-element subsets \(S\) of vertices such that all elements of S have a common neighbour. On one hand, each vertex v contributes \(\binom{\deg(v)}{s}\) such subsets (choose s neighbours of v). On the other hand, since G is \(K_{s,t}\)-free, each s-element set has at most \(t-1\) common neighbours, so the total count is at most \((t-1)\binom{n}{s}\).
Thus \(\sum_{v} \binom{\deg(v)}{s} \leq (t-1)\binom{n}{s}\). By the convexity of \(\binom{x}{s}\) and Jensen's inequality,
\[
n \cdot \binom{2e(G)/n}{s} \leq (t-1)\binom{n}{s},
\]
where \(e(G)\) is the number of edges. For \(s = 2\), this simplifies to
\[
\frac{n \cdot (2e/n)(2e/n - 1)}{2} \leq (t-1)\frac{n(n-1)}{2},
\]
giving \(e(G) \leq \frac{1}{2}(1 + \sqrt{1 + 4(t-1)(n-1)}) \cdot \frac{n}{2}\). For general s, careful estimation yields the stated bound.
1.4 Supersaturation
Once a graph exceeds the extremal number, it must contain not just one copy of the forbidden subgraph, but many copies. This is the supersaturation phenomenon, first observed by Erdős and Simonovits.
Theorem 1.4 (Erdős–Simonovits Supersaturation). For every graph H and every \(\varepsilon > 0\), there exists \(\delta > 0\) and \(n_0\) such that every graph G on \(n \geq n_0\) vertices with \(e(G) \geq \operatorname{ex}(n, H) + \varepsilon n^2\) contains at least \(\delta n^{|V(H)|}\) copies of H.
Example. For \(H = K_3\) (triangles), Turán's theorem gives \(\operatorname{ex}(n, K_3) = \lfloor n^2/4 \rfloor\). Supersaturation tells us that if a graph on n vertices has \((1/4 + \varepsilon)n^2\) edges, it contains at least \(\delta n^3\) triangles. The Kruskal-Katona theorem can be used to determine the minimum number of triangles exactly as a function of the number of edges.
1.5 The Kruskal–Katona Theorem
The Kruskal–Katona theorem, although technically a result about simplicial complexes and set systems, plays a foundational role in extremal combinatorics. It answers the following question: given a collection of \(k\)-element sets, what is the minimum number of \((k-1)\)-element subsets they can have?
Definition (Shadow). For a family \(\mathcal{F} \subseteq \binom{[n]}{k}\), the shadow of \(\mathcal{F}\) is
\[
\partial \mathcal{F} = \left\{ G \in \binom{[n]}{k-1} : G \subset F \text{ for some } F \in \mathcal{F} \right\}.
\]
Theorem 1.5 (Kruskal, 1963; Katona, 1968). If \(|\mathcal{F}| = \binom{a_k}{k} + \binom{a_{k-1}}{k-1} + \cdots + \binom{a_j}{j}\) is the k-binomial representation of \(|\mathcal{F}|\) (where \(a_k > a_{k-1} > \cdots > a_j \geq j \geq 1\)), then
\[
|\partial \mathcal{F}| \geq \binom{a_k}{k-1} + \binom{a_{k-1}}{k-2} + \cdots + \binom{a_j}{j-1}.
\]
The extremal families are initial segments of the colexicographic (colex) order on \(\binom{[n]}{k}\), a beautiful fact connecting extremal combinatorics to order theory.
2. Extremal Set Theory
Extremal set theory studies the maximum or minimum size of families of sets subject to various combinatorial conditions. The results in this area are among the most elegant in all of combinatorics, with short proofs that reveal deep structural truths.
2.1 Sperner’s Theorem and the LYM Inequality
We begin with the oldest result, Sperner’s theorem, which determines the largest antichain in the power set \(2^{[n]}\).
Definition (Antichain). A family \(\mathcal{F} \subseteq 2^{[n]}\) is an antichain (or Sperner family) if no set in \(\mathcal{F}\) contains another: for all \(A, B \in \mathcal{F}\), \(A \subseteq B\) implies \(A = B\).
Theorem 2.1 (Sperner, 1928). If \(\mathcal{F} \subseteq 2^{[n]}\) is an antichain, then
\[
|\mathcal{F}| \leq \binom{n}{\lfloor n/2 \rfloor}.
\]
Equality holds if and only if \(\mathcal{F} = \binom{[n]}{\lfloor n/2 \rfloor}\) or \(\mathcal{F} = \binom{[n]}{\lceil n/2 \rceil}\).
The cleanest proof of Sperner’s theorem goes through the LYM inequality, named after Lubell, Yamamoto, and Meshalkin.
Theorem 2.2 (LYM Inequality). If \(\mathcal{F} \subseteq 2^{[n]}\) is an antichain, then
\[
\sum_{A \in \mathcal{F}} \frac{1}{\binom{n}{|A|}} \leq 1.
\]
Proof (Lubell, 1966). Consider a uniformly random permutation \(\sigma\) of \([n]\). For each \(k\), the set \(\{\sigma(1), \ldots, \sigma(k)\}\) is a uniformly random k-element subset of \([n]\). Define the random chain \(C_\sigma = \{\emptyset, \{\sigma(1)\}, \{\sigma(1), \sigma(2)\}, \ldots, [n]\}\).
For a set \(A \in \mathcal{F}\) with \(|A| = k\), the probability that \(A \in C_\sigma\) equals
\[
\Pr[A \in C_\sigma] = \frac{k!(n-k)!}{n!} = \frac{1}{\binom{n}{k}}.
\]
Since \(\mathcal{F}\) is an antichain, at most one element of \(\mathcal{F}\) can appear in any chain \(C_\sigma\). Therefore,
\[
1 \geq \Pr\left[\exists A \in \mathcal{F} : A \in C_\sigma\right] = \sum_{A \in \mathcal{F}} \Pr[A \in C_\sigma] = \sum_{A \in \mathcal{F}} \frac{1}{\binom{n}{|A|}}.
\]
This completes the proof.
Proof of Sperner's theorem from LYM. Since \(\binom{n}{|A|} \leq \binom{n}{\lfloor n/2 \rfloor}\) for every \(A\), the LYM inequality gives
\[
1 \geq \sum_{A \in \mathcal{F}} \frac{1}{\binom{n}{|A|}} \geq \frac{|\mathcal{F}|}{\binom{n}{\lfloor n/2 \rfloor}},
\]
hence \(|\mathcal{F}| \leq \binom{n}{\lfloor n/2 \rfloor}\).
2.2 The Bollobás Set-Pairs Inequality
One of the most versatile tools in extremal set theory is the Bollobás set-pairs inequality.
Theorem 2.3 (Bollobás, 1965). Let \(A_1, \ldots, A_m\) and \(B_1, \ldots, B_m\) be finite sets such that \(A_i \cap B_j = \emptyset\) if and only if \(i = j\), with \(|A_i| = a\) and \(|B_i| = b\) for all i. Then
\[
m \leq \binom{a+b}{a}.
\]
More generally, if \(|A_i| = a_i\) and \(|B_i| = b_i\), then
\[
\sum_{i=1}^{m} \frac{1}{\binom{a_i + b_i}{a_i}} \leq 1.
\]
Proof. Let \(X = A_1 \cup \cdots \cup A_m \cup B_1 \cup \cdots \cup B_m\) and \(|X| = n\). We work in the vector space of polynomials. For each \(i\), define the polynomial
\[
f_i(x) = \prod_{b \in B_i} \left(\sum_{a \in A_i \cup B_i} x_a - x_b\right)
\]
(we use a slightly different formulation for clarity). Actually, let us use the standard exterior algebra proof.
Consider the vector space \(V = \bigwedge^{a_i + b_i}(\mathbb{R}^n)\) for the uniform case. Instead, let us present the following clean proof. We may assume all sets are subsets of \([n]\) for some n. Consider the polynomial ring \(\mathbb{R}[x_1, \ldots, x_n]\) and for each i, define
\[
f_i(x_1, \ldots, x_n) = \prod_{j \in B_i} \left(1 - x_j\right).
\]
This is a polynomial of degree \(b_i\). Now restrict attention to the multilinear polynomial obtained by reducing modulo the ideal \((x_j^2 - x_j : j \in [n])\). For a set \(S \subseteq [n]\), let \(\mathbf{1}_S \in \{0,1\}^n\) be its characteristic vector. Then \(f_i(\mathbf{1}_S) = 0\) if \(S \cap B_i \neq \emptyset\), and \(f_i(\mathbf{1}_S) = 1\) if \(S \cap B_i = \emptyset\).
Now define \(g_i = f_i \cdot \prod_{j \in A_i} x_j\). This is a multilinear polynomial (after reduction). We have \(g_i(\mathbf{1}_{A_j}) = 0\) whenever \(j \neq i\) (either \(A_j \cap B_i \neq \emptyset\) making \(f_i = 0\), or \(A_j \not\supseteq A_i\) making the \(\prod x_j\) term vanish). Also, \(g_i(\mathbf{1}_{A_i}) = 1 \cdot 1 = 1\) since \(A_i \cap B_i = \emptyset\).
Thus the polynomials \(g_1, \ldots, g_m\) are linearly independent (evaluate at \(\mathbf{1}_{A_1}, \ldots, \mathbf{1}_{A_m}\) to get a triangular matrix with 1's on the diagonal). Each \(g_i\) is a multilinear polynomial of degree at most \(a_i + b_i\). The space of multilinear polynomials of degree at most \(a + b\) in n variables has dimension \(\sum_{k=0}^{a+b} \binom{n}{k}\). In the uniform case \(a_i = a, b_i = b\), each \(g_i\) has degree exactly \(a + b\), and more careful analysis shows \(g_i\) lies in the span of the \(\binom{n}{a+b}\) squarefree monomials of degree \(a + b\). But since we only need the \(g_i\) restricted to \(\{0,1\}^n\), we can use the subspace of multilinear polynomials of degree exactly \(a + b\), which gives \(m \leq \binom{n}{a+b}\). A refined argument (projecting onto the relevant coordinates) gives the sharp bound \(m \leq \binom{a+b}{a}\).
2.3 The Erdős–Ko–Rado Theorem
The Erdős–Ko–Rado theorem is perhaps the most celebrated result in extremal set theory, determining the largest intersecting family of \(k\)-element sets.
Definition (Intersecting Family). A family \(\mathcal{F} \subseteq \binom{[n]}{k}\) is intersecting if \(A \cap B \neq \emptyset\) for all \(A, B \in \mathcal{F}\).
Example. The family of all k-element subsets of \([n]\) containing a fixed element (called a star or trivial intersecting family) has size \(\binom{n-1}{k-1}\) and is clearly intersecting.
Theorem 2.4 (Erdős–Ko–Rado, 1961). If \(n \geq 2k\) and \(\mathcal{F} \subseteq \binom{[n]}{k}\) is intersecting, then
\[
|\mathcal{F}| \leq \binom{n-1}{k-1}.
\]
For \(n > 2k\), equality holds if and only if \(\mathcal{F}\) is a star.
We present Katona’s beautiful proof using cyclic permutations.
Proof (Katona, 1972). Place the elements \(1, 2, \ldots, n\) around a circle (in some cyclic order). A k-arc is a set of k consecutive elements on the circle. There are exactly n such arcs (one starting at each position), and any two k-arcs are either disjoint or intersecting. Since each arc has k elements and \(n \geq 2k\), two arcs with starting positions differing by at least k (and at most \(n-k\)) are disjoint. So among the n arcs, at most k can be mutually intersecting.
Now count pairs \((\sigma, A)\) where \(\sigma\) is a cyclic permutation of \([n]\) and \(A \in \mathcal{F}\) is a k-arc in the circular order given by \(\sigma\). For each of the \((n-1)!\) cyclic permutations, at most \(k\) elements of \(\mathcal{F}\) are arcs (since \(\mathcal{F}\) is intersecting). For each \(A \in \mathcal{F}\), the number of cyclic permutations making A an arc is \(k!(n-k)!\) (arrange the elements of A consecutively on the circle, with \(k!\) internal orderings and \((n-k)!\) orderings for the rest). Therefore:
\[
|\mathcal{F}| \cdot k!(n-k)! \leq k \cdot (n-1)!,
\]
which gives
\[
|\mathcal{F}| \leq \frac{k \cdot (n-1)!}{k!(n-k)!} = \frac{(n-1)!}{(k-1)!(n-k)!} = \binom{n-1}{k-1}.
\]
2.4 The Sauer–Shelah–Perles Lemma
The Sauer–Shelah lemma connects extremal set theory to computational learning theory through the concept of VC dimension.
Definition (Shattering and VC Dimension). A family \(\mathcal{F} \subseteq 2^{[n]}\) shatters a set \(S \subseteq [n]\) if \(\{A \cap S : A \in \mathcal{F}\} = 2^S\). The VC dimension of \(\mathcal{F}\) is the size of the largest set shattered by \(\mathcal{F}\).
Theorem 2.5 (Sauer, 1972; Shelah, 1972; Perles). If \(\mathcal{F} \subseteq 2^{[n]}\) has VC dimension at most d (i.e., shatters no \((d+1)\)-element set), then
\[
|\mathcal{F}| \leq \sum_{i=0}^{d} \binom{n}{i} = \Phi_d(n).
\]
Proof. We prove the stronger statement: the number of sets shattered by \(\mathcal{F}\) is at least \(|\mathcal{F}|\). Since no set of size \(d+1\) is shattered, the number of shattered sets is at most \(\sum_{i=0}^{d} \binom{n}{i}\).
We proceed by induction on \(n + |\mathcal{F}|\). The base cases are trivial. For the inductive step, fix an element \(x \in [n]\) and partition \(\mathcal{F}\) into three parts: \(\mathcal{F}_0\) (sets not containing x with no "twin" containing x), \(\mathcal{F}_1\) (sets containing x with no twin without x), and matched pairs. Define \(\mathcal{F}' = \{A \setminus \{x\} : A \in \mathcal{F}\}\) and \(\mathcal{F}'' = \{A \setminus \{x\} : A \in \mathcal{F}, A \cup \{x\} \in \mathcal{F} \text{ or } A \setminus \{x\} \in \mathcal{F}\}\) — the family obtained by looking at sets appearing with both \(x\) present and absent. Then \(|\mathcal{F}| = |\mathcal{F}'| + |\mathcal{F}''|\), and a set S not containing x is shattered by \(\mathcal{F}\) if and only if S is shattered by \(\mathcal{F}'\) and \(S \cup \{x\}\) is shattered by \(\mathcal{F}\) only if \(S\) is shattered by \(\mathcal{F}''\). By induction, each \(\mathcal{F}'\) and \(\mathcal{F}''\) shatters at least as many sets as its size, and the shattered sets for \(\mathcal{F}\) include disjoint contributions from both families, completing the proof.
2.5 The Sunflower Lemma
Definition (Sunflower). A sunflower (or \(\Delta\)-system) is a collection of sets \(A_1, \ldots, A_r\) such that there exists a set Y (the core) with \(A_i \cap A_j = Y\) for all \(i \neq j\). The sets \(A_i \setminus Y\) are the petals.
Theorem 2.6 (Erdős–Ko Sunflower Lemma, 1960). If \(\mathcal{F}\) is a family of sets each of size at most k, and \(|\mathcal{F}| > (r-1)^k \cdot k!\), then \(\mathcal{F}\) contains a sunflower with r petals.
Proof. We prove the result by induction on k. For \(k = 1\), the sets are singletons, and any r of them form a sunflower with empty core, so we need \(|\mathcal{F}| > r - 1\), which is \((r-1)^1 \cdot 1!\). For the inductive step, suppose no element appears in more than \(r-1\) sets (otherwise, those r sets containing a common element include a sunflower by induction on the restrictions \(A_i \setminus \{x\}\)). Choose a maximal collection of pairwise disjoint sets \(B_1, \ldots, B_t \in \mathcal{F}\). If \(t \geq r\), these form a sunflower with empty core. Otherwise, every set in \(\mathcal{F}\) intersects \(B_1 \cup \cdots \cup B_t\), which has at most \(kt < kr\) elements. Group the sets by their intersection with this union and apply induction.
2.6 Frankl’s Conjecture (Union-Closed Sets Conjecture)
We briefly discuss one of the most famous open problems in extremal set theory.
Definition. A family \(\mathcal{F} \subseteq 2^{[n]}\) is union-closed if for all \(A, B \in \mathcal{F}\), we have \(A \cup B \in \mathcal{F}\).
Frankl's Conjecture (1979). If \(\mathcal{F}\) is a union-closed family of finite sets with \(|\mathcal{F}| \geq 2\), then there exists an element \(x\) that belongs to at least \(|\mathcal{F}|/2\) members of \(\mathcal{F}\).
3. Ramsey Theory: Classical Results
Ramsey theory, broadly understood, asserts that complete disorder is impossible: every large enough structure must contain a well-organized substructure. This idea, first made precise by Frank Ramsey in 1930, has grown into one of the most influential branches of combinatorics.
3.1 Ramsey’s Theorem
Theorem 3.1 (Ramsey, 1930). For any positive integers \(s\) and \(t\), there exists a smallest integer \(R(s,t)\) such that every 2-colouring of the edges of \(K_{R(s,t)}\) (with colours red and blue) contains either a red \(K_s\) or a blue \(K_t\).
Proof. We prove that \(R(s,t) \leq R(s-1,t) + R(s,t-1)\) by induction, with base cases \(R(s,1) = R(1,t) = 1\).
Let \(n = R(s-1,t) + R(s,t-1)\) and consider a 2-colouring (red/blue) of \(K_n\). Pick any vertex v. The remaining \(n - 1 = R(s-1,t) + R(s,t-1) - 1\) vertices are partitioned into \(R\) (red neighbours of v) and \(B\) (blue neighbours of v). By pigeonhole, either \(|R| \geq R(s-1,t)\) or \(|B| \geq R(s,t-1)\).
Case 1: \(|R| \geq R(s-1,t)\). The 2-coloured \(K_{|R|}\) on R contains either a red \(K_{s-1}\) (which together with v gives a red \(K_s\)) or a blue \(K_t\).
Case 2: \(|B| \geq R(s,t-1)\). Similarly, we find a red \(K_s\) or a blue \(K_t\).
In both cases the conclusion holds. The existence of \(R(s,t)\) follows by induction, and the recurrence gives \(R(s,t) \leq \binom{s+t-2}{s-1}\).
3.2 Bounds on Ramsey Numbers
The study of Ramsey numbers — their precise values and asymptotic behaviour — is central to combinatorics. We know embarrassingly few exact values.
Example (Small Ramsey Numbers).
\(R(3,3) = 6\): Colour the edges of \(K_5\) as the Petersen complement — five red and five blue, forming two 5-cycles. This shows \(R(3,3) > 5\). The upper bound \(R(3,3) \leq R(2,3) + R(3,2) = 3 + 3 = 6\) completes the proof.
\(R(3,4) = 9\), \(R(3,5) = 14\), \(R(4,4) = 18\). The value \(R(5,5)\) is unknown; the best bounds are \(43 \leq R(5,5) \leq 48\).
The Erdős–Szekeres Upper Bound
Theorem 3.2 (Erdős–Szekeres, 1935). For all \(s, t \geq 2\),
\[
R(s,t) \leq \binom{s+t-2}{s-1}.
\]
In particular, \(R(k,k) \leq \binom{2k-2}{k-1} \leq \frac{4^{k-1}}{\sqrt{\pi k}}\) (by Stirling's approximation).
The Erdős Probabilistic Lower Bound
In 1947, Erdős proved a remarkable lower bound using the probabilistic method — one of the first applications of this technique.
Theorem 3.3 (Erdős, 1947). If \(\binom{n}{k} \cdot 2^{1 - \binom{k}{2}} < 1\), then \(R(k,k) > n\). In particular,
\[
R(k,k) > \lfloor 2^{k/2} \rfloor.
\]
Proof. Colour each edge of \(K_n\) red or blue independently with probability \(1/2\). For a fixed set \(S\) of k vertices, the probability that S is monochromatic (all red or all blue) is \(2 \cdot 2^{-\binom{k}{2}} = 2^{1 - \binom{k}{2}}\).
By the union bound, the probability that some k-set is monochromatic is at most
\[
\binom{n}{k} \cdot 2^{1-\binom{k}{2}}.
\]
If this is less than 1, there exists a colouring with no monochromatic \(K_k\), so \(R(k,k) > n\). Setting \(n = \lfloor 2^{k/2} \rfloor\) and checking the inequality (for large k) gives the result.
Off-Diagonal Ramsey Numbers
Theorem 3.4. For fixed s and \(t \to \infty\),
\[
c_1 \cdot \frac{t^{(s+1)/2}}{(\log t)^{(s+1)/2 - 1/(s-1)}} \leq R(s,t) \leq c_2 \cdot \frac{t^{s-1}}{(\log t)^{s-2}}
\]
for constants \(c_1, c_2\) depending on s. The upper bound is due to Ajtai-Komlós-Szemerédi (1980) for \(s=3\) and general s due to others. The lower bound for \(s = 3\) is \(R(3,t) = \Theta(t^2/\log t)\), established by Kim (1995, lower bound) and Shearer (upper bound).
3.3 The Diagonal Ramsey Number: Recent Breakthroughs
\[
\sqrt{2}^k \lesssim R(k,k) \lesssim 4^k.
\]
Theorem 3.5 (Campos–Griffiths–Morris–Sahasrabudhe, 2023). There exists a constant \(\varepsilon > 0\) such that
\[
R(k,k) \leq (4 - \varepsilon)^k
\]
for all sufficiently large k.
3.4 Schur’s Theorem
Theorem 3.6 (Schur, 1916). For every positive integer r, there exists a smallest integer \(S(r)\) such that for every r-colouring of \([S(r)]\), there exist x, y, z (not necessarily distinct from each other, but the equation is non-trivial) of the same colour with \(x + y = z\).
Proof. We derive Schur's theorem from Ramsey's theorem. Given an r-colouring \(c: [n] \to [r]\), define an edge colouring of \(K_n\) by colouring the edge \(\{i,j\}\) (with \(i < j\)) with colour \(c(j - i)\). By Ramsey's theorem, if n is large enough, there is a monochromatic triangle \(\{a, b, c\}\) with \(a < b < c\). The edges \(\{a,b\}, \{b,c\}, \{a,c\}\) all have the same colour, meaning \(c(b-a) = c(c-b) = c(c-a)\). Setting \(x = b - a\), \(y = c - b\), \(z = c - a\), we have \(x + y = z\) and all three have the same colour.
Thus \(S(r) \leq R_r(3,3,\ldots,3) - 1\), where the Ramsey number has r colours. More precisely, \(S(r) \leq R(3,3,\ldots,3;r) - 1\).
3.5 Van der Waerden’s Theorem
Van der Waerden’s theorem, proved in 1927, is one of the foundational results of Ramsey theory on the integers.
Theorem 3.7 (Van der Waerden, 1927). For every positive integer r (number of colours) and every positive integer k (length of arithmetic progression), there exists a smallest integer \(W(r,k)\) such that every r-colouring of \([W(r,k)]\) contains a monochromatic arithmetic progression of length k.
The original proof of van der Waerden gives tower-type bounds. We outline how van der Waerden’s theorem follows from the Hales-Jewett theorem (discussed in Section 7).
3.6 Rado’s Theorem and Partition Regularity
Rado’s theorem provides a complete characterization of which systems of linear equations are “partition regular.”
Definition. A system of linear equations \(Ax = 0\) (where A is an \(m \times n\) integer matrix) is partition regular if, for every finite colouring of the positive integers, there is a monochromatic solution \(x_1, \ldots, x_n\) (all the same colour).
Theorem 3.8 (Rado, 1933). A single equation \(c_1 x_1 + c_2 x_2 + \cdots + c_n x_n = 0\) with integer coefficients is partition regular if and only if some non-empty subset of the coefficients sums to zero. More generally, a system \(Ax = 0\) is partition regular if the columns condition is satisfied: the columns of A can be ordered \(v_1, \ldots, v_n\) and partitioned into groups \(B_1, \ldots, B_k\) such that \(\sum_{v \in B_1} v = 0\), and for each \(i > 1\), \(\sum_{v \in B_i} v\) is in the real span of columns from \(B_1 \cup \cdots \cup B_{i-1}\).
Example. The equation \(x + y = z\) (i.e., \(x + y - z = 0\)) is partition regular since the coefficients \(\{1, -1\}\) or \(\{1, -1\}\) — actually, since \(1 + (-1) = 0\) (taking the subset \(\{1, -1\}\)), Rado's condition is satisfied. This recovers Schur's theorem. The equation \(x + y = 2z\) is also partition regular (this gives monochromatic 3-term arithmetic progressions — a special case of van der Waerden's theorem). However, \(x + y = 3z\) is not partition regular, as no non-empty subset of \(\{1, 1, -3\}\) sums to zero.
3.7 Ramsey’s Theorem for Hypergraphs
The original theorem of Ramsey applies to hypergraphs as well.
Theorem 3.9 (Ramsey, 1930 — Hypergraph Version). For all positive integers \(r, k, s, t\) with \(k \geq 1\), there exists a smallest integer \(R_k(s,t)\) such that every 2-colouring of the k-element subsets of an \(R_k(s,t)\)-element set contains either a red set of size s (all whose k-subsets are red) or a blue set of size t.
4. Algebraic and Linear Algebraic Methods
Some of the most elegant proofs in extremal combinatorics use algebra — particularly linear algebra and polynomial methods. The general philosophy is: translate a combinatorial problem into a question about linear independence, dimension, or polynomial vanishing, then use algebraic constraints to derive the desired bounds.
4.1 The Dimension Argument
The simplest algebraic technique is the dimension argument: if we can associate linearly independent vectors with the objects we are counting, then their number is at most the dimension of the ambient space.
Theorem 4.1 (Oddtown Theorem). Let \(A_1, \ldots, A_m \subseteq [n]\) be such that \(|A_i|\) is odd for all i, and \(|A_i \cap A_j|\) is even for all \(i \neq j\). Then \(m \leq n\).
Proof. Work over \(\mathbb{F}_2\). Represent each set \(A_i\) by its characteristic vector \(v_i \in \mathbb{F}_2^n\). The condition \(|A_i \cap A_j| \equiv 0 \pmod{2}\) for \(i \neq j\) translates to \(\langle v_i, v_j \rangle = 0\) over \(\mathbb{F}_2\), while \(|A_i| \equiv 1 \pmod{2}\) means \(\langle v_i, v_i \rangle = 1\).
We claim the vectors \(v_1, \ldots, v_m\) are linearly independent over \(\mathbb{F}_2\). Suppose \(\sum_{i \in S} v_i = 0\) for some non-empty \(S \subseteq [m]\). Taking the inner product with \(v_j\) for \(j \in S\):
\[
0 = \left\langle \sum_{i \in S} v_i, v_j \right\rangle = \sum_{i \in S} \langle v_i, v_j \rangle = \langle v_j, v_j \rangle = 1,
\]
a contradiction. Therefore \(m \leq n\).
Theorem 4.2 (Eventown Theorem). Let \(A_1, \ldots, A_m \subseteq [n]\) be such that \(|A_i \cap A_j|\) is even for all \(i, j\) (including \(i = j\), i.e., all sets have even size). Then \(m \leq 2^{\lfloor n/2 \rfloor}\).
Proof. Over \(\mathbb{F}_2\), the characteristic vectors \(v_1, \ldots, v_m\) satisfy \(\langle v_i, v_j \rangle = 0\) for all i, j. Thus the span of \(\{v_1, \ldots, v_m\}\) is a totally isotropic subspace of \(\mathbb{F}_2^n\), which has dimension at most \(\lfloor n/2 \rfloor\). Hence \(m \leq 2^{\lfloor n/2 \rfloor}\).
4.2 Fisher’s Inequality
Theorem 4.3 (Fisher's Inequality). If \(\mathcal{F} = \{A_1, \ldots, A_m\}\) is a family of subsets of \([n]\) with \(|A_i \cap A_j| = \lambda\) for all \(i \neq j\) (a 1-design or uniform Fisher family), and \(|A_i| > \lambda\) for all i, then \(m \leq n\).
Proof. Over \(\mathbb{R}\), consider the incidence matrix \(M\) whose rows are the characteristic vectors of \(A_1, \ldots, A_m\). The Gram matrix \(G = MM^T\) has diagonal entries \(|A_i|\) and off-diagonal entries \(\lambda\). We can write \(G = (\lambda)J + D\) where J is the all-ones matrix and D is diagonal with entries \(|A_i| - \lambda > 0\). Since D is positive definite and \((\lambda)J\) is positive semidefinite, G is positive definite, so \(\operatorname{rank}(G) = m\). Since \(G = MM^T\), we have \(m = \operatorname{rank}(G) \leq \operatorname{rank}(M) \leq n\).
4.3 The Polynomial Method
The polynomial method is one of the most powerful paradigms in combinatorics. The key idea: represent combinatorial objects as zeros of polynomials, then use the algebraic structure of polynomial rings to derive bounds.
The Combinatorial Nullstellensatz
Theorem 4.4 (Alon, 1999 — Combinatorial Nullstellensatz). Let \(\mathbb{F}\) be a field and let \(f \in \mathbb{F}[x_1, \ldots, x_n]\) be a polynomial of degree \(\sum_{i=1}^n t_i\), where each \(t_i\) is a non-negative integer. Suppose the coefficient of \(\prod_{i=1}^n x_i^{t_i}\) in f is nonzero. Then for any sets \(S_1, \ldots, S_n \subseteq \mathbb{F}\) with \(|S_i| > t_i\), there exists \((a_1, \ldots, a_n) \in S_1 \times \cdots \times S_n\) with \(f(a_1, \ldots, a_n) \neq 0\).
The Chevalley–Warning Theorem and Applications
Theorem 4.5 (Chevalley–Warning). Let \(f_1, \ldots, f_m \in \mathbb{F}_p[x_1, \ldots, x_n]\) with \(\sum_{j=1}^m \deg(f_j) < n\). Let \(V = \{x \in \mathbb{F}_p^n : f_1(x) = \cdots = f_m(x) = 0\}\). Then \(p \mid |V|\). In particular, if \(V \neq \emptyset\) (e.g., if all \(f_j\) vanish at the origin), then \(|V| \geq p\).
4.4 The Alon–Babai–Suzuki Theorem
Generalizing the oddtown and eventown results, the Alon-Babai-Suzuki theorem provides modular intersection bounds.
Theorem 4.6 (Alon–Babai–Suzuki, 1991). Let p be a prime, and let \(\mathcal{F} = \{A_1, \ldots, A_m\}\) be a family of subsets of \([n]\) with \(|A_i| \equiv 0 \pmod{p}\) for all i, and \(|A_i \cap A_j| \not\equiv 0 \pmod{p}\) for all \(i \neq j\). Then \(m \leq n\).
The proof uses the multilinear polynomial technique: represent the intersection conditions via polynomials over \(\mathbb{F}_p\), show the associated vectors are linearly independent, and use dimension counting.
4.5 The Caro–Wei Bound
We include a result from graph theory that beautifully illustrates the probabilistic–algebraic interplay.
Theorem 4.7 (Caro, 1979; Wei, 1981). Every graph G has an independent set of size at least
\[
\sum_{v \in V(G)} \frac{1}{\deg(v) + 1}.
\]
Proof. Let \(\sigma\) be a uniformly random permutation of \(V(G)\). Define a vertex v to be locally first if v appears before all its neighbours in \(\sigma\). The set of locally first vertices is an independent set (no two adjacent vertices can both be locally first).
The probability that v is locally first is \(1/(\deg(v) + 1)\), since among v and its \(\deg(v)\) neighbours, each is equally likely to be first. By linearity of expectation, the expected size of the independent set is \(\sum_v 1/(\deg(v) + 1)\). Hence an independent set of at least this size exists.
4.6 The Linear Algebra Bound Method (Babai–Frankl)
The general framework for applying linear algebra to extremal combinatorics was systematized by Babai and Frankl. The idea is:
- Associate to each combinatorial object a vector in some vector space \(V\).
- Translate the combinatorial constraints into linear algebraic conditions (orthogonality, linear independence, etc.).
- Use the dimension of \(V\) to bound the number of objects.
Theorem 4.8 (Non-uniform Ray-Chaudhuri–Wilson). Let \(L = \{\ell_1, \ldots, \ell_s\}\) be a set of s non-negative integers. If \(\mathcal{F} = \{A_1, \ldots, A_m\} \subseteq 2^{[n]}\) satisfies \(|A_i \cap A_j| \in L\) for all \(i \neq j\), then
\[
m \leq \binom{n}{s}.
\]
Proof Sketch. For each set \(A_i\), define the polynomial \(f_i(x) = \prod_{l \in L}(\sum_{j \in A_i} x_j - l)\) restricted to multilinear polynomials. The key properties are: \(f_i(\mathbf{1}_{A_j}) = 0\) for \(i \neq j\) (since \(|A_i \cap A_j| \in L\)), and \(f_i(\mathbf{1}_{A_i}) \neq 0\) (since \(|A_i| \notin L\) can be arranged). The polynomials \(f_1, \ldots, f_m\) are linearly independent over \(\mathbb{R}\), and each has degree at most s as a multilinear polynomial, so they live in a space of dimension \(\sum_{j=0}^{s} \binom{n}{j} \leq \binom{n}{s}\) (for the appropriate range).
5. The Szemerédi Regularity Lemma in Extremal Combinatorics
The Szemerédi regularity lemma is one of the most powerful tools in combinatorics. Introduced by Szemerédi in 1975 as a key ingredient in his proof of the Erdős–Turán conjecture on arithmetic progressions, it has since become a cornerstone of graph theory, with applications ranging from extremal combinatorics to number theory to theoretical computer science.
5.1 Statement of the Regularity Lemma
Definition (\(\varepsilon\)-regular pair). Let G be a graph and let \(A, B\) be disjoint subsets of \(V(G)\). The density of the pair \((A,B)\) is
\[
d(A,B) = \frac{e(A,B)}{|A||B|},
\]
where \(e(A,B)\) is the number of edges between A and B. The pair \((A,B)\) is \(\varepsilon\)-regular if for every \(A' \subseteq A\) with \(|A'| \geq \varepsilon |A|\) and every \(B' \subseteq B\) with \(|B'| \geq \varepsilon |B|\), we have
\[
|d(A',B') - d(A,B)| < \varepsilon.
\]
Informally, an \(\varepsilon\)-regular pair behaves like a random bipartite graph: the edge density is approximately uniform across all large subsets. The regularity lemma says every dense graph can be partitioned into a bounded number of parts, almost all pairs of which are \(\varepsilon\)-regular.
Theorem 5.1 (Szemerédi Regularity Lemma, 1975). For every \(\varepsilon > 0\) and every positive integer \(m_0\), there exist integers \(M = M(\varepsilon, m_0)\) and \(N = N(\varepsilon, m_0)\) such that every graph G on \(n \geq N\) vertices admits a partition \(V(G) = V_0 \cup V_1 \cup \cdots \cup V_k\) with the following properties:
(i) \(m_0 \leq k \leq M\),
(ii) \(|V_0| \leq \varepsilon n\) (the "exceptional" set),
(iii) \(|V_1| = |V_2| = \cdots = |V_k|\),
(iv) All but at most \(\varepsilon \binom{k}{2}\) pairs \((V_i, V_j)\) with \(1 \leq i < j \leq k\) are \(\varepsilon\)-regular.
5.2 Key Lemmas: Embedding and Counting
The power of the regularity lemma comes through two companion results.
Theorem 5.2 (Embedding Lemma / Key Lemma). Let H be a graph with vertex set \(\{1, \ldots, h\}\) and let \(\varepsilon > 0\) and \(d > 0\). If \(\varepsilon < d^h / (2h)\) (a sufficient condition), then the following holds: suppose \(V_1, \ldots, V_h\) are pairwise disjoint vertex sets with \(|V_i| \geq m\), and for every edge \(\{i,j\}\) of H, the pair \((V_i, V_j)\) is \(\varepsilon\)-regular with density at least d. Then the number of labelled copies of H with vertex i in \(V_i\) is at least
\[
(d - \varepsilon)^{e(H)} \prod_{i=1}^{h} |V_i| - \text{lower order terms}.
\]
In particular, at least one copy of H exists.
Theorem 5.3 (Counting Lemma). Under the same conditions as the embedding lemma, the number of copies of H is
\[
\left(\prod_{\{i,j\} \in E(H)} d(V_i, V_j) \pm c(\varepsilon, h)\right) \prod_{i=1}^{h} |V_i|,
\]
where \(c(\varepsilon, h) \to 0\) as \(\varepsilon \to 0\) for fixed h.
5.3 The Graph Removal Lemma
One of the most important consequences of the regularity lemma is the graph removal lemma, which has far-reaching applications.
Theorem 5.4 (Graph Removal Lemma — Ruzsa-Szemerédi, 1978; general version Erdős-Frankl-Rödl, Alon et al.). For every graph H on h vertices and every \(\varepsilon > 0\), there exists \(\delta > 0\) such that: if a graph G on n vertices contains at most \(\delta n^h\) copies of H, then G can be made H-free by removing at most \(\varepsilon n^2\) edges.
Proof Sketch. Apply the regularity lemma to G with an appropriate \(\varepsilon'\) (depending on \(\varepsilon\) and H). Remove all edges within the exceptional set, all edges in irregular pairs, and all edges in regular pairs of density below a threshold d. This removes at most \(\varepsilon n^2 / 2\) edges. The remaining graph is "clean" — it consists of regular pairs of density at least d. If G still contains a copy of H, the counting lemma implies there are at least \(c \cdot n^h\) copies, contradicting the assumption that there are at most \(\delta n^h\) for small enough \(\delta\). Hence the cleaned graph is H-free, and we removed at most \(\varepsilon n^2\) edges in total.
Corollary 5.5 (Triangle Removal Lemma). For every \(\varepsilon > 0\), there exists \(\delta > 0\) such that any graph on n vertices with at most \(\delta n^3\) triangles can be made triangle-free by removing at most \(\varepsilon n^2\) edges.
5.4 Ramsey–Turán Problems
The interplay between Ramsey theory and Turán-type questions leads to the Ramsey-Turán theory.
Definition. The Ramsey-Turán number \(\mathrm{RT}(n, H, f(n))\) is the maximum number of edges in a graph on n vertices that contains no copy of H and has independence number at most \(f(n)\).
Theorem 5.6 (Szemerédi, 1972; Erdős-Sós, 1970).
\[
\mathrm{RT}(n, K_4, o(n)) = \left(\frac{1}{8} + o(1)\right)n^2.
\]
This contrasts with \(\operatorname{ex}(n, K_4) = (1/3 + o(1))\binom{n}{2}\) from Turán's theorem, showing that the additional independence number constraint dramatically changes the extremal number.
5.5 The Sparse Regularity Lemma
The classical regularity lemma applies only to dense graphs (with \(\Theta(n^2)\) edges). Extending it to sparse graphs requires significant new ideas.
Theorem 5.7 (Kohayakawa, 1997; Rödl, 1997 — Sparse Regularity Lemma). There is an analogue of the regularity lemma for subgraphs of the random graph \(G(n,p)\), where density is measured relative to \(p\). Specifically, for every \(\varepsilon > 0\) and integer \(m_0\), if G is a subgraph of \(G(n,p)\) with \(p \gg n^{-1/d}\) for appropriate d, then G admits a regular partition in the relative sense.
6. Flag Algebras and Modern Methods
The past two decades have seen a revolution in extremal combinatorics through the introduction of analytic and algebraic methods that go far beyond classical combinatorial reasoning. Central among these is Razborov’s flag algebra framework, which has resolved or made progress on numerous long-standing conjectures.
6.1 Graph Limits and Graphons
To motivate flag algebras, we first discuss the theory of graph limits, developed by Lovász and Szegedy (2006).
Definition (Homomorphism Density). For graphs F and G, the homomorphism density is
\[
t(F, G) = \frac{|\operatorname{Hom}(F, G)|}{|V(G)|^{|V(F)|}},
\]
where \(\operatorname{Hom}(F, G)\) is the set of graph homomorphisms from F to G. The subgraph density is
\[
p(F, G) = \Pr[\text{random } |V(F)|\text{-subset of } V(G) \text{ induces a copy of } F].
\]
Definition (Graphon). A graphon is a symmetric measurable function \(W: [0,1]^2 \to [0,1]\). A graphon represents the limit of a convergent graph sequence \((G_n)\) if for every fixed graph F,
\[
t(F, G_n) \to t(F, W) = \int_{[0,1]^{|V(F)|}} \prod_{\{i,j\} \in E(F)} W(x_i, x_j) \, dx_1 \cdots dx_{|V(F)|}.
\]
Theorem 6.1 (Lovász–Szegedy, 2006). Every convergent sequence of graphs has a graphon limit, and conversely, every graphon is the limit of some graph sequence. The space of graphons (up to measure-preserving bijections) is compact.
Example. The Turán graph \(T(n,r)\), as \(n \to \infty\), converges to the graphon \(W(x,y) = 1 - \sum_{i=1}^{r} \mathbf{1}_{[(i-1)/r, i/r]}(x) \cdot \mathbf{1}_{[(i-1)/r, i/r]}(y)\), which is 1 except on \(r\) squares of side \(1/r\) along the diagonal.
6.2 Quasirandomness
The theory of graph quasirandomness, initiated by Thomason (1987) and developed by Chung, Graham, and Wilson (1989), reveals that several seemingly different notions of “random-like” behaviour in graphs are in fact equivalent.
Theorem 6.2 (Chung–Graham–Wilson, 1989). For a sequence of graphs \((G_n)\) with \(|V(G_n)| = n\) and edge density \(e(G_n)/\binom{n}{2} \to p\), the following properties are equivalent:
(P1) Subgraph counts: For every graph H, \(p(H, G_n) \to p^{e(H)}\).
(P2) 4-cycle count: \(p(C_4, G_n) \to p^4\).
(P3) Codegree: For all but \(o(n)\) vertices u, the number of common neighbours of u and v is \((p^2 + o(1))n\) for all but \(o(n)\) vertices v.
(P4) Eigenvalue gap: \(\lambda_1(G_n) = (p + o(1))n\) and \(|\lambda_2(G_n)| = o(n)\), where \(\lambda_1 \geq \lambda_2 \geq \cdots\) are the eigenvalues of the adjacency matrix.
(P5) Discrepancy: For all \(S, T \subseteq V(G_n)\), \(|e(S,T) - p|S||T|| = o(n^2)\).
6.3 Razborov’s Flag Algebra Framework
Flag algebras, introduced by Alexander Razborov in 2007, provide a systematic algebraic framework for establishing inequalities between subgraph densities.
Definition (Informal). A flag is a graph together with a labelled subgraph (the "type"). The flag algebra \(\mathcal{A}\) (for a given type) is a commutative algebra whose elements represent linear combinations of graph densities, subject to the relations that hold for all sufficiently large graphs.
The power of flag algebras comes from the following observation: if \(\phi\) is a homomorphism from the flag algebra to \(\mathbb{R}\) (representing the densities in some limit graph), then:
- \(\phi\) maps non-negative elements (sums of squares) to non-negative reals.
- Proving \(\phi(f) \geq 0\) for all valid \(\phi\) establishes a universal inequality on subgraph densities.
- These non-negativity conditions can be verified via semidefinite programming (SDP).
Theorem 6.3 (Razborov, 2010). Among all graphs with edge density \(1/2\), the minimum triangle density is
\[
p(K_3, G) \geq \frac{1}{2} - \frac{1}{2}\left(1 - 2 \cdot \frac{e(G)}{\binom{n}{2}}\right)^{3/2}.
\]
For edge density exactly \(1/2\), the minimum triangle density is \(1/2 - 1/(2\sqrt{2}) \approx 0.1464\), achieved by a random-like construction. This resolves a conjecture of Erdős on the "pentagon problem" for triangles.
Proof Sketch. One writes the triangle density as an element of the flag algebra, then expresses the difference between \(p(K_3, G)\) and the conjectured lower bound as a sum of squares of flag algebra elements (corresponding to a positive semidefinite matrix). The sum-of-squares decomposition is found numerically via SDP, then verified exactly (rationalizing the coefficients). The extremal graph is identified by analyzing when equality holds.
6.4 The Container Method
The container method, developed independently by Balogh, Morris, and Samotij (2015) and Saxton and Thomason (2015), is one of the most versatile modern tools in combinatorics.
Theorem 6.4 (Hypergraph Container Lemma — Informal). For a uniform hypergraph H on vertex set V with "not too many" edges, there exists a small collection \(\mathcal{C}\) of "containers" — subsets of V that are "almost independent" (contain few edges) — such that every independent set of H is contained in some container. The number of containers is much smaller than \(2^{|V|}\), and each container is close to the expected size of a random independent set.
7. Advanced Ramsey Theory
We now turn to deeper results in Ramsey theory, including the Hales-Jewett theorem, Euclidean Ramsey theory, and recent breakthroughs.
7.1 The Hales–Jewett Theorem
The Hales-Jewett theorem is in many ways the most fundamental result in Ramsey theory, as many other results (van der Waerden, Szemerédi, etc.) can be derived from it.
Definition (Combinatorial Line). Let \([k]^n\) denote the set of words of length n over the alphabet \([k] = \{1, 2, \ldots, k\}\). A combinatorial line is a set of k words \(\{w_1, \ldots, w_k\} \subseteq [k]^n\) obtained by choosing a non-empty set of "active" coordinates \(S \subseteq [n]\) and fixing the inactive coordinates: for each \(j \in [k]\), the word \(w_j\) has the value j at all active coordinates and the same fixed value at all inactive coordinates.
Example. In \(\{1,2,3\}^4\), with active coordinates \(\{1,3\}\) and inactive values \((*, 2, *, 1)\), the combinatorial line is \(\{(1,2,1,1), (2,2,2,1), (3,2,3,1)\}\).
Theorem 7.1 (Hales–Jewett, 1963). For every positive integer k (alphabet size) and every positive integer r (number of colours), there exists an integer \(\mathrm{HJ}(k, r)\) such that for every \(n \geq \mathrm{HJ}(k, r)\), every r-colouring of \([k]^n\) contains a monochromatic combinatorial line.
We now present Shelah’s proof, which gives primitive recursive bounds.
Proof Sketch (Shelah, 1988). Shelah's proof gives a much better bound than the original Hales-Jewett argument. The key innovation is a "colour focusing" technique combined with a product argument. The proof proceeds by double induction on the number of colours r and the alphabet size k.
Base case: For \(k = 1\), the result is trivial. For \(r = 1\), any single word forms a (trivial) monochromatic line.
Inductive step: Assume the theorem for \((k, r-1)\) and for \((k-1, r)\). Shelah's argument constructs a high-dimensional cube \([k]^N\) by iterating the following:
(1) Partition \([k]^N\) into "slices" based on the first group of coordinates. By the inductive hypothesis for fewer colours, within each slice we can find large monochromatic combinatorial subspaces.
(2) The "colour focusing" step: by the Hales-Jewett theorem for a smaller alphabet (induction on k), the patterns of colours across slices must contain a monochromatic structure.
(3) Combining these gives a monochromatic line in the full cube. The dimension N grows as an iterated exponential (primitive recursive), a dramatic improvement over the original proof which gave Ackermann-type bounds.
7.2 The Graham–Rothschild Theorem
The Graham-Rothschild theorem generalizes the Hales-Jewett theorem from lines to higher-dimensional combinatorial subspaces.
Definition. A combinatorial m-subspace of \([k]^n\) is a set of \(k^m\) words parameterized by \([k]^m\), obtained by choosing m disjoint non-empty sets of active coordinates \(S_1, \ldots, S_m \subseteq [n]\) and fixing the remaining coordinates. For a parameter word \((a_1, \ldots, a_m) \in [k]^m\), the corresponding word in \([k]^n\) has value \(a_i\) at all coordinates in \(S_i\) and the fixed values elsewhere.
Theorem 7.2 (Graham–Rothschild, 1971). For all positive integers \(k, m, r\), there exists \(N\) such that every r-colouring of the combinatorial m-subspaces of \([k]^N\) contains a monochromatic combinatorial \((m+1)\)-subspace. More generally, every r-colouring of the m-parameter words contains a monochromatic \((m+p)\)-parameter word for any given p.
7.3 Euclidean Ramsey Theory
Euclidean Ramsey theory studies which geometric configurations must appear monochromatically in any finite colouring of \(\mathbb{R}^d\).
Definition. A finite set \(X \subseteq \mathbb{R}^d\) is Ramsey if for every positive integer r, there exists a dimension D such that every r-colouring of \(\mathbb{R}^D\) contains a monochromatic congruent copy of X.
Theorem 7.3 (Erdős–Graham–Montgomery–Rothschild–Spencer–Straus, 1973). Every finite set that can be embedded on a sphere is Ramsey. In particular, all simplices (including equilateral triangles, regular tetrahedra, etc.) are Ramsey. Conversely, any Ramsey set must be embeddable on a sphere.
7.4 Folkman’s Theorem and Partition Regularity
Theorem 7.4 (Folkman, 1970; Rado-Folkman-Sanders). For every finite colouring of the positive integers and every \(k\), there exists a finite set \(B\) such that all subset sums of B (including each element of B itself) up to size k are monochromatic. Equivalently, there exists a monochromatic set \(\{x_1, \ldots, x_m\}\) such that all sums of at most k distinct elements have the same colour.
7.5 The Ramsey Multiplicity Problem
Beyond the existence of monochromatic subgraphs, one can ask: how many monochromatic copies must there be?
Definition. The Ramsey multiplicity \(M(H)\) of a graph H is the minimum, over all 2-colourings of \(K_n\), of the number of monochromatic copies of H, divided by \(n^{|V(H)|}\), in the limit as \(n \to \infty\).
Theorem 7.5 (Goodman, 1959). Every 2-colouring of \(K_n\) contains at least
\[
\frac{1}{4}\binom{n}{3}(1 + o(1))
\]
monochromatic triangles. Equivalently, \(M(K_3) = 1/4\) of the expected count under a random colouring. The extremal colouring is (close to) a random colouring or a balanced "quasi-random" colouring.
7.6 Graph Ramsey Numbers for Sparse Graphs
For sparse graphs (e.g., bounded-degree graphs, trees, cycles), the Ramsey numbers are much smaller than for cliques.
Theorem 7.6 (Chvátal–Rödl–Szemerédi–Trotter, 1983). There exists a constant C depending only on \(\Delta\) such that for every graph H on n vertices with maximum degree at most \(\Delta\),
\[
R(H, H) \leq Cn.
\]
That is, bounded-degree graphs have linear Ramsey numbers.
Proof Sketch. The proof uses the regularity lemma. Apply the regularity lemma to a 2-coloured \(K_N\), obtaining a regular partition. Since the partition has a bounded number of parts and one colour class has density at least \(1/2\), the embedding lemma guarantees a copy of the bounded-degree graph H in that colour class, provided \(N\) is large enough relative to n.
Theorem 7.7. For cycles, \(R(C_n, C_n) = 2n - 1\) for odd n (Faudree-Schelp, 1974; Rosta, 1973) and \(R(C_n, C_n) = \frac{3n}{2} - 1\) for even n (Rosta, 1973; Faudree-Schelp, 1974). For trees, \(R(T_n, T_n) \leq Cn\) with \(C\) depending on the maximum degree of the tree.
7.7 Recent Advances: The Campos–Griffiths–Morris–Sahasrabudhe Breakthrough
We return to the diagonal Ramsey number, the most famous problem in the field.
Theorem 7.8 (Campos–Griffiths–Morris–Sahasrabudhe, 2023).
\[
R(k,k) \leq (4 - \varepsilon)^k
\]
for some absolute constant \(\varepsilon > 0\) and all sufficiently large k.
The proof introduces a new approach: the “book algorithm.” We describe the high-level strategy.
Proof Sketch. The classical Erdős-Szekeres proof of \(R(k,k) \leq 4^k\) proceeds by the greedy algorithm: pick a vertex v, and step to either its red or blue neighbourhood, whichever is larger. This gives the recurrence \(R(k,k) \leq 2R(k-1,k) \leq 2 \cdot 2R(k-1,k-1)\), yielding the \(4^k\) bound.
Campos et al. modify this algorithm. They track not just the size of the red and blue neighbourhoods, but also a "book" — a collection of vertices that are connected to the current vertex set by many edges of both colours. At each step, the algorithm either:
(a) Steps to a large monochromatic neighbourhood as before, or
(b) Uses the book to find a large set where the bipartite density between it and the remaining vertices is "balanced" (close to 1/2 in both colours).
The key insight is that when the bipartite graph between the current set and the book is unbalanced, one of the two colours has density bounded away from 1/2, and in this case the stepping algorithm performs strictly better than the classical one. The saving is then quantified to give the \((4 - \varepsilon)^k\) bound.
7.8 Canonical Ramsey Theorems
Not every colouring yields a monochromatic structure, but with enough colours, a more nuanced conclusion holds.
Theorem 7.9 (Erdős–Rado Canonical Ramsey Theorem, 1950). For every positive integer k, there exists an integer \(n\) such that for every colouring (with any number of colours) of the k-element subsets of \([n]\), there exists a set \(S \subseteq [n]\) of size \(n\) such that the colouring restricted to \(\binom{S}{k}\) is canonical: the colour of a k-set \(\{a_1, \ldots, a_k\}\) (with \(a_1 < \cdots < a_k\)) depends only on some fixed subset of the positions \(\{1, \ldots, k\}\) — that is, there exists \(I \subseteq [k]\) such that two k-sets receive the same colour if and only if they agree on the coordinates indexed by I.
7.9 Ramsey Theory and Infinite Combinatorics
We briefly touch on infinite Ramsey theory, which connects combinatorics to set theory and logic.
Theorem 7.10 (Infinite Ramsey Theorem). For any positive integer k and any finite colouring of the k-element subsets of an infinite set X, there exists an infinite monochromatic set \(Y \subseteq X\).
Proof. We prove this by induction on k. For \(k = 1\), this is the infinite pigeonhole principle: if \(X\) is infinite and finitely coloured, some colour class is infinite.
For the inductive step, let \(c: \binom{X}{k} \to [r]\) be a colouring. Pick any \(x_1 \in X\). The colouring c induces an \(r\)-colouring of the \((k-1)\)-element subsets of \(X \setminus \{x_1\}\) via \(S \mapsto c(\{x_1\} \cup S)\). By induction, there exists an infinite set \(X_1 \subseteq X \setminus \{x_1\}\) that is monochromatic for this induced colouring; let its colour be \(c_1\).
Now pick \(x_2 \in X_1\), and repeat: obtain an infinite monochromatic set \(X_2 \subseteq X_1 \setminus \{x_2\}\) with colour \(c_2\) for the colouring induced by \(x_2\). Continuing, we get a sequence \(x_1, x_2, x_3, \ldots\) and colours \(c_1, c_2, \ldots \in [r]\). By pigeonhole, some colour c appears infinitely often. The corresponding subsequence \(\{x_{i_1}, x_{i_2}, \ldots\}\) is monochromatic: any k-set \(\{x_{i_{j_1}}, \ldots, x_{i_{j_k}}\}\) (with \(j_1 < \cdots < j_k\)) has colour \(c_{i_{j_1}} = c\) because \(\{x_{i_{j_2}}, \ldots, x_{i_{j_k}}\} \subseteq X_{i_{j_1}}\).
Theorem 7.11 (Ramsey Cardinals — Brief Mention). In set theory, a cardinal \(\kappa\) is Ramsey if \(\kappa \to (\kappa)^{<\omega}_2\) — i.e., every 2-colouring of the finite subsets of \(\kappa\) admits an infinite homogeneous set. The existence of Ramsey cardinals cannot be proved in ZFC; it is a large cardinal axiom strictly stronger than the existence of measurable cardinals. The connection between Ramsey-type partition properties and large cardinals is a central theme in set theory.
7.10 Ramsey Theory and Model Theory
We briefly mention a beautiful connection between Ramsey theory and model theory.
Summary of Key Results
We conclude by collecting the main bounds and results discussed in these notes.
The interplay between extremal combinatorics and Ramsey theory — through tools like the regularity lemma, the probabilistic method, algebraic techniques, flag algebras, and the container method — makes this one of the most dynamic and interconnected areas of modern mathematics. The subject continues to produce surprising connections and breakthrough results, as evidenced by the recent resolution of long-standing conjectures on diagonal Ramsey numbers, the sunflower conjecture, and union-closed sets.