CO 741: Extremal Combinatorics and Ramsey Theory
Estimated study time: 1 hr 32 min
Table of contents
These notes synthesize material from S. Jukna’s Extremal Combinatorics, R. Graham, B. Rothschild, and J. Spencer’s Ramsey Theory, N. Alon and J. Spencer’s The Probabilistic Method, and D. Conlon, J. Fox, and B. Sudakov’s survey articles, enriched with material from CMU 21-738 (P.-S. Loh) and Y. Zhao’s Graph Theory and Additive Combinatorics.
1. Extremal Graph Theory: Turán-type Problems
Extremal graph theory asks one of the most natural questions in combinatorics: given a forbidden subgraph \(H\), what is the maximum number of edges a graph on \(n\) vertices can have without containing a copy of \(H\)? This deceptively simple question, first systematically studied by Pál Turán during World War II, has blossomed into one of the richest areas of modern combinatorics, with deep connections to algebra, probability, analysis, and theoretical computer science.
1.1 The Turán Problem and Turán’s Theorem
We begin with the fundamental definitions that frame the entire subject.
The first and most fundamental result in extremal graph theory determines \(\operatorname{ex}(n, K_{r+1})\) exactly.
One can verify that
\[ t(n,r) = \binom{n}{2} - \sum_{i=1}^{r} \binom{|V_i|}{2} = \left(1 - \frac{1}{r}\right)\frac{n^2}{2} + O(n), \]where \(V_1, \ldots, V_r\) are the parts. Now we state the celebrated theorem.
Moreover, the Turán graph \(T(n,r)\) is the unique extremal graph.
Turán’s theorem admits several beautiful proofs, each illuminating different aspects of the result. We present three.
Every other vertex \(w\) is adjacent to at most \(r-1\) of the \(v_i\), since otherwise \(\{w, v_1, \ldots, v_r\}\) would form a \(K_{r+1}\). Removing the r vertices \(v_1, \ldots, v_r\), the remaining graph \(G'\) on \(n - r\) vertices is still \(K_{r+1}\)-free, so by induction \(e(G') \leq t(n-r, r)\). The total number of edges is: \[ e(G) = e(G') + \binom{r}{2} + \sum_{w \notin \{v_1,\ldots,v_r\}} |N(w) \cap \{v_1,\ldots,v_r\}| \leq t(n-r,r) + \binom{r}{2} + (n-r)(r-1). \]
A direct calculation shows this equals \(t(n,r)\). Uniqueness follows from analyzing the equality conditions.
Suppose two non-adjacent vertices u and v exist with \(\deg(u) \leq \deg(v)\). We perform Zykov symmetrization: replace u with a copy of v — that is, delete all edges incident to u, then for every neighbour w of v, add the edge \(uw\). The resulting graph G' has at least as many edges as G (since \(\deg(v) \geq \deg(u)\)), and G' is still \(K_{r+1}\)-free because any clique in G' containing u corresponds to a clique in G with u replaced by v.
Repeating this process, we eventually obtain a graph where every pair of non-adjacent vertices has the same neighbourhood — i.e., a complete multipartite graph. The number of edges is maximized when the parts are as equal as possible (by convexity of \(\binom{x}{2}\)), giving the Turán graph \(T(n,r)\).
Motzkin-Straus Theorem: If \(G\) is \(K_{r+1}\)-free, then \(\max f \leq \frac{1}{2}\left(1 - \frac{1}{r}\right)\), with equality achieved when the weights are uniformly distributed on an r-clique.
To see this, observe that at the maximum, the weight is supported on a clique (any non-edge with positive weights on both endpoints can have its weight redistributed to increase \(f\)). For a clique of size \(s \leq r\), the maximum of \(f\) with uniform weights \(1/s\) each is \(\binom{s}{2}/s^2 = (1 - 1/s)/2 \leq (1 - 1/r)/2\).
Setting \(x_i = 1/n\) for all i, we get \(f = e(G)/n^2\), so \(e(G)/n^2 \leq (1 - 1/r)/2\), yielding \(e(G) \leq (1 - 1/r)n^2/2\). The exact bound \(t(n,r)\) follows by optimizing over integer part sizes.
1.2 The Erdős–Stone–Simonovits Theorem
Turán’s theorem determines \(\operatorname{ex}(n, K_{r+1})\) exactly. What about general graphs \(H\)? The Erdős–Stone theorem, often called the “fundamental theorem of extremal graph theory,” answers this asymptotically.
This limit exists by a standard averaging argument (or by appealing to the Kruskal-Katona machinery).
Equivalently, \(\pi(H) = 1 - \frac{1}{\chi(H) - 1}\).
The beauty of this result is that the asymptotic extremal number depends only on the chromatic number of \(H\). For non-bipartite graphs (\(\chi(H) \geq 3\)), the Turán density is strictly between 0 and 1, and is completely determined by \(\chi(H)\). We give a proof sketch.
The upper bound is the deep part. The Erdős–Stone theorem states that for every \(r \geq 1\), \(\varepsilon > 0\), and integer \(t\), there exists \(n_0\) such that every graph on \(n \geq n_0\) vertices with at least \(t(n,r) + \varepsilon n^2\) edges contains a copy of \(K_{r+1}(t)\) — the complete \((r+1)\)-partite graph with each part of size \(t\). Since any graph with chromatic number \(r+1\) is a subgraph of some \(K_{r+1}(t)\), this gives the desired upper bound.
The proof of the Erdős–Stone theorem itself proceeds via the regularity lemma (Section 5), or via Ramsey-type arguments and dependent random choice.
1.3 The Kővári–Sós–Turán Theorem and the Zarankiewicz Problem
For bipartite forbidden subgraphs, the Zarankiewicz problem asks for better bounds.
In particular, \(\operatorname{ex}(n, K_{s,t}) = O(n^{2 - 1/s})\).
Thus \(\sum_{v} \binom{\deg(v)}{s} \leq (t-1)\binom{n}{s}\). By the convexity of \(\binom{x}{s}\) and Jensen's inequality, \[ n \cdot \binom{2e(G)/n}{s} \leq (t-1)\binom{n}{s}, \]
where \(e(G)\) is the number of edges. For \(s = 2\), this simplifies to
\[ \frac{n \cdot (2e/n)(2e/n - 1)}{2} \leq (t-1)\frac{n(n-1)}{2}, \]giving \(e(G) \leq \frac{1}{2}(1 + \sqrt{1 + 4(t-1)(n-1)}) \cdot \frac{n}{2}\). For general s, careful estimation yields the stated bound.
1.4 Supersaturation
Once a graph exceeds the extremal number, it must contain not just one copy of the forbidden subgraph, but many copies. This is the supersaturation phenomenon, first observed by Erdős and Simonovits.
1.5 The Kruskal–Katona Theorem
The Kruskal–Katona theorem, although technically a result about simplicial complexes and set systems, plays a foundational role in extremal combinatorics. It answers the following question: given a collection of \(k\)-element sets, what is the minimum number of \((k-1)\)-element subsets they can have?
The extremal families are initial segments of the colexicographic (colex) order on \(\binom{[n]}{k}\), a beautiful fact connecting extremal combinatorics to order theory.
2. Extremal Set Theory
Extremal set theory studies the maximum or minimum size of families of sets subject to various combinatorial conditions. The results in this area are among the most elegant in all of combinatorics, with short proofs that reveal deep structural truths.
2.1 Sperner’s Theorem and the LYM Inequality
We begin with the oldest result, Sperner’s theorem, which determines the largest antichain in the power set \(2^{[n]}\).
Equality holds if and only if \(\mathcal{F} = \binom{[n]}{\lfloor n/2 \rfloor}\) or \(\mathcal{F} = \binom{[n]}{\lceil n/2 \rceil}\).
The cleanest proof of Sperner’s theorem goes through the LYM inequality, named after Lubell, Yamamoto, and Meshalkin.
For a set \(A \in \mathcal{F}\) with \(|A| = k\), the probability that \(A \in C_\sigma\) equals \[ \Pr[A \in C_\sigma] = \frac{k!(n-k)!}{n!} = \frac{1}{\binom{n}{k}}. \]
Since \(\mathcal{F}\) is an antichain, at most one element of \(\mathcal{F}\) can appear in any chain \(C_\sigma\). Therefore,
\[ 1 \geq \Pr\left[\exists A \in \mathcal{F} : A \in C_\sigma\right] = \sum_{A \in \mathcal{F}} \Pr[A \in C_\sigma] = \sum_{A \in \mathcal{F}} \frac{1}{\binom{n}{|A|}}. \]This completes the proof.
hence \(|\mathcal{F}| \leq \binom{n}{\lfloor n/2 \rfloor}\).
2.2 The Bollobás Set-Pairs Inequality
One of the most versatile tools in extremal set theory is the Bollobás set-pairs inequality.
More generally, if \(|A_i| = a_i\) and \(|B_i| = b_i\), then
\[ \sum_{i=1}^{m} \frac{1}{\binom{a_i + b_i}{a_i}} \leq 1. \](we use a slightly different formulation for clarity). Actually, let us use the standard exterior algebra proof.
Consider the vector space \(V = \bigwedge^{a_i + b_i}(\mathbb{R}^n)\) for the uniform case. Instead, let us present the following clean proof. We may assume all sets are subsets of \([n]\) for some n. Consider the polynomial ring \(\mathbb{R}[x_1, \ldots, x_n]\) and for each i, define
This is a polynomial of degree \(b_i\). Now restrict attention to the multilinear polynomial obtained by reducing modulo the ideal \((x_j^2 - x_j : j \in [n])\). For a set \(S \subseteq [n]\), let \(\mathbf{1}_S \in \{0,1\}^n\) be its characteristic vector. Then \(f_i(\mathbf{1}_S) = 0\) if \(S \cap B_i \neq \emptyset\), and \(f_i(\mathbf{1}_S) = 1\) if \(S \cap B_i = \emptyset\).
Now define \(g_i = f_i \cdot \prod_{j \in A_i} x_j\). This is a multilinear polynomial (after reduction). We have \(g_i(\mathbf{1}_{A_j}) = 0\) whenever \(j \neq i\) (either \(A_j \cap B_i \neq \emptyset\) making \(f_i = 0\), or \(A_j \not\supseteq A_i\) making the \(\prod x_j\) term vanish). Also, \(g_i(\mathbf{1}_{A_i}) = 1 \cdot 1 = 1\) since \(A_i \cap B_i = \emptyset\).
Thus the polynomials \(g_1, \ldots, g_m\) are linearly independent (evaluate at \(\mathbf{1}_{A_1}, \ldots, \mathbf{1}_{A_m}\) to get a triangular matrix with 1’s on the diagonal). Each \(g_i\) is a multilinear polynomial of degree at most \(a_i + b_i\). The space of multilinear polynomials of degree at most \(a + b\) in n variables has dimension \(\sum_{k=0}^{a+b} \binom{n}{k}\). In the uniform case \(a_i = a, b_i = b\), each \(g_i\) has degree exactly \(a + b\), and more careful analysis shows \(g_i\) lies in the span of the \(\binom{n}{a+b}\) squarefree monomials of degree \(a + b\). But since we only need the \(g_i\) restricted to \(\{0,1\}^n\), we can use the subspace of multilinear polynomials of degree exactly \(a + b\), which gives \(m \leq \binom{n}{a+b}\). A refined argument (projecting onto the relevant coordinates) gives the sharp bound \(m \leq \binom{a+b}{a}\).
2.3 The Erdős–Ko–Rado Theorem
The Erdős–Ko–Rado theorem is perhaps the most celebrated result in extremal set theory, determining the largest intersecting family of \(k\)-element sets.
For \(n > 2k\), equality holds if and only if \(\mathcal{F}\) is a star.
We present Katona’s beautiful proof using cyclic permutations.
Now count pairs \((\sigma, A)\) where \(\sigma\) is a cyclic permutation of \([n]\) and \(A \in \mathcal{F}\) is a k-arc in the circular order given by \(\sigma\). For each of the \((n-1)!\) cyclic permutations, at most \(k\) elements of \(\mathcal{F}\) are arcs (since \(\mathcal{F}\) is intersecting). For each \(A \in \mathcal{F}\), the number of cyclic permutations making A an arc is \(k!(n-k)!\) (arrange the elements of A consecutively on the circle, with \(k!\) internal orderings and \((n-k)!\) orderings for the rest). Therefore: \[ |\mathcal{F}| \cdot k!(n-k)! \leq k \cdot (n-1)!, \]
which gives
\[ |\mathcal{F}| \leq \frac{k \cdot (n-1)!}{k!(n-k)!} = \frac{(n-1)!}{(k-1)!(n-k)!} = \binom{n-1}{k-1}. \]2.4 The Sauer–Shelah–Perles Lemma
The Sauer–Shelah lemma connects extremal set theory to computational learning theory through the concept of VC dimension.
We proceed by induction on \(n + |\mathcal{F}|\). The base cases are trivial. For the inductive step, fix an element \(x \in [n]\) and partition \(\mathcal{F}\) into three parts: \(\mathcal{F}_0\) (sets not containing x with no "twin" containing x), \(\mathcal{F}_1\) (sets containing x with no twin without x), and matched pairs. Define \(\mathcal{F}' = \{A \setminus \{x\} : A \in \mathcal{F}\}\) and \(\mathcal{F}'' = \{A \setminus \{x\} : A \in \mathcal{F}, A \cup \{x\} \in \mathcal{F} \text{ or } A \setminus \{x\} \in \mathcal{F}\}\) — the family obtained by looking at sets appearing with both \(x\) present and absent. Then \(|\mathcal{F}| = |\mathcal{F}'| + |\mathcal{F}''|\), and a set S not containing x is shattered by \(\mathcal{F}\) if and only if S is shattered by \(\mathcal{F}'\) and \(S \cup \{x\}\) is shattered by \(\mathcal{F}\) only if \(S\) is shattered by \(\mathcal{F}''\). By induction, each \(\mathcal{F}'\) and \(\mathcal{F}''\) shatters at least as many sets as its size, and the shattered sets for \(\mathcal{F}\) include disjoint contributions from both families, completing the proof.
2.5 The Sunflower Lemma
2.6 Frankl’s Conjecture (Union-Closed Sets Conjecture)
We briefly discuss one of the most famous open problems in extremal set theory.
3. Ramsey Theory: Classical Results
Ramsey theory, broadly understood, asserts that complete disorder is impossible: every large enough structure must contain a well-organized substructure. This idea, first made precise by Frank Ramsey in 1930, has grown into one of the most influential branches of combinatorics.
3.1 Ramsey’s Theorem
Let \(n = R(s-1,t) + R(s,t-1)\) and consider a 2-colouring (red/blue) of \(K_n\). Pick any vertex v. The remaining \(n - 1 = R(s-1,t) + R(s,t-1) - 1\) vertices are partitioned into \(R\) (red neighbours of v) and \(B\) (blue neighbours of v). By pigeonhole, either \(|R| \geq R(s-1,t)\) or \(|B| \geq R(s,t-1)\).
Case 1: \(|R| \geq R(s-1,t)\). The 2-coloured \(K_{|R|}\) on R contains either a red \(K_{s-1}\) (which together with v gives a red \(K_s\)) or a blue \(K_t\).
Case 2: \(|B| \geq R(s,t-1)\). Similarly, we find a red \(K_s\) or a blue \(K_t\).
In both cases the conclusion holds. The existence of \(R(s,t)\) follows by induction, and the recurrence gives \(R(s,t) \leq \binom{s+t-2}{s-1}\).
3.2 Bounds on Ramsey Numbers
The study of Ramsey numbers — their precise values and asymptotic behaviour — is central to combinatorics. We know embarrassingly few exact values.
\(R(3,3) = 6\): Colour the edges of \(K_5\) as the Petersen complement — five red and five blue, forming two 5-cycles. This shows \(R(3,3) > 5\). The upper bound \(R(3,3) \leq R(2,3) + R(3,2) = 3 + 3 = 6\) completes the proof.
\(R(3,4) = 9\), \(R(3,5) = 14\), \(R(4,4) = 18\). The value \(R(5,5)\) is unknown; the best bounds are \(43 \leq R(5,5) \leq 48\).
The Erdős–Szekeres Upper Bound
In particular, \(R(k,k) \leq \binom{2k-2}{k-1} \leq \frac{4^{k-1}}{\sqrt{\pi k}}\) (by Stirling’s approximation).
The Erdős Probabilistic Lower Bound
In 1947, Erdős proved a remarkable lower bound using the probabilistic method — one of the first applications of this technique.
By the union bound, the probability that some k-set is monochromatic is at most \[ \binom{n}{k} \cdot 2^{1-\binom{k}{2}}. \]
If this is less than 1, there exists a colouring with no monochromatic \(K_k\), so \(R(k,k) > n\). Setting \(n = \lfloor 2^{k/2} \rfloor\) and checking the inequality (for large k) gives the result.
Off-Diagonal Ramsey Numbers
for constants \(c_1, c_2\) depending on s. The upper bound is due to Ajtai-Komlós-Szemerédi (1980) for \(s=3\) and general s due to others. The lower bound for \(s = 3\) is \(R(3,t) = \Theta(t^2/\log t)\), established by Kim (1995, lower bound) and Shearer (upper bound).
3.3 The Diagonal Ramsey Number: Recent Breakthroughs
For 75 years, the best bounds on diagonal Ramsey numbers were essentially:
\[ \sqrt{2}^k \lesssim R(k,k) \lesssim 4^k. \]for all sufficiently large k.
3.4 Schur’s Theorem
Thus \(S(r) \leq R_r(3,3,\ldots,3) - 1\), where the Ramsey number has r colours. More precisely, \(S(r) \leq R(3,3,\ldots,3;r) - 1\).
3.5 Van der Waerden’s Theorem
Van der Waerden’s theorem, proved in 1927, is one of the foundational results of Ramsey theory on the integers.
The original proof of van der Waerden gives tower-type bounds. We outline how van der Waerden’s theorem follows from the Hales-Jewett theorem (discussed in Section 7).
3.6 Rado’s Theorem and Partition Regularity
Rado’s theorem provides a complete characterization of which systems of linear equations are “partition regular.”
3.7 Ramsey’s Theorem for Hypergraphs
The original theorem of Ramsey applies to hypergraphs as well.
4. Algebraic and Linear Algebraic Methods
Some of the most elegant proofs in extremal combinatorics use algebra — particularly linear algebra and polynomial methods. The general philosophy is: translate a combinatorial problem into a question about linear independence, dimension, or polynomial vanishing, then use algebraic constraints to derive the desired bounds.
4.1 The Dimension Argument
The simplest algebraic technique is the dimension argument: if we can associate linearly independent vectors with the objects we are counting, then their number is at most the dimension of the ambient space.
We claim the vectors \(v_1, \ldots, v_m\) are linearly independent over \(\mathbb{F}_2\). Suppose \(\sum_{i \in S} v_i = 0\) for some non-empty \(S \subseteq [m]\). Taking the inner product with \(v_j\) for \(j \in S\): \[ 0 = \left\langle \sum_{i \in S} v_i, v_j \right\rangle = \sum_{i \in S} \langle v_i, v_j \rangle = \langle v_j, v_j \rangle = 1, \]
a contradiction. Therefore \(m \leq n\).
4.2 Fisher’s Inequality
4.3 The Polynomial Method
The polynomial method is one of the most powerful paradigms in combinatorics. The key idea: represent combinatorial objects as zeros of polynomials, then use the algebraic structure of polynomial rings to derive bounds.
The Combinatorial Nullstellensatz
The Chevalley–Warning Theorem and Applications
4.4 The Alon–Babai–Suzuki Theorem
Generalizing the oddtown and eventown results, the Alon-Babai-Suzuki theorem provides modular intersection bounds.
The proof uses the multilinear polynomial technique: represent the intersection conditions via polynomials over \(\mathbb{F}_p\), show the associated vectors are linearly independent, and use dimension counting.
4.5 The Caro–Wei Bound
We include a result from graph theory that beautifully illustrates the probabilistic–algebraic interplay.
The probability that v is locally first is \(1/(\deg(v) + 1)\), since among v and its \(\deg(v)\) neighbours, each is equally likely to be first. By linearity of expectation, the expected size of the independent set is \(\sum_v 1/(\deg(v) + 1)\). Hence an independent set of at least this size exists.
4.6 The Linear Algebra Bound Method (Babai–Frankl)
The general framework for applying linear algebra to extremal combinatorics was systematized by Babai and Frankl. The idea is:
- Associate to each combinatorial object a vector in some vector space \(V\).
- Translate the combinatorial constraints into linear algebraic conditions (orthogonality, linear independence, etc.).
- Use the dimension of \(V\) to bound the number of objects.
5. The Szemerédi Regularity Lemma in Extremal Combinatorics
The Szemerédi regularity lemma is one of the most powerful tools in combinatorics. Introduced by Szemerédi in 1975 as a key ingredient in his proof of the Erdős–Turán conjecture on arithmetic progressions, it has since become a cornerstone of graph theory, with applications ranging from extremal combinatorics to number theory to theoretical computer science.
5.1 Statement of the Regularity Lemma
where \(e(A,B)\) is the number of edges between A and B. The pair \((A,B)\) is \(\varepsilon\)-regular if for every \(A' \subseteq A\) with \(|A'| \geq \varepsilon |A|\) and every \(B' \subseteq B\) with \(|B'| \geq \varepsilon |B|\), we have
\[ |d(A',B') - d(A,B)| < \varepsilon. \]Informally, an \(\varepsilon\)-regular pair behaves like a random bipartite graph: the edge density is approximately uniform across all large subsets. The regularity lemma says every dense graph can be partitioned into a bounded number of parts, almost all pairs of which are \(\varepsilon\)-regular.
(i) \(m_0 \leq k \leq M\),
(ii) \(|V_0| \leq \varepsilon n\) (the "exceptional" set),
(iii) \(|V_1| = |V_2| = \cdots = |V_k|\),
(iv) All but at most \(\varepsilon \binom{k}{2}\) pairs \((V_i, V_j)\) with \(1 \leq i < j \leq k\) are \(\varepsilon\)-regular.
5.2 Key Lemmas: Embedding and Counting
The power of the regularity lemma comes through two companion results.
In particular, at least one copy of H exists.
where \(c(\varepsilon, h) \to 0\) as \(\varepsilon \to 0\) for fixed h.
5.3 The Graph Removal Lemma
One of the most important consequences of the regularity lemma is the graph removal lemma, which has far-reaching applications.
5.4 Ramsey–Turán Problems
The interplay between Ramsey theory and Turán-type questions leads to the Ramsey-Turán theory.
This contrasts with \(\operatorname{ex}(n, K_4) = (1/3 + o(1))\binom{n}{2}\) from Turán’s theorem, showing that the additional independence number constraint dramatically changes the extremal number.
5.5 The Sparse Regularity Lemma
The classical regularity lemma applies only to dense graphs (with \(\Theta(n^2)\) edges). Extending it to sparse graphs requires significant new ideas.
6. Flag Algebras and Modern Methods
The past two decades have seen a revolution in extremal combinatorics through the introduction of analytic and algebraic methods that go far beyond classical combinatorial reasoning. Central among these is Razborov’s flag algebra framework, which has resolved or made progress on numerous long-standing conjectures.
6.1 Graph Limits and Graphons
To motivate flag algebras, we first discuss the theory of graph limits, developed by Lovász and Szegedy (2006).
where \(\operatorname{Hom}(F, G)\) is the set of graph homomorphisms from F to G. The subgraph density is
\[ p(F, G) = \Pr[\text{random } |V(F)|\text{-subset of } V(G) \text{ induces a copy of } F]. \]6.2 Quasirandomness
The theory of graph quasirandomness, initiated by Thomason (1987) and developed by Chung, Graham, and Wilson (1989), reveals that several seemingly different notions of “random-like” behaviour in graphs are in fact equivalent.
(P1) Subgraph counts: For every graph H, \(p(H, G_n) \to p^{e(H)}\).
(P2) 4-cycle count: \(p(C_4, G_n) \to p^4\).
(P3) Codegree: For all but \(o(n)\) vertices u, the number of common neighbours of u and v is \((p^2 + o(1))n\) for all but \(o(n)\) vertices v.
(P4) Eigenvalue gap: \(\lambda_1(G_n) = (p + o(1))n\) and \(|\lambda_2(G_n)| = o(n)\), where \(\lambda_1 \geq \lambda_2 \geq \cdots\) are the eigenvalues of the adjacency matrix.
(P5) Discrepancy: For all \(S, T \subseteq V(G_n)\), \(|e(S,T) - p|S||T|| = o(n^2)\).
6.3 Razborov’s Flag Algebra Framework
Flag algebras, introduced by Alexander Razborov in 2007, provide a systematic algebraic framework for establishing inequalities between subgraph densities.
The power of flag algebras comes from the following observation: if \(\phi\) is a homomorphism from the flag algebra to \(\mathbb{R}\) (representing the densities in some limit graph), then:
- \(\phi\) maps non-negative elements (sums of squares) to non-negative reals.
- Proving \(\phi(f) \geq 0\) for all valid \(\phi\) establishes a universal inequality on subgraph densities.
- These non-negativity conditions can be verified via semidefinite programming (SDP).
For edge density exactly \(1/2\), the minimum triangle density is \(1/2 - 1/(2\sqrt{2}) \approx 0.1464\), achieved by a random-like construction. This resolves a conjecture of Erdős on the “pentagon problem” for triangles.
6.4 The Container Method
The container method, developed independently by Balogh, Morris, and Samotij (2015) and Saxton and Thomason (2015), is one of the most versatile modern tools in combinatorics.
(1) Counting the number of H-free graphs on n vertices (showing it is \(2^{(1+o(1))\operatorname{ex}(n,H)}\) for many H).
(2) Showing that a random H-free graph is "essentially" a subgraph of an extremal graph.
(3) Bounding the number of sum-free sets, Sidon sets, and other combinatorial structures.
(4) Proving transference results for Ramsey and Turán problems in sparse random settings.
7. Advanced Ramsey Theory
We now turn to deeper results in Ramsey theory, including the Hales-Jewett theorem, Euclidean Ramsey theory, and recent breakthroughs.
7.1 The Hales–Jewett Theorem
The Hales-Jewett theorem is in many ways the most fundamental result in Ramsey theory, as many other results (van der Waerden, Szemerédi, etc.) can be derived from it.
We now present Shelah’s proof, which gives primitive recursive bounds.
Base case: For \(k = 1\), the result is trivial. For \(r = 1\), any single word forms a (trivial) monochromatic line.
Inductive step: Assume the theorem for \((k, r-1)\) and for \((k-1, r)\). Shelah's argument constructs a high-dimensional cube \([k]^N\) by iterating the following:
(1) Partition \([k]^N\) into "slices" based on the first group of coordinates. By the inductive hypothesis for fewer colours, within each slice we can find large monochromatic combinatorial subspaces.
(2) The "colour focusing" step: by the Hales-Jewett theorem for a smaller alphabet (induction on k), the patterns of colours across slices must contain a monochromatic structure.
(3) Combining these gives a monochromatic line in the full cube. The dimension N grows as an iterated exponential (primitive recursive), a dramatic improvement over the original proof which gave Ackermann-type bounds.
7.2 The Graham–Rothschild Theorem
The Graham-Rothschild theorem generalizes the Hales-Jewett theorem from lines to higher-dimensional combinatorial subspaces.
7.3 Euclidean Ramsey Theory
Euclidean Ramsey theory studies which geometric configurations must appear monochromatically in any finite colouring of \(\mathbb{R}^d\).
7.4 Folkman’s Theorem and Partition Regularity
7.5 The Ramsey Multiplicity Problem
Beyond the existence of monochromatic subgraphs, one can ask: how many monochromatic copies must there be?
monochromatic triangles. Equivalently, \(M(K_3) = 1/4\) of the expected count under a random colouring. The extremal colouring is (close to) a random colouring or a balanced “quasi-random” colouring.
7.6 Graph Ramsey Numbers for Sparse Graphs
For sparse graphs (e.g., bounded-degree graphs, trees, cycles), the Ramsey numbers are much smaller than for cliques.
That is, bounded-degree graphs have linear Ramsey numbers.
7.7 Recent Advances: The Campos–Griffiths–Morris–Sahasrabudhe Breakthrough
We return to the diagonal Ramsey number, the most famous problem in the field.
for some absolute constant \(\varepsilon > 0\) and all sufficiently large k.
The proof introduces a new approach: the “book algorithm.” We describe the high-level strategy.
Campos et al. modify this algorithm. They track not just the size of the red and blue neighbourhoods, but also a "book" — a collection of vertices that are connected to the current vertex set by many edges of both colours. At each step, the algorithm either:
(a) Steps to a large monochromatic neighbourhood as before, or
(b) Uses the book to find a large set where the bipartite density between it and the remaining vertices is "balanced" (close to 1/2 in both colours).
The key insight is that when the bipartite graph between the current set and the book is unbalanced, one of the two colours has density bounded away from 1/2, and in this case the stepping algorithm performs strictly better than the classical one. The saving is then quantified to give the \((4 - \varepsilon)^k\) bound.
7.8 Canonical Ramsey Theorems
Not every colouring yields a monochromatic structure, but with enough colours, a more nuanced conclusion holds.
(1) Monochromatic (\(I = \emptyset\)): all edges the same colour.
(2) Lexicographic on the first element (\(I = \{1\}\)): edges \(\{a,b\}\) and \(\{a,c\}\) (with \(a < b, c\)) have the same colour.
(3) Lexicographic on the second element (\(I = \{2\}\)).
(4) Rainbow (\(I = \{1,2\}\)): all edges have distinct colours.
This gives a complete structural description of arbitrary edge colourings of large complete graphs.
7.9 Ramsey Theory and Infinite Combinatorics
We briefly touch on infinite Ramsey theory, which connects combinatorics to set theory and logic.
For the inductive step, let \(c: \binom{X}{k} \to [r]\) be a colouring. Pick any \(x_1 \in X\). The colouring c induces an \(r\)-colouring of the \((k-1)\)-element subsets of \(X \setminus \{x_1\}\) via \(S \mapsto c(\{x_1\} \cup S)\). By induction, there exists an infinite set \(X_1 \subseteq X \setminus \{x_1\}\) that is monochromatic for this induced colouring; let its colour be \(c_1\).
Now pick \(x_2 \in X_1\), and repeat: obtain an infinite monochromatic set \(X_2 \subseteq X_1 \setminus \{x_2\}\) with colour \(c_2\) for the colouring induced by \(x_2\). Continuing, we get a sequence \(x_1, x_2, x_3, \ldots\) and colours \(c_1, c_2, \ldots \in [r]\). By pigeonhole, some colour c appears infinitely often. The corresponding subsequence \(\{x_{i_1}, x_{i_2}, \ldots\}\) is monochromatic: any k-set \(\{x_{i_{j_1}}, \ldots, x_{i_{j_k}}\}\) (with \(j_1 < \cdots < j_k\)) has colour \(c_{i_{j_1}} = c\) because \(\{x_{i_{j_2}}, \ldots, x_{i_{j_k}}\} \subseteq X_{i_{j_1}}\).
7.10 Ramsey Theory and Model Theory
We briefly mention a beautiful connection between Ramsey theory and model theory.
Summary of Key Results
We conclude by collecting the main bounds and results discussed in these notes.
Turán-type:
\(\operatorname{ex}(n, K_{r+1}) = t(n,r) = \left(1 - \frac{1}{r}\right)\frac{n^2}{2} + O(n)\)
\(\operatorname{ex}(n, H) = \left(1 - \frac{1}{\chi(H)-1}\right)\frac{n^2}{2} + o(n^2)\) for \(\chi(H) \geq 3\) (Erdős–Stone–Simonovits)
\(\operatorname{ex}(n, K_{s,t}) = O(n^{2 - 1/s})\) (Kővári–Sós–Turán)
Set-theoretic:
Maximum antichain in \(2^{[n]}\): \(\binom{n}{\lfloor n/2 \rfloor}\) (Sperner)
Maximum intersecting family in \(\binom{[n]}{k}\): \(\binom{n-1}{k-1}\) for \(n \geq 2k\) (Erdős–Ko–Rado)
Bollobás set-pairs: \(m \leq \binom{a+b}{a}\)
VC dimension d family: \(|\mathcal{F}| \leq \sum_{i=0}^d \binom{n}{i}\) (Sauer–Shelah)
Ramsey numbers:
\(R(s,t) \leq \binom{s+t-2}{s-1}\) (Erdős–Szekeres)
\(R(k,k) > 2^{k/2}\) (Erdős, probabilistic method)
\(R(k,k) \leq (4 - \varepsilon)^k\) (Campos–Griffiths–Morris–Sahasrabudhe, 2023)
\(R(3,t) = \Theta(t^2/\log t)\) (Kim, Shearer)
The interplay between extremal combinatorics and Ramsey theory — through tools like the regularity lemma, the probabilistic method, algebraic techniques, flag algebras, and the container method — makes this one of the most dynamic and interconnected areas of modern mathematics. The subject continues to produce surprising connections and breakthrough results, as evidenced by the recent resolution of long-standing conjectures on diagonal Ramsey numbers, the sunflower conjecture, and union-closed sets.