CO 741: Extremal Combinatorics and Ramsey Theory

Estimated study time: 1 hr 32 min

Table of contents

⚡ Why make it up

Extremal combinatorics contains some of the most elegant short proofs in mathematics alongside some of the most notorious open problems (like the exact value of R(5,5)). It is a natural companion to graph theory and a good training ground for probabilistic intuition.

These notes synthesize material from S. Jukna’s Extremal Combinatorics, R. Graham, B. Rothschild, and J. Spencer’s Ramsey Theory, N. Alon and J. Spencer’s The Probabilistic Method, and D. Conlon, J. Fox, and B. Sudakov’s survey articles, enriched with material from CMU 21-738 (P.-S. Loh) and Y. Zhao’s Graph Theory and Additive Combinatorics.

1. Extremal Graph Theory: Turán-type Problems

Extremal graph theory asks one of the most natural questions in combinatorics: given a forbidden subgraph \(H\), what is the maximum number of edges a graph on \(n\) vertices can have without containing a copy of \(H\)? This deceptively simple question, first systematically studied by Pál Turán during World War II, has blossomed into one of the richest areas of modern combinatorics, with deep connections to algebra, probability, analysis, and theoretical computer science.

1.1 The Turán Problem and Turán’s Theorem

We begin with the fundamental definitions that frame the entire subject.

Definition (Extremal Number). For a graph H and a positive integer n, the extremal number \(\operatorname{ex}(n, H)\) is the maximum number of edges in a graph on n vertices that does not contain H as a subgraph. A graph on n vertices with \(\operatorname{ex}(n, H)\) edges and no copy of H is called an extremal graph for H.

The first and most fundamental result in extremal graph theory determines \(\operatorname{ex}(n, K_{r+1})\) exactly.

Definition (Turán Graph). The Turán graph \(T(n, r)\) is the complete r-partite graph on n vertices in which the part sizes are as equal as possible — that is, each part has size either \(\lfloor n/r \rfloor\) or \(\lceil n/r \rceil\). The number of edges in T(n,r) is denoted \(t(n,r)\).

One can verify that

\[ t(n,r) = \binom{n}{2} - \sum_{i=1}^{r} \binom{|V_i|}{2} = \left(1 - \frac{1}{r}\right)\frac{n^2}{2} + O(n), \]

where \(V_1, \ldots, V_r\) are the parts. Now we state the celebrated theorem.

Theorem 1.1 (Turán, 1941). For all integers \(r \geq 1\) and \(n \geq 1\), \[ \operatorname{ex}(n, K_{r+1}) = t(n,r). \]

Moreover, the Turán graph \(T(n,r)\) is the unique extremal graph.

Turán’s theorem admits several beautiful proofs, each illuminating different aspects of the result. We present three.

Proof 1 (Induction on n). We proceed by induction on n. The base case \(n \leq r\) is trivial, since any graph on at most r vertices is \(K_{r+1}\)-free. For the inductive step, let G be a \(K_{r+1}\)-free graph on n vertices with the maximum number of edges. Since G is \(K_{r+1}\)-free but edge-maximal with this property, G must contain a copy of \(K_r\) (otherwise we could add edges). Let \(v_1, \ldots, v_r\) be the vertices of such a \(K_r\).

Every other vertex \(w\) is adjacent to at most \(r-1\) of the \(v_i\), since otherwise \(\{w, v_1, \ldots, v_r\}\) would form a \(K_{r+1}\). Removing the r vertices \(v_1, \ldots, v_r\), the remaining graph \(G'\) on \(n - r\) vertices is still \(K_{r+1}\)-free, so by induction \(e(G') \leq t(n-r, r)\). The total number of edges is: \[ e(G) = e(G') + \binom{r}{2} + \sum_{w \notin \{v_1,\ldots,v_r\}} |N(w) \cap \{v_1,\ldots,v_r\}| \leq t(n-r,r) + \binom{r}{2} + (n-r)(r-1). \]

A direct calculation shows this equals \(t(n,r)\). Uniqueness follows from analyzing the equality conditions.

Proof 2 (Zykov Symmetrization). Let G be a \(K_{r+1}\)-free graph on n vertices with the maximum number of edges. We show G must be a complete multipartite graph.

Suppose two non-adjacent vertices u and v exist with \(\deg(u) \leq \deg(v)\). We perform Zykov symmetrization: replace u with a copy of v — that is, delete all edges incident to u, then for every neighbour w of v, add the edge \(uw\). The resulting graph G' has at least as many edges as G (since \(\deg(v) \geq \deg(u)\)), and G' is still \(K_{r+1}\)-free because any clique in G' containing u corresponds to a clique in G with u replaced by v.

Repeating this process, we eventually obtain a graph where every pair of non-adjacent vertices has the same neighbourhood — i.e., a complete multipartite graph. The number of edges is maximized when the parts are as equal as possible (by convexity of \(\binom{x}{2}\)), giving the Turán graph \(T(n,r)\).

Proof 3 (Probabilistic / Double Counting — Motzkin-Straus). We establish the result through a beautiful connection to optimization. Let G be a graph on \([n]\) and assign weights \(x_1, \ldots, x_n \geq 0\) with \(\sum x_i = 1\). Define \[ f(x_1, \ldots, x_n) = \sum_{ij \in E(G)} x_i x_j. \]

Motzkin-Straus Theorem: If \(G\) is \(K_{r+1}\)-free, then \(\max f \leq \frac{1}{2}\left(1 - \frac{1}{r}\right)\), with equality achieved when the weights are uniformly distributed on an r-clique.

To see this, observe that at the maximum, the weight is supported on a clique (any non-edge with positive weights on both endpoints can have its weight redistributed to increase \(f\)). For a clique of size \(s \leq r\), the maximum of \(f\) with uniform weights \(1/s\) each is \(\binom{s}{2}/s^2 = (1 - 1/s)/2 \leq (1 - 1/r)/2\).

Setting \(x_i = 1/n\) for all i, we get \(f = e(G)/n^2\), so \(e(G)/n^2 \leq (1 - 1/r)/2\), yielding \(e(G) \leq (1 - 1/r)n^2/2\). The exact bound \(t(n,r)\) follows by optimizing over integer part sizes.

1.2 The Erdős–Stone–Simonovits Theorem

Turán’s theorem determines \(\operatorname{ex}(n, K_{r+1})\) exactly. What about general graphs \(H\)? The Erdős–Stone theorem, often called the “fundamental theorem of extremal graph theory,” answers this asymptotically.

Definition (Turán Density). The Turán density of a graph H is \[ \pi(H) = \lim_{n \to \infty} \frac{\operatorname{ex}(n,H)}{\binom{n}{2}}. \]

This limit exists by a standard averaging argument (or by appealing to the Kruskal-Katona machinery).

Theorem 1.2 (Erdős–Stone, 1946; Erdős–Simonovits, 1966). For any graph H with chromatic number \(\chi(H) = r + 1 \geq 2\), \[ \operatorname{ex}(n, H) = \left(1 - \frac{1}{r}\right)\frac{n^2}{2} + o(n^2). \]

Equivalently, \(\pi(H) = 1 - \frac{1}{\chi(H) - 1}\).

The beauty of this result is that the asymptotic extremal number depends only on the chromatic number of \(H\). For non-bipartite graphs (\(\chi(H) \geq 3\)), the Turán density is strictly between 0 and 1, and is completely determined by \(\chi(H)\). We give a proof sketch.

Proof Sketch. The lower bound \(\operatorname{ex}(n, H) \geq t(n,r)\) follows from the fact that the Turán graph \(T(n,r)\) is r-partite, hence \(r\)-colourable, and so does not contain any graph with chromatic number \(r+1\).

The upper bound is the deep part. The Erdős–Stone theorem states that for every \(r \geq 1\), \(\varepsilon > 0\), and integer \(t\), there exists \(n_0\) such that every graph on \(n \geq n_0\) vertices with at least \(t(n,r) + \varepsilon n^2\) edges contains a copy of \(K_{r+1}(t)\) — the complete \((r+1)\)-partite graph with each part of size \(t\). Since any graph with chromatic number \(r+1\) is a subgraph of some \(K_{r+1}(t)\), this gives the desired upper bound.

The proof of the Erdős–Stone theorem itself proceeds via the regularity lemma (Section 5), or via Ramsey-type arguments and dependent random choice.

Remark. For bipartite graphs H (where \(\chi(H) = 2\)), the Erdős–Stone–Simonovits theorem gives only \(\operatorname{ex}(n, H) = o(n^2)\), which is uninformative. The determination of \(\operatorname{ex}(n, H)\) for bipartite H is one of the most important open problems in extremal graph theory, where polynomial bounds replace quadratic ones.

1.3 The Kővári–Sós–Turán Theorem and the Zarankiewicz Problem

For bipartite forbidden subgraphs, the Zarankiewicz problem asks for better bounds.

Theorem 1.3 (Kővári–Sós–Turán, 1954). For integers \(2 \leq s \leq t\) and \(n \geq s\), \[ \operatorname{ex}(n, K_{s,t}) \leq \frac{1}{2}\left( (t-1)^{1/s} \cdot n^{2 - 1/s} + (s-1)n \right). \]

In particular, \(\operatorname{ex}(n, K_{s,t}) = O(n^{2 - 1/s})\).

Proof. Let G be a \(K_{s,t}\)-free graph on n vertices. Count the number of s-element subsets \(S\) of vertices such that all elements of S have a common neighbour. On one hand, each vertex v contributes \(\binom{\deg(v)}{s}\) such subsets (choose s neighbours of v). On the other hand, since G is \(K_{s,t}\)-free, each s-element set has at most \(t-1\) common neighbours, so the total count is at most \((t-1)\binom{n}{s}\).

Thus \(\sum_{v} \binom{\deg(v)}{s} \leq (t-1)\binom{n}{s}\). By the convexity of \(\binom{x}{s}\) and Jensen's inequality, \[ n \cdot \binom{2e(G)/n}{s} \leq (t-1)\binom{n}{s}, \]

where \(e(G)\) is the number of edges. For \(s = 2\), this simplifies to

\[ \frac{n \cdot (2e/n)(2e/n - 1)}{2} \leq (t-1)\frac{n(n-1)}{2}, \]

giving \(e(G) \leq \frac{1}{2}(1 + \sqrt{1 + 4(t-1)(n-1)}) \cdot \frac{n}{2}\). For general s, careful estimation yields the stated bound.

Remark. For \(s = 2\), the bound gives \(\operatorname{ex}(n, K_{2,t}) = O(n^{3/2})\). This is sharp up to a constant: constructions based on finite geometry (projective planes, generalized polygons) show \(\operatorname{ex}(n, K_{2,t}) = \Theta(n^{3/2})\) for \(t = 2\) (when n admits a projective plane of suitable order). For general s and t, the Kővári–Sós–Turán bound is conjectured to be tight when \(t\) is sufficiently large relative to s, but proving matching lower bounds for \(s \geq 3\) is a deep open problem. Notable progress includes algebraic constructions for \(K_{3,3}\)-free and \(K_{4,4}\)-free graphs.

1.4 Supersaturation

Once a graph exceeds the extremal number, it must contain not just one copy of the forbidden subgraph, but many copies. This is the supersaturation phenomenon, first observed by Erdős and Simonovits.

Theorem 1.4 (Erdős–Simonovits Supersaturation). For every graph H and every \(\varepsilon > 0\), there exists \(\delta > 0\) and \(n_0\) such that every graph G on \(n \geq n_0\) vertices with \(e(G) \geq \operatorname{ex}(n, H) + \varepsilon n^2\) contains at least \(\delta n^{|V(H)|}\) copies of H.

Example. For \(H = K_3\) (triangles), Turán's theorem gives \(\operatorname{ex}(n, K_3) = \lfloor n^2/4 \rfloor\). Supersaturation tells us that if a graph on n vertices has \((1/4 + \varepsilon)n^2\) edges, it contains at least \(\delta n^3\) triangles. The Kruskal-Katona theorem can be used to determine the minimum number of triangles exactly as a function of the number of edges.

1.5 The Kruskal–Katona Theorem

The Kruskal–Katona theorem, although technically a result about simplicial complexes and set systems, plays a foundational role in extremal combinatorics. It answers the following question: given a collection of \(k\)-element sets, what is the minimum number of \((k-1)\)-element subsets they can have?

Definition (Shadow). For a family \(\mathcal{F} \subseteq \binom{[n]}{k}\), the shadow of \(\mathcal{F}\) is \[ \partial \mathcal{F} = \left\{ G \in \binom{[n]}{k-1} : G \subset F \text{ for some } F \in \mathcal{F} \right\}. \]

Theorem 1.5 (Kruskal, 1963; Katona, 1968). If \(|\mathcal{F}| = \binom{a_k}{k} + \binom{a_{k-1}}{k-1} + \cdots + \binom{a_j}{j}\) is the k-binomial representation of \(|\mathcal{F}|\) (where \(a_k > a_{k-1} > \cdots > a_j \geq j \geq 1\)), then \[ |\partial \mathcal{F}| \geq \binom{a_k}{k-1} + \binom{a_{k-1}}{k-2} + \cdots + \binom{a_j}{j-1}. \]

The extremal families are initial segments of the colexicographic (colex) order on \(\binom{[n]}{k}\), a beautiful fact connecting extremal combinatorics to order theory.

2. Extremal Set Theory

Extremal set theory studies the maximum or minimum size of families of sets subject to various combinatorial conditions. The results in this area are among the most elegant in all of combinatorics, with short proofs that reveal deep structural truths.

2.1 Sperner’s Theorem and the LYM Inequality

We begin with the oldest result, Sperner’s theorem, which determines the largest antichain in the power set \(2^{[n]}\).

Definition (Antichain). A family \(\mathcal{F} \subseteq 2^{[n]}\) is an antichain (or Sperner family) if no set in \(\mathcal{F}\) contains another: for all \(A, B \in \mathcal{F}\), \(A \subseteq B\) implies \(A = B\).

Theorem 2.1 (Sperner, 1928). If \(\mathcal{F} \subseteq 2^{[n]}\) is an antichain, then \[ |\mathcal{F}| \leq \binom{n}{\lfloor n/2 \rfloor}. \]

Equality holds if and only if \(\mathcal{F} = \binom{[n]}{\lfloor n/2 \rfloor}\) or \(\mathcal{F} = \binom{[n]}{\lceil n/2 \rceil}\).

The cleanest proof of Sperner’s theorem goes through the LYM inequality, named after Lubell, Yamamoto, and Meshalkin.

Theorem 2.2 (LYM Inequality). If \(\mathcal{F} \subseteq 2^{[n]}\) is an antichain, then \[ \sum_{A \in \mathcal{F}} \frac{1}{\binom{n}{|A|}} \leq 1. \]

Proof (Lubell, 1966). Consider a uniformly random permutation \(\sigma\) of \([n]\). For each \(k\), the set \(\{\sigma(1), \ldots, \sigma(k)\}\) is a uniformly random k-element subset of \([n]\). Define the random chain \(C_\sigma = \{\emptyset, \{\sigma(1)\}, \{\sigma(1), \sigma(2)\}, \ldots, [n]\}\).

For a set \(A \in \mathcal{F}\) with \(|A| = k\), the probability that \(A \in C_\sigma\) equals \[ \Pr[A \in C_\sigma] = \frac{k!(n-k)!}{n!} = \frac{1}{\binom{n}{k}}. \]

Since \(\mathcal{F}\) is an antichain, at most one element of \(\mathcal{F}\) can appear in any chain \(C_\sigma\). Therefore,

\[ 1 \geq \Pr\left[\exists A \in \mathcal{F} : A \in C_\sigma\right] = \sum_{A \in \mathcal{F}} \Pr[A \in C_\sigma] = \sum_{A \in \mathcal{F}} \frac{1}{\binom{n}{|A|}}. \]

This completes the proof.

Proof of Sperner's theorem from LYM. Since \(\binom{n}{|A|} \leq \binom{n}{\lfloor n/2 \rfloor}\) for every \(A\), the LYM inequality gives \[ 1 \geq \sum_{A \in \mathcal{F}} \frac{1}{\binom{n}{|A|}} \geq \frac{|\mathcal{F}|}{\binom{n}{\lfloor n/2 \rfloor}}, \]

hence \(|\mathcal{F}| \leq \binom{n}{\lfloor n/2 \rfloor}\).

2.2 The Bollobás Set-Pairs Inequality

One of the most versatile tools in extremal set theory is the Bollobás set-pairs inequality.

Theorem 2.3 (Bollobás, 1965). Let \(A_1, \ldots, A_m\) and \(B_1, \ldots, B_m\) be finite sets such that \(A_i \cap B_j = \emptyset\) if and only if \(i = j\), with \(|A_i| = a\) and \(|B_i| = b\) for all i. Then \[ m \leq \binom{a+b}{a}. \]

More generally, if \(|A_i| = a_i\) and \(|B_i| = b_i\), then

\[ \sum_{i=1}^{m} \frac{1}{\binom{a_i + b_i}{a_i}} \leq 1. \]

Proof. Let \(X = A_1 \cup \cdots \cup A_m \cup B_1 \cup \cdots \cup B_m\) and \(|X| = n\). We work in the vector space of polynomials. For each \(i\), define the polynomial \[ f_i(x) = \prod_{b \in B_i} \left(\sum_{a \in A_i \cup B_i} x_a - x_b\right) \]

(we use a slightly different formulation for clarity). Actually, let us use the standard exterior algebra proof.

Consider the vector space \(V = \bigwedge^{a_i + b_i}(\mathbb{R}^n)\) for the uniform case. Instead, let us present the following clean proof. We may assume all sets are subsets of \([n]\) for some n. Consider the polynomial ring \(\mathbb{R}[x_1, \ldots, x_n]\) and for each i, define

\[ f_i(x_1, \ldots, x_n) = \prod_{j \in B_i} \left(1 - x_j\right). \]

This is a polynomial of degree \(b_i\). Now restrict attention to the multilinear polynomial obtained by reducing modulo the ideal \((x_j^2 - x_j : j \in [n])\). For a set \(S \subseteq [n]\), let \(\mathbf{1}_S \in \{0,1\}^n\) be its characteristic vector. Then \(f_i(\mathbf{1}_S) = 0\) if \(S \cap B_i \neq \emptyset\), and \(f_i(\mathbf{1}_S) = 1\) if \(S \cap B_i = \emptyset\).

Now define \(g_i = f_i \cdot \prod_{j \in A_i} x_j\). This is a multilinear polynomial (after reduction). We have \(g_i(\mathbf{1}_{A_j}) = 0\) whenever \(j \neq i\) (either \(A_j \cap B_i \neq \emptyset\) making \(f_i = 0\), or \(A_j \not\supseteq A_i\) making the \(\prod x_j\) term vanish). Also, \(g_i(\mathbf{1}_{A_i}) = 1 \cdot 1 = 1\) since \(A_i \cap B_i = \emptyset\).

Thus the polynomials \(g_1, \ldots, g_m\) are linearly independent (evaluate at \(\mathbf{1}_{A_1}, \ldots, \mathbf{1}_{A_m}\) to get a triangular matrix with 1’s on the diagonal). Each \(g_i\) is a multilinear polynomial of degree at most \(a_i + b_i\). The space of multilinear polynomials of degree at most \(a + b\) in n variables has dimension \(\sum_{k=0}^{a+b} \binom{n}{k}\). In the uniform case \(a_i = a, b_i = b\), each \(g_i\) has degree exactly \(a + b\), and more careful analysis shows \(g_i\) lies in the span of the \(\binom{n}{a+b}\) squarefree monomials of degree \(a + b\). But since we only need the \(g_i\) restricted to \(\{0,1\}^n\), we can use the subspace of multilinear polynomials of degree exactly \(a + b\), which gives \(m \leq \binom{n}{a+b}\). A refined argument (projecting onto the relevant coordinates) gives the sharp bound \(m \leq \binom{a+b}{a}\).

2.3 The Erdős–Ko–Rado Theorem

The Erdős–Ko–Rado theorem is perhaps the most celebrated result in extremal set theory, determining the largest intersecting family of \(k\)-element sets.

Definition (Intersecting Family). A family \(\mathcal{F} \subseteq \binom{[n]}{k}\) is intersecting if \(A \cap B \neq \emptyset\) for all \(A, B \in \mathcal{F}\).

Example. The family of all k-element subsets of \([n]\) containing a fixed element (called a star or trivial intersecting family) has size \(\binom{n-1}{k-1}\) and is clearly intersecting.

Theorem 2.4 (Erdős–Ko–Rado, 1961). If \(n \geq 2k\) and \(\mathcal{F} \subseteq \binom{[n]}{k}\) is intersecting, then \[ |\mathcal{F}| \leq \binom{n-1}{k-1}. \]

For \(n > 2k\), equality holds if and only if \(\mathcal{F}\) is a star.

We present Katona’s beautiful proof using cyclic permutations.

Proof (Katona, 1972). Place the elements \(1, 2, \ldots, n\) around a circle (in some cyclic order). A k-arc is a set of k consecutive elements on the circle. There are exactly n such arcs (one starting at each position), and any two k-arcs are either disjoint or intersecting. Since each arc has k elements and \(n \geq 2k\), two arcs with starting positions differing by at least k (and at most \(n-k\)) are disjoint. So among the n arcs, at most k can be mutually intersecting.

Now count pairs \((\sigma, A)\) where \(\sigma\) is a cyclic permutation of \([n]\) and \(A \in \mathcal{F}\) is a k-arc in the circular order given by \(\sigma\). For each of the \((n-1)!\) cyclic permutations, at most \(k\) elements of \(\mathcal{F}\) are arcs (since \(\mathcal{F}\) is intersecting). For each \(A \in \mathcal{F}\), the number of cyclic permutations making A an arc is \(k!(n-k)!\) (arrange the elements of A consecutively on the circle, with \(k!\) internal orderings and \((n-k)!\) orderings for the rest). Therefore: \[ |\mathcal{F}| \cdot k!(n-k)! \leq k \cdot (n-1)!, \]

which gives

\[ |\mathcal{F}| \leq \frac{k \cdot (n-1)!}{k!(n-k)!} = \frac{(n-1)!}{(k-1)!(n-k)!} = \binom{n-1}{k-1}. \]

2.4 The Sauer–Shelah–Perles Lemma

The Sauer–Shelah lemma connects extremal set theory to computational learning theory through the concept of VC dimension.

Definition (Shattering and VC Dimension). A family \(\mathcal{F} \subseteq 2^{[n]}\) shatters a set \(S \subseteq [n]\) if \(\{A \cap S : A \in \mathcal{F}\} = 2^S\). The VC dimension of \(\mathcal{F}\) is the size of the largest set shattered by \(\mathcal{F}\).

Theorem 2.5 (Sauer, 1972; Shelah, 1972; Perles). If \(\mathcal{F} \subseteq 2^{[n]}\) has VC dimension at most d (i.e., shatters no \((d+1)\)-element set), then \[ |\mathcal{F}| \leq \sum_{i=0}^{d} \binom{n}{i} = \Phi_d(n). \]

Proof. We prove the stronger statement: the number of sets shattered by \(\mathcal{F}\) is at least \(|\mathcal{F}|\). Since no set of size \(d+1\) is shattered, the number of shattered sets is at most \(\sum_{i=0}^{d} \binom{n}{i}\).

We proceed by induction on \(n + |\mathcal{F}|\). The base cases are trivial. For the inductive step, fix an element \(x \in [n]\) and partition \(\mathcal{F}\) into three parts: \(\mathcal{F}_0\) (sets not containing x with no "twin" containing x), \(\mathcal{F}_1\) (sets containing x with no twin without x), and matched pairs. Define \(\mathcal{F}' = \{A \setminus \{x\} : A \in \mathcal{F}\}\) and \(\mathcal{F}'' = \{A \setminus \{x\} : A \in \mathcal{F}, A \cup \{x\} \in \mathcal{F} \text{ or } A \setminus \{x\} \in \mathcal{F}\}\) — the family obtained by looking at sets appearing with both \(x\) present and absent. Then \(|\mathcal{F}| = |\mathcal{F}'| + |\mathcal{F}''|\), and a set S not containing x is shattered by \(\mathcal{F}\) if and only if S is shattered by \(\mathcal{F}'\) and \(S \cup \{x\}\) is shattered by \(\mathcal{F}\) only if \(S\) is shattered by \(\mathcal{F}''\). By induction, each \(\mathcal{F}'\) and \(\mathcal{F}''\) shatters at least as many sets as its size, and the shattered sets for \(\mathcal{F}\) include disjoint contributions from both families, completing the proof.

2.5 The Sunflower Lemma

Definition (Sunflower). A sunflower (or \(\Delta\)-system) is a collection of sets \(A_1, \ldots, A_r\) such that there exists a set Y (the core) with \(A_i \cap A_j = Y\) for all \(i \neq j\). The sets \(A_i \setminus Y\) are the petals.

Theorem 2.6 (Erdős–Ko Sunflower Lemma, 1960). If \(\mathcal{F}\) is a family of sets each of size at most k, and \(|\mathcal{F}| > (r-1)^k \cdot k!\), then \(\mathcal{F}\) contains a sunflower with r petals.

Proof. We prove the result by induction on k. For \(k = 1\), the sets are singletons, and any r of them form a sunflower with empty core, so we need \(|\mathcal{F}| > r - 1\), which is \((r-1)^1 \cdot 1!\). For the inductive step, suppose no element appears in more than \(r-1\) sets (otherwise, those r sets containing a common element include a sunflower by induction on the restrictions \(A_i \setminus \{x\}\)). Choose a maximal collection of pairwise disjoint sets \(B_1, \ldots, B_t \in \mathcal{F}\). If \(t \geq r\), these form a sunflower with empty core. Otherwise, every set in \(\mathcal{F}\) intersects \(B_1 \cup \cdots \cup B_t\), which has at most \(kt < kr\) elements. Group the sets by their intersection with this union and apply induction.

Remark. The Sunflower Conjecture (Erdős and Ko) asserts that the \(k!\) factor can be replaced by \(c^k\) for some constant c depending on r. In a major breakthrough, Alweiss, Lovett, Wu, and Zhang (2020) proved an upper bound of \((\log k)^{O(k)}\), later improved by Rao to \((C \log k)^k\) for an absolute constant C. This is close to, but does not resolve, the conjecture.

2.6 Frankl’s Conjecture (Union-Closed Sets Conjecture)

We briefly discuss one of the most famous open problems in extremal set theory.

Definition. A family \(\mathcal{F} \subseteq 2^{[n]}\) is union-closed if for all \(A, B \in \mathcal{F}\), we have \(A \cup B \in \mathcal{F}\).

Frankl's Conjecture (1979). If \(\mathcal{F}\) is a union-closed family of finite sets with \(|\mathcal{F}| \geq 2\), then there exists an element \(x\) that belongs to at least \(|\mathcal{F}|/2\) members of \(\mathcal{F}\).

Remark. In 2022, Justin Gilmer proved that there always exists an element in at least a \(0.01\) fraction of the sets, breaking through a long-standing barrier. This was subsequently improved by several groups (Chase-Lovett, Alweiss-Huang-Sellke, and others) to a constant approaching \((3 - \sqrt{5})/2 \approx 0.382\) using entropy methods. The full conjecture (constant \(1/2\)) remains open.

3. Ramsey Theory: Classical Results

Ramsey theory, broadly understood, asserts that complete disorder is impossible: every large enough structure must contain a well-organized substructure. This idea, first made precise by Frank Ramsey in 1930, has grown into one of the most influential branches of combinatorics.

3.1 Ramsey’s Theorem

Theorem 3.1 (Ramsey, 1930). For any positive integers \(s\) and \(t\), there exists a smallest integer \(R(s,t)\) such that every 2-colouring of the edges of \(K_{R(s,t)}\) (with colours red and blue) contains either a red \(K_s\) or a blue \(K_t\).

Proof. We prove that \(R(s,t) \leq R(s-1,t) + R(s,t-1)\) by induction, with base cases \(R(s,1) = R(1,t) = 1\).

Let \(n = R(s-1,t) + R(s,t-1)\) and consider a 2-colouring (red/blue) of \(K_n\). Pick any vertex v. The remaining \(n - 1 = R(s-1,t) + R(s,t-1) - 1\) vertices are partitioned into \(R\) (red neighbours of v) and \(B\) (blue neighbours of v). By pigeonhole, either \(|R| \geq R(s-1,t)\) or \(|B| \geq R(s,t-1)\).

Case 1: \(|R| \geq R(s-1,t)\). The 2-coloured \(K_{|R|}\) on R contains either a red \(K_{s-1}\) (which together with v gives a red \(K_s\)) or a blue \(K_t\).

Case 2: \(|B| \geq R(s,t-1)\). Similarly, we find a red \(K_s\) or a blue \(K_t\).

In both cases the conclusion holds. The existence of \(R(s,t)\) follows by induction, and the recurrence gives \(R(s,t) \leq \binom{s+t-2}{s-1}\).

Historical Note. Frank Plumpton Ramsey (1903–1930) proved this theorem as a tool in mathematical logic, to establish a decision procedure for a fragment of first-order logic. He died at age 26, but his brief mathematical career left an extraordinary legacy. The theorem that bears his name launched an entire field.

3.2 Bounds on Ramsey Numbers

The study of Ramsey numbers — their precise values and asymptotic behaviour — is central to combinatorics. We know embarrassingly few exact values.

Example (Small Ramsey Numbers).
\(R(3,3) = 6\): Colour the edges of \(K_5\) as the Petersen complement — five red and five blue, forming two 5-cycles. This shows \(R(3,3) > 5\). The upper bound \(R(3,3) \leq R(2,3) + R(3,2) = 3 + 3 = 6\) completes the proof.

\(R(3,4) = 9\), \(R(3,5) = 14\), \(R(4,4) = 18\). The value \(R(5,5)\) is unknown; the best bounds are \(43 \leq R(5,5) \leq 48\).

The Erdős–Szekeres Upper Bound

Theorem 3.2 (Erdős–Szekeres, 1935). For all \(s, t \geq 2\), \[ R(s,t) \leq \binom{s+t-2}{s-1}. \]

In particular, \(R(k,k) \leq \binom{2k-2}{k-1} \leq \frac{4^{k-1}}{\sqrt{\pi k}}\) (by Stirling’s approximation).

The Erdős Probabilistic Lower Bound

In 1947, Erdős proved a remarkable lower bound using the probabilistic method — one of the first applications of this technique.

Theorem 3.3 (Erdős, 1947). If \(\binom{n}{k} \cdot 2^{1 - \binom{k}{2}} < 1\), then \(R(k,k) > n\). In particular, \[ R(k,k) > \lfloor 2^{k/2} \rfloor. \]

Proof. Colour each edge of \(K_n\) red or blue independently with probability \(1/2\). For a fixed set \(S\) of k vertices, the probability that S is monochromatic (all red or all blue) is \(2 \cdot 2^{-\binom{k}{2}} = 2^{1 - \binom{k}{2}}\).

By the union bound, the probability that some k-set is monochromatic is at most \[ \binom{n}{k} \cdot 2^{1-\binom{k}{2}}. \]

If this is less than 1, there exists a colouring with no monochromatic \(K_k\), so \(R(k,k) > n\). Setting \(n = \lfloor 2^{k/2} \rfloor\) and checking the inequality (for large k) gives the result.

Remark. This was the first application of the probabilistic method to combinatorics: a non-constructive proof that specific combinatorial objects exist, by showing a random object has the desired property with positive probability. Erdős's proof launched the probabilistic method as a major tool in combinatorics.

Off-Diagonal Ramsey Numbers

Theorem 3.4. For fixed s and \(t \to \infty\), \[ c_1 \cdot \frac{t^{(s+1)/2}}{(\log t)^{(s+1)/2 - 1/(s-1)}} \leq R(s,t) \leq c_2 \cdot \frac{t^{s-1}}{(\log t)^{s-2}} \]

for constants \(c_1, c_2\) depending on s. The upper bound is due to Ajtai-Komlós-Szemerédi (1980) for \(s=3\) and general s due to others. The lower bound for \(s = 3\) is \(R(3,t) = \Theta(t^2/\log t)\), established by Kim (1995, lower bound) and Shearer (upper bound).

3.3 The Diagonal Ramsey Number: Recent Breakthroughs

For 75 years, the best bounds on diagonal Ramsey numbers were essentially:

\[ \sqrt{2}^k \lesssim R(k,k) \lesssim 4^k. \]

Theorem 3.5 (Campos–Griffiths–Morris–Sahasrabudhe, 2023). There exists a constant \(\varepsilon > 0\) such that \[ R(k,k) \leq (4 - \varepsilon)^k \]

for all sufficiently large k.

Remark. This is a monumental breakthrough. The key idea involves a novel "book algorithm" — a modification of the classical Erdős–Szekeres recursive argument where, instead of always stepping to a vertex with the most monochromatic neighbours, one balances several graph-theoretic parameters simultaneously. The proof is remarkably short and elegant for such a long-standing problem.

3.4 Schur’s Theorem

Theorem 3.6 (Schur, 1916). For every positive integer r, there exists a smallest integer \(S(r)\) such that for every r-colouring of \([S(r)]\), there exist x, y, z (not necessarily distinct from each other, but the equation is non-trivial) of the same colour with \(x + y = z\).

Proof. We derive Schur's theorem from Ramsey's theorem. Given an r-colouring \(c: [n] \to [r]\), define an edge colouring of \(K_n\) by colouring the edge \(\{i,j\}\) (with \(i < j\)) with colour \(c(j - i)\). By Ramsey's theorem, if n is large enough, there is a monochromatic triangle \(\{a, b, c\}\) with \(a < b < c\). The edges \(\{a,b\}, \{b,c\}, \{a,c\}\) all have the same colour, meaning \(c(b-a) = c(c-b) = c(c-a)\). Setting \(x = b - a\), \(y = c - b\), \(z = c - a\), we have \(x + y = z\) and all three have the same colour.

Thus \(S(r) \leq R_r(3,3,\ldots,3) - 1\), where the Ramsey number has r colours. More precisely, \(S(r) \leq R(3,3,\ldots,3;r) - 1\).

3.5 Van der Waerden’s Theorem

Van der Waerden’s theorem, proved in 1927, is one of the foundational results of Ramsey theory on the integers.

Theorem 3.7 (Van der Waerden, 1927). For every positive integer r (number of colours) and every positive integer k (length of arithmetic progression), there exists a smallest integer \(W(r,k)\) such that every r-colouring of \([W(r,k)]\) contains a monochromatic arithmetic progression of length k.

The original proof of van der Waerden gives tower-type bounds. We outline how van der Waerden’s theorem follows from the Hales-Jewett theorem (discussed in Section 7).

Remark. The bounds on \(W(r,k)\) are notoriously enormous. Gowers (2001) gave the first reasonable upper bound as part of his celebrated proof of the quantitative Szemerédi theorem: \(W(2,k) \leq 2^{2^{2^{2^{2^{k+9}}}}}\). Subsequent improvements by Green-Tao, Szemerédi, and others have reduced the tower height, but the bounds remain far from the conjectured polynomial lower bound.

3.6 Rado’s Theorem and Partition Regularity

Rado’s theorem provides a complete characterization of which systems of linear equations are “partition regular.”

Definition. A system of linear equations \(Ax = 0\) (where A is an \(m \times n\) integer matrix) is partition regular if, for every finite colouring of the positive integers, there is a monochromatic solution \(x_1, \ldots, x_n\) (all the same colour).

Theorem 3.8 (Rado, 1933). A single equation \(c_1 x_1 + c_2 x_2 + \cdots + c_n x_n = 0\) with integer coefficients is partition regular if and only if some non-empty subset of the coefficients sums to zero. More generally, a system \(Ax = 0\) is partition regular if the columns condition is satisfied: the columns of A can be ordered \(v_1, \ldots, v_n\) and partitioned into groups \(B_1, \ldots, B_k\) such that \(\sum_{v \in B_1} v = 0\), and for each \(i > 1\), \(\sum_{v \in B_i} v\) is in the real span of columns from \(B_1 \cup \cdots \cup B_{i-1}\).

Example. The equation \(x + y = z\) (i.e., \(x + y - z = 0\)) is partition regular since the coefficients \(\{1, -1\}\) or \(\{1, -1\}\) — actually, since \(1 + (-1) = 0\) (taking the subset \(\{1, -1\}\)), Rado's condition is satisfied. This recovers Schur's theorem. The equation \(x + y = 2z\) is also partition regular (this gives monochromatic 3-term arithmetic progressions — a special case of van der Waerden's theorem). However, \(x + y = 3z\) is not partition regular, as no non-empty subset of \(\{1, 1, -3\}\) sums to zero.

3.7 Ramsey’s Theorem for Hypergraphs

The original theorem of Ramsey applies to hypergraphs as well.

Theorem 3.9 (Ramsey, 1930 — Hypergraph Version). For all positive integers \(r, k, s, t\) with \(k \geq 1\), there exists a smallest integer \(R_k(s,t)\) such that every 2-colouring of the k-element subsets of an \(R_k(s,t)\)-element set contains either a red set of size s (all whose k-subsets are red) or a blue set of size t.

Remark. For \(k = 1\), this is just the pigeonhole principle. For \(k = 2\), this is the graph Ramsey theorem. For \(k \geq 3\), the bounds grow much faster: the stepping-up lemma of Erdős and Rado shows that \(R_k(s,t)\) involves tower functions of height growing with k. In particular, Erdős, Hajnal, and Rado showed that \(R_3(s,s)\) grows as a tower of 2's of height \(\Theta(s)\).

4. Algebraic and Linear Algebraic Methods

Some of the most elegant proofs in extremal combinatorics use algebra — particularly linear algebra and polynomial methods. The general philosophy is: translate a combinatorial problem into a question about linear independence, dimension, or polynomial vanishing, then use algebraic constraints to derive the desired bounds.

4.1 The Dimension Argument

The simplest algebraic technique is the dimension argument: if we can associate linearly independent vectors with the objects we are counting, then their number is at most the dimension of the ambient space.

Theorem 4.1 (Oddtown Theorem). Let \(A_1, \ldots, A_m \subseteq [n]\) be such that \(|A_i|\) is odd for all i, and \(|A_i \cap A_j|\) is even for all \(i \neq j\). Then \(m \leq n\).

Proof. Work over \(\mathbb{F}_2\). Represent each set \(A_i\) by its characteristic vector \(v_i \in \mathbb{F}_2^n\). The condition \(|A_i \cap A_j| \equiv 0 \pmod{2}\) for \(i \neq j\) translates to \(\langle v_i, v_j \rangle = 0\) over \(\mathbb{F}_2\), while \(|A_i| \equiv 1 \pmod{2}\) means \(\langle v_i, v_i \rangle = 1\).

We claim the vectors \(v_1, \ldots, v_m\) are linearly independent over \(\mathbb{F}_2\). Suppose \(\sum_{i \in S} v_i = 0\) for some non-empty \(S \subseteq [m]\). Taking the inner product with \(v_j\) for \(j \in S\): \[ 0 = \left\langle \sum_{i \in S} v_i, v_j \right\rangle = \sum_{i \in S} \langle v_i, v_j \rangle = \langle v_j, v_j \rangle = 1, \]

a contradiction. Therefore \(m \leq n\).

Theorem 4.2 (Eventown Theorem). Let \(A_1, \ldots, A_m \subseteq [n]\) be such that \(|A_i \cap A_j|\) is even for all \(i, j\) (including \(i = j\), i.e., all sets have even size). Then \(m \leq 2^{\lfloor n/2 \rfloor}\).

Proof. Over \(\mathbb{F}_2\), the characteristic vectors \(v_1, \ldots, v_m\) satisfy \(\langle v_i, v_j \rangle = 0\) for all i, j. Thus the span of \(\{v_1, \ldots, v_m\}\) is a totally isotropic subspace of \(\mathbb{F}_2^n\), which has dimension at most \(\lfloor n/2 \rfloor\). Hence \(m \leq 2^{\lfloor n/2 \rfloor}\).

4.2 Fisher’s Inequality

Theorem 4.3 (Fisher's Inequality). If \(\mathcal{F} = \{A_1, \ldots, A_m\}\) is a family of subsets of \([n]\) with \(|A_i \cap A_j| = \lambda\) for all \(i \neq j\) (a 1-design or uniform Fisher family), and \(|A_i| > \lambda\) for all i, then \(m \leq n\).

Proof. Over \(\mathbb{R}\), consider the incidence matrix \(M\) whose rows are the characteristic vectors of \(A_1, \ldots, A_m\). The Gram matrix \(G = MM^T\) has diagonal entries \(|A_i|\) and off-diagonal entries \(\lambda\). We can write \(G = (\lambda)J + D\) where J is the all-ones matrix and D is diagonal with entries \(|A_i| - \lambda > 0\). Since D is positive definite and \((\lambda)J\) is positive semidefinite, G is positive definite, so \(\operatorname{rank}(G) = m\). Since \(G = MM^T\), we have \(m = \operatorname{rank}(G) \leq \operatorname{rank}(M) \leq n\).

4.3 The Polynomial Method

The polynomial method is one of the most powerful paradigms in combinatorics. The key idea: represent combinatorial objects as zeros of polynomials, then use the algebraic structure of polynomial rings to derive bounds.

The Combinatorial Nullstellensatz

Theorem 4.4 (Alon, 1999 — Combinatorial Nullstellensatz). Let \(\mathbb{F}\) be a field and let \(f \in \mathbb{F}[x_1, \ldots, x_n]\) be a polynomial of degree \(\sum_{i=1}^n t_i\), where each \(t_i\) is a non-negative integer. Suppose the coefficient of \(\prod_{i=1}^n x_i^{t_i}\) in f is nonzero. Then for any sets \(S_1, \ldots, S_n \subseteq \mathbb{F}\) with \(|S_i| > t_i\), there exists \((a_1, \ldots, a_n) \in S_1 \times \cdots \times S_n\) with \(f(a_1, \ldots, a_n) \neq 0\).

Remark. The Combinatorial Nullstellensatz has an enormous range of applications: to restricted sumsets (the Cauchy-Davenport theorem), graph colourings (choosability results), permanent lower bounds, and more. It is a key bridge between algebra and combinatorics.

The Chevalley–Warning Theorem and Applications

Theorem 4.5 (Chevalley–Warning). Let \(f_1, \ldots, f_m \in \mathbb{F}_p[x_1, \ldots, x_n]\) with \(\sum_{j=1}^m \deg(f_j) < n\). Let \(V = \{x \in \mathbb{F}_p^n : f_1(x) = \cdots = f_m(x) = 0\}\). Then \(p \mid |V|\). In particular, if \(V \neq \emptyset\) (e.g., if all \(f_j\) vanish at the origin), then \(|V| \geq p\).

4.4 The Alon–Babai–Suzuki Theorem

Generalizing the oddtown and eventown results, the Alon-Babai-Suzuki theorem provides modular intersection bounds.

Theorem 4.6 (Alon–Babai–Suzuki, 1991). Let p be a prime, and let \(\mathcal{F} = \{A_1, \ldots, A_m\}\) be a family of subsets of \([n]\) with \(|A_i| \equiv 0 \pmod{p}\) for all i, and \(|A_i \cap A_j| \not\equiv 0 \pmod{p}\) for all \(i \neq j\). Then \(m \leq n\).

The proof uses the multilinear polynomial technique: represent the intersection conditions via polynomials over \(\mathbb{F}_p\), show the associated vectors are linearly independent, and use dimension counting.

4.5 The Caro–Wei Bound

We include a result from graph theory that beautifully illustrates the probabilistic–algebraic interplay.

Theorem 4.7 (Caro, 1979; Wei, 1981). Every graph G has an independent set of size at least \[ \sum_{v \in V(G)} \frac{1}{\deg(v) + 1}. \]

Proof. Let \(\sigma\) be a uniformly random permutation of \(V(G)\). Define a vertex v to be locally first if v appears before all its neighbours in \(\sigma\). The set of locally first vertices is an independent set (no two adjacent vertices can both be locally first).

The probability that v is locally first is \(1/(\deg(v) + 1)\), since among v and its \(\deg(v)\) neighbours, each is equally likely to be first. By linearity of expectation, the expected size of the independent set is \(\sum_v 1/(\deg(v) + 1)\). Hence an independent set of at least this size exists.

4.6 The Linear Algebra Bound Method (Babai–Frankl)

The general framework for applying linear algebra to extremal combinatorics was systematized by Babai and Frankl. The idea is:

Associate to each combinatorial object a vector in some vector space \(V\).
Translate the combinatorial constraints into linear algebraic conditions (orthogonality, linear independence, etc.).
Use the dimension of \(V\) to bound the number of objects.

Theorem 4.8 (Non-uniform Ray-Chaudhuri–Wilson). Let \(L = \{\ell_1, \ldots, \ell_s\}\) be a set of s non-negative integers. If \(\mathcal{F} = \{A_1, \ldots, A_m\} \subseteq 2^{[n]}\) satisfies \(|A_i \cap A_j| \in L\) for all \(i \neq j\), then \[ m \leq \binom{n}{s}. \]

Proof Sketch. For each set \(A_i\), define the polynomial \(f_i(x) = \prod_{l \in L}(\sum_{j \in A_i} x_j - l)\) restricted to multilinear polynomials. The key properties are: \(f_i(\mathbf{1}_{A_j}) = 0\) for \(i \neq j\) (since \(|A_i \cap A_j| \in L\)), and \(f_i(\mathbf{1}_{A_i}) \neq 0\) (since \(|A_i| \notin L\) can be arranged). The polynomials \(f_1, \ldots, f_m\) are linearly independent over \(\mathbb{R}\), and each has degree at most s as a multilinear polynomial, so they live in a space of dimension \(\sum_{j=0}^{s} \binom{n}{j} \leq \binom{n}{s}\) (for the appropriate range).

5. The Szemerédi Regularity Lemma in Extremal Combinatorics

The Szemerédi regularity lemma is one of the most powerful tools in combinatorics. Introduced by Szemerédi in 1975 as a key ingredient in his proof of the Erdős–Turán conjecture on arithmetic progressions, it has since become a cornerstone of graph theory, with applications ranging from extremal combinatorics to number theory to theoretical computer science.

5.1 Statement of the Regularity Lemma

Definition (\(\varepsilon\)-regular pair). Let G be a graph and let \(A, B\) be disjoint subsets of \(V(G)\). The density of the pair \((A,B)\) is \[ d(A,B) = \frac{e(A,B)}{|A||B|}, \]

where \(e(A,B)\) is the number of edges between A and B. The pair \((A,B)\) is \(\varepsilon\)-regular if for every \(A' \subseteq A\) with \(|A'| \geq \varepsilon |A|\) and every \(B' \subseteq B\) with \(|B'| \geq \varepsilon |B|\), we have

\[ |d(A',B') - d(A,B)| < \varepsilon. \]

Informally, an \(\varepsilon\)-regular pair behaves like a random bipartite graph: the edge density is approximately uniform across all large subsets. The regularity lemma says every dense graph can be partitioned into a bounded number of parts, almost all pairs of which are \(\varepsilon\)-regular.

Theorem 5.1 (Szemerédi Regularity Lemma, 1975). For every \(\varepsilon > 0\) and every positive integer \(m_0\), there exist integers \(M = M(\varepsilon, m_0)\) and \(N = N(\varepsilon, m_0)\) such that every graph G on \(n \geq N\) vertices admits a partition \(V(G) = V_0 \cup V_1 \cup \cdots \cup V_k\) with the following properties:
(i) \(m_0 \leq k \leq M\),
(ii) \(|V_0| \leq \varepsilon n\) (the "exceptional" set),
(iii) \(|V_1| = |V_2| = \cdots = |V_k|\),
(iv) All but at most \(\varepsilon \binom{k}{2}\) pairs \((V_i, V_j)\) with \(1 \leq i < j \leq k\) are \(\varepsilon\)-regular.

Remark. The bound on M is necessarily enormous: Gowers (1997) showed that any bound must be at least a tower of 2's of height polynomial in \(1/\varepsilon\). This is essentially tight, as Szemerédi's original proof gives a tower-type bound.

5.2 Key Lemmas: Embedding and Counting

The power of the regularity lemma comes through two companion results.

Theorem 5.2 (Embedding Lemma / Key Lemma). Let H be a graph with vertex set \(\{1, \ldots, h\}\) and let \(\varepsilon > 0\) and \(d > 0\). If \(\varepsilon < d^h / (2h)\) (a sufficient condition), then the following holds: suppose \(V_1, \ldots, V_h\) are pairwise disjoint vertex sets with \(|V_i| \geq m\), and for every edge \(\{i,j\}\) of H, the pair \((V_i, V_j)\) is \(\varepsilon\)-regular with density at least d. Then the number of labelled copies of H with vertex i in \(V_i\) is at least \[ (d - \varepsilon)^{e(H)} \prod_{i=1}^{h} |V_i| - \text{lower order terms}. \]

In particular, at least one copy of H exists.

Theorem 5.3 (Counting Lemma). Under the same conditions as the embedding lemma, the number of copies of H is \[ \left(\prod_{\{i,j\} \in E(H)} d(V_i, V_j) \pm c(\varepsilon, h)\right) \prod_{i=1}^{h} |V_i|, \]

where \(c(\varepsilon, h) \to 0\) as \(\varepsilon \to 0\) for fixed h.

5.3 The Graph Removal Lemma

One of the most important consequences of the regularity lemma is the graph removal lemma, which has far-reaching applications.

Theorem 5.4 (Graph Removal Lemma — Ruzsa-Szemerédi, 1978; general version Erdős-Frankl-Rödl, Alon et al.). For every graph H on h vertices and every \(\varepsilon > 0\), there exists \(\delta > 0\) such that: if a graph G on n vertices contains at most \(\delta n^h\) copies of H, then G can be made H-free by removing at most \(\varepsilon n^2\) edges.

Proof Sketch. Apply the regularity lemma to G with an appropriate \(\varepsilon'\) (depending on \(\varepsilon\) and H). Remove all edges within the exceptional set, all edges in irregular pairs, and all edges in regular pairs of density below a threshold d. This removes at most \(\varepsilon n^2 / 2\) edges. The remaining graph is "clean" — it consists of regular pairs of density at least d. If G still contains a copy of H, the counting lemma implies there are at least \(c \cdot n^h\) copies, contradicting the assumption that there are at most \(\delta n^h\) for small enough \(\delta\). Hence the cleaned graph is H-free, and we removed at most \(\varepsilon n^2\) edges in total.

Corollary 5.5 (Triangle Removal Lemma). For every \(\varepsilon > 0\), there exists \(\delta > 0\) such that any graph on n vertices with at most \(\delta n^3\) triangles can be made triangle-free by removing at most \(\varepsilon n^2\) edges.

Remark. The triangle removal lemma immediately implies the Roth's theorem: every subset of \([n]\) of positive density contains a 3-term arithmetic progression. The connection goes through the triangle removal lemma applied to a carefully constructed tripartite graph encoding the arithmetic structure.

5.4 Ramsey–Turán Problems

The interplay between Ramsey theory and Turán-type questions leads to the Ramsey-Turán theory.

Definition. The Ramsey-Turán number \(\mathrm{RT}(n, H, f(n))\) is the maximum number of edges in a graph on n vertices that contains no copy of H and has independence number at most \(f(n)\).

Theorem 5.6 (Szemerédi, 1972; Erdős-Sós, 1970). \[ \mathrm{RT}(n, K_4, o(n)) = \left(\frac{1}{8} + o(1)\right)n^2. \]

This contrasts with \(\operatorname{ex}(n, K_4) = (1/3 + o(1))\binom{n}{2}\) from Turán’s theorem, showing that the additional independence number constraint dramatically changes the extremal number.

5.5 The Sparse Regularity Lemma

The classical regularity lemma applies only to dense graphs (with \(\Theta(n^2)\) edges). Extending it to sparse graphs requires significant new ideas.

Theorem 5.7 (Kohayakawa, 1997; Rödl, 1997 — Sparse Regularity Lemma). There is an analogue of the regularity lemma for subgraphs of the random graph \(G(n,p)\), where density is measured relative to \(p\). Specifically, for every \(\varepsilon > 0\) and integer \(m_0\), if G is a subgraph of \(G(n,p)\) with \(p \gg n^{-1/d}\) for appropriate d, then G admits a regular partition in the relative sense.

Remark. The sparse regularity lemma, combined with a sparse counting lemma (proved in full generality by Conlon-Fox-Zhao, 2014, building on work of many others), has become the standard tool for transferring results from dense to sparse settings. It was a key ingredient in the Green-Tao theorem on arithmetic progressions in the primes.

6. Flag Algebras and Modern Methods

The past two decades have seen a revolution in extremal combinatorics through the introduction of analytic and algebraic methods that go far beyond classical combinatorial reasoning. Central among these is Razborov’s flag algebra framework, which has resolved or made progress on numerous long-standing conjectures.

6.1 Graph Limits and Graphons

To motivate flag algebras, we first discuss the theory of graph limits, developed by Lovász and Szegedy (2006).

Definition (Homomorphism Density). For graphs F and G, the homomorphism density is \[ t(F, G) = \frac{|\operatorname{Hom}(F, G)|}{|V(G)|^{|V(F)|}}, \]

where \(\operatorname{Hom}(F, G)\) is the set of graph homomorphisms from F to G. The subgraph density is

\[ p(F, G) = \Pr[\text{random } |V(F)|\text{-subset of } V(G) \text{ induces a copy of } F]. \]

Definition (Graphon). A graphon is a symmetric measurable function \(W: [0,1]^2 \to [0,1]\). A graphon represents the limit of a convergent graph sequence \((G_n)\) if for every fixed graph F, \[ t(F, G_n) \to t(F, W) = \int_{[0,1]^{|V(F)|}} \prod_{\{i,j\} \in E(F)} W(x_i, x_j) \, dx_1 \cdots dx_{|V(F)|}. \]

Theorem 6.1 (Lovász–Szegedy, 2006). Every convergent sequence of graphs has a graphon limit, and conversely, every graphon is the limit of some graph sequence. The space of graphons (up to measure-preserving bijections) is compact.

Example. The Turán graph \(T(n,r)\), as \(n \to \infty\), converges to the graphon \(W(x,y) = 1 - \sum_{i=1}^{r} \mathbf{1}_{[(i-1)/r, i/r]}(x) \cdot \mathbf{1}_{[(i-1)/r, i/r]}(y)\), which is 1 except on \(r\) squares of side \(1/r\) along the diagonal.

6.2 Quasirandomness

The theory of graph quasirandomness, initiated by Thomason (1987) and developed by Chung, Graham, and Wilson (1989), reveals that several seemingly different notions of “random-like” behaviour in graphs are in fact equivalent.

Theorem 6.2 (Chung–Graham–Wilson, 1989). For a sequence of graphs \((G_n)\) with \(|V(G_n)| = n\) and edge density \(e(G_n)/\binom{n}{2} \to p\), the following properties are equivalent:

(P1) Subgraph counts: For every graph H, \(p(H, G_n) \to p^{e(H)}\).

(P2) 4-cycle count: \(p(C_4, G_n) \to p^4\).

(P3) Codegree: For all but \(o(n)\) vertices u, the number of common neighbours of u and v is \((p^2 + o(1))n\) for all but \(o(n)\) vertices v.

(P4) Eigenvalue gap: \(\lambda_1(G_n) = (p + o(1))n\) and \(|\lambda_2(G_n)| = o(n)\), where \(\lambda_1 \geq \lambda_2 \geq \cdots\) are the eigenvalues of the adjacency matrix.

(P5) Discrepancy: For all \(S, T \subseteq V(G_n)\), \(|e(S,T) - p|S||T|| = o(n^2)\).

Remark. The remarkable feature is that the single property (P2) — counting 4-cycles — implies all the others. This means that controlling a single subgraph count forces global random-like behaviour. The theory extends to hypergraphs (Gowers, Rödl et al.), where the situation is much more complex, as multiple non-equivalent notions of quasirandomness arise.

6.3 Razborov’s Flag Algebra Framework

Flag algebras, introduced by Alexander Razborov in 2007, provide a systematic algebraic framework for establishing inequalities between subgraph densities.

Definition (Informal). A flag is a graph together with a labelled subgraph (the "type"). The flag algebra \(\mathcal{A}\) (for a given type) is a commutative algebra whose elements represent linear combinations of graph densities, subject to the relations that hold for all sufficiently large graphs.

The power of flag algebras comes from the following observation: if \(\phi\) is a homomorphism from the flag algebra to \(\mathbb{R}\) (representing the densities in some limit graph), then:

\(\phi\) maps non-negative elements (sums of squares) to non-negative reals.
Proving \(\phi(f) \geq 0\) for all valid \(\phi\) establishes a universal inequality on subgraph densities.
These non-negativity conditions can be verified via semidefinite programming (SDP).

Theorem 6.3 (Razborov, 2010). Among all graphs with edge density \(1/2\), the minimum triangle density is \[ p(K_3, G) \geq \frac{1}{2} - \frac{1}{2}\left(1 - 2 \cdot \frac{e(G)}{\binom{n}{2}}\right)^{3/2}. \]

For edge density exactly \(1/2\), the minimum triangle density is \(1/2 - 1/(2\sqrt{2}) \approx 0.1464\), achieved by a random-like construction. This resolves a conjecture of Erdős on the “pentagon problem” for triangles.

Proof Sketch. One writes the triangle density as an element of the flag algebra, then expresses the difference between \(p(K_3, G)\) and the conjectured lower bound as a sum of squares of flag algebra elements (corresponding to a positive semidefinite matrix). The sum-of-squares decomposition is found numerically via SDP, then verified exactly (rationalizing the coefficients). The extremal graph is identified by analyzing when equality holds.

Remark. Flag algebras have resolved or advanced numerous problems: the minimum number of triangles in a graph of given density, Turán densities for small hypergraphs, the Caccetta-Häggkvist conjecture for small cases, and the Erdős pentagon problem. The method is remarkable for being both computer-assisted and rigorous: the SDP provides a certificate that can be verified by hand (in principle).

6.4 The Container Method

The container method, developed independently by Balogh, Morris, and Samotij (2015) and Saxton and Thomason (2015), is one of the most versatile modern tools in combinatorics.

Theorem 6.4 (Hypergraph Container Lemma — Informal). For a uniform hypergraph H on vertex set V with "not too many" edges, there exists a small collection \(\mathcal{C}\) of "containers" — subsets of V that are "almost independent" (contain few edges) — such that every independent set of H is contained in some container. The number of containers is much smaller than \(2^{|V|}\), and each container is close to the expected size of a random independent set.

Remark. The container method has resolved numerous conjectures. Key applications include:
(1) Counting the number of H-free graphs on n vertices (showing it is \(2^{(1+o(1))\operatorname{ex}(n,H)}\) for many H).
(2) Showing that a random H-free graph is "essentially" a subgraph of an extremal graph.
(3) Bounding the number of sum-free sets, Sidon sets, and other combinatorial structures.
(4) Proving transference results for Ramsey and Turán problems in sparse random settings.

7. Advanced Ramsey Theory

We now turn to deeper results in Ramsey theory, including the Hales-Jewett theorem, Euclidean Ramsey theory, and recent breakthroughs.

7.1 The Hales–Jewett Theorem

The Hales-Jewett theorem is in many ways the most fundamental result in Ramsey theory, as many other results (van der Waerden, Szemerédi, etc.) can be derived from it.

Definition (Combinatorial Line). Let \([k]^n\) denote the set of words of length n over the alphabet \([k] = \{1, 2, \ldots, k\}\). A combinatorial line is a set of k words \(\{w_1, \ldots, w_k\} \subseteq [k]^n\) obtained by choosing a non-empty set of "active" coordinates \(S \subseteq [n]\) and fixing the inactive coordinates: for each \(j \in [k]\), the word \(w_j\) has the value j at all active coordinates and the same fixed value at all inactive coordinates.

Example. In \(\{1,2,3\}^4\), with active coordinates \(\{1,3\}\) and inactive values \((*, 2, *, 1)\), the combinatorial line is \(\{(1,2,1,1), (2,2,2,1), (3,2,3,1)\}\).

Theorem 7.1 (Hales–Jewett, 1963). For every positive integer k (alphabet size) and every positive integer r (number of colours), there exists an integer \(\mathrm{HJ}(k, r)\) such that for every \(n \geq \mathrm{HJ}(k, r)\), every r-colouring of \([k]^n\) contains a monochromatic combinatorial line.

Remark. Van der Waerden's theorem follows from Hales-Jewett by projecting: the map \(w \mapsto \sum_{i=1}^n w_i\) sends \([k]^n\) to \(\{n, n+1, \ldots, kn\}\), and a combinatorial line maps to a k-term arithmetic progression. Thus the Hales-Jewett theorem is strictly stronger.

We now present Shelah’s proof, which gives primitive recursive bounds.

Proof Sketch (Shelah, 1988). Shelah's proof gives a much better bound than the original Hales-Jewett argument. The key innovation is a "colour focusing" technique combined with a product argument. The proof proceeds by double induction on the number of colours r and the alphabet size k.

Base case: For \(k = 1\), the result is trivial. For \(r = 1\), any single word forms a (trivial) monochromatic line.

Inductive step: Assume the theorem for \((k, r-1)\) and for \((k-1, r)\). Shelah's argument constructs a high-dimensional cube \([k]^N\) by iterating the following:

(1) Partition \([k]^N\) into "slices" based on the first group of coordinates. By the inductive hypothesis for fewer colours, within each slice we can find large monochromatic combinatorial subspaces.

(2) The "colour focusing" step: by the Hales-Jewett theorem for a smaller alphabet (induction on k), the patterns of colours across slices must contain a monochromatic structure.

(3) Combining these gives a monochromatic line in the full cube. The dimension N grows as an iterated exponential (primitive recursive), a dramatic improvement over the original proof which gave Ackermann-type bounds.

7.2 The Graham–Rothschild Theorem

The Graham-Rothschild theorem generalizes the Hales-Jewett theorem from lines to higher-dimensional combinatorial subspaces.

Definition. A combinatorial m-subspace of \([k]^n\) is a set of \(k^m\) words parameterized by \([k]^m\), obtained by choosing m disjoint non-empty sets of active coordinates \(S_1, \ldots, S_m \subseteq [n]\) and fixing the remaining coordinates. For a parameter word \((a_1, \ldots, a_m) \in [k]^m\), the corresponding word in \([k]^n\) has value \(a_i\) at all coordinates in \(S_i\) and the fixed values elsewhere.

Theorem 7.2 (Graham–Rothschild, 1971). For all positive integers \(k, m, r\), there exists \(N\) such that every r-colouring of the combinatorial m-subspaces of \([k]^N\) contains a monochromatic combinatorial \((m+1)\)-subspace. More generally, every r-colouring of the m-parameter words contains a monochromatic \((m+p)\)-parameter word for any given p.

7.3 Euclidean Ramsey Theory

Euclidean Ramsey theory studies which geometric configurations must appear monochromatically in any finite colouring of \(\mathbb{R}^d\).

Definition. A finite set \(X \subseteq \mathbb{R}^d\) is Ramsey if for every positive integer r, there exists a dimension D such that every r-colouring of \(\mathbb{R}^D\) contains a monochromatic congruent copy of X.

Theorem 7.3 (Erdős–Graham–Montgomery–Rothschild–Spencer–Straus, 1973). Every finite set that can be embedded on a sphere is Ramsey. In particular, all simplices (including equilateral triangles, regular tetrahedra, etc.) are Ramsey. Conversely, any Ramsey set must be embeddable on a sphere.

Remark. The characterization of Ramsey sets in Euclidean space was completed by Frankl and Rödl (1990) and Kříž (1991). A set is Ramsey if and only if it is "spherical" — it can be placed on the surface of a sphere. The proof of the sufficient condition uses the partition calculus and the product Ramsey theorem applied to high-dimensional lattices.

7.4 Folkman’s Theorem and Partition Regularity

Theorem 7.4 (Folkman, 1970; Rado-Folkman-Sanders). For every finite colouring of the positive integers and every \(k\), there exists a finite set \(B\) such that all subset sums of B (including each element of B itself) up to size k are monochromatic. Equivalently, there exists a monochromatic set \(\{x_1, \ldots, x_m\}\) such that all sums of at most k distinct elements have the same colour.

Remark. Folkman's theorem can be derived from the Hales-Jewett theorem. The finite sums version (where all non-empty subset sums are monochromatic) is the Folkman-Rado-Sanders theorem, also known as the finite sums theorem or Hindman's theorem in its infinite version.

7.5 The Ramsey Multiplicity Problem

Beyond the existence of monochromatic subgraphs, one can ask: how many monochromatic copies must there be?

Definition. The Ramsey multiplicity \(M(H)\) of a graph H is the minimum, over all 2-colourings of \(K_n\), of the number of monochromatic copies of H, divided by \(n^{|V(H)|}\), in the limit as \(n \to \infty\).

Theorem 7.5 (Goodman, 1959). Every 2-colouring of \(K_n\) contains at least \[ \frac{1}{4}\binom{n}{3}(1 + o(1)) \]

monochromatic triangles. Equivalently, \(M(K_3) = 1/4\) of the expected count under a random colouring. The extremal colouring is (close to) a random colouring or a balanced “quasi-random” colouring.

Remark. Whether \(M(K_4) = 1/33\) (the value predicted by a random colouring) was a long-standing question. Thomason (1989) showed that the random colouring is not optimal for \(K_4\), disproving the "common graph" conjecture of Erdős. The minimum Ramsey multiplicity for \(K_4\) is now known to be achieved by non-quasirandom colourings, a surprising finding established using flag algebras.

7.6 Graph Ramsey Numbers for Sparse Graphs

For sparse graphs (e.g., bounded-degree graphs, trees, cycles), the Ramsey numbers are much smaller than for cliques.

Theorem 7.6 (Chvátal–Rödl–Szemerédi–Trotter, 1983). There exists a constant C depending only on \(\Delta\) such that for every graph H on n vertices with maximum degree at most \(\Delta\), \[ R(H, H) \leq Cn. \]

That is, bounded-degree graphs have linear Ramsey numbers.

Proof Sketch. The proof uses the regularity lemma. Apply the regularity lemma to a 2-coloured \(K_N\), obtaining a regular partition. Since the partition has a bounded number of parts and one colour class has density at least \(1/2\), the embedding lemma guarantees a copy of the bounded-degree graph H in that colour class, provided \(N\) is large enough relative to n.

Theorem 7.7. For cycles, \(R(C_n, C_n) = 2n - 1\) for odd n (Faudree-Schelp, 1974; Rosta, 1973) and \(R(C_n, C_n) = \frac{3n}{2} - 1\) for even n (Rosta, 1973; Faudree-Schelp, 1974). For trees, \(R(T_n, T_n) \leq Cn\) with \(C\) depending on the maximum degree of the tree.

7.7 Recent Advances: The Campos–Griffiths–Morris–Sahasrabudhe Breakthrough

We return to the diagonal Ramsey number, the most famous problem in the field.

Theorem 7.8 (Campos–Griffiths–Morris–Sahasrabudhe, 2023). \[ R(k,k) \leq (4 - \varepsilon)^k \]

for some absolute constant \(\varepsilon > 0\) and all sufficiently large k.

The proof introduces a new approach: the “book algorithm.” We describe the high-level strategy.

Proof Sketch. The classical Erdős-Szekeres proof of \(R(k,k) \leq 4^k\) proceeds by the greedy algorithm: pick a vertex v, and step to either its red or blue neighbourhood, whichever is larger. This gives the recurrence \(R(k,k) \leq 2R(k-1,k) \leq 2 \cdot 2R(k-1,k-1)\), yielding the \(4^k\) bound.

Campos et al. modify this algorithm. They track not just the size of the red and blue neighbourhoods, but also a "book" — a collection of vertices that are connected to the current vertex set by many edges of both colours. At each step, the algorithm either:
(a) Steps to a large monochromatic neighbourhood as before, or
(b) Uses the book to find a large set where the bipartite density between it and the remaining vertices is "balanced" (close to 1/2 in both colours).

The key insight is that when the bipartite graph between the current set and the book is unbalanced, one of the two colours has density bounded away from 1/2, and in this case the stepping algorithm performs strictly better than the classical one. The saving is then quantified to give the \((4 - \varepsilon)^k\) bound.

Remark. This result represents the first exponential improvement to the upper bound for diagonal Ramsey numbers since Erdős and Szekeres in 1935. The value of \(\varepsilon\) in the original paper is very small, but it marks a qualitative breakthrough. The lower bound \(R(k,k) \geq (\sqrt{2} + o(1))^k\) (due to Spencer, 1975, improving Erdős's 1947 bound by a factor of 2) remains the best known.

7.8 Canonical Ramsey Theorems

Not every colouring yields a monochromatic structure, but with enough colours, a more nuanced conclusion holds.

Theorem 7.9 (Erdős–Rado Canonical Ramsey Theorem, 1950). For every positive integer k, there exists an integer \(n\) such that for every colouring (with any number of colours) of the k-element subsets of \([n]\), there exists a set \(S \subseteq [n]\) of size \(n\) such that the colouring restricted to \(\binom{S}{k}\) is canonical: the colour of a k-set \(\{a_1, \ldots, a_k\}\) (with \(a_1 < \cdots < a_k\)) depends only on some fixed subset of the positions \(\{1, \ldots, k\}\) — that is, there exists \(I \subseteq [k]\) such that two k-sets receive the same colour if and only if they agree on the coordinates indexed by I.

Remark. The four canonical types for edges (\(k = 2\)) on an infinite set are:
(1) Monochromatic (\(I = \emptyset\)): all edges the same colour.
(2) Lexicographic on the first element (\(I = \{1\}\)): edges \(\{a,b\}\) and \(\{a,c\}\) (with \(a < b, c\)) have the same colour.
(3) Lexicographic on the second element (\(I = \{2\}\)).
(4) Rainbow (\(I = \{1,2\}\)): all edges have distinct colours.
This gives a complete structural description of arbitrary edge colourings of large complete graphs.

7.9 Ramsey Theory and Infinite Combinatorics

We briefly touch on infinite Ramsey theory, which connects combinatorics to set theory and logic.

Theorem 7.10 (Infinite Ramsey Theorem). For any positive integer k and any finite colouring of the k-element subsets of an infinite set X, there exists an infinite monochromatic set \(Y \subseteq X\).

Proof. We prove this by induction on k. For \(k = 1\), this is the infinite pigeonhole principle: if \(X\) is infinite and finitely coloured, some colour class is infinite.

For the inductive step, let \(c: \binom{X}{k} \to [r]\) be a colouring. Pick any \(x_1 \in X\). The colouring c induces an \(r\)-colouring of the \((k-1)\)-element subsets of \(X \setminus \{x_1\}\) via \(S \mapsto c(\{x_1\} \cup S)\). By induction, there exists an infinite set \(X_1 \subseteq X \setminus \{x_1\}\) that is monochromatic for this induced colouring; let its colour be \(c_1\).

Now pick \(x_2 \in X_1\), and repeat: obtain an infinite monochromatic set \(X_2 \subseteq X_1 \setminus \{x_2\}\) with colour \(c_2\) for the colouring induced by \(x_2\). Continuing, we get a sequence \(x_1, x_2, x_3, \ldots\) and colours \(c_1, c_2, \ldots \in [r]\). By pigeonhole, some colour c appears infinitely often. The corresponding subsequence \(\{x_{i_1}, x_{i_2}, \ldots\}\) is monochromatic: any k-set \(\{x_{i_{j_1}}, \ldots, x_{i_{j_k}}\}\) (with \(j_1 < \cdots < j_k\)) has colour \(c_{i_{j_1}} = c\) because \(\{x_{i_{j_2}}, \ldots, x_{i_{j_k}}\} \subseteq X_{i_{j_1}}\).

Theorem 7.11 (Ramsey Cardinals — Brief Mention). In set theory, a cardinal \(\kappa\) is Ramsey if \(\kappa \to (\kappa)^{<\omega}_2\) — i.e., every 2-colouring of the finite subsets of \(\kappa\) admits an infinite homogeneous set. The existence of Ramsey cardinals cannot be proved in ZFC; it is a large cardinal axiom strictly stronger than the existence of measurable cardinals. The connection between Ramsey-type partition properties and large cardinals is a central theme in set theory.

7.10 Ramsey Theory and Model Theory

We briefly mention a beautiful connection between Ramsey theory and model theory.

Remark. The Nešetřil-Rödl theorem (1977) establishes that the class of finite ordered graphs has the Ramsey property: for every pair of finite ordered graphs \(A \subseteq B\), there exists a finite ordered graph C such that every 2-colouring of the copies of A in C contains a copy of B all of whose copies of A are monochromatic. This connects to the Kechris-Pestov-Todorčević correspondence (2005), which relates the Ramsey property of a Fraïssé class to the extreme amenability of its automorphism group. This provides a deep link between combinatorics, topological dynamics, and model theory that has become highly active in recent years.

Summary of Key Results

We conclude by collecting the main bounds and results discussed in these notes.

Key Extremal Numbers and Ramsey Bounds.

Turán-type:
\(\operatorname{ex}(n, K_{r+1}) = t(n,r) = \left(1 - \frac{1}{r}\right)\frac{n^2}{2} + O(n)\)
\(\operatorname{ex}(n, H) = \left(1 - \frac{1}{\chi(H)-1}\right)\frac{n^2}{2} + o(n^2)\) for \(\chi(H) \geq 3\) (Erdős–Stone–Simonovits)
\(\operatorname{ex}(n, K_{s,t}) = O(n^{2 - 1/s})\) (Kővári–Sós–Turán)

Set-theoretic:
Maximum antichain in \(2^{[n]}\): \(\binom{n}{\lfloor n/2 \rfloor}\) (Sperner)
Maximum intersecting family in \(\binom{[n]}{k}\): \(\binom{n-1}{k-1}\) for \(n \geq 2k\) (Erdős–Ko–Rado)
Bollobás set-pairs: \(m \leq \binom{a+b}{a}\)
VC dimension d family: \(|\mathcal{F}| \leq \sum_{i=0}^d \binom{n}{i}\) (Sauer–Shelah)

Ramsey numbers:
\(R(s,t) \leq \binom{s+t-2}{s-1}\) (Erdős–Szekeres)
\(R(k,k) > 2^{k/2}\) (Erdős, probabilistic method)
\(R(k,k) \leq (4 - \varepsilon)^k\) (Campos–Griffiths–Morris–Sahasrabudhe, 2023)
\(R(3,t) = \Theta(t^2/\log t)\) (Kim, Shearer)

The interplay between extremal combinatorics and Ramsey theory — through tools like the regularity lemma, the probabilistic method, algebraic techniques, flag algebras, and the container method — makes this one of the most dynamic and interconnected areas of modern mathematics. The subject continues to produce surprising connections and breakthrough results, as evidenced by the recent resolution of long-standing conjectures on diagonal Ramsey numbers, the sunflower conjecture, and union-closed sets.