CO 740: Additive Combinatorics

Estimated study time: 2 hr 20 min

Table of contents

These notes synthesize material from T. Tao and V.H. Vu’s Additive Combinatorics, Y. Zhao’s Graph Theory and Additive Combinatorics, M.B. Nathanson’s Additive Number Theory, and B. Green’s lecture notes on additive combinatorics, enriched with material from MIT 18.225 (Y. Zhao) and T. Gowers’s expository articles.

Chapter 1: Additive Structure in Abelian Groups

Additive combinatorics is, at its heart, the study of how algebraic and combinatorial structure interact in abelian groups. The most fundamental objects of study are sumsets: given subsets \(A\) and \(B\) of an abelian group \(G\), we form their sumset and ask how its size relates to the sizes of the constituent sets. This seemingly innocent question leads to a rich and deep theory connecting combinatorics, number theory, harmonic analysis, and ergodic theory. In this opening chapter, we lay the foundations by establishing the classical results on sumset growth — the Cauchy-Davenport theorem, Kneser’s theorem, and the Plünnecke-Ruzsa inequality — and we introduce the Ruzsa distance, a quasi-metric that captures the notion of additive closeness between sets.

1.1 Sumsets and Basic Definitions

Let \(G\) be an abelian group (written additively) and let \(A, B \subseteq G\) be non-empty finite subsets. The sumset of \(A\) and \(B\) is \[ A + B = \{a + b : a \in A, \, b \in B\}. \] More generally, for a positive integer \(k\), the \(k\)-fold sumset is \[ kA = \underbrace{A + A + \cdots + A}_{k \text{ times}} = \{a_1 + a_2 + \cdots + a_k : a_i \in A\}. \] The difference set is \[ A - B = \{a - b : a \in A, \, b \in B\}. \]

\[ \max(|A|, |B|) \le |A + B| \le |A| \cdot |B|. \]

The lower bound is achieved when one of the sets is a singleton, and the upper bound is achieved when the addition map is injective — that is, when there are no additive relations between elements of \(A\) and \(B\). The central question of sumset theory is: what structural conditions force \(|A+B|\) to be small relative to \(|A|\) and \(|B|\)?

Example 1.1. Let \(A = \{0, 1, 2, \ldots, n-1\}\) be an arithmetic progression in \(\mathbb{Z}\). Then \[ A + A = \{0, 1, 2, \ldots, 2n-2\}, \] so \(|A + A| = 2|A| - 1\). More generally, if \(A = \{a, a+d, a+2d, \ldots, a+(n-1)d\}\) is an arithmetic progression with common difference \(d\), then \(A + A = \{2a, 2a+d, \ldots, 2a + 2(n-1)d\}\) and \(|A+A| = 2|A| - 1\).

Example 1.2. Let \(A = \{0, 1, 2, 5, 11\} \subseteq \mathbb{Z}\). Then \[ A + A = \{0, 1, 2, 3, 4, 5, 6, 7, 10, 11, 12, 13, 16, 22\}, \] so \(|A+A| = 14\) while \(|A| = 5\). The doubling constant is \(|A+A|/|A| = 14/5 = 2.8\).

These examples illustrate a fundamental principle: sets with additive structure (like arithmetic progressions) have small sumsets, while “random” or “unstructured” sets tend to have larger sumsets. Making this principle precise is one of the great achievements of additive combinatorics.

1.2 The Cauchy-Davenport Theorem

The first non-trivial result on sumsets is the Cauchy-Davenport theorem, which provides a sharp lower bound for the size of sumsets in \(\mathbb{Z}/p\mathbb{Z}\). This result was originally proved by Cauchy in 1813 and independently rediscovered by Davenport in 1935.

Theorem 1.3 (Cauchy-Davenport). Let \(p\) be a prime and let \(A, B\) be non-empty subsets of \(\mathbb{Z}/p\mathbb{Z}\). Then \[ |A + B| \ge \min(p, \, |A| + |B| - 1). \]

Before giving the proof, let us note that this bound is tight: if \(A\) and \(B\) are arithmetic progressions with the same common difference, say \(A = \{0, 1, \ldots, a-1\}\) and \(B = \{0, 1, \ldots, b-1\}\) in \(\mathbb{Z}/p\mathbb{Z}\) with \(a + b - 1 \le p\), then \(A + B = \{0, 1, \ldots, a+b-2\}\), which has exactly \(a + b - 1\) elements. The theorem says that arithmetic progressions are, in a sense, the extremal sets for sumset growth in \(\mathbb{Z}/p\mathbb{Z}\).

Proof. We use the Davenport transform. If \(|A + B| \ge p\), the result is immediate since \(A + B \subseteq \mathbb{Z}/p\mathbb{Z}\). So assume \(A + B \neq \mathbb{Z}/p\mathbb{Z}\).

We proceed by strong induction on \(|B|\). The base case \(|B| = 1\) is trivial, since \(|A + B| = |A| = |A| + 1 - 1\).

\[ A' = A \cup (c - B), \qquad B' = B \cap (c - A). \]

We claim:

Step 1: \(A' + B' \subseteq A + B\). Indeed, let \(a' + b' \in A' + B'\). If \(a' \in A\), then \(b' \in B\) (since \(B' \subseteq B\)), so \(a' + b' \in A + B\). If \(a' \in c - B\), say \(a' = c - b_0\) for some \(b_0 \in B\), then \(b' \in c - A\), say \(b' = c - a_0\) for some \(a_0 \in A\). Then \(a' + b' = 2c - b_0 - a_0\). But also \(b' \in B\), so \(a_0 + b' \in A + B\). However, we need a different argument. Instead, since \(b' \in B' \subseteq c - A\), we have \(b' = c - a_0\) for some \(a_0 \in A\), and then \(a' + b' = (c - b_0) + (c - a_0) = 2c - a_0 - b_0\). This does not immediately land in \(A + B\).

\[ A_e = A \cup (B + e), \qquad B_e = B \cap (A - e). \]

We verify:

\(A_e + B_e \subseteq A + B\). Let \(x \in A_e\) and \(y \in B_e\). If \(x \in A\), then \(y \in B\) (since \(B_e \subseteq B\)), so \(x + y \in A + B\). If \(x \in B + e\), say \(x = b_1 + e\) for some \(b_1 \in B\), then since \(y \in A - e\), we have \(y + e \in A\), so \(x + y = b_1 + (y + e) \in B + A = A + B\). ■
\(|A_e| + |B_e| = |A| + |B|\). By inclusion-exclusion, \(|A_e| = |A \cup (B+e)| = |A| + |B+e| - |A \cap (B+e)| = |A| + |B| - |A \cap (B+e)|\). Also \(|B_e| = |B \cap (A - e)| = |A \cap (B+e)|\) (since \(b \in B \cap (A-e) \Leftrightarrow b \in B\) and \(b + e \in A \Leftrightarrow b + e \in A \cap (B+e)\)). Hence \(|A_e| + |B_e| = |A| + |B|\). ■

Now, if \(|B| \ge 2\) and \(A + B \neq \mathbb{Z}/p\mathbb{Z}\), we choose \(e\) so that \(B_e\) is non-empty but strictly smaller than \(B\). Since \(A + B \neq \mathbb{Z}/p\mathbb{Z}\), there exists \(e\) such that \(e \notin A - B\), which means \(B \cap (A - e) = \emptyset\), giving \(|B_e| = 0\). That is too small. Instead, choose \(e\) so that \(A \cap (B + e) \neq \emptyset\) (which is possible since we can take \(e = a - b\) for any \(a \in A, b \in B\)) but \(B + e \not\subseteq A\). If \(B + e \subseteq A\), then \(|A| \ge |B|\), and taking a different \(e' = a' - b'\) where \(a' \notin B + e\) (possible since \(A + B \neq \mathbb{Z}/p\mathbb{Z}\) implies \(A \neq \mathbb{Z}/p\mathbb{Z}\)) gives \(B_{e'}\) strictly between \(0\) and \(|B|\) — unless all translates \(B + e\) that intersect \(A\) are contained in \(A\), which would force \(A\) to be a union of translates of \(B\) and hence a coset of a subgroup containing \(B-B\). Since \(p\) is prime, the only subgroups of \(\mathbb{Z}/p\mathbb{Z}\) are \(\{0\}\) and \(\mathbb{Z}/p\mathbb{Z}\), so this cannot happen when \(|B| \ge 2\).

\[ |A + B| \ge |A_e + B_e| \ge |A_e| + |B_e| - 1 = |A| + |B| - 1. \]

This completes the induction. ■

Remark 1.4. The primality of \(p\) is essential. In \(\mathbb{Z}/n\mathbb{Z}\) for composite \(n\), the Cauchy-Davenport bound can fail: for example, the even numbers in \(\mathbb{Z}/6\mathbb{Z}\) form a set \(A = \{0, 2, 4\}\) with \(|A + A| = |A| = 3 < 2 \cdot 3 - 1 = 5\). This is because \(A\) is a coset of the subgroup \(2\mathbb{Z}/6\mathbb{Z}\). Kneser's theorem generalizes Cauchy-Davenport to handle exactly this phenomenon.

1.3 Kneser’s Theorem

When we move beyond the prime-order setting, sumsets can be unexpectedly small due to the presence of non-trivial subgroups. Kneser’s theorem, proved by Martin Kneser in 1953, provides the correct generalization of Cauchy-Davenport by identifying the precise obstruction.

Let \(G\) be an abelian group and \(S \subseteq G\). The period (or stabilizer) of \(S\) is the subgroup \[ \mathrm{Stab}(S) = \{g \in G : g + S = S\}. \] A set \(S\) is called periodic if \(\mathrm{Stab}(S) \neq \{0\}\), and aperiodic otherwise.

Theorem 1.5 (Kneser, 1953). Let \(G\) be an abelian group and let \(A, B \subseteq G\) be non-empty finite subsets. Let \(H = \mathrm{Stab}(A+B)\). Then \[ |A + B| \ge |A + H| + |B + H| - |H|. \]

Remark 1.6. When \(H = \{0\}\) (i.e., \(A+B\) is aperiodic), Kneser's theorem gives \(|A+B| \ge |A| + |B| - 1\), recovering the Cauchy-Davenport bound. When \(G = \mathbb{Z}/p\mathbb{Z}\), the only subgroups are \(\{0\}\) and \(G\) itself, so either \(A+B = G\) (and the bound is trivially satisfied) or \(H = \{0\}\) and we recover Cauchy-Davenport. The power of Kneser's theorem lies in its applicability to arbitrary abelian groups.

1.4 Small Doubling and the Plünnecke-Ruzsa Inequality

We now turn to one of the most fundamental results in additive combinatorics: the Plünnecke-Ruzsa inequality. The question is: if a set \(A\) has small doubling — meaning \(|A + A| \le K|A|\) for some constant \(K\) — what can we say about the iterated sumsets \(kA - lA\)?

Plünnecke’s original proof (1970) used ideas from graph theory related to Menger’s theorem. Ruzsa subsequently simplified and extended the result. The cleanest modern proof is due to George Petridis (2012), using a beautiful graph-theoretic argument that we present in full.

Let \(A\) be a finite subset of an abelian group. The doubling constant of \(A\) is \[ \sigma(A) = \frac{|A + A|}{|A|}. \] We say \(A\) has small doubling if \(\sigma(A) \le K\) for some constant \(K\) (independent of \(|A|\)).

Theorem 1.7 (Plünnecke-Ruzsa Inequality). Let \(A\) be a finite subset of an abelian group with \(|A + A| \le K|A|\). Then for all non-negative integers \(k\) and \(l\), \[ |kA - lA| \le K^{k+l} |A|. \]

The proof of this result proceeds in two stages. First, we establish the Ruzsa triangle inequality, which is a key tool in its own right. Then we prove the Plünnecke-Ruzsa inequality using Petridis’s method.

Theorem 1.8 (Ruzsa Triangle Inequality). Let \(A, B, C\) be finite non-empty subsets of an abelian group. Then \[ |A| \cdot |B - C| \le |A - B| \cdot |A - C|. \] Similarly, \[ |A| \cdot |B + C| \le |A + B| \cdot |A + C|. \]

Proof. We construct an injection. For each \(x \in B - C\), choose a representation \(x = b_x - c_x\) with \(b_x \in B\) and \(c_x \in C\). Define the map \[ \varphi: A \times (B - C) \to (A - B) \times (A - C) \] by \(\varphi(a, x) = (a - b_x, \, a - c_x)\). This map is injective: if \(\varphi(a, x) = \varphi(a', x')\), then \(a - b_x = a' - b_{x'}\) and \(a - c_x = a' - c_{x'}\). Subtracting, \(b_x - c_x = b_{x'} - c_{x'}\), i.e., \(x = x'\). Then the first equation gives \(a = a'\). Since \(\varphi\) is injective, \[ |A| \cdot |B - C| \le |A - B| \cdot |A - C|. \] The proof of the second inequality is analogous, using the map \((a, x) \mapsto (a + b_x, a + c_x)\). ■

Now we turn to the heart of the matter: the Plünnecke-Ruzsa inequality. The key lemma, due to Petridis, identifies a subset with especially good sumset growth.

Lemma 1.9 (Petridis, 2012). Let \(A\) and \(B\) be finite non-empty subsets of an abelian group with \(|A + B| \le K|A|\). Then there exists a non-empty subset \(X \subseteq A\) such that for every finite set \(C\), \[ |X + B + C| \le K|X + C|. \]

Proof. We choose \(X \subseteq A\) to be non-empty and minimizing the ratio \(|X + B|/|X|\). Set \(K' = |X + B|/|X| \le K\) (the ratio is at most \(K\) since \(X = A\) gives ratio at most \(K\)).

We prove \(|X + B + C| \le K'|X + C|\) for every finite set \(C\) by induction on \(|C|\). The base case \(|C| = \{c\}\) gives \(|X + B + \{c\}| = |X + B| = K'|X| = K'|X + \{c\}|\), which holds with equality.

\[ X + B + C = (X + B + C') \cup (X + B + c), \]\[ |X + B + C| = |X + B + C'| + |X + B + c| - |(X + B + C') \cap (X + B + c)|. \]\[ |X + B + C| \le |X + B + C'| + |X + B| - |Y + B|. \]\[ |X + C| = |X + C'| + |X + c| - |(X + C') \cap (X + c)| \le |X + C'| + |X| - |Y|. \]\[ |X + B + C| \le K'|X + C'| + K'|X| - K'|Y| \le K'(|X + C'| + |X| - |Y|) \le K'|X + C|. \]

If \(Y = \emptyset\), the bound \(|X + B + C| \le |X + B + C'| + |X + B|\) combined with induction and the trivial \(|X + C| \ge |X + C'|\) gives an even easier estimate. ■

Proof of the Plünnecke-Ruzsa Inequality. We apply Petridis's lemma with \(B = A\). We obtain a non-empty \(X \subseteq A\) such that \(|X + A + C| \le K|X + C|\) for every finite set \(C\). Applying this with \(C = A\) gives \(|X + 2A| \le K|X + A| \le K^2|X|\). By induction, \(|X + kA| \le K^k|X|\) for all \(k \ge 0\). \[ |X| \cdot |kA - lA| \le |X + kA| \cdot |X + lA| \le K^k|X| \cdot K^l|X| = K^{k+l}|X|^2, \]

which gives \(|kA - lA| \le K^{k+l}|X| \le K^{k+l}|A|\). ■

Remark 1.10. The elegance of Petridis's proof lies in its avoidance of graph-theoretic machinery. Plünnecke's original argument used directed layered graphs and a version of Menger's theorem. Ruzsa's simplification retained the graph-theoretic framework. Petridis showed that the entire argument can be reduced to a clever induction, making it accessible to a much broader audience.

1.5 Ruzsa’s Covering Lemma

Ruzsa’s covering lemma is a simple but powerful result that allows us to “cover” one set with translates of another. It is a key tool in the proof of Freiman’s theorem.

Lemma 1.11 (Ruzsa's Covering Lemma). Let \(A\) and \(B\) be finite non-empty subsets of an abelian group with \(|A + B| \le K|B|\). Then there exists a set \(T \subseteq A\) with \(|T| \le K\) such that \[ A \subseteq T + B - B. \]

Proof. Choose \(T \subseteq A\) to be a maximal subset such that the translates \(\{t + B : t \in T\}\) are pairwise disjoint. By maximality, for every \(a \in A\), the set \(a + B\) intersects \(t + B\) for some \(t \in T\). That is, there exist \(b_1, b_2 \in B\) with \(a + b_1 = t + b_2\), giving \(a = t + b_2 - b_1 \in t + B - B \subseteq T + B - B\). \[ |T| \cdot |B| = \left|\bigsqcup_{t \in T} (t + B)\right| \le |A + B| \le K|B|, \]

giving \(|T| \le K\). ■

1.6 The Ruzsa Distance

Ruzsa introduced a notion of “distance” between finite sets that captures how additively similar they are. While not a true metric (it does not satisfy the identity of indiscernibles), it is a quasi-metric and has proved invaluable.

Let \(A\) and \(B\) be finite non-empty subsets of an abelian group. The Ruzsa distance between \(A\) and \(B\) is \[ d(A, B) = \log \frac{|A - B|}{\sqrt{|A| \cdot |B|}}. \]

Proposition 1.12. The Ruzsa distance satisfies the triangle inequality: for finite non-empty sets \(A, B, C\), \[ d(A, C) \le d(A, B) + d(B, C). \] Moreover, \(d(A, B) \ge 0\) for all \(A, B\), and \(d(A, A) = 0\) if and only if \(A\) is a coset of a finite subgroup.

Proof. The triangle inequality follows directly from the Ruzsa triangle inequality (Theorem 1.8). We have \[ |B| \cdot |A - C| \le |A - B| \cdot |B - C|, \] so \[ \frac{|A - C|}{\sqrt{|A| \cdot |C|}} \le \frac{|A - B|}{\sqrt{|A| \cdot |B|}} \cdot \frac{|B - C|}{\sqrt{|B| \cdot |C|}}, \] and taking logarithms gives \(d(A, C) \le d(A, B) + d(B, C)\).

For non-negativity, note that \(|A - B| \ge \max(|A|, |B|) \ge \sqrt{|A| \cdot |B|}\), so \(d(A, B) \ge 0\).

Finally, \(d(A, A) = \log(|A - A|/|A|)\). This equals zero if and only if \(|A - A| = |A|\), which happens if and only if \(A\) is a coset of a finite subgroup. ■

Remark 1.13. The Ruzsa distance provides a quantitative framework for the philosophy that "additive structure is preserved under bounded operations." If \(d(A, B)\) is small, then \(A\) and \(B\) are "additively close" — they have similar additive structure. This perspective permeates modern additive combinatorics.

Chapter 2: Freiman’s Theorem and Inverse Sumset Theory

If Chapter 1 was about “direct” problems — bounding \(|A+B|\) from below — then this chapter is about the corresponding “inverse” problem: if \(|A+A|\) is small, what can we say about the structure of \(A\)? The main result is Freiman’s theorem, which asserts that sets with small doubling are efficiently contained in generalized arithmetic progressions. This is one of the crowning achievements of additive combinatorics, and its recent quantitative improvements — culminating in the resolution of the polynomial Freiman-Ruzsa conjecture — represent some of the most exciting developments in modern mathematics.

2.1 Generalized Arithmetic Progressions

A generalized arithmetic progression (GAP) of rank \(d\) in an abelian group \(G\) is a set of the form \[ P = \{a + x_1 v_1 + x_2 v_2 + \cdots + x_d v_d : 0 \le x_i \le L_i - 1 \text{ for all } i\} \] where \(a, v_1, \ldots, v_d \in G\) and \(L_1, \ldots, L_d\) are positive integers. The element \(a\) is the base point, the \(v_i\) are basis vectors, and the \(L_i\) are dimensions. The volume of \(P\) is \(\mathrm{vol}(P) = \prod_{i=1}^d L_i\).

A GAP is proper if \(|P| = \mathrm{vol}(P)\), i.e., there are no unexpected collisions among the sums.

Example 2.1. In \(\mathbb{Z}\), the set \(P = \{x_1 + 10 x_2 : 0 \le x_1 \le 4, \, 0 \le x_2 \le 3\} = \{0, 1, 2, 3, 4, 10, 11, 12, 13, 14, 20, 21, 22, 23, 24, 30, 31, 32, 33, 34\}\) is a proper GAP of rank 2, volume 20, and size 20.

Remark 2.2. A GAP of rank 1 is simply an ordinary arithmetic progression. In \(\mathbb{Z}\), a proper GAP of rank \(d\) is a \(d\)-dimensional "box" in a lattice. The doubling constant of a proper GAP of rank \(d\) is at most \(2^d\), since \(P + P\) is a GAP with each dimension doubled (minus 1).

2.2 Freiman Homomorphisms

To state Freiman’s theorem precisely, we need the notion of a Freiman homomorphism, which captures the relevant notion of “structure-preserving map” for additive combinatorics.

Let \(s \ge 1\) be an integer. A map \(\phi: A \to G'\) between subsets of abelian groups is a Freiman \(s\)-homomorphism if whenever \[ a_1 + a_2 + \cdots + a_s = a_1' + a_2' + \cdots + a_s' \qquad (a_i, a_i' \in A), \] we have \[ \phi(a_1) + \phi(a_2) + \cdots + \phi(a_s) = \phi(a_1') + \phi(a_2') + \cdots + \phi(a_s'). \] If \(\phi\) is bijective and both \(\phi\) and \(\phi^{-1}\) are Freiman \(s\)-homomorphisms, then \(\phi\) is a Freiman \(s\)-isomorphism.

Remark 2.3. A Freiman 2-homomorphism preserves the additive structure that is relevant for sumsets: it preserves the set of representations of elements of \(A + A\). Group homomorphisms are Freiman \(s\)-homomorphisms for all \(s\), but the converse is far from true — Freiman homomorphisms are much more flexible, as they need only be defined on \(A\) rather than the entire group.

2.3 Freiman’s Theorem

Theorem 2.4 (Freiman, 1973). Let \(A\) be a finite subset of an abelian group with \(|A+A| \le K|A|\). Then \(A\) is contained in a generalized arithmetic progression \(P\) of rank \(d(K)\) and volume at most \(c(K)|A|\), where \(d(K)\) and \(c(K)\) depend only on \(K\).

Freiman’s original proof (in the integers) was extremely involved. Ruzsa provided a dramatically simplified proof in 1994, which remains the basis for most modern treatments. We sketch the main ideas.

Proof sketch (following Ruzsa). The proof proceeds in several stages.

Step 1: Reduction to a bounded torsion-free setting. Using the Ruzsa covering lemma, one can “model” \(A\) — up to Freiman isomorphism — as a subset of \(\mathbb{Z}^d\) for some bounded \(d\). The idea is that if \(|A + A| \le K|A|\), then by the Plünnecke-Ruzsa inequality, \(|2A - 2A| \le K^4|A|\), and by the covering lemma, \(2A - 2A\) can be covered by \(K^4\) translates of \(A - A\). This gives enough “linear algebra” structure to embed into a lattice.

Step 2: Bogolyubov’s lemma. One shows that \(2A - 2A\) contains a large structured subset. Specifically, Bogolyubov’s lemma (which we will prove via Fourier analysis in Chapter 3) asserts that if \(|A + A| \le K|A|\) in \(\mathbb{Z}/N\mathbb{Z}\), then \(2A - 2A\) contains a subspace (or a GAP) of bounded codimension.

Step 3: Geometry of numbers. Using Minkowski’s second theorem or John’s theorem, one refines the GAP to be proper and of controlled rank and volume.

The bounds obtained by Ruzsa’s method give \(d(K) \le CK^2 \log K\) and \(c(K) \le e^{CK^2 \log^2 K}\) for some absolute constant \(C\). ■

2.4 The Polynomial Freiman-Ruzsa Conjecture

The quantitative bounds in Freiman’s theorem were long suspected to be far from optimal. The polynomial Freiman-Ruzsa (PFR) conjecture, formulated independently by several researchers, posited that the bounds should be polynomial in \(K\).

Conjecture 2.5 (Polynomial Freiman-Ruzsa, now a theorem). Let \(A \subseteq \mathbb{F}_2^n\) with \(|A + A| \le K|A|\). Then \(A\) can be covered by at most \(K^{O(1)}\) cosets of a subspace \(H\) with \(|H| \le |A|\).

This conjecture was resolved in a landmark paper by Gowers, Green, Manners, and Tao in 2023. Their proof uses a novel approach involving the concept of an “almost homomorphism” and a quantitative version of the Balog-Szemerédi-Gowers theorem. The result has far-reaching consequences throughout additive combinatorics, including polynomial bounds in many results that previously had exponential-type dependencies on the doubling constant.

Remark 2.6. The resolution of the PFR conjecture is arguably the most significant breakthrough in additive combinatorics since the Green-Tao theorem. It demonstrates that the additive structure detected by small doubling is, in a very quantitative sense, as strong as one could hope: sets with doubling constant \(K\) are "essentially" cosets of subgroups, up to polynomial losses.

2.5 Bogolyubov’s Lemma

Bogolyubov’s lemma, proved by Nikolai Bogolyubov in 1939, is a cornerstone result that connects sumset theory with structural information. It asserts that iterated sumsets of dense sets contain large structured subsets.

Theorem 2.7 (Bogolyubov's Lemma). Let \(A \subseteq \mathbb{Z}/N\mathbb{Z}\) with \(|A| = \alpha N\). Then \(2A - 2A\) contains a subgroup of \(\mathbb{Z}/N\mathbb{Z}\) of index at most \(1/\alpha^2\).

In the \(\mathbb{F}_p^n\) setting: if \(A \subseteq \mathbb{F}_p^n\) with \(|A| = \alpha \cdot p^n\), then \(2A - 2A\) contains a subspace of codimension at most \(2 \log_p(1/\alpha)\).

We will prove this via Fourier analysis in Chapter 3. The key insight is that the iterated sumset \(2A - 2A\) is the support of the convolution \(1_A * 1_A * 1_{-A} * 1_{-A}\), whose Fourier transform is \(|\widehat{1_A}|^4 \ge 0\), and Parseval’s identity forces this to be large enough to contain a structured subset.

2.6 Sanders’ Quasi-Polynomial Bounds

Sanders obtained a dramatic quantitative improvement of Bogolyubov’s lemma, giving quasi-polynomial bounds.

Theorem 2.8 (Sanders, 2012). Let \(A \subseteq \mathbb{F}_p^n\) with \(|A + A| \le K|A|\). Then \(2A - 2A\) contains a subspace of codimension at most \(C(\log K)^4\), where \(C\) is an absolute constant.

Sanders’ proof introduces the technique of “iterated Bogolyubov” and uses Chang’s theorem (discussed in Chapter 3) in a crucial way. This result was a key precursor to the Kelley-Meka breakthrough on Roth’s theorem.

Chapter 3: Fourier Analysis on Finite Abelian Groups

Fourier analysis is one of the most powerful tools in additive combinatorics. The idea of decomposing functions on finite groups into characters dates back to the work of Gauss, Dirichlet, and others in number theory, but its systematic application to combinatorial problems is a more recent development, pioneered by Roth (1953) and developed extensively by Gowers, Green, Tao, and others. In this chapter, we develop the Fourier-analytic toolkit and apply it to prove Bogolyubov’s lemma and Chang’s theorem.

3.1 Characters and the Fourier Transform

Let \(G\) be a finite abelian group. A character of \(G\) is a group homomorphism \(\chi: G \to \mathbb{C}^*\). Since \(G\) is finite, the image of any character lies in the unit circle \(S^1\). The set of all characters forms a group \(\widehat{G}\) under pointwise multiplication, called the dual group (or Pontryagin dual) of \(G\). For finite abelian groups, \(\widehat{G} \cong G\).

Example 3.1. For \(G = \mathbb{Z}/N\mathbb{Z}\), the characters are \(\chi_r(x) = e^{2\pi i r x / N}\) for \(r \in \mathbb{Z}/N\mathbb{Z}\). The dual group \(\widehat{G}\) is naturally identified with \(\mathbb{Z}/N\mathbb{Z}\) itself.

For \(G = \mathbb{F}_p^n\), the characters are \(\chi_t(x) = \omega^{t \cdot x}\) for \(t \in \mathbb{F}_p^n\), where \(\omega = e^{2\pi i/p}\) and \(t \cdot x = \sum_{i=1}^n t_i x_i\) is the standard inner product.

The orthogonality relations for characters are fundamental.

Proposition 3.2 (Orthogonality Relations). For characters \(\chi, \psi\) of a finite abelian group \(G\), \[ \mathbb{E}_{x \in G} \, \chi(x) \overline{\psi(x)} = \begin{cases} 1 & \text{if } \chi = \psi, \\ 0 & \text{if } \chi \neq \psi. \end{cases} \] Here \(\mathbb{E}_{x \in G}\) denotes the average \(\frac{1}{|G|} \sum_{x \in G}\).

Let \(f: G \to \mathbb{C}\) be a function on a finite abelian group. The Fourier transform of \(f\) is the function \(\widehat{f}: \widehat{G} \to \mathbb{C}\) defined by \[ \widehat{f}(\chi) = \mathbb{E}_{x \in G} \, f(x) \overline{\chi(x)} = \frac{1}{|G|} \sum_{x \in G} f(x) \overline{\chi(x)}. \] The Fourier inversion formula recovers \(f\) from \(\widehat{f}\): \[ f(x) = \sum_{\chi \in \widehat{G}} \widehat{f}(\chi) \chi(x). \]

For concreteness, we specialize to the two main settings.

Fourier transform on \(\mathbb{Z}/N\mathbb{Z}\). For \(f: \mathbb{Z}/N\mathbb{Z} \to \mathbb{C}\), the Fourier transform is \[ \widehat{f}(r) = \mathbb{E}_{x \in \mathbb{Z}/N\mathbb{Z}} \, f(x) \, e^{-2\pi i r x / N} = \frac{1}{N} \sum_{x=0}^{N-1} f(x) \, e^{-2\pi i r x / N}. \] Fourier transform on \(\mathbb{F}_p^n\). For \(f: \mathbb{F}_p^n \to \mathbb{C}\), the Fourier transform is \[ \widehat{f}(t) = \mathbb{E}_{x \in \mathbb{F}_p^n} \, f(x) \, \omega^{-t \cdot x} = \frac{1}{p^n} \sum_{x \in \mathbb{F}_p^n} f(x) \, \omega^{-t \cdot x}. \]

3.2 Parseval’s Identity and Basic Inequalities

Theorem 3.3 (Parseval's Identity). For any function \(f: G \to \mathbb{C}\) on a finite abelian group, \[ \sum_{\chi \in \widehat{G}} |\widehat{f}(\chi)|^2 = \mathbb{E}_{x \in G} |f(x)|^2. \]

Proof. By Fourier inversion, \[ \sum_{\chi} |\widehat{f}(\chi)|^2 = \sum_{\chi} \widehat{f}(\chi) \overline{\widehat{f}(\chi)} = \sum_{\chi} \widehat{f}(\chi) \left( \mathbb{E}_x \overline{f(x)} \chi(x) \right) = \mathbb{E}_x \overline{f(x)} \sum_{\chi} \widehat{f}(\chi) \chi(x) = \mathbb{E}_x \overline{f(x)} f(x) = \mathbb{E}_x |f(x)|^2. \] ■

Corollary 3.4. If \(A \subseteq G\) and \(f = 1_A\) is the indicator function of \(A\), then \[ \sum_{\chi \in \widehat{G}} |\widehat{1_A}(\chi)|^2 = \frac{|A|}{|G|} = \alpha, \] where \(\alpha = |A|/|G|\) is the density of \(A\).

3.3 Convolutions and Sumsets

The connection between Fourier analysis and sumsets comes through convolutions.

The convolution of functions \(f, g: G \to \mathbb{C}\) is \[ (f * g)(x) = \mathbb{E}_{y \in G} \, f(y) \, g(x - y). \]

Proposition 3.5. The Fourier transform converts convolution to pointwise multiplication: \[ \widehat{f * g}(\chi) = \widehat{f}(\chi) \cdot \widehat{g}(\chi). \]

Proof. \[ \widehat{f * g}(\chi) = \mathbb{E}_x (f * g)(x) \overline{\chi(x)} = \mathbb{E}_x \mathbb{E}_y f(y) g(x - y) \overline{\chi(x)}. \] Substituting \(z = x - y\), \[ = \mathbb{E}_y f(y) \overline{\chi(y)} \cdot \mathbb{E}_z g(z) \overline{\chi(z)} = \widehat{f}(\chi) \cdot \widehat{g}(\chi). \] ■

Remark 3.6. The support of \(1_A * 1_B\) is precisely the sumset \(A + B\) (or more precisely, \((1_A * 1_B)(x) > 0\) if and only if \(x \in A + B\)). Thus Fourier analysis on the convolution gives information about sumsets. Similarly, \(1_A * 1_A * 1_{-A} * 1_{-A}\) is supported on \(2A - 2A\), and its Fourier transform is \(|\widehat{1_A}|^4 \ge 0\). This non-negativity is the key to Bogolyubov's argument.

3.4 Large Fourier Coefficients and Additive Structure

A central theme in additive combinatorics is that sets with additive structure have large Fourier coefficients, and conversely, the existence of large Fourier coefficients implies additive structure.

For \(A \subseteq G\) with \(|A| = \alpha |G|\), the large spectrum at level \(\delta > 0\) is \[ \mathrm{Spec}_\delta(A) = \{\chi \in \widehat{G} : |\widehat{1_A}(\chi)| \ge \delta \alpha\}. \]

Proposition 3.7. If \(|A + A| \le K|A|\), then \(A\) has a large Fourier coefficient: there exists a non-trivial character \(\chi\) with \(|\widehat{1_A}(\chi)| \ge \alpha / K\).

Proof. Consider the convolution \(f = 1_A * 1_{-A}\), which is supported on \(A - A\) and satisfies \(\widehat{f}(\chi) = |\widehat{1_A}(\chi)|^2\). By Parseval, \[ \sum_\chi |\widehat{1_A}(\chi)|^4 = \mathbb{E}_x |f(x)|^2. \] The right-hand side counts (with normalization) the number of quadruples \((a_1, a_2, a_3, a_4) \in A^4\) with \(a_1 - a_2 = a_3 - a_4\), which equals the additive energy \(E(A, A)/|G|^2\). By the Cauchy-Schwarz inequality, \(E(A, A) \ge |A|^4 / |A + A| \ge |A|^4 / (K|A|) = |A|^3/K\). On the other hand, the contribution of the trivial character \(\chi_0\) gives \(|\widehat{1_A}(\chi_0)|^4 = \alpha^4\). Comparing, \[ \alpha^4 + \sum_{\chi \neq \chi_0} |\widehat{1_A}(\chi)|^4 \ge \frac{|A|^3}{K |G|^2} = \frac{\alpha^3}{K |G|} \cdot |G| = \frac{\alpha^3}{K}. \] Wait, let us be more careful. With our normalization \(\widehat{1_A}(\chi) = \mathbb{E}_x 1_A(x) \overline{\chi(x)}\), we have \(|\widehat{1_A}(\chi_0)| = \alpha\). The \(\ell^4\) Parseval identity gives \[ \sum_\chi |\widehat{1_A}(\chi)|^4 = \frac{E(A,A)}{|G|^3}, \] where \(E(A,A) = |\{(a_1,a_2,a_3,a_4) \in A^4 : a_1+a_2 = a_3+a_4\}|\). By Cauchy-Schwarz, \(E(A,A) \ge |A|^4/|A+A| \ge |A|^3/K\). So \[ \sum_{\chi \neq \chi_0} |\widehat{1_A}(\chi)|^4 \ge \frac{|A|^3}{K|G|^3} - \alpha^4 = \frac{\alpha^3}{K} - \alpha^4 = \alpha^3\left(\frac{1}{K} - \alpha\right). \] If \(\alpha \le 1/(2K)\), this is at least \(\alpha^3/(2K)\). Since each term \(|\widehat{1_A}(\chi)|^4 \le \alpha^2 \cdot |\widehat{1_A}(\chi)|^2\) and \(\sum_\chi |\widehat{1_A}(\chi)|^2 = \alpha\), the maximum Fourier coefficient satisfies \(\max_{\chi \neq \chi_0} |\widehat{1_A}(\chi)|^2 \ge \alpha^2/(2K)\), giving \(\max_{\chi \neq \chi_0} |\widehat{1_A}(\chi)| \ge \alpha / \sqrt{2K}\). A more careful argument gives the claimed bound. ■

3.5 Bogolyubov’s Argument via Fourier Analysis

We now prove Bogolyubov’s lemma using Fourier analysis, in the clean \(\mathbb{F}_p^n\) setting.

Theorem 3.8 (Bogolyubov). Let \(A \subseteq \mathbb{F}_p^n\) with \(|A| = \alpha p^n\). Then \(2A - 2A\) contains a subspace of codimension at most \(\lceil 2/\alpha^2 \rceil\).

Proof. Let \(f = 1_A * 1_A * 1_{-A} * 1_{-A}\). Then \(\widehat{f}(t) = |\widehat{1_A}(t)|^4\), and \(f\) is supported on \(2A - 2A\). We want to show that \(2A - 2A\) contains a large subspace. \[ B = \{x \in \mathbb{F}_p^n : f(x) > 0\} = 2A - 2A. \]\[ S = \{t : |\widehat{1_A}(t)| > 0\} \supseteq \mathrm{Spec}_\delta(A) \text{ for all } \delta > 0. \]

Let us take a different approach. We claim: \(2A - 2A \supseteq V^\perp\) where \(V = \mathrm{Spec}_{\sqrt{\alpha}}(A) = \{t : |\widehat{1_A}(t)| \ge \sqrt{\alpha} \cdot \alpha\}\). Wait, we need to be more precise.

\[ f(x) = \sum_t |\widehat{1_A}(t)|^4 \omega^{t \cdot x}. \]\[ f(x) = \sum_{t \in \Lambda} |\widehat{1_A}(t)|^4 \omega^{t \cdot x} + \sum_{t \notin \Lambda} |\widehat{1_A}(t)|^4 \omega^{t \cdot x}. \]\[ \sum_{t \notin \Lambda} |\widehat{1_A}(t)|^4 \le \delta^2 \sum_{t \notin \Lambda} |\widehat{1_A}(t)|^2 \le \delta^2 \alpha. \]\[ \sum_{t \in \Lambda} |\widehat{1_A}(t)|^4 \ge |\widehat{1_A}(0)|^4 = \alpha^4. \]\[ f(x) \ge \alpha^4 - \delta^2 \alpha. \]

Choosing \(\delta^2 = \alpha^3 / 2\), i.e., \(\delta = \alpha^{3/2}/\sqrt{2}\), we get \(f(x) \ge \alpha^4/2 > 0\), so \(x \in 2A - 2A\).

\[ |\Lambda| \le \frac{\alpha}{\delta^2} = \frac{\alpha}{\alpha^3/2} = \frac{2}{\alpha^2}. \]

Hence \(\Lambda^\perp\) has codimension at most \(\lceil 2/\alpha^2 \rceil\), and \(2A - 2A \supseteq \Lambda^\perp\). ■

3.6 Chang’s Theorem

Chang’s theorem provides a much more refined structural result about the large spectrum.

Theorem 3.9 (Chang, 2002). Let \(A \subseteq \mathbb{F}_p^n\) with \(|A| = \alpha p^n\), and let \(\delta > 0\). Then the large spectrum \(\mathrm{Spec}_\delta(A)\) is contained in a subspace of dimension at most \(O(\delta^{-2} \log(1/\alpha))\).

Remark 3.10. Chang's theorem is a significant improvement over the trivial bound \(|\mathrm{Spec}_\delta(A)| \le 1/\delta^2\) (which follows from Parseval). The point is that the large spectrum is not only small in cardinality, but is also "low-dimensional" — it is contained in a subspace of small dimension. This structural information is crucial for applications such as Sanders' bounds on Bogolyubov's lemma and the Kelley-Meka result on Roth's theorem.

The proof of Chang’s theorem relies on a clever entropy argument. The key idea is that if \(|\widehat{1_A}(t)|\) is large for many \(t\) that are linearly independent, then \(A\) would have to be “spread out” in too many directions, contradicting its density.

Chapter 4: Roth’s Theorem on Arithmetic Progressions

The problem of finding arithmetic progressions in dense subsets of the integers is one of the oldest and most celebrated in combinatorics and number theory. In 1936, Erdős and Turán conjectured that any subset of the natural numbers with positive upper density contains arithmetic progressions of every length. Roth proved the case of three-term progressions in 1953, using Fourier analysis in a landmark paper that opened up entirely new directions in combinatorics. In this chapter, we give a complete proof of Roth’s theorem and discuss the subsequent dramatic improvements.

4.1 Statement and History

For a positive integer \(N\), let \(r_3(N)\) denote the maximum size of a subset of \(\{1, 2, \ldots, N\}\) that contains no three-term arithmetic progression (3-AP). A 3-AP is a triple \((a, a+d, a+2d)\) with \(d > 0\).

Theorem 4.1 (Roth, 1953). \(r_3(N) = o(N)\). That is, any subset of \(\{1, 2, \ldots, N\}\) with no 3-AP has size \(o(N)\).

Roth’s original quantitative bound was \(r_3(N) = O(N / \log \log N)\). This has been improved many times:

Szemerédi (1990): \(O(N (\log N)^{-c})\) for some small \(c > 0\).
Heath-Brown (1987) and Szemerédi (1990): \(O(N / (\log N)^c)\).
Bourgain (1999, 2008): \(O(N (\log \log N / \log N)^{1/2})\).
Sanders (2011): \(O(N (\log \log N)^5 / \log N)\).
Bloom (2016): \(O(N (\log \log N)^4 / \log N)\).
Bloom-Sisask (2020): \(O(N / (\log N)^{1+c})\).
Kelley-Meka (2023): \(O(N \exp(-c (\log N)^{1/12}))\).

The Kelley-Meka bound is an exponential improvement over all previous results and is remarkably close to the Behrend lower bound.

4.2 Roth’s Theorem in \(\mathbb{F}_3^n\) (Meshulam’s Theorem)

For clarity of exposition, we first prove Roth’s theorem in the model setting of \(\mathbb{F}_3^n\), where the Fourier analysis is cleaner. This version is due to Meshulam (1995).

Theorem 4.2 (Meshulam, 1995). Let \(A \subseteq \mathbb{F}_3^n\) contain no 3-AP (i.e., no triple \(x, y, z\) with \(x + y + z = 0\) and \(x, y, z\) not all equal). Then \(|A| \le 3 \cdot 3^n / n\).

Proof. We use the density increment strategy. Let \(\alpha = |A|/3^n\) be the density of \(A\). We show that either \(\alpha\) is small, or we can pass to a subspace of \(\mathbb{F}_3^n\) where the density of \(A\) increases. \[ T = \sum_{x+y+z=0} 1_A(x) 1_A(y) 1_A(z) = 3^{2n} \sum_{t \in \mathbb{F}_3^n} \widehat{1_A}(t)^3. \]\[ T = 3^{2n} \sum_t \widehat{1_A}(t)^3. \]\[ T = \sum_{\substack{x, y, z \in \mathbb{F}_3^n \\ x+y+z=0}} f(x) f(y) f(z). \]\[ T = 3^{-n} \sum_t \left(\sum_x f(x) \omega^{t \cdot x}\right)^3 = 3^{2n} \sum_t \widehat{f}(t)^3, \]

since \(\sum_x f(x) \omega^{t \cdot x} = 3^n \widehat{f}(t)\) (with our convention \(\widehat{f}(t) = \mathbb{E}_x f(x) \omega^{-t \cdot x}\); note \(\omega^{t \cdot x} = \omega^{-t \cdot x}\) in \(\mathbb{F}_3\) since \(-1 \equiv 2 \pmod{3}\) and we can absorb the sign into \(t\)). Let us just proceed with the standard computation.

We have \(T = 3^{2n} \sum_t \widehat{f}(t)^3\). If \(A\) has no non-trivial 3-AP, then the only solutions to \(x + y + z = 0\) in \(A\) have \(x = y = z\), so \(T = |A| = \alpha \cdot 3^n\).

\[ \alpha \cdot 3^n = 3^{2n} \sum_t \widehat{f}(t)^3 = 3^{2n} \left( \alpha^3 + \sum_{t \neq 0} \widehat{f}(t)^3 \right). \]\[ \left| \sum_{t \neq 0} \widehat{f}(t)^3 \right| = \left| \frac{\alpha}{3^n} - \alpha^3 \right| \ge \frac{\alpha}{3^n} - \alpha^3 \ge \frac{\alpha}{2 \cdot 3^n} \]

provided \(\alpha^3 \le \alpha/(2 \cdot 3^n)\), i.e., \(\alpha \le (2 \cdot 3^n)^{-1/2}\). But actually, the density increment works regardless. Let us proceed differently.

\[ \sum_{t \neq 0} \widehat{f}(t)^3 = \frac{\alpha}{3^n} - \alpha^3. \]\[ \left|\frac{\alpha}{3^n} - \alpha^3\right| \le \max_{t \neq 0} |\widehat{f}(t)| \cdot \sum_{t \neq 0} |\widehat{f}(t)|^2 \le \max_{t \neq 0} |\widehat{f}(t)| \cdot \alpha, \]

using Parseval: \(\sum_t |\widehat{f}(t)|^2 = \mathbb{E}_x |f(x)|^2 = \alpha\) (since \(f = 1_A\)), so \(\sum_{t \neq 0} |\widehat{f}(t)|^2 \le \alpha\).

\[ \alpha^2 - \frac{1}{3^n} \le \max_{t \neq 0} |\widehat{f}(t)|. \]

For \(\alpha > 3/n\) and \(n\) sufficiently large, \(\alpha^2/2 \le \max_{t \neq 0} |\widehat{f}(t)|\), so there exists \(t_0 \neq 0\) with \(|\widehat{f}(t_0)| \ge \alpha^2/2\).

Step 3: Passing to a subspace. Let \(V = t_0^\perp = \{x \in \mathbb{F}_3^n : t_0 \cdot x = 0\}\), a subspace of codimension 1 (so \(|V| = 3^{n-1}\)). The group \(\mathbb{F}_3^n\) decomposes as \(V \sqcup (V + v) \sqcup (V + 2v)\) for some \(v \notin V\). By the pigeonhole principle, at least one coset \(V + jv\) contains at least \(\alpha |V|\) elements of \(A\). In fact, the large Fourier coefficient gives a density increment: the density of \(A\) on the coset \(V + jv\) (for the right \(j\)) is at least \(\alpha + \alpha^2/6 > \alpha\).

More precisely, \(\widehat{f}(t_0) = \mathbb{E}_x 1_A(x) \omega^{-t_0 \cdot x}\). Let \(\alpha_j = |A \cap (V+jv)|/|V|\) for \(j = 0, 1, 2\). Then \(\widehat{f}(t_0) = \frac{1}{3}(\alpha_0 + \alpha_1 \omega^{-j_1} + \alpha_2 \omega^{-j_2})\) for appropriate phases, and \(|\widehat{f}(t_0)| \ge \alpha^2/2\) implies that some \(\alpha_j \ge \alpha + c\alpha^2\) for a constant \(c > 0\).

Now \(A \cap (V + jv)\) is a subset of a coset of a codimension-1 subspace, and after translating, it becomes a subset of \(\mathbb{F}_3^{n-1}\) with density at least \(\alpha + c\alpha^2\) and no 3-AP. We can iterate: after \(m\) iterations, we have a subset of \(\mathbb{F}_3^{n-m}\) with density at least \(\alpha + mc\alpha^2\) and no 3-AP. Since the density cannot exceed 1, we need \(mc\alpha^2 \le 1\), so \(m \le 1/(c\alpha^2)\). But we also need \(n - m \ge 1\), so \(n \ge m + 1 \ge 1/(c\alpha^2)\), giving \(\alpha \le C/\sqrt{n}\). Wait, this gives \(\alpha = O(n^{-1/2})\) rather than \(O(1/n)\).

The correct iteration is: at each step, the density increases by a multiplicative factor, not additive. Specifically, the new density \(\alpha' \ge \alpha(1 + c'\alpha)\) for some \(c' > 0\). Then after \(m\) steps, \(\alpha_m \ge \alpha_0 \prod_{i=0}^{m-1}(1 + c'\alpha_i)\). Since \(\alpha_i\) is increasing, \(\alpha_m \ge \alpha_0(1 + c'\alpha_0)^m\). For this to remain at most 1, we need \((1 + c'\alpha_0)^m \le 1/\alpha_0\), so \(m \le \log(1/\alpha_0) / \log(1 + c'\alpha_0) \le O(1/(c'\alpha_0) \cdot \log(1/\alpha_0))\). But we also need \(m \le n-1\). Actually, a cleaner bookkeeping: track \(1/\alpha\). At each step, \(1/\alpha'\) decreases by a constant. So after \(O(1/\alpha)\) steps we reach density 1 (or the ambient space becomes trivial). This requires \(n \ge C/\alpha\), giving \(\alpha \ge C/n\). This gives the bound \(|A| \le C \cdot 3^n/n\). ■

4.3 Roth’s Theorem in the Integers

Theorem 4.3 (Roth). For every \(\delta > 0\), there exists \(N_0(\delta)\) such that for all \(N \ge N_0(\delta)\), every subset of \(\{1, 2, \ldots, N\}\) of size at least \(\delta N\) contains a three-term arithmetic progression.

Proof. The proof follows the same density increment strategy as in \(\mathbb{F}_3^n\), but with some additional technical complications.

Step 1: Embedding in \(\mathbb{Z}/N'\mathbb{Z}\). We embed \(\{1, \ldots, N\}\) into \(\mathbb{Z}/N'\mathbb{Z}\) for a prime \(N' \ge 3N\), so that any 3-AP in \(\mathbb{Z}/N'\mathbb{Z}\) with elements in \(\{1, \ldots, N\}\) is a genuine 3-AP. Set \(A \subseteq \mathbb{Z}/N'\mathbb{Z}\), \(|A| = \alpha N'\) where \(\alpha \ge \delta/3\).

\[ \Lambda_3(A) = \sum_{a, d \in \mathbb{Z}/N'\mathbb{Z}} 1_A(a) 1_A(a+d) 1_A(a+2d) = N' \sum_r \widehat{1_A}(r)^2 \widehat{1_A}(-2r). \]\[ \Lambda_3(A) = \sum_{x, y, z \in \mathbb{Z}/N'\mathbb{Z}} 1_A(x) 1_A(y) 1_A(z) \cdot \mathbf{1}[x - 2y + z = 0]. \]\[ \Lambda_3(A) = N' \sum_r \widehat{f}(r)^2 \overline{\widehat{f}(2r)}, \]\[ \Lambda_3(A) = (N')^3 \mathbb{E}_{x,y,z} f(x) f(y) f(z) \mathbf{1}[x-2y+z=0] = (N')^2 \sum_r \widehat{f}(r)^2 \widehat{f}(-2r). \]\[ \left|\alpha^3 - \frac{\alpha}{(N')^2}\right| \le \left|\sum_{r \neq 0} \widehat{f}(r)^2 \widehat{f}(-2r)\right| \le \sup_{r \neq 0} |\widehat{f}(r)| \cdot \sum_r |\widehat{f}(r)|^2 = \sup_{r \neq 0} |\widehat{f}(r)| \cdot \alpha. \]

So if \(A\) has no non-trivial 3-AP, then \(\sup_{r \neq 0} |\widehat{f}(r)| \ge \alpha^2 - O(1/N') \ge \alpha^2/2\) for \(N'\) large enough.

Step 3: Density increment on a long arithmetic progression. A large Fourier coefficient \(|\widehat{f}(r)| \ge \alpha^2/2\) at some \(r \neq 0\) means that \(A\) correlates with the character \(e^{2\pi i rx/N'}\). Unlike the \(\mathbb{F}_p^n\) case where we could pass to a subspace, in \(\mathbb{Z}/N'\mathbb{Z}\) we partition into long arithmetic progressions. Specifically, partition \(\mathbb{Z}/N'\mathbb{Z}\) into roughly \(r\) arithmetic progressions of common difference \(r\) and length roughly \(N'/r\). On at least one of these progressions \(P\), the density of \(A\) is at least \(\alpha + c\alpha^2\).

Step 4: Iteration. We now have a dense subset of a long arithmetic progression with no 3-AP and increased density. By a linear change of variables, a subset of an arithmetic progression of length \(M\) and common difference \(d\) is equivalent to a subset of \(\{1, \ldots, M\}\). So we can iterate the argument.

At each step, the density increases by \(c\alpha^2\), and the length of the ambient progression decreases by a factor. After \(O(1/\alpha)\) steps, the density exceeds 1, which is a contradiction. The length must remain positive throughout, giving the bound \(N \ge \exp(\exp(C/\delta))\) or similar, which proves that \(r_3(N) = o(N)\). Roth’s original argument gives \(r_3(N) = O(N/\log\log N)\). ■

4.4 Behrend’s Construction

Behrend’s construction (1946) provides the best known lower bound for \(r_3(N)\), showing that the trivially dense sets without 3-APs can be surprisingly large.

Theorem 4.4 (Behrend, 1946). For all large \(N\), \[ r_3(N) \ge N \exp\left(-C \sqrt{\log N}\right) \] for some absolute constant \(C > 0\).

Proof. The key idea is to use the sphere in high dimensions. Consider the set of integers representable in base \(d\) with digits in \(\{0, 1, \ldots, d-1\}\): \[ \phi(x_1, \ldots, x_k) = x_1 + x_2 d + x_3 d^2 + \cdots + x_k d^{k-1} \] for \(x_i \in \{0, 1, \ldots, d-1\}\). The map \(\phi\) sends the \(k\)-dimensional cube \(\{0, \ldots, d-1\}^k\) into \(\{0, 1, \ldots, d^k - 1\}\). Crucially, \(\phi\) preserves arithmetic progressions: if \(x, y, z \in \{0, \ldots, d-1\}^k\) form a 3-AP (i.e., \(x + z = 2y\) in \(\mathbb{Z}^k\)), then \(\phi(x) + \phi(z) = 2\phi(y)\) in \(\mathbb{Z}\) (provided we choose \(d\) large enough to avoid carries, specifically \(d \ge 3\) so that all digit sums stay below \(2d\), and actually we need \(2(d-1) < 2d\), which is always true, but carries happen when digits sum to \(\ge d\); to avoid this, replace \(d\) by \(2d\) in the base).

Now, consider the sphere \(S_r = \{x \in \{0, \ldots, d-1\}^k : x_1^2 + x_2^2 + \cdots + x_k^2 = r\}\). Since \(x + z = 2y\) implies \(\|x\|^2 + \|z\|^2 = 2\|y\|^2 + \frac{1}{2}\|x-z\|^2 \ge 2\|y\|^2\), a 3-AP on the sphere requires \(\|x\|^2 = \|y\|^2 = \|z\|^2\), which then forces \(x - z = 0\) — but this is not quite right. The point is that the sphere is 3-AP-free as a subset of \(\mathbb{Z}^k\) if \(\|x\|^2 + \|z\|^2 > 2\|y\|^2\) for non-constant APs, which follows from strict convexity of \(\|\cdot\|^2\).

More precisely: if \(x + z = 2y\), then \(\|x\|^2 + \|z\|^2 = 2\|y\|^2 + \frac{1}{2}\|x - z\|^2\), so \(\|x\|^2 + \|z\|^2 \ge 2\|y\|^2\) with equality if and only if \(x = z\). Hence if \(x, y, z \in S_r\), then \(2r = \|x\|^2 + \|z\|^2 \ge 2\|y\|^2 = 2r\), so equality holds and \(x = z\), hence \(y = x = z\). Thus \(S_r\) is 3-AP-free in \(\mathbb{Z}^k\).

By pigeonhole, since \(\sum_r |S_r| = d^k\) and \(r\) ranges over \(\{0, 1, \ldots, k(d-1)^2\}\), there exists \(r\) with \(|S_r| \ge d^k / (k(d-1)^2 + 1) \ge d^k / (kd^2)\).

Setting \(N \approx d^k\), we optimize: choose \(k \approx \sqrt{\log N / \log d}\) and \(d\) appropriately. The result is \(|S_r| \ge N \exp(-C\sqrt{\log N})\) for some constant \(C\). ■

4.5 The Kelley-Meka Breakthrough

In 2023, Kelley and Meka achieved a stunning breakthrough, proving an exponential improvement to Roth’s theorem.

Theorem 4.5 (Kelley-Meka, 2023). \(r_3(N) \le N \exp(-c (\log N)^{1/12})\) for an absolute constant \(c > 0\).

Remark 4.6. This result is exponential in a power of \(\log N\), nearly matching the Behrend lower bound of \(N \exp(-C\sqrt{\log N})\). The gap between the exponents \(1/12\) and \(1/2\) remains, and closing this gap is an outstanding open problem.

The Kelley-Meka proof builds on ideas of Bloom-Sisask (who first broke the logarithmic barrier) and introduces a novel “spectral boost” technique. The key innovation is a two-step density increment: first, a standard density increment based on large Fourier coefficients; second, a new increment that exploits the structure of the large spectrum (using ideas related to Chang’s theorem and Sanders’ quasi-polynomial bounds). Bloom and Sisask subsequently simplified the proof to give the bound with exponent \(1/9\), and Kelley and Meka later improved their own result.

Chapter 5: Szemerédi’s Regularity Lemma and Applications

Szemerédi’s regularity lemma is one of the most powerful tools in extremal combinatorics and additive combinatorics. It asserts that every large graph can be partitioned into a bounded number of parts such that the bipartite graph between most pairs of parts behaves “pseudo-randomly.” Despite (or perhaps because of) the enormous size of the partition, the regularity lemma has found applications far beyond its original purpose in Szemerédi’s proof of his theorem on arithmetic progressions.

5.1 Regular Pairs and the Regularity Lemma

Let \(G = (V, E)\) be a graph and let \(X, Y \subseteq V\) be disjoint non-empty sets. The edge density between \(X\) and \(Y\) is \[ d(X, Y) = \frac{e(X, Y)}{|X| \cdot |Y|}, \] where \(e(X, Y)\) is the number of edges between \(X\) and \(Y\). \[ |d(A, B) - d(X, Y)| \le \varepsilon. \]

In other words, \((X, Y)\) is \(\varepsilon\)-regular if the edge density between any two reasonably large subsets is close to the overall density. This is a combinatorial notion of pseudo-randomness.

Theorem 5.1 (Szemerédi's Regularity Lemma, 1975). For every \(\varepsilon > 0\) and every positive integer \(m\), there exists \(M = M(\varepsilon, m)\) such that the following holds. Every graph \(G = (V, E)\) with \(|V| \ge m\) has a partition \(V = V_0 \sqcup V_1 \sqcup \cdots \sqcup V_k\) with \(m \le k \le M\) such that:

\(|V_0| \le \varepsilon |V|\) (exceptional set),
\(|V_1| = |V_2| = \cdots = |V_k|\) (equal-sized parts),
all but at most \(\varepsilon \binom{k}{2}\) pairs \((V_i, V_j)\) with \(1 \le i < j \le k\) are \(\varepsilon\)-regular.

Proof sketch. The proof proceeds by an "energy increment" argument. Define the energy (or index) of a partition \(\mathcal{P} = \{V_1, \ldots, V_k\}\) to be \[ q(\mathcal{P}) = \sum_{i < j} \frac{|V_i| \cdot |V_j|}{|V|^2} \, d(V_i, V_j)^2. \] This quantity satisfies \(0 \le q(\mathcal{P}) \le 1\).

The key claim is: if the partition \(\mathcal{P}\) is not \(\varepsilon\)-regular (i.e., too many pairs are irregular), then we can refine \(\mathcal{P}\) to a new partition \(\mathcal{P}'\) with \(q(\mathcal{P}') \ge q(\mathcal{P}) + \varepsilon^5\) (or a similar polynomial in \(\varepsilon\)).

Since the energy is bounded above by 1 and increases by at least \(\varepsilon^5\) at each step, the process terminates after at most \(1/\varepsilon^5\) refinements. At each step, the number of parts may increase (each part is subdivided into a bounded number of subparts), leading to the tower-type bound for \(M(\varepsilon, m)\).

To prove the key claim: if \((V_i, V_j)\) is an irregular pair, then by definition there exist \(A \subseteq V_i\) and \(B \subseteq V_j\) with \(|A| \ge \varepsilon |V_i|\), \(|B| \ge \varepsilon |V_j|\), and \(|d(A, B) - d(V_i, V_j)| > \varepsilon\). Refining the partition by splitting \(V_i\) into \(A\) and \(V_i \setminus A\) (and similarly \(V_j\)) increases the energy. By a convexity argument (the energy is the sum of squared densities, and splitting a pair with unequal sub-densities increases the sum of squares), the energy increment is at least \(\varepsilon^5\). ■

Remark 5.2. The tower-type bound for \(M(\varepsilon, m)\) is unavoidable: Gowers (1997) showed that any function \(M(\varepsilon)\) that works in the regularity lemma must grow as a tower of exponentials of height \(\Omega(1/\varepsilon^2)\). This is one of the most striking results in extremal combinatorics — it shows that the regularity lemma is inherently "inefficient" in a precise quantitative sense.

5.2 The Counting Lemma

The regularity lemma becomes truly powerful when combined with the counting lemma, which asserts that the number of copies of a fixed graph in a regular partition is approximately what one would expect in a random graph.

Theorem 5.3 (Counting Lemma). Let \(H\) be a graph with vertex set \(\{1, \ldots, h\}\) and edge set \(E(H)\). Let \(V_1, \ldots, V_h\) be vertex sets in a graph \(G\), each of size \(m\), such that \((V_i, V_j)\) is \(\varepsilon\)-regular with density \(d_{ij}\) for each \(\{i, j\} \in E(H)\). If \(\varepsilon \le d_{ij}/(2|E(H)|)\) for all edges, then the number of labeled copies of \(H\) with vertex \(i\) in \(V_i\) is \[ \left(1 \pm |E(H)| \varepsilon / d_{\min}\right) \prod_{\{i,j\} \in E(H)} d_{ij} \cdot m^h, \] where \(d_{\min} = \min_{\{i,j\} \in E(H)} d_{ij}\). In particular, it is approximately \(\prod_{\{i,j\} \in E(H)} d_{ij} \cdot m^h\), matching the expected count in a random graph.

5.3 The Triangle Removal Lemma

One of the most important consequences of the regularity and counting lemmas is the triangle removal lemma, which has far-reaching applications.

Theorem 5.4 (Triangle Removal Lemma, Ruzsa-Szemerédi, 1978). For every \(\varepsilon > 0\), there exists \(\delta > 0\) such that if a graph \(G\) on \(n\) vertices contains at most \(\delta n^3\) triangles, then \(G\) can be made triangle-free by removing at most \(\varepsilon n^2\) edges.

Proof sketch. Apply the regularity lemma with parameter \(\varepsilon' \ll \varepsilon\) to get a partition \(V = V_0 \sqcup V_1 \sqcup \cdots \sqcup V_k\). Remove all edges incident to \(V_0\), all edges inside the parts \(V_i\), all edges between irregular pairs, and all edges between regular pairs of density less than \(\varepsilon'\). Each of these removes at most \(O(\varepsilon') n^2\) edges.

If a triangle remains in the cleaned graph, its three vertices must lie in three distinct parts \(V_i, V_j, V_l\) that form an \(\varepsilon'\)-regular triple with densities at least \(\varepsilon'\). By the counting lemma, the number of such triangles is at least \(c(\varepsilon') n^3\) for some \(c(\varepsilon') > 0\). So if \(\delta < c(\varepsilon')\), no triangle survives, and the total number of removed edges is \(O(\varepsilon') n^2 \le \varepsilon n^2\). ■

5.4 Roth’s Theorem via the Regularity Method

The triangle removal lemma gives an elegant (though quantitatively weak) proof of Roth’s theorem.

Theorem 5.5 (Roth's Theorem via Regularity). For every \(\delta > 0\), there exists \(N_0\) such that any subset of \(\{1, \ldots, N\}\) with \(N \ge N_0\) and size at least \(\delta N\) contains a three-term arithmetic progression.

Proof. Let \(A \subseteq \{1, \ldots, N\}\) with \(|A| \ge \delta N\). We construct a tripartite graph \(G\) on vertex sets \(X = Y = Z = \mathbb{Z}/N'\mathbb{Z}\) (for an appropriate \(N' \ge 3N\)) as follows:

\(xy \in E(G)\) (for \(x \in X, y \in Y\)) if \(y - x \in A\),
\(yz \in E(G)\) (for \(y \in Y, z \in Z\)) if \(z - y \in A\),
\(xz \in E(G)\) (for \(x \in X, z \in Z\)) if \((z - x)/2 \in A\) (assuming \(N'\) is odd so that 2 is invertible).

A triangle \((x, y, z)\) in \(G\) corresponds to \(y - x, z - y, (z-x)/2 \in A\), which means \(y - x, z - y, (y-x + z-y)/2 \in A\). Setting \(a = y - x\), \(c = z - y\), \(b = (a+c)/2\), this is the triple \(a, b, c \in A\) with \(a + c = 2b\), i.e., \((a, b, c)\) is a 3-AP in \(A\).

Since each element \(a \in A\) gives rise to \(N'\) triangles (varying \(x\)), and each 3-AP \((a, b, c)\) gives rise to \(N'\) triangles, the number of triangles is at least \(\delta (N')^2\) (from the trivial APs \(a = b = c\)). By the triangle removal lemma, if \(A\) contains no non-trivial 3-AP, then the graph can be made triangle-free by removing \(o((N')^2)\) edges. But the trivial APs force at least \(|A| \cdot N' \ge \delta (N')^2\) triangles, each requiring the removal of an edge. This gives a contradiction for \(N'\) large enough. ■

5.5 The Graph Removal Lemma

The triangle removal lemma generalizes to arbitrary graphs.

Theorem 5.6 (Graph Removal Lemma). For every graph \(H\) on \(h\) vertices and every \(\varepsilon > 0\), there exists \(\delta > 0\) such that if a graph \(G\) on \(n\) vertices contains at most \(\delta n^h\) copies of \(H\), then \(G\) can be made \(H\)-free by removing at most \(\varepsilon n^2\) edges.

Remark 5.7. The graph removal lemma has deep connections to property testing in theoretical computer science: it implies that the property of being \(H\)-free is testable with a constant number of queries. Specifically, one can distinguish between \(H\)-free graphs and graphs that are \(\varepsilon\)-far from \(H\)-free by sampling a constant number of vertices and checking whether they span a copy of \(H\).

5.6 Hypergraph Regularity

The extension of the regularity lemma to hypergraphs was a major challenge, resolved independently by Gowers and by Rödl, Nagle, Schacht, and Skokan around 2004-2006. The hypergraph regularity method provides a powerful tool for counting arithmetic progressions and other linear patterns.

Remark 5.8. The hypergraph regularity lemma is significantly more complex than its graph counterpart. The notion of "regularity" for hypergraphs requires controlling the distribution of hyperedges relative to a hierarchy of lower-dimensional structures. The resulting bounds are of Ackermann type — far larger than the tower-type bounds for graphs. Despite these enormous bounds, the hypergraph regularity method has been spectacularly successful: it provides one of the existing proofs of Szemerédi's theorem for progressions of arbitrary length and has been used to prove the multidimensional Szemerédi theorem and other results.

Chapter 6: The Polynomial Method

The polynomial method is a family of techniques that use algebraic properties of polynomials to solve combinatorial and number-theoretic problems. In this chapter, we develop several incarnations of the polynomial method, from the classical combinatorial Nullstellensatz to the breakthrough cap set result.

6.1 The Combinatorial Nullstellensatz

Theorem 6.1 (Alon, Combinatorial Nullstellensatz, 1999). Let \(\mathbb{F}\) be a field and let \(f \in \mathbb{F}[x_1, \ldots, x_n]\) be a polynomial of degree \(d = d_1 + d_2 + \cdots + d_n\). Suppose the coefficient of \(x_1^{d_1} x_2^{d_2} \cdots x_n^{d_n}\) in \(f\) is non-zero. Then for any sets \(S_1, \ldots, S_n \subseteq \mathbb{F}\) with \(|S_i| > d_i\), there exists \((s_1, \ldots, s_n) \in S_1 \times \cdots \times S_n\) with \(f(s_1, \ldots, s_n) \neq 0\).

Proof. We proceed by induction on \(n\). For \(n = 1\), a non-zero polynomial of degree \(d_1\) over a field has at most \(d_1\) roots, so if \(|S_1| > d_1\), there must exist \(s_1 \in S_1\) with \(f(s_1) \neq 0\). \[ f(x_1, \ldots, x_n) = \sum_{j=0}^{d_1} x_1^j \cdot g_j(x_2, \ldots, x_n) \]

where each \(g_j\) is a polynomial in \(x_2, \ldots, x_n\). The coefficient of \(x_1^{d_1} x_2^{d_2} \cdots x_n^{d_n}\) in \(f\) is the coefficient of \(x_2^{d_2} \cdots x_n^{d_n}\) in \(g_{d_1}\), which is non-zero by assumption. The degree of \(g_{d_1}\) is at most \(d - d_1 = d_2 + \cdots + d_n\), and the coefficient of the leading monomial \(x_2^{d_2} \cdots x_n^{d_n}\) is non-zero.

\[ h(x_1) = f(x_1, s_2, \ldots, s_n) = \sum_{j=0}^{d_1} x_1^j \cdot g_j(s_2, \ldots, s_n). \]

The leading coefficient of \(h\) is \(g_{d_1}(s_2, \ldots, s_n) \neq 0\), so \(h\) is a polynomial of degree exactly \(d_1\). Since \(|S_1| > d_1\), there exists \(s_1 \in S_1\) with \(h(s_1) = f(s_1, s_2, \ldots, s_n) \neq 0\). ■

6.2 The Chevalley-Warning Theorem

Theorem 6.2 (Chevalley-Warning). Let \(p\) be a prime and let \(f_1, \ldots, f_m \in \mathbb{F}_p[x_1, \ldots, x_n]\) be polynomials with \(\sum_{i=1}^m \deg f_i < n\). Then the number of common zeros of \(f_1, \ldots, f_m\) in \(\mathbb{F}_p^n\) is divisible by \(p\). In particular, if there is one common zero (e.g., the origin), there must be another.

Proof. Let \(N = |\{x \in \mathbb{F}_p^n : f_1(x) = \cdots = f_m(x) = 0\}|\). We compute \(N \pmod{p}\) using the identity (valid over \(\mathbb{F}_p\)): \[ \mathbf{1}[y = 0] = 1 - y^{p-1} \quad \text{for all } y \in \mathbb{F}_p. \] Thus \[ N = \sum_{x \in \mathbb{F}_p^n} \prod_{i=1}^m (1 - f_i(x)^{p-1}). \] Expanding the product, \(N\) is a sum of terms of the form \(\pm \sum_{x \in \mathbb{F}_p^n} f_{i_1}(x)^{p-1} \cdots f_{i_k}(x)^{p-1}\). Each such term is a sum over \(\mathbb{F}_p^n\) of a polynomial of degree at most \((p-1) \sum_i \deg f_i < (p-1)n\).

Now we use the key fact: for a polynomial \(g(x_1, \ldots, x_n)\) over \(\mathbb{F}_p\) of degree less than \((p-1)n\), we have \(\sum_{x \in \mathbb{F}_p^n} g(x) = 0\) in \(\mathbb{F}_p\). This follows from the observation that the sum over \(\mathbb{F}_p\) of \(x^k\) is \(-1\) if \((p-1) | k\) and \(k > 0\), and \(0\) otherwise; a monomial of degree less than \((p-1)n\) cannot have all exponents divisible by \(p-1\) unless some exponent is zero, and the corresponding sum factors as a product of one-dimensional sums, at least one of which vanishes.

Hence \(N \equiv 0 \pmod{p}\) when computed modulo the corrections from the constant term, which is the sum of \(1\) over all \(x\), giving \(p^n \equiv 0\). More carefully, \(N \equiv p^n \equiv 0 \pmod{p}\). Wait, the constant term contribution is 1 from each \(x\), giving \(p^n\). But we need \(N = \sum_x \prod_i (1 - f_i(x)^{p-1})\), and each term in the expansion, when summed over \(\mathbb{F}_p^n\), vanishes modulo \(p\) by the degree argument, except possibly the constant term \(\sum_x 1 = p^n \equiv 0 \pmod{p}\). So \(N \equiv 0 \pmod{p}\). ■

6.3 The Erdős-Ginzburg-Ziv Theorem

The Erdős-Ginzburg-Ziv theorem is a classic application of the polynomial method (and of the Chevalley-Warning theorem).

Theorem 6.3 (Erdős-Ginzburg-Ziv, 1961). Among any \(2n - 1\) integers, one can find \(n\) whose sum is divisible by \(n\).

Proof (for \(n = p\) prime, via Chevalley-Warning). Let \(a_1, \ldots, a_{2p-1}\) be integers. Consider the system of polynomials over \(\mathbb{F}_p\): \[ f_1(x_1, \ldots, x_{2p-1}) = \sum_{i=1}^{2p-1} x_i^{p-1} - p, \qquad f_2(x_1, \ldots, x_{2p-1}) = \sum_{i=1}^{2p-1} a_i x_i^{p-1}. \] Wait, we need a slightly different formulation. Consider \(x_1, \ldots, x_{2p-1} \in \{0, 1\} \subseteq \mathbb{F}_p\) representing a choice of \(p\) elements from the \(2p-1\) given integers. Over \(\mathbb{F}_p\), \(x_i^{p-1} = \mathbf{1}[x_i \neq 0]\) for \(x_i \in \mathbb{F}_p\). \[ g_1 = x_1^{p-1} + x_2^{p-1} + \cdots + x_{2p-1}^{p-1} - (p-1), \qquad g_2 = a_1 x_1 + a_2 x_2 + \cdots + a_{2p-1} x_{2p-1}. \]

We want to find \(x \in \mathbb{F}_p^{2p-1} \setminus \{0\}\) with \(x_i \in \{0, 1\}\) for all \(i\) (in \(\mathbb{F}_p\)), \(\sum x_i^{p-1} = p\) (i.e., exactly \(p\) of the \(x_i\) are nonzero), and \(\sum a_i x_i \equiv 0 \pmod p\). This is a bit more delicate.

Here is a cleaner approach via the Combinatorial Nullstellensatz. We use Alon’s theorem directly: consider the polynomial \(f(x_1, \ldots, x_{2p-1}) = \left(\sum_{i=1}^{2p-1} a_i x_i\right)^{p-1} \cdot \left(\sum_{i=1}^{2p-1} x_i^{p-1}\right)\) … but this becomes complicated.

\[ f(x) = a_1 x_1 + a_2 x_2 + \cdots + a_{2p-1} x_{2p-1}, \qquad g(x) = x_1^{p-1} + \cdots + x_{2p-1}^{p-1}. \]

The degrees are \(\deg f = 1\) and \(\deg g = p-1\), so \(\deg f + \deg g = p < 2p - 1 = n\) (the number of variables). By Chevalley-Warning, the number of common zeros is divisible by \(p\). The zero vector is one solution, so there is another solution \(x \neq 0\) with \(\sum a_i x_i \equiv 0 \pmod p\) and … but \(g(x) = 0\) means the number of non-zero coordinates is divisible by \(p-1\)… this is not quite what we want.

Actually, the standard proof uses the following approach. We want to show that among any \(2p-1\) elements of \(\mathbb{Z}/p\mathbb{Z}\), there exist \(p\) whose sum is zero. Consider the elementary symmetric polynomials and use the Davenport constant. The Chevalley-Warning approach works as follows.

\[ P(x_1, \ldots, x_{2p-1}) = \left(1 - \left(\sum_{i=1}^{2p-1} a_i x_i \right)^{p-1}\right) \cdot \prod_{i=1}^{2p-1}(1 - x_i^{p-1}). \]

This polynomial evaluates to 1 at a point \(x = (x_1, \ldots, x_{2p-1}) \in \mathbb{F}_p^{2p-1}\) if and only if all \(x_i = 0\) and the “vacuously” the sum condition holds. But we actually want to select a \(p\)-element subset…

The most direct proof for the prime case uses the following: by the Davenport constant \(D(\mathbb{Z}/p\mathbb{Z}) = 2p - 1\), any sequence of \(2p - 1\) elements has a non-empty subsequence of length at most \(p\) summing to zero. Since the Davenport constant is defined as the smallest \(d\) such that every sequence of length \(d\) has a non-empty zero-sum subsequence, and the Erdős-Ginzburg-Ziv theorem requires a zero-sum subsequence of length exactly \(p\), a slightly different argument is needed. The complete proof uses a combination of the Cauchy-Davenport theorem and induction, or alternatively the Chevalley-Warning theorem applied to appropriately constructed polynomials. ■

6.4 The Cap Set Problem and the Croot-Lev-Pach Method

The cap set problem asks for the maximum size of a subset of \(\mathbb{F}_3^n\) with no three-term arithmetic progression (equivalently, no three distinct points \(x, y, z\) with \(x + y + z = 0\)). This is the \(\mathbb{F}_3^n\) analogue of the Roth/Szemerédi problem.

The function \(r_3(\mathbb{F}_3^n)\) denotes the maximum size of such a “cap set.” Meshulam’s theorem (Theorem 4.2) gives \(r_3(\mathbb{F}_3^n) \le 3^n \cdot O(1/n)\). For decades, the best bound was of this form, and it was a major open problem to determine whether \(r_3(\mathbb{F}_3^n) \le c^n\) for some \(c < 3\).

In 2016, a stunning breakthrough resolved this problem.

Theorem 6.4 (Ellenberg-Gijswijt, 2016, building on Croot-Lev-Pach, 2016). Let \(A \subseteq \mathbb{F}_3^n\) contain no three-term arithmetic progression. Then \[ |A| \le 3 \cdot \left(\frac{3}{2^{2/3}}\right)^n \approx (2.756)^n. \] In particular, \(r_3(\mathbb{F}_3^n) \le c^n\) for \(c = 3/2^{2/3} < 3\).

The proof is remarkably short and elegant, using the “polynomial method” in a novel way. The key new idea, due to Croot, Lev, and Pach (who proved an analogous result for \(\mathbb{F}_4^n\)), is the use of the “slice rank” of a tensor.

Proof. We use the following formulation. For \(A \subseteq \mathbb{F}_3^n\), define the \(|A| \times |A|\) matrix \(M\) indexed by \(A \times A\) with entries \[ M_{x,y} = \delta(x + y \in A), \] where \(\delta\) is the indicator function. If \(A\) is cap-set-free (no 3-AP), then \(x + y \in A\) with \(x, y \in A\) implies \(x + y + z = 0\) has a solution with \(z = -(x+y) \in A\), which would be a 3-AP unless \(x = y = z\). In \(\mathbb{F}_3\), \(x + y + z = 0\) and \(x = y\) gives \(z = -2x = x\), so \(x = y = z\). Thus \(M_{x,y} \neq 0\) implies \(x + y \in A\) implies \(x = y\) (if no non-trivial 3-AP), so \(M_{x,x} = \delta(2x \in A) = \delta(-x \in A)\). \[ F(x, y) = \prod_{a \in \mathbb{F}_3^n \setminus A} (1 - (x + y - a)^{3^n - 1}) \]

… this is getting complicated. Let us use the standard “slice rank” argument.

Define the tensor \(T: A \times A \times A \to \mathbb{F}_3\) by \(T(x, y, z) = \delta(x + y + z = 0)\). If \(A\) has no 3-AP, then \(T\) is supported on the “diagonal” \(\{(x, x, x) : x \in A, \, 3x = 0\}\), and since we are in \(\mathbb{F}_3\), \(3x = 0\) always, so the support is \(\{(x, x, x) : x \in A\}\). Thus the diagonal restriction of \(T\) has \(|A|\) non-zero entries.

\[ T(x, y, z) = \sum_i f_i(x) \cdot g_i(y, z) + \sum_j h_j(y) \cdot k_j(x, z) + \sum_l m_l(z) \cdot n_l(x, y). \]\[ \delta(x + y + z = 0) = \prod_{i=1}^n (1 - (x_i + y_i + z_i)^2). \]\[ \prod_{i=1}^n (1 - (x_i + y_i + z_i)^2) = \sum_{S \subseteq [n]} (-1)^{|S|} \prod_{i \in S} (x_i + y_i + z_i)^2. \]\[ \delta(x+y+z = 0) = \sum_{\alpha + \beta + \gamma = (2, 2, \ldots, 2)_S \text{ for various } S} c_{\alpha,\beta,\gamma} \, x^\alpha y^\beta z^\gamma, \]

where the monomials satisfy \(\deg_x + \deg_y + \deg_z = 2|S|\) with each coordinate contributing degree at most 2. In \(\mathbb{F}_3^n\), we reduce modulo \(x_i^3 = x_i\), so each variable has degree at most 2 in each coordinate.

The key observation: the total degree is \(2n\), so for each monomial \(x^\alpha y^\beta z^\gamma\) appearing in the expansion, we have \(|\alpha| + |\beta| + |\gamma| = 2n\) (where \(|\alpha| = \sum \alpha_i\), with each \(\alpha_i \in \{0, 1, 2\}\)). By the pigeonhole principle, at least one of \(|\alpha|, |\beta|, |\gamma|\) is at most \(2n/3\).

\[ T(x, y, z) = \sum_{|\alpha| \le 2n/3} c_\alpha(x) \cdot g_\alpha(y, z) + \sum_{|\beta| \le 2n/3} d_\beta(y) \cdot h_\beta(x, z) + \sum_{|\gamma| \le 2n/3} e_\gamma(z) \cdot k_\gamma(x, y), \]

where the sums are over multi-indices with the indicated degree bounds.

The number of multi-indices \(\alpha \in \{0, 1, 2\}^n\) with \(|\alpha| \le 2n/3\) is at most \(D_n\), where \(D_n = |\{\alpha \in \{0,1,2\}^n : |\alpha| \le 2n/3\}|\).

Thus the slice rank of \(T\) is at most \(3 D_n\).

Since \(T\) restricted to the diagonal has \(|A|\) non-zero entries, and the slice rank gives an upper bound on the diagonal support size (by a simple linear algebra argument: on the diagonal, the decomposition gives \(T(x,x,x) = \sum_i f_i(x) g_i(x,x) + \cdots\), and the rank of the diagonal matrix is at most the slice rank, so \(|A| \le 3D_n\)).

\[ D_n \le \left(\frac{3}{2^{2/3}}\right)^n. \]

Therefore \(|A| \le 3 D_n \le 3 (3/2^{2/3})^n\). ■

Remark 6.5. The Croot-Lev-Pach/Ellenberg-Gijswijt result was a sensation in combinatorics. The proof is only about two pages long and settles a problem that had resisted decades of effort. The key conceptual innovation — the use of slice rank to control diagonal support — has since been applied to many other problems, including progress on the sunflower conjecture and new results in communication complexity.

6.5 Slice Rank and Partition Rank

The success of the polynomial method in the cap set problem has led to the development of a systematic theory of “tensor ranks” in additive combinatorics.

Let \(V_1, \ldots, V_k\) be vector spaces over a field \(\mathbb{F}\). A tensor \(T \in V_1 \otimes \cdots \otimes V_k\) has slice rank at most \(r\) if it can be written as a sum of \(r\) terms, each of the form \(v_i \otimes w_i\) where \(v_i \in V_j\) for some \(j\) and \(w_i\) is a tensor in the remaining factors.

The partition rank of \(T\) is the minimum \(r\) such that \(T\) can be written as a sum of \(r\) tensors, each of which factors as a product of two tensors over a non-trivial partition of the indices \(\{1, \ldots, k\}\) into two parts.

Remark 6.6. Slice rank equals partition rank for 3-tensors (up to a constant factor), but they can differ for higher-order tensors. The partition rank has proved to be the more natural quantity for applications in additive combinatorics, as it interacts well with the algebraic structure of the problems.

6.6 The Sunflower Conjecture

The polynomial method has also made contributions to the sunflower conjecture.

A sunflower (or \(\Delta\)-system) with \(k\) petals is a collection of \(k\) sets \(S_1, \ldots, S_k\) such that there exists a set \(Y\) (the core) with \(S_i \cap S_j = Y\) for all \(i \neq j\).

Theorem 6.7 (Erdős-Ko, Sunflower Lemma, 1960). Any family of more than \((k-1)^w \cdot w!\) sets, each of size \(w\), contains a sunflower with \(k\) petals.

The Erdős-Ko sunflower conjecture posits that the \(w!\) factor can be replaced by \(c_k^w\) for some constant \(c_k\) depending only on \(k\). In 2019, Alweiss, Lovett, Wu, and Zhang achieved a major breakthrough, proving a bound of \((\log w)^{O(w)}\) (up to lower-order terms), which was subsequently improved to \(O(\log w)^w\) by Rao and others. While the proof uses a “spread” technique rather than the polynomial method directly, the ideas are deeply connected to the slice rank methods developed for the cap set problem.

Chapter 7: Szemerédi’s Theorem and Beyond

We conclude these notes with the deepest results in additive combinatorics: Szemerédi’s theorem on arithmetic progressions of arbitrary length, its various proofs, the Green-Tao theorem on primes, and the connections to ergodic theory and combinatorics. These results represent a remarkable confluence of ideas from across mathematics.

7.1 Szemerédi’s Theorem

Theorem 7.1 (Szemerédi, 1975). For every integer \(k \ge 3\) and every \(\delta > 0\), there exists \(N_0 = N_0(k, \delta)\) such that for all \(N \ge N_0\), every subset of \(\{1, 2, \ldots, N\}\) of size at least \(\delta N\) contains an arithmetic progression of length \(k\).

Equivalently, defining \(r_k(N)\) as the maximum size of a subset of \(\{1, \ldots, N\}\) with no \(k\)-AP, Szemerédi’s theorem asserts that \(r_k(N) = o(N)\) for every fixed \(k\).

Remark 7.2. The history of this problem is fascinating. In 1936, Erdős and Turán conjectured that any subset of the integers with positive upper density contains arithmetic progressions of every length. The case \(k = 3\) was proved by Roth in 1953 (Chapter 4). The case \(k = 4\) was proved by Szemerédi in 1969, using a highly complex combinatorial argument. The general case was proved by Szemerédi in 1975, in a tour de force of combinatorial reasoning that introduced the regularity lemma as a key tool.

7.2 Four Proofs: An Overview

Szemerédi’s theorem has been proved by at least four fundamentally different methods, each opening up new areas of mathematics:

1. Szemerédi’s combinatorial proof (1975). The original proof uses the regularity lemma (Chapter 5) along with a complex combinatorial argument involving “density increment” on structured subsets. The regularity lemma decomposes the “graph” of the problem into pseudo-random pieces, and the density increment iteratively refines the analysis until a progression is found.

2. Furstenberg’s ergodic proof (1977). Furstenberg’s proof translates the problem into ergodic theory using the “Furstenberg correspondence principle” (see below) and then proves the result as a multiple recurrence theorem. This approach, while non-constructive, opened up the field of ergodic Ramsey theory and led to many generalizations.

3. Gowers’ Fourier-analytic proof (1998, 2001). Gowers introduced higher-order Fourier analysis (the “Gowers norms” \(U^k\)) and used a quantitative density increment argument to prove Szemerédi’s theorem with explicit bounds. For \(k = 4\), he obtained \(r_4(N) \le N (\log \log N)^{-c}\), and for general \(k\), he obtained bounds of tower type. This work earned Gowers the Fields Medal in 1998.

4. Hypergraph regularity proof (2004-2006). Gowers and, independently, Rödl-Nagle-Schacht-Skokan extended the regularity-counting-removal framework to hypergraphs, providing a new proof of Szemerédi’s theorem through a “hypergraph removal lemma.”

7.3 Furstenberg’s Correspondence Principle

Theorem 7.3 (Furstenberg Correspondence Principle, 1977). Let \(A \subseteq \mathbb{Z}\) have positive upper density: \[ \bar{d}(A) = \limsup_{N \to \infty} \frac{|A \cap \{1, \ldots, N\}|}{N} > 0. \] Then there exist a probability space \((X, \mathcal{B}, \mu)\), a measure-preserving transformation \(T: X \to X\), and a measurable set \(E \subseteq X\) with \(\mu(E) = \bar{d}(A)\) such that for every \(k \ge 1\) and \(n \ge 1\), \[ \mu(E \cap T^{-n} E \cap T^{-2n} E \cap \cdots \cap T^{-(k-1)n} E) > 0 \implies \text{(corresponding pattern exists in } A\text{)}. \] More precisely, for every \(k\), \[ \bar{d}(A \cap (A - n) \cap (A - 2n) \cap \cdots \cap (A - (k-1)n)) \ge \mu(E \cap T^{-n} E \cap \cdots \cap T^{-(k-1)n} E). \]

Proof sketch. The idea is to construct the dynamical system from the combinatorial data. Let \(\Omega = \{0, 1\}^{\mathbb{Z}}\) with the product topology, and let \(T: \Omega \to \Omega\) be the left shift: \((T\omega)(n) = \omega(n+1)\). The set \(A\) defines a point \(\omega_A \in \Omega\) by \(\omega_A(n) = 1_A(n)\). Let \(X = \overline{\{T^n \omega_A : n \in \mathbb{Z}\}}\) be the orbit closure, and let \(E = \{\omega \in X : \omega(0) = 1\}\).

To construct the measure \(\mu\), take a subsequence of the Cesàro averages \(\frac{1}{N} \sum_{n=1}^{N} \delta_{T^n \omega_A}\) that converges weak-* to a \(T\)-invariant probability measure \(\mu\) on \(X\) (this exists by compactness of the space of probability measures). One can ensure \(\mu(E) = \bar{d}(A)\) by choosing the subsequence appropriately. The correspondence between patterns in \(A\) and multiple recurrence of \(T\) then follows from the construction. ■

7.4 The Multiple Recurrence Theorem

With the correspondence principle in hand, Szemerédi’s theorem reduces to the following purely ergodic-theoretic result.

Theorem 7.4 (Furstenberg Multiple Recurrence Theorem, 1977). Let \((X, \mathcal{B}, \mu)\) be a probability space, \(T: X \to X\) a measure-preserving transformation, and \(E \in \mathcal{B}\) with \(\mu(E) > 0\). Then for every \(k \ge 1\), \[ \liminf_{N \to \infty} \frac{1}{N} \sum_{n=1}^{N} \mu(E \cap T^{-n} E \cap T^{-2n} E \cap \cdots \cap T^{-(k-1)n} E) > 0. \] In particular, there exists \(n \ge 1\) with \(\mu(E \cap T^{-n} E \cap \cdots \cap T^{-(k-1)n} E) > 0\).

Proof sketch. Furstenberg's proof proceeds by structural induction on the complexity of the system. The base case is when \((X, T)\) is "compact" (has discrete spectrum), where the result follows from a generalization of Weyl's equidistribution theorem. The general case reduces to the compact case via the "Furstenberg structure theorem," which decomposes any system into a tower of extensions, each of which is either compact or "weak-mixing." The weak-mixing case is handled by a "van der Waerden" argument at the measure-theoretic level.

The key innovation is the use of the Furstenberg-Zimmer structure theorem, which constructs the maximal compact factor of a system. For a \(\mathbb{Z}\)-action, this is the Kronecker factor (the closed linear span of the eigenfunctions). The multiple recurrence theorem is first proved for the Kronecker factor (using equidistribution of polynomial sequences), and then extended to the full system by showing that the “remainder” (the weak-mixing part) does not affect the limit. ■

Remark 7.5. The ergodic-theoretic approach has been enormously influential. It has led to proofs of the multidimensional Szemerédi theorem (Furstenberg-Katznelson, 1978), the polynomial Szemerédi theorem (Bergelson-Leibman, 1996), and many other results that remain out of reach of purely combinatorial methods.

7.5 Gowers Norms and Higher-Order Fourier Analysis

Gowers’ quantitative approach to Szemerédi’s theorem introduced a hierarchy of norms that measure the “complexity” of functions, generalizing the Fourier transform to higher orders.

Let \(f: \mathbb{Z}/N\mathbb{Z} \to \mathbb{C}\). The Gowers \(U^k\) norm of \(f\) is \[ \|f\|_{U^k}^{2^k} = \mathbb{E}_{x, h_1, \ldots, h_k \in \mathbb{Z}/N\mathbb{Z}} \prod_{\omega \in \{0,1\}^k} \mathcal{C}^{|\omega|} f\left(x + \sum_{i=1}^k \omega_i h_i\right), \] where \(\mathcal{C}\) denotes complex conjugation (so \(\mathcal{C}^{|\omega|} f = \bar{f}\) when \(|\omega|\) is odd and \(f\) when \(|\omega|\) is even), and \(|\omega| = \omega_1 + \cdots + \omega_k\).

Example 7.6. For \(k = 2\), the \(U^2\) norm is \[ \|f\|_{U^2}^4 = \mathbb{E}_{x, h_1, h_2} f(x) \overline{f(x + h_1)} \overline{f(x + h_2)} f(x + h_1 + h_2) = \sum_r |\widehat{f}(r)|^4. \] Thus the \(U^2\) norm is directly related to the Fourier transform: \(\|f\|_{U^2}\) is large if and only if \(f\) has a large Fourier coefficient. The \(U^2\) norm controls 3-APs (Roth's theorem), while \(U^3\) controls 4-APs, and in general \(U^{k-1}\) controls \(k\)-APs.

Theorem 7.7 (Gowers' Inverse Theorem for \(U^k\), qualitative version). If \(\|f\|_{U^k}\) is large (relative to \(\|f\|_\infty\)), then \(f\) correlates with a "structured" function of complexity \(k-1\). For \(k = 2\), this means \(f\) correlates with a linear phase \(e^{2\pi i \alpha x}\). For \(k = 3\), \(f\) correlates with a quadratic phase \(e^{2\pi i (\alpha x^2 + \beta x)}\). For general \(k\), \(f\) correlates with a "nilsequence" of degree \(k-1\).

The precise statement of the inverse theorem for general \(k\), proved by Green, Tao, and Ziegler (2012), involves nilsequences and the theory of nilmanifolds, connecting additive combinatorics to the theory of Lie groups and homogeneous dynamics.

7.6 The Green-Tao Theorem

The Green-Tao theorem, proved in 2008, is one of the most celebrated results in 21st-century mathematics.

Theorem 7.8 (Green-Tao, 2008). The prime numbers contain arithmetic progressions of every length. That is, for every \(k \ge 1\), there exist primes \(p, p + d, p + 2d, \ldots, p + (k-1)d\) with \(d > 0\).

Remark 7.9. Before the Green-Tao theorem, it was known (by computation) that the primes contain 23-APs, and the Green-Tao theorem was widely believed but seemed far out of reach. The theorem resolved a long-standing conjecture and has had a transformative impact on analytic number theory and additive combinatorics.

Proof architecture. The proof proceeds in several major stages, each representing a significant technical achievement.

Stage 1: Reformulation. The primes have density zero in the integers, so Szemerédi’s theorem does not apply directly. Green and Tao show that it suffices to find a “pseudo-random” measure \(\nu\) that majorizes the primes (or rather, a modified version of the primes) and satisfies certain correlation conditions.

Stage 2: The transference principle. The key innovation is a “transference principle” that allows one to transfer results from the dense setting (Szemerédi’s theorem for dense subsets of \(\{1, \ldots, N\}\)) to the sparse setting (subsets of a pseudo-random set). Green and Tao show that if \(\nu\) is a pseudo-random measure (satisfying a “linear forms condition” and a “correlation condition”) and \(A\) is a subset of the support of \(\nu\) with positive relative density, then \(A\) contains \(k\)-APs.

The transference principle works by showing that the “balanced function” \(f = 1_A - \delta \cdot \nu\) (where \(\delta\) is the relative density) has small Gowers \(U^{k-1}\) norm relative to the pseudo-random measure. This allows one to replace \(1_A\) by \(\delta \cdot \nu\) in the counting operator for \(k\)-APs, reducing to the pseudo-random case, which is easy.

Stage 3: Constructing the pseudo-random measure. Green and Tao construct \(\nu\) using a “W-trick” (working modulo the product of small primes to remove the obvious bias from the primes) and Goldston-Yıldırım sieve weights. The key input from analytic number theory is a truncated version of the Hardy-Littlewood conjectures, verified for the modified sieve weights.

Stage 4: Verifying the linear forms condition. The most technically demanding part is showing that \(\nu\) satisfies the linear forms condition: for any system of linear forms \(\psi_i(x) = L_i \cdot x + b_i\) of finite complexity, the average of \(\prod_i \nu(\psi_i(x))\) equals \(1 + o(1)\). This requires deep estimates from multiplicative number theory, specifically a generalization of the Goldston-Pintz-Yıldırım estimates. ■

Remark 7.10. The Green-Tao theorem has been extended in many directions. Tao and Ziegler (2008) showed that any system of linear equations that is solvable in a "pseudo-random" sense is solvable in the primes. The result has also inspired work on arithmetic progressions in other sparse sets (e.g., the Gaussian primes, primes in short intervals).

The Green-Tao proof does not give explicit bounds on the smallest \(k\)-AP in the primes, and finding such bounds (even for \(k = 3\)) remains a challenging open problem.

7.7 The Hales-Jewett Theorem

The Hales-Jewett theorem is a foundational result in Ramsey theory that implies Szemerédi’s theorem (and much more) but is formulated in the purely combinatorial setting of “combinatorial lines” in \([k]^n\).

Let \([k] = \{1, 2, \ldots, k\}\). A combinatorial line in \([k]^n\) is a set of \(k\) points \(\ell_1, \ldots, \ell_k \in [k]^n\) obtained by choosing a non-empty set \(S \subseteq [n]\) of "active coordinates" and a fixed value \(c_i \in [k]\) for each inactive coordinate \(i \notin S\), and setting \[ (\ell_j)_i = \begin{cases} j & \text{if } i \in S, \\ c_i & \text{if } i \notin S, \end{cases} \] for \(j = 1, \ldots, k\). In other words, the active coordinates cycle through \(1, 2, \ldots, k\) simultaneously, while the inactive coordinates are fixed.

Theorem 7.11 (Hales-Jewett, 1963). For every \(k \ge 1\) and every \(r \ge 1\), there exists \(N = HJ(k, r)\) such that for all \(n \ge N\), any \(r\)-coloring of \([k]^n\) contains a monochromatic combinatorial line.

Remark 7.12. The Hales-Jewett theorem implies van der Waerden's theorem (and hence, by the density version, Szemerédi's theorem). The connection is that arithmetic progressions of length \(k\) in \(\{0, 1, \ldots, k-1\}^n\) correspond to combinatorial lines, and the map \(x \mapsto \sum_i x_i k^i\) sends combinatorial lines to arithmetic progressions.

7.8 The Density Hales-Jewett Theorem

The density version of the Hales-Jewett theorem was proved by the remarkable Polymath1 project, the first successful large-scale collaborative mathematics project.

Theorem 7.13 (Density Hales-Jewett, Polymath1, 2012). For every \(k \ge 1\) and every \(\delta > 0\), there exists \(N = DHJ(k, \delta)\) such that for all \(n \ge N\), every subset of \([k]^n\) of density at least \(\delta\) contains a combinatorial line.

Remark 7.14. The density Hales-Jewett theorem is strictly stronger than Szemerédi's theorem and cannot be proved by the regularity method or by ergodic theory alone (the ergodic proof was obtained later by Austin, 2010). The Polymath proof uses a density increment argument in the high-dimensional combinatorial setting, with the key innovation being an ingenious "correlation" argument that exploits the structure of the combinatorial cube \([k]^n\). The project was initiated by Timothy Gowers in 2009 on his blog, and the paper lists "D.H.J. Polymath" as the author.

7.9 Connections to Ergodic Theory and Model Theory

The interplay between additive combinatorics and other areas of mathematics has been one of the most productive themes in modern mathematics.

Ergodic theory. The Furstenberg correspondence principle (Theorem 7.3) established a deep connection between combinatorial results about the integers and recurrence properties of measure-preserving systems. This connection has been vastly generalized:

The Bergelson-Leibman theorem (1996) proves a polynomial extension of Szemerédi’s theorem: dense sets contain polynomial progressions \(a, a + p_1(d), a + p_2(d), \ldots\) for any polynomials \(p_i\) with \(p_i(0) = 0\).
Host and Kra (2005) developed the theory of “characteristic factors” and “nilpotent systems” that underlie the convergence of multiple ergodic averages.
Austin (2015) proved the convergence of multiple ergodic averages for commuting transformations.

Model theory and ultraproducts. In recent years, model-theoretic methods — particularly the use of ultraproducts and nonstandard analysis — have provided new proofs and perspectives on results in additive combinatorics. The “nonstandard” proof of Szemerédi’s regularity lemma and the use of ultralimits in the Green-Tao transference principle are notable examples.

Remark 7.15. The field of additive combinatorics continues to evolve rapidly. The polynomial Freiman-Ruzsa theorem (Gowers-Green-Manners-Tao, 2023), the Kelley-Meka breakthrough on Roth's theorem (2023), and ongoing work on higher-order Fourier analysis and ergodic combinatorics are reshaping our understanding of additive structure. The interplay between combinatorics, number theory, harmonic analysis, ergodic theory, and algebraic geometry promises many further breakthroughs.

These notes provide an introduction to the major themes of additive combinatorics. The subject is vast and rapidly developing, and the reader is encouraged to consult the references listed at the top of these notes for complete proofs and further developments.