CO 740: Additive Combinatorics
Estimated study time: 2 hr 20 min
Table of contents
These notes synthesize material from T. Tao and V.H. Vu’s Additive Combinatorics, Y. Zhao’s Graph Theory and Additive Combinatorics, M.B. Nathanson’s Additive Number Theory, and B. Green’s lecture notes on additive combinatorics, enriched with material from MIT 18.225 (Y. Zhao) and T. Gowers’s expository articles.
Chapter 1: Additive Structure in Abelian Groups
Additive combinatorics is, at its heart, the study of how algebraic and combinatorial structure interact in abelian groups. The most fundamental objects of study are sumsets: given subsets \(A\) and \(B\) of an abelian group \(G\), we form their sumset and ask how its size relates to the sizes of the constituent sets. This seemingly innocent question leads to a rich and deep theory connecting combinatorics, number theory, harmonic analysis, and ergodic theory. In this opening chapter, we lay the foundations by establishing the classical results on sumset growth — the Cauchy-Davenport theorem, Kneser’s theorem, and the Plünnecke-Ruzsa inequality — and we introduce the Ruzsa distance, a quasi-metric that captures the notion of additive closeness between sets.
1.1 Sumsets and Basic Definitions
The lower bound is achieved when one of the sets is a singleton, and the upper bound is achieved when the addition map is injective — that is, when there are no additive relations between elements of \(A\) and \(B\). The central question of sumset theory is: what structural conditions force \(|A+B|\) to be small relative to \(|A|\) and \(|B|\)?
These examples illustrate a fundamental principle: sets with additive structure (like arithmetic progressions) have small sumsets, while “random” or “unstructured” sets tend to have larger sumsets. Making this principle precise is one of the great achievements of additive combinatorics.
1.2 The Cauchy-Davenport Theorem
The first non-trivial result on sumsets is the Cauchy-Davenport theorem, which provides a sharp lower bound for the size of sumsets in \(\mathbb{Z}/p\mathbb{Z}\). This result was originally proved by Cauchy in 1813 and independently rediscovered by Davenport in 1935.
Before giving the proof, let us note that this bound is tight: if \(A\) and \(B\) are arithmetic progressions with the same common difference, say \(A = \{0, 1, \ldots, a-1\}\) and \(B = \{0, 1, \ldots, b-1\}\) in \(\mathbb{Z}/p\mathbb{Z}\) with \(a + b - 1 \le p\), then \(A + B = \{0, 1, \ldots, a+b-2\}\), which has exactly \(a + b - 1\) elements. The theorem says that arithmetic progressions are, in a sense, the extremal sets for sumset growth in \(\mathbb{Z}/p\mathbb{Z}\).
We proceed by strong induction on \(|B|\). The base case \(|B| = 1\) is trivial, since \(|A + B| = |A| = |A| + 1 - 1\).
\[ A' = A \cup (c - B), \qquad B' = B \cap (c - A). \]We claim:
Step 1: \(A' + B' \subseteq A + B\). Indeed, let \(a' + b' \in A' + B'\). If \(a' \in A\), then \(b' \in B\) (since \(B' \subseteq B\)), so \(a' + b' \in A + B\). If \(a' \in c - B\), say \(a' = c - b_0\) for some \(b_0 \in B\), then \(b' \in c - A\), say \(b' = c - a_0\) for some \(a_0 \in A\). Then \(a' + b' = 2c - b_0 - a_0\). But also \(b' \in B\), so \(a_0 + b' \in A + B\). However, we need a different argument. Instead, since \(b' \in B' \subseteq c - A\), we have \(b' = c - a_0\) for some \(a_0 \in A\), and then \(a' + b' = (c - b_0) + (c - a_0) = 2c - a_0 - b_0\). This does not immediately land in \(A + B\).
\[ A_e = A \cup (B + e), \qquad B_e = B \cap (A - e). \]We verify:
- \(A_e + B_e \subseteq A + B\). Let \(x \in A_e\) and \(y \in B_e\). If \(x \in A\), then \(y \in B\) (since \(B_e \subseteq B\)), so \(x + y \in A + B\). If \(x \in B + e\), say \(x = b_1 + e\) for some \(b_1 \in B\), then since \(y \in A - e\), we have \(y + e \in A\), so \(x + y = b_1 + (y + e) \in B + A = A + B\). ■
- \(|A_e| + |B_e| = |A| + |B|\). By inclusion-exclusion, \(|A_e| = |A \cup (B+e)| = |A| + |B+e| - |A \cap (B+e)| = |A| + |B| - |A \cap (B+e)|\). Also \(|B_e| = |B \cap (A - e)| = |A \cap (B+e)|\) (since \(b \in B \cap (A-e) \Leftrightarrow b \in B\) and \(b + e \in A \Leftrightarrow b + e \in A \cap (B+e)\)). Hence \(|A_e| + |B_e| = |A| + |B|\). ■
Now, if \(|B| \ge 2\) and \(A + B \neq \mathbb{Z}/p\mathbb{Z}\), we choose \(e\) so that \(B_e\) is non-empty but strictly smaller than \(B\). Since \(A + B \neq \mathbb{Z}/p\mathbb{Z}\), there exists \(e\) such that \(e \notin A - B\), which means \(B \cap (A - e) = \emptyset\), giving \(|B_e| = 0\). That is too small. Instead, choose \(e\) so that \(A \cap (B + e) \neq \emptyset\) (which is possible since we can take \(e = a - b\) for any \(a \in A, b \in B\)) but \(B + e \not\subseteq A\). If \(B + e \subseteq A\), then \(|A| \ge |B|\), and taking a different \(e' = a' - b'\) where \(a' \notin B + e\) (possible since \(A + B \neq \mathbb{Z}/p\mathbb{Z}\) implies \(A \neq \mathbb{Z}/p\mathbb{Z}\)) gives \(B_{e'}\) strictly between \(0\) and \(|B|\) — unless all translates \(B + e\) that intersect \(A\) are contained in \(A\), which would force \(A\) to be a union of translates of \(B\) and hence a coset of a subgroup containing \(B-B\). Since \(p\) is prime, the only subgroups of \(\mathbb{Z}/p\mathbb{Z}\) are \(\{0\}\) and \(\mathbb{Z}/p\mathbb{Z}\), so this cannot happen when \(|B| \ge 2\).
\[ |A + B| \ge |A_e + B_e| \ge |A_e| + |B_e| - 1 = |A| + |B| - 1. \]This completes the induction. ■
1.3 Kneser’s Theorem
When we move beyond the prime-order setting, sumsets can be unexpectedly small due to the presence of non-trivial subgroups. Kneser’s theorem, proved by Martin Kneser in 1953, provides the correct generalization of Cauchy-Davenport by identifying the precise obstruction.
1.4 Small Doubling and the Plünnecke-Ruzsa Inequality
We now turn to one of the most fundamental results in additive combinatorics: the Plünnecke-Ruzsa inequality. The question is: if a set \(A\) has small doubling — meaning \(|A + A| \le K|A|\) for some constant \(K\) — what can we say about the iterated sumsets \(kA - lA\)?
Plünnecke’s original proof (1970) used ideas from graph theory related to Menger’s theorem. Ruzsa subsequently simplified and extended the result. The cleanest modern proof is due to George Petridis (2012), using a beautiful graph-theoretic argument that we present in full.
The proof of this result proceeds in two stages. First, we establish the Ruzsa triangle inequality, which is a key tool in its own right. Then we prove the Plünnecke-Ruzsa inequality using Petridis’s method.
Now we turn to the heart of the matter: the Plünnecke-Ruzsa inequality. The key lemma, due to Petridis, identifies a subset with especially good sumset growth.
We prove \(|X + B + C| \le K'|X + C|\) for every finite set \(C\) by induction on \(|C|\). The base case \(|C| = \{c\}\) gives \(|X + B + \{c\}| = |X + B| = K'|X| = K'|X + \{c\}|\), which holds with equality.
\[ X + B + C = (X + B + C') \cup (X + B + c), \]\[ |X + B + C| = |X + B + C'| + |X + B + c| - |(X + B + C') \cap (X + B + c)|. \]\[ |X + B + C| \le |X + B + C'| + |X + B| - |Y + B|. \]\[ |X + C| = |X + C'| + |X + c| - |(X + C') \cap (X + c)| \le |X + C'| + |X| - |Y|. \]\[ |X + B + C| \le K'|X + C'| + K'|X| - K'|Y| \le K'(|X + C'| + |X| - |Y|) \le K'|X + C|. \]If \(Y = \emptyset\), the bound \(|X + B + C| \le |X + B + C'| + |X + B|\) combined with induction and the trivial \(|X + C| \ge |X + C'|\) gives an even easier estimate. ■
which gives \(|kA - lA| \le K^{k+l}|X| \le K^{k+l}|A|\). ■
1.5 Ruzsa’s Covering Lemma
Ruzsa’s covering lemma is a simple but powerful result that allows us to “cover” one set with translates of another. It is a key tool in the proof of Freiman’s theorem.
giving \(|T| \le K\). ■
1.6 The Ruzsa Distance
Ruzsa introduced a notion of “distance” between finite sets that captures how additively similar they are. While not a true metric (it does not satisfy the identity of indiscernibles), it is a quasi-metric and has proved invaluable.
For non-negativity, note that \(|A - B| \ge \max(|A|, |B|) \ge \sqrt{|A| \cdot |B|}\), so \(d(A, B) \ge 0\).
Finally, \(d(A, A) = \log(|A - A|/|A|)\). This equals zero if and only if \(|A - A| = |A|\), which happens if and only if \(A\) is a coset of a finite subgroup. ■
Chapter 2: Freiman’s Theorem and Inverse Sumset Theory
If Chapter 1 was about “direct” problems — bounding \(|A+B|\) from below — then this chapter is about the corresponding “inverse” problem: if \(|A+A|\) is small, what can we say about the structure of \(A\)? The main result is Freiman’s theorem, which asserts that sets with small doubling are efficiently contained in generalized arithmetic progressions. This is one of the crowning achievements of additive combinatorics, and its recent quantitative improvements — culminating in the resolution of the polynomial Freiman-Ruzsa conjecture — represent some of the most exciting developments in modern mathematics.
2.1 Generalized Arithmetic Progressions
A GAP is proper if \(|P| = \mathrm{vol}(P)\), i.e., there are no unexpected collisions among the sums.
2.2 Freiman Homomorphisms
To state Freiman’s theorem precisely, we need the notion of a Freiman homomorphism, which captures the relevant notion of “structure-preserving map” for additive combinatorics.
2.3 Freiman’s Theorem
Freiman’s original proof (in the integers) was extremely involved. Ruzsa provided a dramatically simplified proof in 1994, which remains the basis for most modern treatments. We sketch the main ideas.
Step 1: Reduction to a bounded torsion-free setting. Using the Ruzsa covering lemma, one can “model” \(A\) — up to Freiman isomorphism — as a subset of \(\mathbb{Z}^d\) for some bounded \(d\). The idea is that if \(|A + A| \le K|A|\), then by the Plünnecke-Ruzsa inequality, \(|2A - 2A| \le K^4|A|\), and by the covering lemma, \(2A - 2A\) can be covered by \(K^4\) translates of \(A - A\). This gives enough “linear algebra” structure to embed into a lattice.
Step 2: Bogolyubov’s lemma. One shows that \(2A - 2A\) contains a large structured subset. Specifically, Bogolyubov’s lemma (which we will prove via Fourier analysis in Chapter 3) asserts that if \(|A + A| \le K|A|\) in \(\mathbb{Z}/N\mathbb{Z}\), then \(2A - 2A\) contains a subspace (or a GAP) of bounded codimension.
Step 3: Geometry of numbers. Using Minkowski’s second theorem or John’s theorem, one refines the GAP to be proper and of controlled rank and volume.
The bounds obtained by Ruzsa’s method give \(d(K) \le CK^2 \log K\) and \(c(K) \le e^{CK^2 \log^2 K}\) for some absolute constant \(C\). ■
2.4 The Polynomial Freiman-Ruzsa Conjecture
The quantitative bounds in Freiman’s theorem were long suspected to be far from optimal. The polynomial Freiman-Ruzsa (PFR) conjecture, formulated independently by several researchers, posited that the bounds should be polynomial in \(K\).
This conjecture was resolved in a landmark paper by Gowers, Green, Manners, and Tao in 2023. Their proof uses a novel approach involving the concept of an “almost homomorphism” and a quantitative version of the Balog-Szemerédi-Gowers theorem. The result has far-reaching consequences throughout additive combinatorics, including polynomial bounds in many results that previously had exponential-type dependencies on the doubling constant.
2.5 Bogolyubov’s Lemma
Bogolyubov’s lemma, proved by Nikolai Bogolyubov in 1939, is a cornerstone result that connects sumset theory with structural information. It asserts that iterated sumsets of dense sets contain large structured subsets.
In the \(\mathbb{F}_p^n\) setting: if \(A \subseteq \mathbb{F}_p^n\) with \(|A| = \alpha \cdot p^n\), then \(2A - 2A\) contains a subspace of codimension at most \(2 \log_p(1/\alpha)\).
We will prove this via Fourier analysis in Chapter 3. The key insight is that the iterated sumset \(2A - 2A\) is the support of the convolution \(1_A * 1_A * 1_{-A} * 1_{-A}\), whose Fourier transform is \(|\widehat{1_A}|^4 \ge 0\), and Parseval’s identity forces this to be large enough to contain a structured subset.
2.6 Sanders’ Quasi-Polynomial Bounds
Sanders obtained a dramatic quantitative improvement of Bogolyubov’s lemma, giving quasi-polynomial bounds.
Sanders’ proof introduces the technique of “iterated Bogolyubov” and uses Chang’s theorem (discussed in Chapter 3) in a crucial way. This result was a key precursor to the Kelley-Meka breakthrough on Roth’s theorem.
Chapter 3: Fourier Analysis on Finite Abelian Groups
Fourier analysis is one of the most powerful tools in additive combinatorics. The idea of decomposing functions on finite groups into characters dates back to the work of Gauss, Dirichlet, and others in number theory, but its systematic application to combinatorial problems is a more recent development, pioneered by Roth (1953) and developed extensively by Gowers, Green, Tao, and others. In this chapter, we develop the Fourier-analytic toolkit and apply it to prove Bogolyubov’s lemma and Chang’s theorem.
3.1 Characters and the Fourier Transform
For \(G = \mathbb{F}_p^n\), the characters are \(\chi_t(x) = \omega^{t \cdot x}\) for \(t \in \mathbb{F}_p^n\), where \(\omega = e^{2\pi i/p}\) and \(t \cdot x = \sum_{i=1}^n t_i x_i\) is the standard inner product.
The orthogonality relations for characters are fundamental.
For concreteness, we specialize to the two main settings.
3.2 Parseval’s Identity and Basic Inequalities
3.3 Convolutions and Sumsets
The connection between Fourier analysis and sumsets comes through convolutions.
3.4 Large Fourier Coefficients and Additive Structure
A central theme in additive combinatorics is that sets with additive structure have large Fourier coefficients, and conversely, the existence of large Fourier coefficients implies additive structure.
3.5 Bogolyubov’s Argument via Fourier Analysis
We now prove Bogolyubov’s lemma using Fourier analysis, in the clean \(\mathbb{F}_p^n\) setting.
Let us take a different approach. We claim: \(2A - 2A \supseteq V^\perp\) where \(V = \mathrm{Spec}_{\sqrt{\alpha}}(A) = \{t : |\widehat{1_A}(t)| \ge \sqrt{\alpha} \cdot \alpha\}\). Wait, we need to be more precise.
\[ f(x) = \sum_t |\widehat{1_A}(t)|^4 \omega^{t \cdot x}. \]\[ f(x) = \sum_{t \in \Lambda} |\widehat{1_A}(t)|^4 \omega^{t \cdot x} + \sum_{t \notin \Lambda} |\widehat{1_A}(t)|^4 \omega^{t \cdot x}. \]\[ \sum_{t \notin \Lambda} |\widehat{1_A}(t)|^4 \le \delta^2 \sum_{t \notin \Lambda} |\widehat{1_A}(t)|^2 \le \delta^2 \alpha. \]\[ \sum_{t \in \Lambda} |\widehat{1_A}(t)|^4 \ge |\widehat{1_A}(0)|^4 = \alpha^4. \]\[ f(x) \ge \alpha^4 - \delta^2 \alpha. \]Choosing \(\delta^2 = \alpha^3 / 2\), i.e., \(\delta = \alpha^{3/2}/\sqrt{2}\), we get \(f(x) \ge \alpha^4/2 > 0\), so \(x \in 2A - 2A\).
\[ |\Lambda| \le \frac{\alpha}{\delta^2} = \frac{\alpha}{\alpha^3/2} = \frac{2}{\alpha^2}. \]Hence \(\Lambda^\perp\) has codimension at most \(\lceil 2/\alpha^2 \rceil\), and \(2A - 2A \supseteq \Lambda^\perp\). ■
3.6 Chang’s Theorem
Chang’s theorem provides a much more refined structural result about the large spectrum.
The proof of Chang’s theorem relies on a clever entropy argument. The key idea is that if \(|\widehat{1_A}(t)|\) is large for many \(t\) that are linearly independent, then \(A\) would have to be “spread out” in too many directions, contradicting its density.
Chapter 4: Roth’s Theorem on Arithmetic Progressions
The problem of finding arithmetic progressions in dense subsets of the integers is one of the oldest and most celebrated in combinatorics and number theory. In 1936, Erdős and Turán conjectured that any subset of the natural numbers with positive upper density contains arithmetic progressions of every length. Roth proved the case of three-term progressions in 1953, using Fourier analysis in a landmark paper that opened up entirely new directions in combinatorics. In this chapter, we give a complete proof of Roth’s theorem and discuss the subsequent dramatic improvements.
4.1 Statement and History
Roth’s original quantitative bound was \(r_3(N) = O(N / \log \log N)\). This has been improved many times:
- Szemerédi (1990): \(O(N (\log N)^{-c})\) for some small \(c > 0\).
- Heath-Brown (1987) and Szemerédi (1990): \(O(N / (\log N)^c)\).
- Bourgain (1999, 2008): \(O(N (\log \log N / \log N)^{1/2})\).
- Sanders (2011): \(O(N (\log \log N)^5 / \log N)\).
- Bloom (2016): \(O(N (\log \log N)^4 / \log N)\).
- Bloom-Sisask (2020): \(O(N / (\log N)^{1+c})\).
- Kelley-Meka (2023): \(O(N \exp(-c (\log N)^{1/12}))\).
The Kelley-Meka bound is an exponential improvement over all previous results and is remarkably close to the Behrend lower bound.
4.2 Roth’s Theorem in \(\mathbb{F}_3^n\) (Meshulam’s Theorem)
For clarity of exposition, we first prove Roth’s theorem in the model setting of \(\mathbb{F}_3^n\), where the Fourier analysis is cleaner. This version is due to Meshulam (1995).
since \(\sum_x f(x) \omega^{t \cdot x} = 3^n \widehat{f}(t)\) (with our convention \(\widehat{f}(t) = \mathbb{E}_x f(x) \omega^{-t \cdot x}\); note \(\omega^{t \cdot x} = \omega^{-t \cdot x}\) in \(\mathbb{F}_3\) since \(-1 \equiv 2 \pmod{3}\) and we can absorb the sign into \(t\)). Let us just proceed with the standard computation.
We have \(T = 3^{2n} \sum_t \widehat{f}(t)^3\). If \(A\) has no non-trivial 3-AP, then the only solutions to \(x + y + z = 0\) in \(A\) have \(x = y = z\), so \(T = |A| = \alpha \cdot 3^n\).
\[ \alpha \cdot 3^n = 3^{2n} \sum_t \widehat{f}(t)^3 = 3^{2n} \left( \alpha^3 + \sum_{t \neq 0} \widehat{f}(t)^3 \right). \]\[ \left| \sum_{t \neq 0} \widehat{f}(t)^3 \right| = \left| \frac{\alpha}{3^n} - \alpha^3 \right| \ge \frac{\alpha}{3^n} - \alpha^3 \ge \frac{\alpha}{2 \cdot 3^n} \]provided \(\alpha^3 \le \alpha/(2 \cdot 3^n)\), i.e., \(\alpha \le (2 \cdot 3^n)^{-1/2}\). But actually, the density increment works regardless. Let us proceed differently.
\[ \sum_{t \neq 0} \widehat{f}(t)^3 = \frac{\alpha}{3^n} - \alpha^3. \]\[ \left|\frac{\alpha}{3^n} - \alpha^3\right| \le \max_{t \neq 0} |\widehat{f}(t)| \cdot \sum_{t \neq 0} |\widehat{f}(t)|^2 \le \max_{t \neq 0} |\widehat{f}(t)| \cdot \alpha, \]using Parseval: \(\sum_t |\widehat{f}(t)|^2 = \mathbb{E}_x |f(x)|^2 = \alpha\) (since \(f = 1_A\)), so \(\sum_{t \neq 0} |\widehat{f}(t)|^2 \le \alpha\).
\[ \alpha^2 - \frac{1}{3^n} \le \max_{t \neq 0} |\widehat{f}(t)|. \]For \(\alpha > 3/n\) and \(n\) sufficiently large, \(\alpha^2/2 \le \max_{t \neq 0} |\widehat{f}(t)|\), so there exists \(t_0 \neq 0\) with \(|\widehat{f}(t_0)| \ge \alpha^2/2\).
Step 3: Passing to a subspace. Let \(V = t_0^\perp = \{x \in \mathbb{F}_3^n : t_0 \cdot x = 0\}\), a subspace of codimension 1 (so \(|V| = 3^{n-1}\)). The group \(\mathbb{F}_3^n\) decomposes as \(V \sqcup (V + v) \sqcup (V + 2v)\) for some \(v \notin V\). By the pigeonhole principle, at least one coset \(V + jv\) contains at least \(\alpha |V|\) elements of \(A\). In fact, the large Fourier coefficient gives a density increment: the density of \(A\) on the coset \(V + jv\) (for the right \(j\)) is at least \(\alpha + \alpha^2/6 > \alpha\).
More precisely, \(\widehat{f}(t_0) = \mathbb{E}_x 1_A(x) \omega^{-t_0 \cdot x}\). Let \(\alpha_j = |A \cap (V+jv)|/|V|\) for \(j = 0, 1, 2\). Then \(\widehat{f}(t_0) = \frac{1}{3}(\alpha_0 + \alpha_1 \omega^{-j_1} + \alpha_2 \omega^{-j_2})\) for appropriate phases, and \(|\widehat{f}(t_0)| \ge \alpha^2/2\) implies that some \(\alpha_j \ge \alpha + c\alpha^2\) for a constant \(c > 0\).
Now \(A \cap (V + jv)\) is a subset of a coset of a codimension-1 subspace, and after translating, it becomes a subset of \(\mathbb{F}_3^{n-1}\) with density at least \(\alpha + c\alpha^2\) and no 3-AP. We can iterate: after \(m\) iterations, we have a subset of \(\mathbb{F}_3^{n-m}\) with density at least \(\alpha + mc\alpha^2\) and no 3-AP. Since the density cannot exceed 1, we need \(mc\alpha^2 \le 1\), so \(m \le 1/(c\alpha^2)\). But we also need \(n - m \ge 1\), so \(n \ge m + 1 \ge 1/(c\alpha^2)\), giving \(\alpha \le C/\sqrt{n}\). Wait, this gives \(\alpha = O(n^{-1/2})\) rather than \(O(1/n)\).
The correct iteration is: at each step, the density increases by a multiplicative factor, not additive. Specifically, the new density \(\alpha' \ge \alpha(1 + c'\alpha)\) for some \(c' > 0\). Then after \(m\) steps, \(\alpha_m \ge \alpha_0 \prod_{i=0}^{m-1}(1 + c'\alpha_i)\). Since \(\alpha_i\) is increasing, \(\alpha_m \ge \alpha_0(1 + c'\alpha_0)^m\). For this to remain at most 1, we need \((1 + c'\alpha_0)^m \le 1/\alpha_0\), so \(m \le \log(1/\alpha_0) / \log(1 + c'\alpha_0) \le O(1/(c'\alpha_0) \cdot \log(1/\alpha_0))\). But we also need \(m \le n-1\). Actually, a cleaner bookkeeping: track \(1/\alpha\). At each step, \(1/\alpha'\) decreases by a constant. So after \(O(1/\alpha)\) steps we reach density 1 (or the ambient space becomes trivial). This requires \(n \ge C/\alpha\), giving \(\alpha \ge C/n\). This gives the bound \(|A| \le C \cdot 3^n/n\). ■
4.3 Roth’s Theorem in the Integers
Step 1: Embedding in \(\mathbb{Z}/N'\mathbb{Z}\). We embed \(\{1, \ldots, N\}\) into \(\mathbb{Z}/N'\mathbb{Z}\) for a prime \(N' \ge 3N\), so that any 3-AP in \(\mathbb{Z}/N'\mathbb{Z}\) with elements in \(\{1, \ldots, N\}\) is a genuine 3-AP. Set \(A \subseteq \mathbb{Z}/N'\mathbb{Z}\), \(|A| = \alpha N'\) where \(\alpha \ge \delta/3\).
\[ \Lambda_3(A) = \sum_{a, d \in \mathbb{Z}/N'\mathbb{Z}} 1_A(a) 1_A(a+d) 1_A(a+2d) = N' \sum_r \widehat{1_A}(r)^2 \widehat{1_A}(-2r). \]\[ \Lambda_3(A) = \sum_{x, y, z \in \mathbb{Z}/N'\mathbb{Z}} 1_A(x) 1_A(y) 1_A(z) \cdot \mathbf{1}[x - 2y + z = 0]. \]\[ \Lambda_3(A) = N' \sum_r \widehat{f}(r)^2 \overline{\widehat{f}(2r)}, \]\[ \Lambda_3(A) = (N')^3 \mathbb{E}_{x,y,z} f(x) f(y) f(z) \mathbf{1}[x-2y+z=0] = (N')^2 \sum_r \widehat{f}(r)^2 \widehat{f}(-2r). \]\[ \left|\alpha^3 - \frac{\alpha}{(N')^2}\right| \le \left|\sum_{r \neq 0} \widehat{f}(r)^2 \widehat{f}(-2r)\right| \le \sup_{r \neq 0} |\widehat{f}(r)| \cdot \sum_r |\widehat{f}(r)|^2 = \sup_{r \neq 0} |\widehat{f}(r)| \cdot \alpha. \]So if \(A\) has no non-trivial 3-AP, then \(\sup_{r \neq 0} |\widehat{f}(r)| \ge \alpha^2 - O(1/N') \ge \alpha^2/2\) for \(N'\) large enough.
Step 3: Density increment on a long arithmetic progression. A large Fourier coefficient \(|\widehat{f}(r)| \ge \alpha^2/2\) at some \(r \neq 0\) means that \(A\) correlates with the character \(e^{2\pi i rx/N'}\). Unlike the \(\mathbb{F}_p^n\) case where we could pass to a subspace, in \(\mathbb{Z}/N'\mathbb{Z}\) we partition into long arithmetic progressions. Specifically, partition \(\mathbb{Z}/N'\mathbb{Z}\) into roughly \(r\) arithmetic progressions of common difference \(r\) and length roughly \(N'/r\). On at least one of these progressions \(P\), the density of \(A\) is at least \(\alpha + c\alpha^2\).
Step 4: Iteration. We now have a dense subset of a long arithmetic progression with no 3-AP and increased density. By a linear change of variables, a subset of an arithmetic progression of length \(M\) and common difference \(d\) is equivalent to a subset of \(\{1, \ldots, M\}\). So we can iterate the argument.
At each step, the density increases by \(c\alpha^2\), and the length of the ambient progression decreases by a factor. After \(O(1/\alpha)\) steps, the density exceeds 1, which is a contradiction. The length must remain positive throughout, giving the bound \(N \ge \exp(\exp(C/\delta))\) or similar, which proves that \(r_3(N) = o(N)\). Roth’s original argument gives \(r_3(N) = O(N/\log\log N)\). ■
4.4 Behrend’s Construction
Behrend’s construction (1946) provides the best known lower bound for \(r_3(N)\), showing that the trivially dense sets without 3-APs can be surprisingly large.
Now, consider the sphere \(S_r = \{x \in \{0, \ldots, d-1\}^k : x_1^2 + x_2^2 + \cdots + x_k^2 = r\}\). Since \(x + z = 2y\) implies \(\|x\|^2 + \|z\|^2 = 2\|y\|^2 + \frac{1}{2}\|x-z\|^2 \ge 2\|y\|^2\), a 3-AP on the sphere requires \(\|x\|^2 = \|y\|^2 = \|z\|^2\), which then forces \(x - z = 0\) — but this is not quite right. The point is that the sphere is 3-AP-free as a subset of \(\mathbb{Z}^k\) if \(\|x\|^2 + \|z\|^2 > 2\|y\|^2\) for non-constant APs, which follows from strict convexity of \(\|\cdot\|^2\).
More precisely: if \(x + z = 2y\), then \(\|x\|^2 + \|z\|^2 = 2\|y\|^2 + \frac{1}{2}\|x - z\|^2\), so \(\|x\|^2 + \|z\|^2 \ge 2\|y\|^2\) with equality if and only if \(x = z\). Hence if \(x, y, z \in S_r\), then \(2r = \|x\|^2 + \|z\|^2 \ge 2\|y\|^2 = 2r\), so equality holds and \(x = z\), hence \(y = x = z\). Thus \(S_r\) is 3-AP-free in \(\mathbb{Z}^k\).
By pigeonhole, since \(\sum_r |S_r| = d^k\) and \(r\) ranges over \(\{0, 1, \ldots, k(d-1)^2\}\), there exists \(r\) with \(|S_r| \ge d^k / (k(d-1)^2 + 1) \ge d^k / (kd^2)\).
Setting \(N \approx d^k\), we optimize: choose \(k \approx \sqrt{\log N / \log d}\) and \(d\) appropriately. The result is \(|S_r| \ge N \exp(-C\sqrt{\log N})\) for some constant \(C\). ■
4.5 The Kelley-Meka Breakthrough
In 2023, Kelley and Meka achieved a stunning breakthrough, proving an exponential improvement to Roth’s theorem.
The Kelley-Meka proof builds on ideas of Bloom-Sisask (who first broke the logarithmic barrier) and introduces a novel “spectral boost” technique. The key innovation is a two-step density increment: first, a standard density increment based on large Fourier coefficients; second, a new increment that exploits the structure of the large spectrum (using ideas related to Chang’s theorem and Sanders’ quasi-polynomial bounds). Bloom and Sisask subsequently simplified the proof to give the bound with exponent \(1/9\), and Kelley and Meka later improved their own result.
Chapter 5: Szemerédi’s Regularity Lemma and Applications
Szemerédi’s regularity lemma is one of the most powerful tools in extremal combinatorics and additive combinatorics. It asserts that every large graph can be partitioned into a bounded number of parts such that the bipartite graph between most pairs of parts behaves “pseudo-randomly.” Despite (or perhaps because of) the enormous size of the partition, the regularity lemma has found applications far beyond its original purpose in Szemerédi’s proof of his theorem on arithmetic progressions.
5.1 Regular Pairs and the Regularity Lemma
In other words, \((X, Y)\) is \(\varepsilon\)-regular if the edge density between any two reasonably large subsets is close to the overall density. This is a combinatorial notion of pseudo-randomness.
- \(|V_0| \le \varepsilon |V|\) (exceptional set),
- \(|V_1| = |V_2| = \cdots = |V_k|\) (equal-sized parts),
- all but at most \(\varepsilon \binom{k}{2}\) pairs \((V_i, V_j)\) with \(1 \le i < j \le k\) are \(\varepsilon\)-regular.
The key claim is: if the partition \(\mathcal{P}\) is not \(\varepsilon\)-regular (i.e., too many pairs are irregular), then we can refine \(\mathcal{P}\) to a new partition \(\mathcal{P}'\) with \(q(\mathcal{P}') \ge q(\mathcal{P}) + \varepsilon^5\) (or a similar polynomial in \(\varepsilon\)).
Since the energy is bounded above by 1 and increases by at least \(\varepsilon^5\) at each step, the process terminates after at most \(1/\varepsilon^5\) refinements. At each step, the number of parts may increase (each part is subdivided into a bounded number of subparts), leading to the tower-type bound for \(M(\varepsilon, m)\).
To prove the key claim: if \((V_i, V_j)\) is an irregular pair, then by definition there exist \(A \subseteq V_i\) and \(B \subseteq V_j\) with \(|A| \ge \varepsilon |V_i|\), \(|B| \ge \varepsilon |V_j|\), and \(|d(A, B) - d(V_i, V_j)| > \varepsilon\). Refining the partition by splitting \(V_i\) into \(A\) and \(V_i \setminus A\) (and similarly \(V_j\)) increases the energy. By a convexity argument (the energy is the sum of squared densities, and splitting a pair with unequal sub-densities increases the sum of squares), the energy increment is at least \(\varepsilon^5\). ■
5.2 The Counting Lemma
The regularity lemma becomes truly powerful when combined with the counting lemma, which asserts that the number of copies of a fixed graph in a regular partition is approximately what one would expect in a random graph.
5.3 The Triangle Removal Lemma
One of the most important consequences of the regularity and counting lemmas is the triangle removal lemma, which has far-reaching applications.
If a triangle remains in the cleaned graph, its three vertices must lie in three distinct parts \(V_i, V_j, V_l\) that form an \(\varepsilon'\)-regular triple with densities at least \(\varepsilon'\). By the counting lemma, the number of such triangles is at least \(c(\varepsilon') n^3\) for some \(c(\varepsilon') > 0\). So if \(\delta < c(\varepsilon')\), no triangle survives, and the total number of removed edges is \(O(\varepsilon') n^2 \le \varepsilon n^2\). ■
5.4 Roth’s Theorem via the Regularity Method
The triangle removal lemma gives an elegant (though quantitatively weak) proof of Roth’s theorem.
- \(xy \in E(G)\) (for \(x \in X, y \in Y\)) if \(y - x \in A\),
- \(yz \in E(G)\) (for \(y \in Y, z \in Z\)) if \(z - y \in A\),
- \(xz \in E(G)\) (for \(x \in X, z \in Z\)) if \((z - x)/2 \in A\) (assuming \(N'\) is odd so that 2 is invertible).
A triangle \((x, y, z)\) in \(G\) corresponds to \(y - x, z - y, (z-x)/2 \in A\), which means \(y - x, z - y, (y-x + z-y)/2 \in A\). Setting \(a = y - x\), \(c = z - y\), \(b = (a+c)/2\), this is the triple \(a, b, c \in A\) with \(a + c = 2b\), i.e., \((a, b, c)\) is a 3-AP in \(A\).
Since each element \(a \in A\) gives rise to \(N'\) triangles (varying \(x\)), and each 3-AP \((a, b, c)\) gives rise to \(N'\) triangles, the number of triangles is at least \(\delta (N')^2\) (from the trivial APs \(a = b = c\)). By the triangle removal lemma, if \(A\) contains no non-trivial 3-AP, then the graph can be made triangle-free by removing \(o((N')^2)\) edges. But the trivial APs force at least \(|A| \cdot N' \ge \delta (N')^2\) triangles, each requiring the removal of an edge. This gives a contradiction for \(N'\) large enough. ■
5.5 The Graph Removal Lemma
The triangle removal lemma generalizes to arbitrary graphs.
5.6 Hypergraph Regularity
The extension of the regularity lemma to hypergraphs was a major challenge, resolved independently by Gowers and by Rödl, Nagle, Schacht, and Skokan around 2004-2006. The hypergraph regularity method provides a powerful tool for counting arithmetic progressions and other linear patterns.
Chapter 6: The Polynomial Method
The polynomial method is a family of techniques that use algebraic properties of polynomials to solve combinatorial and number-theoretic problems. In this chapter, we develop several incarnations of the polynomial method, from the classical combinatorial Nullstellensatz to the breakthrough cap set result.
6.1 The Combinatorial Nullstellensatz
where each \(g_j\) is a polynomial in \(x_2, \ldots, x_n\). The coefficient of \(x_1^{d_1} x_2^{d_2} \cdots x_n^{d_n}\) in \(f\) is the coefficient of \(x_2^{d_2} \cdots x_n^{d_n}\) in \(g_{d_1}\), which is non-zero by assumption. The degree of \(g_{d_1}\) is at most \(d - d_1 = d_2 + \cdots + d_n\), and the coefficient of the leading monomial \(x_2^{d_2} \cdots x_n^{d_n}\) is non-zero.
\[ h(x_1) = f(x_1, s_2, \ldots, s_n) = \sum_{j=0}^{d_1} x_1^j \cdot g_j(s_2, \ldots, s_n). \]The leading coefficient of \(h\) is \(g_{d_1}(s_2, \ldots, s_n) \neq 0\), so \(h\) is a polynomial of degree exactly \(d_1\). Since \(|S_1| > d_1\), there exists \(s_1 \in S_1\) with \(h(s_1) = f(s_1, s_2, \ldots, s_n) \neq 0\). ■
6.2 The Chevalley-Warning Theorem
Now we use the key fact: for a polynomial \(g(x_1, \ldots, x_n)\) over \(\mathbb{F}_p\) of degree less than \((p-1)n\), we have \(\sum_{x \in \mathbb{F}_p^n} g(x) = 0\) in \(\mathbb{F}_p\). This follows from the observation that the sum over \(\mathbb{F}_p\) of \(x^k\) is \(-1\) if \((p-1) | k\) and \(k > 0\), and \(0\) otherwise; a monomial of degree less than \((p-1)n\) cannot have all exponents divisible by \(p-1\) unless some exponent is zero, and the corresponding sum factors as a product of one-dimensional sums, at least one of which vanishes.
Hence \(N \equiv 0 \pmod{p}\) when computed modulo the corrections from the constant term, which is the sum of \(1\) over all \(x\), giving \(p^n \equiv 0\). More carefully, \(N \equiv p^n \equiv 0 \pmod{p}\). Wait, the constant term contribution is 1 from each \(x\), giving \(p^n\). But we need \(N = \sum_x \prod_i (1 - f_i(x)^{p-1})\), and each term in the expansion, when summed over \(\mathbb{F}_p^n\), vanishes modulo \(p\) by the degree argument, except possibly the constant term \(\sum_x 1 = p^n \equiv 0 \pmod{p}\). So \(N \equiv 0 \pmod{p}\). ■
6.3 The Erdős-Ginzburg-Ziv Theorem
The Erdős-Ginzburg-Ziv theorem is a classic application of the polynomial method (and of the Chevalley-Warning theorem).
We want to find \(x \in \mathbb{F}_p^{2p-1} \setminus \{0\}\) with \(x_i \in \{0, 1\}\) for all \(i\) (in \(\mathbb{F}_p\)), \(\sum x_i^{p-1} = p\) (i.e., exactly \(p\) of the \(x_i\) are nonzero), and \(\sum a_i x_i \equiv 0 \pmod p\). This is a bit more delicate.
Here is a cleaner approach via the Combinatorial Nullstellensatz. We use Alon’s theorem directly: consider the polynomial \(f(x_1, \ldots, x_{2p-1}) = \left(\sum_{i=1}^{2p-1} a_i x_i\right)^{p-1} \cdot \left(\sum_{i=1}^{2p-1} x_i^{p-1}\right)\) … but this becomes complicated.
\[ f(x) = a_1 x_1 + a_2 x_2 + \cdots + a_{2p-1} x_{2p-1}, \qquad g(x) = x_1^{p-1} + \cdots + x_{2p-1}^{p-1}. \]The degrees are \(\deg f = 1\) and \(\deg g = p-1\), so \(\deg f + \deg g = p < 2p - 1 = n\) (the number of variables). By Chevalley-Warning, the number of common zeros is divisible by \(p\). The zero vector is one solution, so there is another solution \(x \neq 0\) with \(\sum a_i x_i \equiv 0 \pmod p\) and … but \(g(x) = 0\) means the number of non-zero coordinates is divisible by \(p-1\)… this is not quite what we want.
Actually, the standard proof uses the following approach. We want to show that among any \(2p-1\) elements of \(\mathbb{Z}/p\mathbb{Z}\), there exist \(p\) whose sum is zero. Consider the elementary symmetric polynomials and use the Davenport constant. The Chevalley-Warning approach works as follows.
\[ P(x_1, \ldots, x_{2p-1}) = \left(1 - \left(\sum_{i=1}^{2p-1} a_i x_i \right)^{p-1}\right) \cdot \prod_{i=1}^{2p-1}(1 - x_i^{p-1}). \]This polynomial evaluates to 1 at a point \(x = (x_1, \ldots, x_{2p-1}) \in \mathbb{F}_p^{2p-1}\) if and only if all \(x_i = 0\) and the “vacuously” the sum condition holds. But we actually want to select a \(p\)-element subset…
The most direct proof for the prime case uses the following: by the Davenport constant \(D(\mathbb{Z}/p\mathbb{Z}) = 2p - 1\), any sequence of \(2p - 1\) elements has a non-empty subsequence of length at most \(p\) summing to zero. Since the Davenport constant is defined as the smallest \(d\) such that every sequence of length \(d\) has a non-empty zero-sum subsequence, and the Erdős-Ginzburg-Ziv theorem requires a zero-sum subsequence of length exactly \(p\), a slightly different argument is needed. The complete proof uses a combination of the Cauchy-Davenport theorem and induction, or alternatively the Chevalley-Warning theorem applied to appropriately constructed polynomials. ■
6.4 The Cap Set Problem and the Croot-Lev-Pach Method
The cap set problem asks for the maximum size of a subset of \(\mathbb{F}_3^n\) with no three-term arithmetic progression (equivalently, no three distinct points \(x, y, z\) with \(x + y + z = 0\)). This is the \(\mathbb{F}_3^n\) analogue of the Roth/Szemerédi problem.
The function \(r_3(\mathbb{F}_3^n)\) denotes the maximum size of such a “cap set.” Meshulam’s theorem (Theorem 4.2) gives \(r_3(\mathbb{F}_3^n) \le 3^n \cdot O(1/n)\). For decades, the best bound was of this form, and it was a major open problem to determine whether \(r_3(\mathbb{F}_3^n) \le c^n\) for some \(c < 3\).
In 2016, a stunning breakthrough resolved this problem.
The proof is remarkably short and elegant, using the “polynomial method” in a novel way. The key new idea, due to Croot, Lev, and Pach (who proved an analogous result for \(\mathbb{F}_4^n\)), is the use of the “slice rank” of a tensor.
… this is getting complicated. Let us use the standard “slice rank” argument.
Define the tensor \(T: A \times A \times A \to \mathbb{F}_3\) by \(T(x, y, z) = \delta(x + y + z = 0)\). If \(A\) has no 3-AP, then \(T\) is supported on the “diagonal” \(\{(x, x, x) : x \in A, \, 3x = 0\}\), and since we are in \(\mathbb{F}_3\), \(3x = 0\) always, so the support is \(\{(x, x, x) : x \in A\}\). Thus the diagonal restriction of \(T\) has \(|A|\) non-zero entries.
\[ T(x, y, z) = \sum_i f_i(x) \cdot g_i(y, z) + \sum_j h_j(y) \cdot k_j(x, z) + \sum_l m_l(z) \cdot n_l(x, y). \]\[ \delta(x + y + z = 0) = \prod_{i=1}^n (1 - (x_i + y_i + z_i)^2). \]\[ \prod_{i=1}^n (1 - (x_i + y_i + z_i)^2) = \sum_{S \subseteq [n]} (-1)^{|S|} \prod_{i \in S} (x_i + y_i + z_i)^2. \]\[ \delta(x+y+z = 0) = \sum_{\alpha + \beta + \gamma = (2, 2, \ldots, 2)_S \text{ for various } S} c_{\alpha,\beta,\gamma} \, x^\alpha y^\beta z^\gamma, \]where the monomials satisfy \(\deg_x + \deg_y + \deg_z = 2|S|\) with each coordinate contributing degree at most 2. In \(\mathbb{F}_3^n\), we reduce modulo \(x_i^3 = x_i\), so each variable has degree at most 2 in each coordinate.
The key observation: the total degree is \(2n\), so for each monomial \(x^\alpha y^\beta z^\gamma\) appearing in the expansion, we have \(|\alpha| + |\beta| + |\gamma| = 2n\) (where \(|\alpha| = \sum \alpha_i\), with each \(\alpha_i \in \{0, 1, 2\}\)). By the pigeonhole principle, at least one of \(|\alpha|, |\beta|, |\gamma|\) is at most \(2n/3\).
\[ T(x, y, z) = \sum_{|\alpha| \le 2n/3} c_\alpha(x) \cdot g_\alpha(y, z) + \sum_{|\beta| \le 2n/3} d_\beta(y) \cdot h_\beta(x, z) + \sum_{|\gamma| \le 2n/3} e_\gamma(z) \cdot k_\gamma(x, y), \]where the sums are over multi-indices with the indicated degree bounds.
The number of multi-indices \(\alpha \in \{0, 1, 2\}^n\) with \(|\alpha| \le 2n/3\) is at most \(D_n\), where \(D_n = |\{\alpha \in \{0,1,2\}^n : |\alpha| \le 2n/3\}|\).
Thus the slice rank of \(T\) is at most \(3 D_n\).
Since \(T\) restricted to the diagonal has \(|A|\) non-zero entries, and the slice rank gives an upper bound on the diagonal support size (by a simple linear algebra argument: on the diagonal, the decomposition gives \(T(x,x,x) = \sum_i f_i(x) g_i(x,x) + \cdots\), and the rank of the diagonal matrix is at most the slice rank, so \(|A| \le 3D_n\)).
\[ D_n \le \left(\frac{3}{2^{2/3}}\right)^n. \]Therefore \(|A| \le 3 D_n \le 3 (3/2^{2/3})^n\). ■
6.5 Slice Rank and Partition Rank
The success of the polynomial method in the cap set problem has led to the development of a systematic theory of “tensor ranks” in additive combinatorics.
The partition rank of \(T\) is the minimum \(r\) such that \(T\) can be written as a sum of \(r\) tensors, each of which factors as a product of two tensors over a non-trivial partition of the indices \(\{1, \ldots, k\}\) into two parts.
6.6 The Sunflower Conjecture
The polynomial method has also made contributions to the sunflower conjecture.
The Erdős-Ko sunflower conjecture posits that the \(w!\) factor can be replaced by \(c_k^w\) for some constant \(c_k\) depending only on \(k\). In 2019, Alweiss, Lovett, Wu, and Zhang achieved a major breakthrough, proving a bound of \((\log w)^{O(w)}\) (up to lower-order terms), which was subsequently improved to \(O(\log w)^w\) by Rao and others. While the proof uses a “spread” technique rather than the polynomial method directly, the ideas are deeply connected to the slice rank methods developed for the cap set problem.
Chapter 7: Szemerédi’s Theorem and Beyond
We conclude these notes with the deepest results in additive combinatorics: Szemerédi’s theorem on arithmetic progressions of arbitrary length, its various proofs, the Green-Tao theorem on primes, and the connections to ergodic theory and combinatorics. These results represent a remarkable confluence of ideas from across mathematics.
7.1 Szemerédi’s Theorem
Equivalently, defining \(r_k(N)\) as the maximum size of a subset of \(\{1, \ldots, N\}\) with no \(k\)-AP, Szemerédi’s theorem asserts that \(r_k(N) = o(N)\) for every fixed \(k\).
7.2 Four Proofs: An Overview
Szemerédi’s theorem has been proved by at least four fundamentally different methods, each opening up new areas of mathematics:
1. Szemerédi’s combinatorial proof (1975). The original proof uses the regularity lemma (Chapter 5) along with a complex combinatorial argument involving “density increment” on structured subsets. The regularity lemma decomposes the “graph” of the problem into pseudo-random pieces, and the density increment iteratively refines the analysis until a progression is found.
2. Furstenberg’s ergodic proof (1977). Furstenberg’s proof translates the problem into ergodic theory using the “Furstenberg correspondence principle” (see below) and then proves the result as a multiple recurrence theorem. This approach, while non-constructive, opened up the field of ergodic Ramsey theory and led to many generalizations.
3. Gowers’ Fourier-analytic proof (1998, 2001). Gowers introduced higher-order Fourier analysis (the “Gowers norms” \(U^k\)) and used a quantitative density increment argument to prove Szemerédi’s theorem with explicit bounds. For \(k = 4\), he obtained \(r_4(N) \le N (\log \log N)^{-c}\), and for general \(k\), he obtained bounds of tower type. This work earned Gowers the Fields Medal in 1998.
4. Hypergraph regularity proof (2004-2006). Gowers and, independently, Rödl-Nagle-Schacht-Skokan extended the regularity-counting-removal framework to hypergraphs, providing a new proof of Szemerédi’s theorem through a “hypergraph removal lemma.”
7.3 Furstenberg’s Correspondence Principle
To construct the measure \(\mu\), take a subsequence of the Cesàro averages \(\frac{1}{N} \sum_{n=1}^{N} \delta_{T^n \omega_A}\) that converges weak-* to a \(T\)-invariant probability measure \(\mu\) on \(X\) (this exists by compactness of the space of probability measures). One can ensure \(\mu(E) = \bar{d}(A)\) by choosing the subsequence appropriately. The correspondence between patterns in \(A\) and multiple recurrence of \(T\) then follows from the construction. ■
7.4 The Multiple Recurrence Theorem
With the correspondence principle in hand, Szemerédi’s theorem reduces to the following purely ergodic-theoretic result.
The key innovation is the use of the Furstenberg-Zimmer structure theorem, which constructs the maximal compact factor of a system. For a \(\mathbb{Z}\)-action, this is the Kronecker factor (the closed linear span of the eigenfunctions). The multiple recurrence theorem is first proved for the Kronecker factor (using equidistribution of polynomial sequences), and then extended to the full system by showing that the “remainder” (the weak-mixing part) does not affect the limit. ■
7.5 Gowers Norms and Higher-Order Fourier Analysis
Gowers’ quantitative approach to Szemerédi’s theorem introduced a hierarchy of norms that measure the “complexity” of functions, generalizing the Fourier transform to higher orders.
The precise statement of the inverse theorem for general \(k\), proved by Green, Tao, and Ziegler (2012), involves nilsequences and the theory of nilmanifolds, connecting additive combinatorics to the theory of Lie groups and homogeneous dynamics.
7.6 The Green-Tao Theorem
The Green-Tao theorem, proved in 2008, is one of the most celebrated results in 21st-century mathematics.
Stage 1: Reformulation. The primes have density zero in the integers, so Szemerédi’s theorem does not apply directly. Green and Tao show that it suffices to find a “pseudo-random” measure \(\nu\) that majorizes the primes (or rather, a modified version of the primes) and satisfies certain correlation conditions.
Stage 2: The transference principle. The key innovation is a “transference principle” that allows one to transfer results from the dense setting (Szemerédi’s theorem for dense subsets of \(\{1, \ldots, N\}\)) to the sparse setting (subsets of a pseudo-random set). Green and Tao show that if \(\nu\) is a pseudo-random measure (satisfying a “linear forms condition” and a “correlation condition”) and \(A\) is a subset of the support of \(\nu\) with positive relative density, then \(A\) contains \(k\)-APs.
The transference principle works by showing that the “balanced function” \(f = 1_A - \delta \cdot \nu\) (where \(\delta\) is the relative density) has small Gowers \(U^{k-1}\) norm relative to the pseudo-random measure. This allows one to replace \(1_A\) by \(\delta \cdot \nu\) in the counting operator for \(k\)-APs, reducing to the pseudo-random case, which is easy.
Stage 3: Constructing the pseudo-random measure. Green and Tao construct \(\nu\) using a “W-trick” (working modulo the product of small primes to remove the obvious bias from the primes) and Goldston-Yıldırım sieve weights. The key input from analytic number theory is a truncated version of the Hardy-Littlewood conjectures, verified for the modified sieve weights.
Stage 4: Verifying the linear forms condition. The most technically demanding part is showing that \(\nu\) satisfies the linear forms condition: for any system of linear forms \(\psi_i(x) = L_i \cdot x + b_i\) of finite complexity, the average of \(\prod_i \nu(\psi_i(x))\) equals \(1 + o(1)\). This requires deep estimates from multiplicative number theory, specifically a generalization of the Goldston-Pintz-Yıldırım estimates. ■
The Green-Tao proof does not give explicit bounds on the smallest \(k\)-AP in the primes, and finding such bounds (even for \(k = 3\)) remains a challenging open problem.
7.7 The Hales-Jewett Theorem
The Hales-Jewett theorem is a foundational result in Ramsey theory that implies Szemerédi’s theorem (and much more) but is formulated in the purely combinatorial setting of “combinatorial lines” in \([k]^n\).
7.8 The Density Hales-Jewett Theorem
The density version of the Hales-Jewett theorem was proved by the remarkable Polymath1 project, the first successful large-scale collaborative mathematics project.
7.9 Connections to Ergodic Theory and Model Theory
The interplay between additive combinatorics and other areas of mathematics has been one of the most productive themes in modern mathematics.
Ergodic theory. The Furstenberg correspondence principle (Theorem 7.3) established a deep connection between combinatorial results about the integers and recurrence properties of measure-preserving systems. This connection has been vastly generalized:
- The Bergelson-Leibman theorem (1996) proves a polynomial extension of Szemerédi’s theorem: dense sets contain polynomial progressions \(a, a + p_1(d), a + p_2(d), \ldots\) for any polynomials \(p_i\) with \(p_i(0) = 0\).
- Host and Kra (2005) developed the theory of “characteristic factors” and “nilpotent systems” that underlie the convergence of multiple ergodic averages.
- Austin (2015) proved the convergence of multiple ergodic averages for commuting transformations.
Model theory and ultraproducts. In recent years, model-theoretic methods — particularly the use of ultraproducts and nonstandard analysis — have provided new proofs and perspectives on results in additive combinatorics. The “nonstandard” proof of Szemerédi’s regularity lemma and the use of ultralimits in the Green-Tao transference principle are notable examples.
These notes provide an introduction to the major themes of additive combinatorics. The subject is vast and rapidly developing, and the reader is encouraged to consult the references listed at the top of these notes for complete proofs and further developments.