PMATH 457: Topological Dynamics and Ergodic Theory
Estimated study time: 3 hr 55 min
Table of contents
Sources and References
Primary textbooks — N. Hindman & D. Strauss, Algebra in the Stone-Čech Compactification, 2nd ed., de Gruyter, 2012; J. Auslander, Minimal Flows and Their Extensions, North-Holland, 1988; H. Furstenberg, Recurrence in Ergodic Theory and Combinatorial Number Theory, Princeton University Press, 1981.
Supplementary texts — W. Parry, Topics in Ergodic Theory, Cambridge University Press, 1981; P. Walters, An Introduction to Ergodic Theory, Springer, 1982; T. Downarowicz, Entropy in Dynamical Systems, Cambridge University Press, 2011; D. Rudolph, Fundamentals of Measurable Dynamics, Oxford University Press, 1990; E. Glasner, Ergodic Theory via Joinings, AMS, 2003.
Online resources — T. Tao’s blog posts on ergodic theory and the Furstenberg correspondence principle; MIT OpenCourseWare 18.125 (Real Analysis); Cambridge Part III lecture notes on ergodic theory; Stanford lecture notes on topological dynamics; notes by Kra, Host, and Frantzikinakis on ergodic theory and combinatorics.
Historical and Motivational Introduction
The subject of dynamical systems — the mathematical study of how points in a space evolve under repeated application of a rule — has its roots in one of the most celebrated failures in mathematical physics. In 1890, Henri Poincaré proved his recurrence theorem while attempting to solve the three-body problem: any measure-preserving transformation of a bounded region of phase space must, with probability one, return arbitrarily close to any given starting point. Poincaré’s result was the first rigorous theorem in what would become ergodic theory. It shattered the naive hope of finding closed-form solutions to the three-body problem, but in doing so it opened an entirely new vista — the statistical, asymptotic study of dynamical behavior over long time scales.
The second great impetus came from Ludwig Boltzmann’s ergodic hypothesis in statistical mechanics, roughly contemporaneous with Poincaré. Boltzmann postulated that a gas molecule moving through phase space visits, over infinite time, every state on its energy surface with equal frequency. This was the physical intuition that time averages and space averages should coincide. Making Boltzmann’s hypothesis rigorous occupied mathematicians for four decades until George David Birkhoff proved his pointwise ergodic theorem in 1931: for almost every starting point, time averages of integrable observables converge to their space averages. The key names in the early development are Birkhoff (ergodic theorem, 1931), von Neumann (mean ergodic theorem, 1932), and Hopf (ergodic decomposition). Then came the entropy revolution: Kolmogorov (1958) imported Shannon’s information-theoretic entropy into dynamics, creating an isomorphism invariant that could distinguish non-isomorphic Bernoulli shifts. Ornstein (1970) showed entropy is actually a complete invariant for Bernoulli shifts.
The modern subject bifurcates into two intertwined branches. Topological dynamics studies continuous group actions on compact Hausdorff spaces, asking questions about minimality, equicontinuity, proximality, and the algebraic structure of Ellis semigroups. Ergodic theory studies measure-preserving transformations and group actions on probability spaces, asking about mixing, spectral properties, entropy, and orbit equivalence. The two branches are connected at the deepest level: Furstenberg’s 1967–1977 work showed that topological notions (distal extensions, equicontinuous towers) have exact measure-theoretic parallels (Furstenberg–Zimmer structure theory), and that ergodic theory can prove purely combinatorial results such as Szemerédi’s theorem (1975) on arithmetic progressions. Other landmark names are: Furstenberg (structure theorems and combinatorics, 1967–1981), Ornstein (isomorphism theory, 1970), Ratner (rigidity of unipotent flows, 1991, proving the Raghunathan conjectures), and Einsiedler–Katok–Lindenstrauss (quantum unique ergodicity and Littlewood’s conjecture, 2006). These notes develop both branches, from the foundational compactness arguments through the Stone–Čech compactification and ultrafilter methods, to entropy theory and Furstenberg’s correspondence principle.
Chapter 1: Compact Topological Dynamical Systems
Section 1.1: Basic Definitions and Examples
The central objects of topological dynamics are flows — continuous actions of groups on compact Hausdorff spaces. This chapter develops the foundational vocabulary and establishes the first structural theorems.
When \( G = \mathbb{Z} \) or \( G = \mathbb{Z}_{\geq 0} \), the system is generated by a single homeomorphism (resp. continuous self-map) \( T = \pi(1, \cdot) : X \to X \), and we write the system as \( (X, T) \). When \( G = \mathbb{R} \) the system is called a continuous flow.
We will frequently drop \( \pi \) from the notation, writing \( g \cdot x \) or \( gx \) for \( \pi(g, x) \). For a fixed \( g \in G \), the map \( x \mapsto gx \) is a homeomorphism of \( X \) (when \( G \) is a group); for a fixed \( x \in X \), the map \( g \mapsto gx \) is continuous and called the orbit map at \( x \).
Section 1.2: Invariant Sets and Minimality
Minimality captures the idea of a dynamical system with no proper closed invariant pieces — every orbit is dense, so the system cannot be decomposed into simpler subsystems. Geometrically, a minimal system is “irreducible”: there is no smaller closed invariant set to which one could restrict attention. The concept is the topological analog of ergodicity in the measure-theoretic setting: just as an ergodic measure-preserving system has no non-trivial invariant sets of positive measure, a minimal topological system has no proper closed invariant sets at all. The two notions interact subtly: a minimal system need not support an ergodic measure, and an ergodic measure-preserving system need not be supported on a minimal topological orbit.
- positively invariant if \( gY \subseteq Y \) for all \( g \in G \) (when \( G \) acts by semigroup);
- invariant if \( gY = Y \) for all \( g \in G \).
For metrizable \( X \), equicontinuous minimal flows are exactly the isometric flows — those that preserve a compatible metric. They are closely related to compact group rotations, as we shall see in the chapter on Bohr compactification.
Section 1.3: Proximality and Distality
Two points \( x, y \in X \) capture qualitatively different behaviors under the action.
- proximal if every pair \( (x, y) \) is proximal;
- distal if every pair of distinct points is distal.
Section 1.4: Morphisms and Extensions
Section 1.5: Recurrence
Section 1.6: Worked Examples — Minimality and Equicontinuity
Proof that every orbit is dense. Fix \( x \in \mathbb{T} \) and let \( y \in \mathbb{T} \) and \( \varepsilon > 0 \). We must find \( n \in \mathbb{Z} \) with \( |T^n x - y| < \varepsilon \) (distances mod 1). Consider the \( N + 1 \) points \( 0\alpha, 1\alpha, 2\alpha, \ldots, N\alpha \) reduced modulo 1. By the pigeonhole principle, if we divide \( \mathbb{T} \) into \( N \) intervals of length \( 1/N \), at least two of these points — say \( j\alpha$ and \( k\alpha \) with \( j < k \leq N \) — lie in the same interval, so \( |(k - j)\alpha \pmod{1}| < 1/N \). Setting \( q = k - j \), we have \( \|q\alpha\| < 1/N \) (where \( \|\cdot\| \) denotes distance to the nearest integer). Taking \( N > 1/\varepsilon \), we get a \( q \geq 1 \) with \( \|q\alpha\| < \varepsilon \). The multiples \( 0, q\alpha, 2q\alpha, \ldots \pmod{1} \) form a sequence of steps of size less than \( \varepsilon \), which therefore visits every \( \varepsilon \)-neighborhood of every point in \( \mathbb{T} \). In particular, there exists \( m \in \mathbb{Z}_{\geq 0} \) with \( \|mq\alpha - (y - x)\| < \varepsilon \), i.e., \( T^{mq}(x) \) is within \( \varepsilon \) of \( y \). Since \( \varepsilon \) and \( y \) were arbitrary, the orbit of \( x \) is dense. \( \square \)
For rational \( \alpha = p/q \), every orbit has period \( q \) and the system is emphatically not minimal (any orbit closure is a finite set, a proper closed invariant subset).
(i) Irrational rotation is equicontinuous. Each \( T^n \) is an isometry of \( \mathbb{T} \) (with the arc-length metric \( d(x, y) = \|x - y\|\)): \( d(T^n x, T^n y) = \|(x + n\alpha) - (y + n\alpha)\| = \|x - y\| = d(x, y) \). So all maps in the family \( \{T^n\} \) preserve distances; given \( \varepsilon > 0 \), take \( \delta = \varepsilon \) and the equicontinuity condition is satisfied uniformly. Equicontinuous minimal systems are the most structured possible: they are compact group rotations.
(ii) Doubling map is not equicontinuous. Consider two points \( x = 0 \) and \( y = \varepsilon \) (small). After \( n \) steps, \( T^n(0) = 0 \) and \( T^n(\varepsilon) = 2^n \varepsilon \pmod{1} \). For \( n \) large enough that \( 2^n \varepsilon > 1/4 \), the distance \( d(T^n(0), T^n(\varepsilon)) \geq 1/4 \) regardless of how small \( \varepsilon \) is. So no uniform \( \delta \) works for \( \varepsilon = 1/4 \), and the family is not equicontinuous. This reflects the exponential separation of nearby orbits — a hallmark of chaos.
Chapter 2: Ultrafilters, Čech-Stone Compactification, and the Greatest Ambit
Section 2.1: Ultrafilters
Ultrafilters are the key algebraic device for studying Stone-Čech compactifications and, ultimately, Ramsey-type combinatorial results.
- \( \emptyset \notin \mathcal{F} \) and \( S \in \mathcal{F} \);
- If \( A \in \mathcal{F} \) and \( A \subseteq B \), then \( B \in \mathcal{F} \);
- If \( A, B \in \mathcal{F} \), then \( A \cap B \in \mathcal{F} \).
A principal ultrafilter is \( \mathcal{U}_x = \{ A \subseteq S : x \in A \} \) for some fixed \( x \in S \). All other ultrafilters are non-principal (or free); their existence requires the axiom of choice (specifically, the ultrafilter lemma, which is weaker than full AC but not provable in ZF alone).
Section 2.2: The Stone-Čech Compactification
- \( \iota(S) \) is dense in \( \beta S \);
- Every bounded function \( f : S \to [0, 1] \) extends uniquely to a continuous function \( \bar{f} : \beta S \to [0, 1] \).
Ultrafilter construction. As a set, \( \beta S \) is identified with the set of all ultrafilters on \( S \). Principal ultrafilters \( \mathcal{U}_s \) for \( s \in S \) are identified with elements of \( S \), so \( \iota(s) = \mathcal{U}_s \). The topology on \( \beta S \) is the one generated by the basic open sets
\[ \overline{A} = \{ \mathcal{U} \in \beta S : A \in \mathcal{U} \} \]for \( A \subseteq S \). These sets are also clopen, making \( \beta S \) a totally disconnected compact Hausdorff space (a Stone space).
For \( f : S \to [0, 1] \), the extension \( \bar{f} : \beta S \to [0, 1] \) is given by \( \bar{f}(\mathcal{U}) = \lim_{\mathcal{U}} f \), the \( \mathcal{U} \)-limit of \( f \): the unique real number \( r \) such that \( f^{-1}(r - \varepsilon, r + \varepsilon) \in \mathcal{U} \) for all \( \varepsilon > 0 \).
Section 2.3: The Greatest Ambit
When \( G \) is a discrete group (or, more generally, a topological group), the Stone-Čech compactification \( \beta G \) carries a natural structure as a \( G \)-flow.
Construction for discrete \( G \). For a discrete group \( G \), the greatest ambit is simply \( \beta G \) with \( G \) acting by left multiplication: \( g \cdot \mathcal{U} = \{ gA : A \in \mathcal{U} \} \). The distinguished point is the principal ultrafilter \( \mathcal{U}_e \) at the identity, and \( G \cdot \mathcal{U}_e = \{ \mathcal{U}_g : g \in G \} = G \), which is dense in \( \beta G \).
Section 2.4: Universal Minimal Flow
Construction. The universal minimal flow \( M(G) \) is the unique minimal subflow of the greatest ambit \( S(G) \). Its existence follows from the general existence of minimal subflows (Theorem 1.1). Its uniqueness and universality follow from the universal property of \( S(G) \): every minimal flow \( Y \) is an ambit (take any point \( y \in Y \) — its orbit is dense by minimality), so there is a factor map \( S(G) \to Y \), whose image is a closed invariant subset of \( Y \), hence all of \( Y \).
Chapter 3: Semigroup Structure on the Greatest Ambit, Idempotent Ultrafilters, and Applications
Section 3.1: The Ellis Semigroup
For the greatest ambit \( S(G) = \beta G \) (with \( G \) discrete), the Ellis semigroup of \( \beta G \) acting on itself is \( \beta G \) itself, with the semigroup operation described below.
Section 3.2: The Semigroup Operation on \(\beta G\)
For \( G \) a discrete group, \( \beta G \) inherits a semigroup structure extending the group operation on \( G \).
- For fixed \( \mathcal{V} \), the map \( \mathcal{U} \mapsto \mathcal{U} \cdot \mathcal{V} \) is continuous (right-topological semigroup).
- The map \( \mathcal{U} \mapsto \mathcal{V} \cdot \mathcal{U} \) need NOT be continuous in general.
- The restriction to \( G \hookrightarrow \beta G \) (principal ultrafilters) coincides with the group operation on \( G \).
- \( (\beta G, \cdot) \) is a compact right-topological semigroup.
Section 3.3: Idempotent Ultrafilters
Corollary. \( \beta G \) contains idempotent ultrafilters (non-principal ones, since the only idempotent in \( G \) is the identity \( e \) when \( G \) is a group).
Section 3.4: Hindman’s Theorem
The idempotent structure on \( \beta\mathbb{N} \) yields spectacular combinatorial results.
More precisely: since \( p \cdot p = p \) and \( C_i \in p \), we have \( \{ n : n^{-1}C_i \in p \} \in p \), so there exists \( x_1 \in C_i \) with \( x_1^{-1}C_i = \{ n : x_1 + n \in C_i \} \in p \). By induction, picking \( x_{k+1} \in C_i \cap x_1^{-1}C_i \cap \cdots \cap \big(\sum_{i \in F} x_i\big)^{-1} C_i \) (using \( p \)’s membership for each of these finitely many intersections), one builds the required sequence.
Section 3.5: The Central Sets Theorem (Preview)
A set \( C \subseteq \mathbb{N} \) is central if it belongs to some minimal idempotent in \( \beta\mathbb{N} \). Central sets are IP sets, satisfy strong recurrence properties, and are the subject of Chapter 4. The connection between the algebraic structure of \( \beta G \) and combinatorial number theory is one of the most striking features of the theory.
Chapter 4: Minimal Idempotents, Central Sets, and Applications
Section 4.1: Minimal Idempotents and the Ideal Structure of \(\beta G\)
- a left ideal if \( SI \subseteq I \);
- a right ideal if \( IS \subseteq I \);
- a two-sided ideal if it is both.
- \( T \) has a smallest two-sided ideal \( K(T) \);
- \( K(T) \) is a union of minimal left ideals and a union of minimal right ideals;
- Every minimal left ideal and every minimal right ideal is closed;
- The intersection of any minimal left ideal with any minimal right ideal is a group;
- All the groups arising this way are isomorphic.
Applied to \( \beta G \), the kernel \( K(\beta G) \) is the smallest ideal, and an idempotent \( p \in \beta G \) is minimal if it lies in \( K(\beta G) \) — equivalently, if the left ideal \( \beta G \cdot p \) is minimal.
Section 4.2: Characterization of Central Sets
Central sets can be characterized dynamically. A set \( A \subseteq \mathbb{N} \) is central if and only if it is a member of some idempotent ultrafilter in the smallest ideal of \( (\beta\mathbb{N}, +) \). Furstenberg gave an equivalent dynamical characterization:
Section 4.3: Central Sets Theorem
Section 4.4: Density and Combinatorial Consequences
- syndetic if it has bounded gaps: \( \exists K \) finite such that \( A \cap (n + K) \neq \emptyset \) for all \( n \in \mathbb{Z} \);
- thick if it contains arbitrarily long intervals: for each \( N \), \( \exists n \) with \( \{n, n+1, \ldots, n+N\} \subseteq A \);
- piecewise syndetic if it is the intersection of a syndetic set with a thick set, equivalently if it has bounded gaps on arbitrarily long intervals.
Chapter 5: Furstenberg’s Theorem, Bohr Compactification, and the Universal Minimal Distal Flow
Section 5.1: Furstenberg’s Structure Theorem for Distal Flows
One of the deepest results in topological dynamics is Furstenberg’s structure theorem, which characterizes distal flows.
Section 5.2: Bohr Compactification
The Bohr compactification \( bG \) of a discrete group \( G \) is the “most efficient” compactification that remembers all the almost-periodic functions on \( G \). Intuitively, \( bG \) retains exactly the “harmonic analysis” of \( G \): all the information encoded by the collection of all characters (homomorphisms to compact groups), but nothing more. For \( G = \mathbb{Z} \), this means \( b\mathbb{Z} \) captures the behavior of sequences of the form \( n \mapsto e^{2\pi i n \alpha} \) for all \( \alpha \in \mathbb{R} \), which are precisely the almost-periodic functions that Bohr studied in his analysis of quasiperiodic phenomena.
Construction. The Bohr compactification of \( G \) is constructed as follows. Let \( \mathcal{F} \) be the family of all continuous homomorphisms \( \phi : G \to K_\phi \) into compact groups \( K_\phi \). Define \( bG = \overline{\iota(G)} \subseteq \prod_{\phi \in \mathcal{F}} K_\phi \), where \( \iota(g) = (\phi(g))_{\phi \in \mathcal{F}} \). This works by Tychonoff compactness.
For \( G = \mathbb{Z} \), the Bohr compactification is the Pontryagin dual of \( \mathbb{T} \) (the circle group thought of as discrete): \( b\mathbb{Z} = \widehat{\mathbb{T}_d} \), the Bohr compactification of \( \mathbb{Z} \). Concretely, \( b\mathbb{Z} \) is the closure of \( \mathbb{Z} \) in \( \prod_{\alpha \in [0,1)} \mathbb{T} \) via the diagonal embedding \( n \mapsto (e^{2\pi i n \alpha})_{\alpha} \).
Explicitly: the diagonal embedding \( \iota : \mathbb{Z} \to \mathbb{T}^{[0,1)} \) is \( \iota(n) = (e^{2\pi i n \alpha})_{\alpha \in [0,1)} \). The Bohr compactification \( b\mathbb{Z} \) is \( \overline{\iota(\mathbb{Z})} \subseteq \mathbb{T}^{[0,1)} \). A point in \( b\mathbb{Z} \) is a limit of \( \iota(n_j) \) for some net/sequence \( (n_j) \) in \( \mathbb{Z} \) such that \( e^{2\pi i n_j \alpha}$ converges for every \( \alpha \in [0,1) \) simultaneously. The group operation is coordinate-wise multiplication in \( \mathbb{T}^{[0,1)} \).
The action of \( \mathbb{Z} \) on \( b\mathbb{Z} \) is by translation: the generator \( 1 \in \mathbb{Z} \) acts on \( b\mathbb{Z} \) by multiplication by the element \( \iota(1) = (e^{2\pi i \alpha})_{\alpha \in [0,1)} \). The system \( (b\mathbb{Z}, T) \) where \( T(z) = \iota(1) \cdot z \) is the universal minimal equicontinuous \( \mathbb{Z} \)-system: every minimal equicontinuous \( \mathbb{Z} \)-system is a factor of \( (b\mathbb{Z}, T) \).
Concretely: the function \( f(n) = e^{2\pi i n \alpha} \) for fixed irrational \( \alpha \) is almost periodic (it is a single Fourier mode). The function \( f(n) = e^{2\pi i n \sqrt{2}} + e^{2\pi i n \sqrt{3}} \) is almost periodic with two incommensurable frequencies. The function \( f(n) = \sin(n^2) \) is NOT almost periodic, as its frequency spectrum is not discrete. Almost-periodic functions are precisely the “signals” that can be analyzed using the compact group \( b\mathbb{Z} \) as a “frequency domain.”
Section 5.2B: Almost Periodicity, Bohr’s Original Work, and Connections to Dynamics
Harald Bohr (brother of Niels Bohr) introduced almost-periodic functions in the 1920s as a natural generalization of periodic functions. A function \( f : \mathbb{R} \to \mathbb{C} \) is periodic with period \( T \) if \( f(t + T) = f(t) \) for all \( t \); equivalently, the set of translations \( \{f(\cdot + s) : s \in \mathbb{R}\} \) takes only finitely many distinct values. Bohr’s generalization: \( f \) is almost periodic if the set of translates is precompact in the uniform topology. The classical example is \( f(t) = e^{2\pi i \alpha t} + e^{2\pi i \beta t} \) for incommensurable \( \alpha, \beta \in \mathbb{R} \): this function is not periodic (it never exactly repeats) but its translates form a compact family.
Section 5.3: Universal Minimal Distal Flow
The universal minimal distal flow is the inverse limit of all isometric towers (with the natural compatible factor maps). For abelian groups, the universal minimal distal flow is the Bohr compactification itself with the natural action. For general groups, it is more complex.
Section 5.4: Almost Periodicity and Equicontinuous Flows
In particular, for \( G = \mathbb{Z} \), minimal equicontinuous systems are exactly the rotations on compact abelian groups \( K \), i.e., systems of the form \( T(k) = k + a \) for a fixed \( a \in K \) that generates a dense subgroup of \( K \).
Chapter 6: Universal Minimal Proximal and Strongly Proximal Flows
Section 6.1: Proximal Flows Revisited
We recall the definition of proximal flow (Definition 1.3) and develop the theory further.
Section 6.2: Strongly Proximal Flows
Equivalently, every \( G \)-invariant measure on \( X \) is a point mass — which (for minimal \( X \)) means every probability measure on \( X \) can be “compressed” to a point mass by the group action.
Section 6.3: Furstenberg Boundary
Section 6.4: Universal Minimal Proximal Flow
The construction uses the greatest ambit \( S(G) \): one quotients out the “distal component” of the action, leaving the proximal part. The universal minimal proximal flow is the quotient of the universal minimal flow \( M(G) \) by the distal proximal equivalence relation.
Chapter 7: Probability Measures, Affine Flows, and Amenable Groups
Section 7.1: The Space of Probability Measures as a \(G\)-Flow
For a compact Hausdorff space \( X \), the set \( \mathcal{M}(X) \) of Borel probability measures on \( X \) with the weak-* topology is compact and convex. If \( G \) acts continuously on \( X \), then \( G \) acts on \( \mathcal{M}(X) \) by pushforward:
\[ (g_* \mu)(A) = \mu(g^{-1}A),\quad g \in G,\, A \subseteq X \text{ Borel}. \]This makes \( \mathcal{M}(X) \) into a compact convex \( G \)-flow.
Section 7.2: Invariant Probability Measures
Applied with \( K = \mathcal{M}(X) \): for any abelian group \( G \) acting on a compact space \( X \), there is a \( G \)-invariant probability measure on \( X \).
Section 7.3: Amenable Groups
Equivalent conditions (for discrete \( G \)):
- \( G \) has a left-invariant finitely additive probability measure (a “mean”) on bounded functions.
- For every \( \varepsilon > 0 \) and finite set \( F \subseteq G \), there exists a finite set \( K \subseteq G \) with \( |F \cdot K \triangle K| / |K| < \varepsilon \) (Følner condition).
- Every affine compact \( G \)-flow has a fixed point (Ryll-Nardzewski / Day theorem).
Section 7.4: Fixed Point Properties and Topological Dynamics
Chapter 8: Standard Lebesgue Spaces, Probability-Measure-Preserving Actions, and \(\mathrm{Aut}(X, \mu)\)
Section 8.1: Standard Probability Spaces
We now transition to the measure-theoretic side of dynamics.
The key theorem of Lebesgue measure theory is:
This means up to isomorphism, there is essentially only one atomless standard probability space, which justifies writing “a standard probability space \( (X, \mu) \)” without loss of generality.
Section 8.2: Measure-Preserving Transformations
Section 8.3: The Group \(\mathrm{Aut}(X, \mu)\)
\( \mathrm{Aut}(X, \mu) \) carries a natural topology, the weak topology: \( T_n \to T \) if \( \mu(T_n A \triangle T A) \to 0 \) for every measurable \( A \). With this topology, \( \mathrm{Aut}(X, \mu) \) is a Polish group (separable completely metrizable).
Section 8.4: Probability-Measure-Preserving Group Actions
Equivalently, ergodicity says \( X \) cannot be split into two non-trivial invariant pieces. This is the measure-theoretic analog of minimality (though the two notions differ).
Chapter 8B: Furstenberg’s Correspondence Principle and Szemerédi’s Theorem
Section 8B.1: Motivation and Statement
Furstenberg’s 1977 ergodic-theoretic proof of Szemerédi’s theorem is one of the most beautiful ideas in twentieth-century mathematics. Szemerédi’s theorem (1975), proved by combinatorial methods, states that any subset of the integers with positive upper density contains arithmetic progressions of every length. Furstenberg discovered that this combinatorial statement about arithmetic progressions in \( \mathbb{Z} \) is equivalent to a statement about multiple recurrence in measure-preserving dynamics. The translation between the two worlds — the “correspondence principle” — turns Szemerédi’s theorem into a theorem about dynamics, which Furstenberg then proved using the structure theory of measure-preserving systems. This approach opened a vast new research area connecting ergodic theory, combinatorics, and additive number theory.
The upper Banach density of a set \( A \subseteq \mathbb{Z} \) is
\[ d^*(A) = \limsup_{N - M \to \infty} \frac{|A \cap \{M, M+1, \ldots, N-1\}|}{N - M}. \]Szemerédi’s theorem says: if \( d^*(A) > 0 \), then \( A \) contains arithmetic progressions \( \{a, a+d, a+2d, \ldots, a+kd\} \) for every \( k \geq 1 \) and some \( a \in \mathbb{Z} \), \( d \geq 1 \).
Section 8B.2: The Furstenberg Correspondence Principle
Since \( d^*(A) > 0 \), the sequence \( (n_j) \) along which the density is achieved allows us to construct a \( T \)-invariant measure \( \mu \) on \( X_A \) (via a weak-* limit of Cesàro averages of the point masses along translates of \( x^{(A)} \)) with \( \mu(B) \geq d^*(A) > 0 \). The key observation is that for any pattern \( n_1, \ldots, n_k \):
\[ n \in A \cap (A - n_1) \cap \cdots \cap (A - n_k) \iff T^n x^{(A)} \in B \cap T^{-n_1}B \cap \cdots \cap T^{-n_k}B, \]since \( (T^n x^{(A)})_0 = x^{(A)}_n = \mathbf{1}_A(n) \) and \( (T^n x^{(A)})_{n_j} = \mathbf{1}_A(n + n_j) \). The density inequality then follows from how \( \mu \) was constructed. \( \square \)
Section 8B.3: Furstenberg’s Multiple Recurrence Theorem
The case \( k = 1 \) is Poincaré recurrence (proved in Section 9.3). The case \( k = 2 \) was proved by Furstenberg and Sárközy independently (and implies that any set of positive density contains a configuration \( \{a, a+n, a+2n\} \)). The general case is Furstenberg’s proof of Szemerédi’s theorem.
The proof of the general case proceeds by the Furstenberg–Zimmer structure theory: any ergodic pmp system has a canonical tower of extensions
\[ X \to Z_k \to Z_{k-1} \to \cdots \to Z_1 \to Z_0 = \{*\} \]where each extension is either a compact (isometric) extension or a weakly mixing extension. For compact extensions, the multiple recurrence is proved using Fourier analysis on the fiber compact group. For weakly mixing extensions, a “van der Corput”-style argument shows the contributions from such extensions are negligible (they average away). Combining these two cases by transfinite induction proves the theorem.
Section 8B.4: Szemerédi’s Theorem as a Corollary
Chapter 9: \(L^2\) and Birkhoff Ergodic Theorems
Section 9.1: The Koopman Operator
Given a pmp action \( \Gamma \curvearrowright (X, \mu) \), every \( T \in \mathrm{Aut}(X, \mu) \) induces a unitary operator on \( L^2(X, \mu) \):
The map \( T \mapsto U_T \) is a group homomorphism from \( \mathrm{Aut}(X, \mu) \) to the unitary group \( \mathcal{U}(L^2(X, \mu)) \).
Ergodic interpretation. For a pmp \( \mathbb{Z} \)-action with \( U = U_T \), the fixed-point space is \( \text{Fix}(U_T) = \{ f \in L^2 : f \circ T = f \text{ a.e.}\} = L^2(\mathcal{I}) \) where \( \mathcal{I} \) is the \(\sigma\)-algebra of \( T \)-invariant sets. If \( T \) is ergodic, \( \mathcal{I} \) is trivial, and \( Pf = \int f \, d\mu \) (the constant function). The mean ergodic theorem then says:
\[ \frac{1}{N} \sum_{n=0}^{N-1} f(T^n x) \xrightarrow{L^2} \int_X f \, d\mu. \]Section 9.2: Birkhoff Pointwise Ergodic Theorem
The Birkhoff ergodic theorem is the rigorous mathematical version of Boltzmann’s ergodic hypothesis: for a measure-preserving system, and for typical initial conditions, time averages of an observable equal the space average. “Typical” here means for almost every starting point with respect to the invariant measure. When the system is ergodic — when there are no non-trivial invariant sets — “almost every” becomes “every” in the probabilistic sense: the time average is the same constant (the space average) for all typical trajectories. Birkhoff’s 1931 proof was a landmark: it required a genuinely new technique (the maximal ergodic lemma, a precursor to Calderón–Zygmund maximal inequalities) and settled one of the central questions in statistical mechanics.
If \( T \) is ergodic, then \( \tilde{f} = \int f \, d\mu \) a.e. — the time average equals the space average.
Section 9.2B: Worked Examples for the Birkhoff Ergodic Theorem
The Birkhoff theorem gives this for \( \lambda \)-almost every \( x \). The stronger statement — convergence for every \( x \) — follows from the unique ergodicity of the irrational rotation: Lebesgue measure is the unique \( T \)-invariant Borel probability measure on \( \mathbb{T} \). This can be seen via Fourier analysis: the characters \( e_k(x) = e^{2\pi i k x} \) form an orthonormal basis of \( L^2(\mathbb{T}) \). We compute
\[ \frac{1}{N} \sum_{n=0}^{N-1} e_k(T^n x) = e_k(x) \cdot \frac{1}{N} \sum_{n=0}^{N-1} e^{2\pi i k n \alpha}. \]For \( k \neq 0 \), this is a geometric series: \( \frac{1}{N} \cdot \frac{e^{2\pi i k \alpha N} - 1}{e^{2\pi i k \alpha} - 1} \to 0 \) as \( N \to \infty \), since \( |e^{2\pi i k \alpha} - 1| > 0 \) (as \( k\alpha \notin \mathbb{Z} \) for \( k \neq 0 \)). For \( k = 0 \), the average is identically 1 \( = \int e_0 \, d\lambda \). By linearity and density of trigonometric polynomials in \( C(\mathbb{T}) \), the time averages converge to the space average for every continuous \( f \) and every \( x \). For our indicator \( f = \mathbf{1}_{[0,1/2)} \), approximate by continuous functions to get convergence to \( 1/2 \) for every \( x \).
The punchline: the orbit of any point \( x \) under the irrational rotation equidistributes in \( \mathbb{T} \) — a result known as Weyl’s equidistribution theorem (1916), which predates Birkhoff by 15 years. Birkhoff’s theorem subsumes this.
Section 9.2C: Spectral Characterization of Ergodicity
Section 9.2D: Mixing, Weak Mixing, and Spectral Types
The ergodic theorem tells us about the long-run average behavior of orbits. A stronger property — mixing — describes how the system “loses memory” of its initial state. Intuitively, a strongly mixing system is one where any two measurable sets become “statistically independent” as time progresses: knowing that \( x \in A \) gives no information about whether \( T^n x \in B \) for large \( n \).
- ergodic if every \( T \)-invariant set has measure 0 or 1;
- weakly mixing if \( \frac{1}{N}\sum_{n=0}^{N-1} |\mu(T^{-n}A \cap B) - \mu(A)\mu(B)| \to 0 \) for all measurable \( A, B \);
- mildly mixing if there is no rigidity sequence: no sequence \( n_k \to \infty \) with \( \mu(T^{-n_k}A \triangle A) \to 0 \) for all \( A \);
- strongly mixing if \( \mu(T^{-n}A \cap B) \to \mu(A)\mu(B) \) for all measurable \( A, B \).
- \( T \) is ergodic \( \iff \) 1 is not an eigenvalue of \( U \) on \( L^2_0 \), i.e., \( U \) has no non-trivial fixed vectors in \( L^2_0 \).
- \( T \) is weakly mixing \( \iff \) \( U \) has no eigenvalues on the unit circle restricted to \( L^2_0 \), i.e., \( U|_{L^2_0} \) has *purely continuous spectrum*.
- \( T \) is strongly mixing \( \iff \) \( U^n \to 0 \) weakly on \( L^2_0 \), i.e., \( \langle U^n f, g\rangle \to 0 \) for all \( f, g \in L^2_0 \).
- The eigenvalues of \( U_T \) form a countable subgroup \( \Lambda \subseteq \mathbb{T} \) (the *eigenvalue group* or *spectrum* of \( T \)).
- \( T \) is isomorphic (as a measure-preserving system) to the rotation \( R_a : k \mapsto k + a \) on the compact abelian group \( \hat{\Lambda} \) (Pontryagin dual of \( \Lambda \)) with Haar measure, where \( a \in \hat{\Lambda} \) is chosen so that \( a(\lambda) = \lambda \) for all \( \lambda \in \Lambda \).
- Two ergodic systems with pure point spectrum are isomorphic if and only if they have the same eigenvalue group \( \Lambda \).
This is the complete classification of ergodic systems with pure point spectrum: they are all rotations on compact abelian groups, and the invariant is the eigenvalue group. The theorem is a precursor to the Ornstein isomorphism theory, showing that spectral invariants can give complete classification in special cases.
- The irrational rotation \( T(x) = x + \alpha \) has eigenvalue group \( \Lambda_\alpha = \{ e^{2\pi i n \alpha} : n \in \mathbb{Z} \} \cong \mathbb{Z} \). Two irrational rotations \( T_\alpha \) and \( T_\beta \) are isomorphic \( \iff \Lambda_\alpha = \Lambda_\beta \iff \alpha \) and \( \beta \) are rationally related (i.e., \( \alpha / \beta \in \mathbb{Q} \)).
- A rotation on a finite group \( \mathbb{Z}/N\mathbb{Z} \) (by generator 1) has eigenvalue group \( \Lambda = \{ e^{2\pi i k/N} : 0 \leq k < N \} \), the \( N \)-th roots of unity.
- A rotation on \( \mathbb{T}^2 \) by an irrational vector \( (\alpha_1, \alpha_2) \) with \( 1, \alpha_1, \alpha_2 \) rationally independent has eigenvalue group \( \Lambda = \{ e^{2\pi i(n_1\alpha_1 + n_2\alpha_2)} : n_1, n_2 \in \mathbb{Z} \} \).
Section 9.3: Consequences and Applications
This is Furstenberg’s ergodic-theoretic proof of Szemerédi’s theorem (every subset of integers with positive upper density contains arithmetic progressions of every length).
Chapter 10: Orbit Equivalence for Aperiodic pmp \(\mathbb{Z}\)-Actions
Section 10.1: Orbit Equivalence Relations
Orbit equivalence is a much coarser equivalence than isomorphism of measure-preserving systems (which requires \( \phi \) to intertwine the group actions, not just map orbits to orbits).
Section 10.2: Dye’s Theorem
This is a remarkable theorem: it says that for measure-preserving \( \mathbb{Z} \)-actions, there is only one orbit structure (up to measure-theoretic isomorphism). The group theoretic structure of the action (e.g., ergodicity, mixing, entropy) is entirely invisible to orbit equivalence among \( \mathbb{Z} \)-actions.
Section 10.3: The Rokhlin Lemma
Section 10.4: The Full Group and Cocycles
Two aperiodic transformations are orbit equivalent if and only if their full groups are isomorphic as abstract groups (Dye’s theorem reformulated). The full group is a complete invariant of the orbit equivalence class.
Chapter 11: Entropy for \(\mathbb{Z}\)-Actions and Bernoulli Systems
Section 11.1: Metric Entropy of a Partition
This makes entropy computable: instead of taking the supremum over all partitions, one need only find a single generating partition and compute its entropy.
Section 11.2: Entropy of Bernoulli Shifts
In particular:
- The \( (1/2, 1/2) \)-Bernoulli shift has entropy \( \log 2 \).
- The \( (1/3, 1/3, 1/3) \)-Bernoulli shift has entropy \( \log 3 \).
- These are not isomorphic as measure-preserving systems (Kolmogorov, 1958 — the first use of entropy in ergodic theory).
Section 11.3: Properties of Entropy
- Isomorphism invariance: If \( T \cong S \) (measure-theoretic isomorphism), then \( h(T) = h(S) \).
- Power formula: \( h(T^n) = |n| \cdot h(T) \) for \( n \in \mathbb{Z} \).
- Invertibility: \( h(T^{-1}) = h(T) \).
- Factors decrease entropy: If \( S \) is a factor of \( T \), then \( h(S) \leq h(T) \).
- Products: \( h(T \times S) = h(T) + h(S) \).
Section 11.4: Topological Entropy
Topological entropy measures the exponential growth rate of distinguishable orbit segments — it is the most fundamental topological invariant of a dynamical system, capturing how fast the system “creates information” or “separates trajectories.” Intuitively: given a finite resolution \( \varepsilon > 0 \), two trajectories are “distinguishable over time \( n \)” if they separate by more than \( \varepsilon \) at some time \( 0 \leq k < n \). The number of such distinguishable \( n \)-orbits grows exponentially in \( n \) for chaotic systems, and the topological entropy is the exponent. Systems with zero entropy are “tame” — they do not create new complexity — while systems with positive entropy are “chaotic” in the sense of exponential complexity growth.
There is a parallel notion of entropy for topological dynamical systems.
Alternative definition via separated sets. On a compact metric space \( (X, d) \), a set \( S \subseteq X \) is \( (n, \varepsilon)\)-separated if for every \( x \neq y \in S \), there exists \( 0 \leq k < n \) with \( d(T^k x, T^k y) > \varepsilon \). Let \( \mathrm{sep}(n, \varepsilon, T) \) be the maximum cardinality of an \( (n, \varepsilon) \)-separated set. Then
\[ h_{top}(T) = \lim_{\varepsilon \to 0} \limsup_{n \to \infty} \frac{1}{n} \log \mathrm{sep}(n, \varepsilon, T). \]This definition (due to Bowen and Dinaburg) is equivalent to the open-cover definition and is often easier for explicit computation.
Section 11.5: Entropy and Lyapunov Exponents
For smooth dynamical systems on compact manifolds, topological entropy has a beautiful connection to the Lyapunov exponents — the exponential rates of expansion and contraction of the derivative map. This connection is captured by the Margulis–Ruelle inequality and the Pesin entropy formula.
Section 11.6: Entropy for Algebraic Systems
For algebraic dynamical systems (group automorphisms, algebraic actions of abelian groups), entropy has especially clean formulas.
The Mahler measure of a polynomial \( p(z) = a \prod (z - \alpha_i) \) is \( M(p) = |a| \prod_i \max(1, |\alpha_i|) \). The formula shows that entropy for toral automorphisms is purely algebraic — it depends only on the characteristic polynomial and its roots outside the unit circle.
Chapter 12: Ornstein’s Theorem
Section 12.1: The Isomorphism Problem
Two central questions in ergodic theory are:
- Classification up to isomorphism: When are two measure-preserving systems isomorphic?
- Entropy as a complete invariant: Is entropy sufficient to classify Bernoulli shifts?
The first question motivated much of classical ergodic theory (from Halmos-von Neumann’s spectral theory to the entropy theory of Kolmogorov-Sinai). The second was answered definitively by Ornstein.
Section 12.2: Finitely Determined Processes and Weak Bernoulli
- weakly Bernoulli if for every \( \varepsilon > 0 \) there exists \( n \) such that the join \( \bigvee_{k=0}^{n-1} T^{-k}\xi \) and the "past" \( \bigvee_{k=n}^\infty T^{-k}\xi \) are \( \varepsilon \)-independent (in the sense of \(\bar{d}\)-distance);
- finitely determined if it is determined (up to \(\bar{d}\)-closeness) by finitely many parameters — its entropy and the distribution of a finite block.
where \( d_H \) is a normalized Hamming distance.
Section 12.3: Ornstein’s Isomorphism Theorem
Entropy is an isomorphism invariant (Kolmogorov, 1958): This was established earlier and shows entropy is necessary.
Entropy is sufficient (Ornstein’s theorem proper): The key insight is that Bernoulli shifts are “finitely determined”: given \( \varepsilon > 0 \), two Bernoulli shifts \( T, S \) with the same entropy, there is a partition \( \xi \) of the first system that is \( \varepsilon \)-close (in \( \bar{d} \)) to a partition of the second with the same distribution and entropy. By iteratively improving approximations (the “Ornstein copying lemma”), one constructs an actual measure-preserving isomorphism in the limit.
The core technical lemma, the Ornstein Copying Lemma, states: given a stationary process and \( \varepsilon > 0 \), one can “copy” a long block of the process into another system that has the right entropy, with an error at most \( \varepsilon \) in \( \bar{d} \).
Section 12.4: Bernoulli Systems and Mixing
- strongly mixing if for all measurable \( A, B \): \( \mu(T^{-n}A \cap B) \to \mu(A)\mu(B) \) as \( n \to \infty \);
- weakly mixing if \( \frac{1}{N}\sum_{n=0}^{N-1} |\mu(T^{-n}A \cap B) - \mu(A)\mu(B)| \to 0 \).
Section 12.5: Extensions and Further Results
Ornstein’s theory extends in several directions:
Bernoulli spectrum: A system \( (X, T) \) is Bernoulli if and only if it is weakly Bernoulli. This gives a tractable characterization.
Ornstein–Weiss theorem (1987): Ornstein’s isomorphism theorem extends to Bernoulli actions of amenable groups \( \Gamma \): two Bernoulli \( \Gamma \)-actions are isomorphic if and only if they have the same entropy.
Non-amenable groups: For non-amenable groups, entropy is no longer a complete invariant for Bernoulli actions. Popa (2006) showed there are non-isomorphic Bernoulli actions of free groups with the same entropy using rigidity theory.
Entropy and orbit equivalence: While entropy distinguishes Bernoulli shifts up to isomorphism, orbit equivalence (Chapter 10) is much coarser — all aperiodic pmp \( \mathbb{Z} \)-actions are orbit equivalent regardless of entropy.
This is Sinai’s “factor theorem” and was a precursor to Ornstein’s full theorem. Together with Ornstein’s result, it gives the complete picture: among Bernoulli systems, entropy classifies up to isomorphism; among general ergodic systems, entropy is only a necessary invariant.
Appendix A: Topological Prerequisites
Section A.1: Compact Hausdorff Spaces
We collect key facts about compact Hausdorff spaces used throughout.
- A closed subset of a compact space is compact.
- A continuous map from a compact space to a Hausdorff space is a closed map; if bijective, it is a homeomorphism.
- The product of compact spaces is compact (Tychonoff’s theorem).
- A compact Hausdorff space is normal (T4): disjoint closed sets can be separated by open sets.
- The continuous real-valued functions \( C(X) \) on a compact Hausdorff \( X \) separate points and determine the topology.
Section A.2: Nets and Convergence
In non-metrizable spaces, sequences do not suffice for topology; nets are the proper generalization.
In a compact Hausdorff space, every net has a convergent subnet (compactness). The ultrafilter \( \mathcal{U} \)-limit \( \lim_{\mathcal{U}} f \) (used in defining the Stone-Čech extension) is a net-theoretic limit.
Section A.3: Baire Category Theorem
This is used in ergodic theory (e.g., genericity of ergodic transformations in \( \mathrm{Aut}(X, \mu) \)) and in topological dynamics (e.g., the set of minimal flows is residual in appropriate spaces).
Appendix B: Measure-Theoretic Prerequisites
Section B.1: Lebesgue Integration Summary
Let \( (X, \mathcal{B}, \mu) \) be a measure space.
- \( L^p(X, \mu) \) for \( 1 \leq p \leq \infty \) denotes the Banach space of (equivalence classes of) measurable functions \( f \) with \( \|f\|_p = \left(\int |f|^p \, d\mu\right)^{1/p} < \infty \) (with the appropriate modification for \( p = \infty \)).
- \( L^2(X, \mu) \) is a Hilbert space with inner product \( \langle f, g \rangle = \int f \bar{g} \, d\mu \).
- Dominated Convergence Theorem: if \( f_n \to f \) a.e. and \( |f_n| \leq g \in L^1 \), then \( \int f_n \to \int f \).
- Monotone Convergence Theorem: if \( 0 \leq f_n \nearrow f \) a.e., then \( \int f_n \nearrow \int f \).
Section B.2: Conditional Expectation
Conditional expectation is the \( L^2 \)-orthogonal projection onto \( L^2(X, \mathcal{A}, \mu) \) (when \( f \in L^2 \)). It satisfies the tower property: if \( \mathcal{A} \subseteq \mathcal{A}' \subseteq \mathcal{B} \), then \( \mathbb{E}[\mathbb{E}[f|\mathcal{A}']|\mathcal{A}] = \mathbb{E}[f|\mathcal{A}] \). Conditional expectation with respect to the invariant \(\sigma\)-algebra is central to the proof of the Birkhoff ergodic theorem and to the theory of disintegrations.
Section B.3: The Ergodic Decomposition
This shows every pmp system decomposes into ergodic components, so it suffices (for most purposes) to study ergodic systems.
Appendix C: Worked Problems
This appendix collects 20 problems organized into four parts. Problems in Parts A and B are computational; Parts C and D develop theory. Solutions are sketched where the argument illuminates a key idea.
Part A: Topological Dynamics — Minimality, Equicontinuity, Proximal/Distal
Problem A1. Let \( T : \mathbb{T}^2 \to \mathbb{T}^2 \) be the map \( T(x, y) = (x + \alpha, y + x) \pmod 1 \) (a skew rotation or two-step nilrotation) where \( \alpha \notin \mathbb{Q} \). Prove that the system \( (\mathbb{T}^2, T) \) is minimal. (Hint: Show that for any \( (x_0, y_0) \), the orbit closure contains \( \{x_0\} \times \mathbb{T} \) by examining the \( y \)-coordinates of the orbit, then use minimality of the \( x \)-rotation to expand to all of \( \mathbb{T}^2 \).)
Problem A2. Let \( X = \{0, 1, 2\}^{\mathbb{Z}} \) with the shift \( \sigma \), and let \( Y \subseteq X \) be the subshift defined by the forbidden word \( 12 \) (no occurrence of the pattern \( 1 \) followed immediately by \( 2 \)). Is \( (Y, \sigma) \) minimal? Either prove it or find a proper closed invariant subset.
Problem A3. Prove that an equicontinuous minimal flow on a metrizable compact space is distal. (Hint: If \( d(g_\alpha x, g_\alpha y) \to 0 \) for a net \( (g_\alpha) \), use equicontinuity to show \( d(x, y) \) must also be zero.)
Problem A4. Let \( G \) be an amenable group acting on a compact metrizable space \( X \). Suppose the action is distal. Prove that \( X \) supports a \( G \)-invariant probability measure. (Hint: Use the Fixed Point Theorem for amenable groups applied to the action on \( \mathcal{M}(X) \).)
Problem A5. Let \( T : X \to X \) be a homeomorphism of a compact metric space. Define the prolongation relation: \( (x, y) \in P \) if there exists a net \( x_\alpha \to x \) and integers \( n_\alpha \) such that \( T^{n_\alpha} x_\alpha \to y \). Show that \( P \) is a closed relation containing the proximal relation \( \sim_P \), and that \( P = \sim_P \) when \( T \) is equicontinuous.
Part B: Entropy — Topological Entropy Computations
Problem B1. Compute the topological entropy of the map \( T : \mathbb{T}^2 \to \mathbb{T}^2 \) given by the Arnold cat map matrix \( A = \begin{pmatrix} 2 & 1 \\ 1 & 1 \end{pmatrix} \). (Hint: The topological entropy of a toral automorphism equals \( \sum_i \log|\lambda_i| \) where the sum is over eigenvalues \( |\lambda_i| > 1 \). The eigenvalues of \( A \) are \( \frac{3 \pm \sqrt{5}}{2} \).)
Problem B2. Let \( T : [0,1] \to [0,1] \) be the tent map: \( T(x) = 2x \) for \( x \leq 1/2 \) and \( T(x) = 2(1-x) \) for \( x \geq 1/2 \). Compute \( h_{top}(T) \). (Hint: Use the partition \( \{[0,1/2], [1/2,1]\} \) and count the number of laps in \( T^n \).)
Problem B3. A subshift of finite type (SFT) is determined by a 0–1 transition matrix \( A \) (where \( A_{ij} = 1 \) means the transition \( i \to j \) is allowed). Prove that \( h_{top}(\sigma_A) = \log \lambda_1 \) where \( \lambda_1 \) is the spectral radius of \( A \). (Hint: The number of admissible words of length \( n \) is \( \sum_{i,j} (A^n)_{ij} \), which grows like \( \lambda_1^n \) by the Perron–Frobenius theorem.)
Problem B4. Let \( (X, T) \) and \( (Y, S) \) be compact topological dynamical systems. Prove that \( h_{top}(T \times S) = h_{top}(T) + h_{top}(S) \), where \( T \times S \) acts on \( X \times Y \) by \( (T \times S)(x, y) = (Tx, Sy) \).
Problem B5. Let \( T : X \to X \) be a continuous map and \( \phi : X \to Y \) a factor map (continuous, surjective, equivariant). Prove \( h_{top}(S) \leq h_{top}(T) \) where \( S \) is the factor map. Then give an example where the inequality is strict.
Part C: Ergodic Theory — Ergodicity, Birkhoff, Koopman
Problem C1. Let \( T : \mathbb{T}^2 \to \mathbb{T}^2 \) be the skew rotation \( T(x, y) = (x + \alpha, y + x) \) with \( \alpha \notin \mathbb{Q} \). Prove that \( T \) is ergodic with respect to Lebesgue measure on \( \mathbb{T}^2 \) using Fourier analysis: show that every \( T \)-invariant \( L^2 \) function is constant. (Hint: If \( f = \sum_{m,n} \hat{f}(m,n) e^{2\pi i(mx + ny)} \) is \( T \)-invariant, then \( \hat{f}(m,n) e^{2\pi i(m\alpha)} e^{2\pi i(nx)} = \hat{f}(m,n) e^{2\pi inx} \cdot e^{2\pi imx}... \) — work this out carefully to show \( \hat{f}(m,n) = 0 \) for \( (m,n) \neq (0,0) \).)
Problem C2. Let \( T \) be the doubling map \( T(x) = 2x \pmod 1 \) and \( f(x) = \cos(2\pi x) \). Compute \( \lim_{N \to \infty} \frac{1}{N} \sum_{n=0}^{N-1} f(T^n x) \) for Lebesgue-almost every \( x \). Identify the limit explicitly.
Problem C3. Let \( (X, \mu, T) \) be a pmp system and \( U_T \) the Koopman operator. Prove that \( T \) is weakly mixing if and only if for every \( f, g \in L^2(X, \mu) \):
\[ \lim_{N \to \infty} \frac{1}{N} \sum_{n=0}^{N-1} |\langle U_T^n f, g \rangle - \langle f, \mathbf{1}\rangle \langle \mathbf{1}, g\rangle| = 0. \](Hint: Weak mixing is equivalent to \( U_T \otimes U_T \) being ergodic on \( L^2(X \times X, \mu \times \mu) \).)
Problem C4. Let \( T \) be an ergodic pmp transformation and \( f, g \in L^2(X, \mu) \). Prove the ergodic theorem for products: \( \frac{1}{N} \sum_{n=0}^{N-1} f(T^n x) g(T^{2n} x) \to \int f \, d\mu \cdot \int g \, d\mu \) in \( L^2 \) if \( T \) is weakly mixing. (Hint: Apply the van der Corput trick: a sequence \( (a_n) \) in a Hilbert space with \( \lim_{H \to \infty} \frac{1}{H}\sum_{h=1}^{H} \limsup_{N} |\frac{1}{N}\sum_n \langle a_{n+h}, a_n\rangle| = 0 \) has Cesàro mean 0.)
Problem C5. Construct an explicit example of an ergodic pmp transformation that is not mixing. (Hint: The irrational rotation \( T(x) = x + \alpha \pmod 1 \) is ergodic (proved in Problem C1 for the 2-torus; adapt for the circle). Show it is not mixing by taking \( A = [0, 1/4) \) and computing \( \mu(T^{-n}A \cap A) \) for suitable \( n \) — it oscillates rather than converging to \( \mu(A)^2 \).)
Part D: Furstenberg / Combinatorics — Challenge Problems
Problem D1 (Sárközy’s Theorem). Let \( A \subseteq \mathbb{N} \) with \( d^*(A) > 0 \). Prove that \( A \) contains two elements differing by a perfect square: there exist \( a, b \in A \) and \( n \geq 1 \) with \( b - a = n^2 \). Use the ergodic approach: apply the Furstenberg Correspondence Principle to get a pmp system \( (X, \mu, T) \) and set \( B \) with \( \mu(B) > 0 \), then show \( \limsup_{N} \frac{1}{N}\sum_{n=1}^N \mu(B \cap T^{-n^2}B) \geq \mu(B)^2 > 0 \) using the spectral theory of \( U_T \).
Problem D2 (Polynomial Szemerédi). State (without proof) the Furstenberg–Bergelson–Leibman theorem: if \( p_1, \ldots, p_k \in \mathbb{Z}[n] \) are polynomials with \( p_i(0) = 0 \) for all \( i \), and \( A \subseteq \mathbb{Z} \) has \( d^*(A) > 0 \), then \( A \) contains a configuration \( \{a, a + p_1(n), \ldots, a + p_k(n)\} \) for some \( a \in \mathbb{Z} \), \( n \geq 1 \). Explain why the case \( p_i(n) = in \) recovers Szemerédi’s theorem, and why the case \( p_1(n) = n^2 \) gives Sárközy’s theorem.
Problem D3 (IP Sets and Recurrence). An IP* set is a set \( A \) that intersects every IP set. Prove that the set of return times \( \{ n \geq 1 : \mu(B \cap T^{-n}B) > 0 \} \) is an IP* set when \( (X, \mu, T) \) is a pmp system and \( \mu(B) > 0 \). (Hint: Use idempotent ultrafilters: if \( p = p \cdot p \) is an idempotent ultrafilter in \( \beta\mathbb{N} \) and \( C \in p \), then \( C \) is an IP set. Show that the return-time set contains every IP set by applying Hindman’s theorem.)
Problem D4 (Van der Waerden via Topological Dynamics). Give a proof of van der Waerden’s theorem — for any finite coloring of \( \mathbb{N} \), one color class contains arithmetic progressions of every length — using only topological dynamics (no ergodic theory). The steps are: (a) model the coloring as a point \( x \) in a shift space \( K^{\mathbb{Z}} \); (b) take a minimal subflow \( M \) of the orbit closure; (c) apply Birkhoff recurrence to find a uniformly recurrent point; (d) use the Furstenberg–Weiss topological multiple recurrence theorem (that every minimal system \( (M, T) \) satisfies: for every open \( U \neq \emptyset \), there exist \( x \in U \) and \( n \geq 1 \) with \( x, Tx, \ldots, T^{kn}x \in U \) for any \( k \)).
Problem D5 (Density Ramsey via Correspondence). Let \( A \subseteq \mathbb{Z}^2 \) with \( d^*(A) > 0 \) (upper density in \( \mathbb{Z}^2 \)). Using the Furstenberg Correspondence Principle for \( \mathbb{Z}^2 \)-actions, show that \( A \) contains the vertices of an axis-aligned square: there exist \( (a, b) \in \mathbb{Z}^2 \) and \( n \geq 1 \) with \( (a,b), (a+n, b), (a, b+n), (a+n, b+n) \in A \). State which multiple recurrence theorem (for \( \mathbb{Z}^2 \)-actions) you would need to complete this argument, and note that it is implied by the Furstenberg–Katznelson theorem (1978) on multidimensional Szemerédi.
Appendix D: Historical Timeline
The following 30-entry timeline traces the development of topological dynamics and ergodic theory from its origins to the early twenty-first century.
| Year | Event |
|---|---|
| 1890 | Henri Poincaré proves the Poincaré Recurrence Theorem in his prize-winning paper on the three-body problem: almost every point in a bounded measure-preserving system returns arbitrarily close to its starting position. |
| 1898 | Jacques Hadamard studies geodesic flows on surfaces of negative curvature, anticipating hyperbolic dynamics. |
| 1900 | Ludwig Boltzmann’s ergodic hypothesis in statistical mechanics: a generic trajectory visits all states on a constant-energy surface. |
| 1906 | Henri Lebesgue’s Lebesgue integration theory provides the measure-theoretic foundation needed for ergodic theory. |
| 1913 | Poul Heegaard and Max Dehn contribute to the study of surface mappings, early precursors to topological dynamics. |
| 1916 | Hermann Weyl proves the Weyl equidistribution theorem: the sequence \( n\alpha \pmod 1 \) is equidistributed in \( [0,1) \) for irrational \( \alpha \). |
| 1927 | Birkhoff and Smith’s paper on “recurrence” introduces the notion of recurrent motion in abstract dynamical systems. |
| 1931 | George David Birkhoff proves the Birkhoff Pointwise Ergodic Theorem, the rigorous vindication of Boltzmann’s hypothesis for time averages. |
| 1932 | John von Neumann proves the Mean Ergodic Theorem (Hilbert space formulation), inspired by quantum mechanics and Koopman’s operator-theoretic approach. |
| 1932 | Bernard Koopman introduces the Koopman operator \( U_T f = f \circ T \) on \( L^2 \), launching the spectral approach to ergodic theory. |
| 1942 | Paul Halmos and John von Neumann classify ergodic transformations with pure point spectrum (Halmos–von Neumann theorem): they are rotations on compact abelian groups. |
| 1949 | Paul Halmos proves the Conjugacy Lemma: the set of ergodic transformations is dense (and \( G_\delta \)) in \( \mathrm{Aut}(X, \mu) \). |
| 1955 | Robert Ellis introduces the Ellis semigroup and proves the Ellis–Namiooka–Ruppert theorem on the existence of idempotents in compact right-topological semigroups. |
| 1956 | Furstenberg defines and studies distal flows, initiating the structure theory of topological dynamical systems. |
| 1958 | Andrei Kolmogorov introduces entropy into ergodic theory, proving that the \( (1/2,1/2) \)- and \( (1/3,1/3,1/3) \)-Bernoulli shifts are non-isomorphic. |
| 1959 | Henry Dye proves Dye’s Theorem: any two aperiodic pmp \( \mathbb{Z} \)-actions are orbit equivalent. |
| 1959 | Yakov Sinai defines the Kolmogorov–Sinai entropy and proves the Kolmogorov–Sinai theorem (generating partitions compute entropy). |
| 1962 | Sinai proves his factor theorem: any ergodic pmp system of entropy \( h \) has the Bernoulli shift of entropy \( h \) as a factor. |
| 1963 | Adler, Konheim, and McAndrew introduce topological entropy, the topological analog of metric entropy. |
| 1967 | Harry Furstenberg proves his structure theorem for minimal distal flows: every minimal distal flow is built from equicontinuous extensions in a transfinite tower. |
| 1970 | Donald Ornstein proves his isomorphism theorem: two Bernoulli shifts are isomorphic iff they have the same entropy, making entropy a complete invariant for Bernoulli systems. |
| 1973 | Neil Hindman proves Hindman’s Theorem (the finite sums theorem), using a purely combinatorial argument; the ultrafilter proof by Glazer came shortly after. |
| 1975 | Endre Szemerédi proves (by combinatorics) that any subset of integers with positive upper density contains arithmetic progressions of every length. |
| 1977 | Hillel Furstenberg gives an ergodic-theoretic proof of Szemerédi’s theorem via the Furstenberg Correspondence Principle and the Multiple Recurrence Theorem. |
| 1978 | Furstenberg and Katznelson prove the multidimensional Szemerédi theorem: subsets of \( \mathbb{Z}^d \) with positive density contain all finite configurations (not just arithmetic progressions). |
| 1981 | Ornstein and Weiss prove that Bernoulli actions of amenable groups are classified by entropy. |
| 1991 | Marina Ratner proves the Ratner theorems on the rigidity and equidistribution of unipotent flows on homogeneous spaces, resolving Raghunathan’s conjectures. |
| 2001 | Timothy Gowers introduces Gowers uniformity norms as a quantitative approach to Szemerédi’s theorem, giving polynomial bounds. |
| 2004 | Ben Green and Terence Tao prove that the primes contain arithmetic progressions of every length, combining ergodic ideas with analytic number theory. |
| 2006 | Einsiedler, Katok, and Lindenstrauss prove partial results on quantum unique ergodicity and use measure rigidity to resolve Littlewood’s conjecture on simultaneous Diophantine approximation for almost all pairs. |
Chapter 13: Unique Ergodicity
Section 13.1: Motivation and Definition
The ergodic theorem guarantees that for a measure-preserving transformation \( T \) with invariant measure \( \mu \), time averages converge to space averages for \( \mu \)-almost every starting point. But what if we want convergence for every starting point, not just almost every one? This is possible when there is only one invariant measure — in that case the system is called uniquely ergodic, and the convergence in Birkhoff’s theorem is automatically uniform over all starting points. Unique ergodicity is thus a strong rigidity property: the system has no hidden “complexity” coming from multiple invariant measures competing for different parts of the space. The irrational rotation is the canonical example, and the theory of unique ergodicity connects deeply to Weyl’s equidistribution theorem, Fourier analysis on compact groups, and the theory of almost-periodic functions.
Section 13.2: Worked Examples
Step 1: Lebesgue measure is invariant. \( \lambda(T^{-1}[a,b]) = \lambda([a - \alpha, b - \alpha]) = b - a = \lambda([a,b]) \). So \( \lambda \) is \( T \)-invariant.
\[ \hat{\nu}(k) = \int e^{2\pi i k x} \, d\nu(x) = \int e^{2\pi i k T(x)} \, d\nu(x) = \int e^{2\pi i k(x + \alpha)} \, d\nu(x) = e^{2\pi i k \alpha} \hat{\nu}(k). \]So \( \hat{\nu}(k)(1 - e^{2\pi i k \alpha}) = 0 \). Since \( \alpha \notin \mathbb{Q} \), we have \( e^{2\pi i k \alpha} \neq 1 \) for all \( k \neq 0 \), so \( \hat{\nu}(k) = 0 \) for all \( k \neq 0 \). The Fourier coefficients determine the measure: a measure on \( \mathbb{T} \) with all Fourier coefficients 0 except at \( k = 0 \) (where \( \hat{\nu}(0) = 1 \)) must be Lebesgue measure. Hence \( \nu = \lambda \). \( \square \)
\[ \frac{1}{N}\sum_{n=0}^{N-1} f(x + n\alpha) \xrightarrow{N \to \infty} \int_0^1 f(t) \, dt \]uniformly in \( x \). This is precisely Weyl’s equidistribution theorem.
- Lebesgue measure \( \lambda \) (ergodic with entropy \( \log 2 \)).
- The Dirac mass \( \delta_0 \) at the fixed point 0.
- Any convex combination \( \alpha \lambda + (1-\alpha)\delta_0 \) for \( \alpha \in [0,1] \) — but only the extreme points are ergodic.
- More generally, for any \( p/q \) (periodic point of period \( k \)), the uniform measure on \( \{p/q, T(p/q), \ldots, T^{k-1}(p/q)\} \) is \( T \)-invariant and ergodic.
To verify invariance: \( \lambda \otimes \lambda(T^{-1}(A \times B)) = \lambda \otimes \lambda(\{(x,y) : x + \alpha \in A, y + x \in B\}) \). Since \( T \) is a measure-preserving homeomorphism of \( \mathbb{T}^2 \) (it has Jacobian 1 as an affine map), Lebesgue measure is preserved.
To prove uniqueness, one uses Fourier analysis on \( \mathbb{T}^2 \). If \( \nu \) is \( T \)-invariant, write \( \hat\nu(m, n) = \int e^{2\pi i(mx + ny)} \, d\nu \). Invariance gives \( \hat\nu(m, n) = e^{2\pi im\alpha} \hat\nu(m, n + m) \). For \( m \neq 0 \), this recursion with the irrational rotation of the \( n \)-index forces \( \hat\nu(m, n) = 0 \) for all \( n \). For \( m = 0 \), one shows \( \hat\nu(0, n) = 0 \) for \( n \neq 0 \) by the irrational rotation argument in \( y \). Hence \( \nu = \lambda \otimes \lambda \).
Chapter 14: The Furstenberg–Zimmer Structure Theory
Section 14.1: Overview
The Furstenberg–Zimmer structure theory is the measure-theoretic analog of the Furstenberg structure theorem for distal flows (Chapter 5). It provides a canonical decomposition of any measure-preserving system into a transfinite tower of extensions, alternating between compact (isometric) extensions and weakly mixing extensions. This structure is the key tool in Furstenberg’s proof of the multiple recurrence theorem and, in the refined form due to Host–Kra and Ziegler, in identifying the characteristic factors for polynomial and multilinear ergodic averages. Understanding the structure theory means understanding why systems recur: compact extensions contribute the “almost periodic” component of the dynamics, while weakly mixing extensions contribute the “chaotic” component that averages away.
Section 14.2: The Structure Theorem
- Each extension \( X_{\alpha+1} \to X_\alpha \) is either a compact (isometric) extension or a relatively weakly mixing extension;
- At limit ordinals \( \lambda \), \( X_\lambda = \varprojlim_{\alpha < \lambda} X_\alpha \);
- The tower terminates at \( X_\infty \) (which is an inverse limit that equals \( X \) in the relevant \( L^2 \)-sense).
Section 14.3: Characteristic Factors
The Furstenberg–Zimmer tower leads to the concept of characteristic factors for multiple ergodic averages. For the averages
\[ \frac{1}{N}\sum_{n=0}^{N-1} f_1(T^n x) f_2(T^{2n} x) \cdots f_k(T^{kn} x), \]one wants to understand which part of the system controls the limit. The answer, identified by Host–Kra (2005) and Ziegler (2007), is the \( (k-1) \)-step nilsystem factor.
The simplest nontrivial nilsystem is the 2-step case. Take \( G \) to be the Heisenberg group
\[ G = \left\{ \begin{pmatrix} 1 & a & c \\ 0 & 1 & b \\ 0 & 0 & 1 \end{pmatrix} : a, b, c \in \mathbb{R} \right\} \]with \( \Gamma \) the subgroup of integer entries. The nilmanifold \( G/\Gamma \) is 3-dimensional, and the dynamics on it is a skew rotation generalizing the 2-torus skew product \( T(x,y) = (x + \alpha, y + x) \pmod{1} \).
- If \( T \) is weakly mixing, then \( \mathcal{Z}_1 = \{*, X\} \) (trivial), and the limit is \( \int f \, d\mu \cdot \int g \, d\mu \).
- If \( T \) is an irrational rotation, then \( \mathcal{Z}_1 = T \) itself (the rotation is its own Kronecker factor), and the limit is computed from the eigenfunction structure.
- For a general ergodic \( T \), the limit is computed from the projection of \( f \) and \( g \) onto \( L^2(\mathcal{Z}_1) \).
Chapter 15: Topological Entropy — Deeper Computations via Open Covers
Section 15.1: The Open Cover Definition — Step-by-Step Computations
The open cover definition of topological entropy (Adler–Konheim–McAndrew, 1965) is the original formulation, and it is important to see it work in concrete examples. For the full shift and the irrational rotation, the definitions via separated sets gave clean answers; here we redo these computations using the open cover definition directly.
Step 1: Compute \( N(\mathcal{U}) \). The cover \( \mathcal{U} \) consists of 2 open sets and already covers \( X = \{0,1\}^{\mathbb{Z}} \) (every point has \( x_0 = 0 \) or \( x_0 = 1 \)). So \( N(\mathcal{U}) = 2 \).
Step 2: Compute the join \( \mathcal{U}^n = \mathcal{U} \vee \sigma^{-1}\mathcal{U} \vee \cdots \vee \sigma^{-(n-1)}\mathcal{U} \). The sets in \( \mathcal{U} \) are the cylinders \( [i]_0 \) (determined by coordinate 0). The sets in \( \sigma^{-k}\mathcal{U} \) are \( \sigma^{-k}[i] = [i]_k \) (determined by coordinate \( k \)). The join consists of all intersections \( [i_0]_0 \cap [i_1]_1 \cap \cdots \cap [i_{n-1}]_{n-1} = [i_0 i_1 \cdots i_{n-1}] \) — the length-\( n \) cylinder sets. There are exactly \( 2^n \) such cylinders (one for each binary word of length \( n \)), and each is non-empty and open, and they cover \( X \). So \( N(\mathcal{U}^n) = 2^n \).
\[ h(\sigma, \mathcal{U}) = \lim_{n \to \infty} \frac{1}{n} \log N(\mathcal{U}^n) = \lim_{n \to \infty} \frac{1}{n} \log 2^n = \log 2. \]Step 4: Conclude \( h_{\mathrm{top}}(\sigma) = \log 2 \). Since \( h_{\mathrm{top}}(\sigma) = \sup_{\mathcal{V}} h(\sigma, \mathcal{V}) \geq h(\sigma, \mathcal{U}) = \log 2 \), and one shows the upper bound \( h_{\mathrm{top}}(\sigma) \leq \log 2 \) by noting that any open cover of \( \{0,1\}^{\mathbb{Z}} \) has a refinement by cylinder sets, and the growth rate of cylinder sets is at most \( 2^n \). Hence \( h_{\mathrm{top}}(\sigma) = \log 2 \).
Step 1: \( N(\mathcal{U}) = 2 \). Two open arcs are needed to cover \( \mathbb{T} \) (neither alone covers it).
Step 2: The join \( \mathcal{U}^n = \mathcal{U} \vee T^{-1}\mathcal{U} \vee \cdots \vee T^{-(n-1)}\mathcal{U} \). The set \( T^{-k}U_i = \{x : T^k x \in U_i\} = U_i - k\alpha \pmod{1} \), a rotation of the arc \( U_i \) by \( -k\alpha \). Each such set is an arc of length equal to \( |U_i| \). The join \( \mathcal{U}^n \) consists of all intersections \( \bigcap_{k=0}^{n-1} T^{-k} V_k \) where \( V_k \in \{U_1, U_2\} \).
Step 3: Bounding \( N(\mathcal{U}^n) \). Each set in \( \mathcal{U}^n \) is a connected arc (intersection of arcs on \( \mathbb{T} \)). The \( n \) arcs \( U_1, T^{-1}U_1, \ldots, T^{-(n-1)}U_1 \) have at most \( 2n \) endpoints total, which divide \( \mathbb{T} \) into at most \( 2n \) arcs. The atoms of \( \mathcal{U}^n \) are unions of these arcs, so \( N(\mathcal{U}^n) \leq 2n \).
\[ h(T, \mathcal{U}) = \lim_{n \to \infty} \frac{1}{n} \log N(\mathcal{U}^n) \leq \lim_{n \to \infty} \frac{1}{n} \log 2n = \lim_{n \to \infty} \frac{\log 2n}{n} = 0. \]Step 5: Conclude. Since this holds for any open cover \( \mathcal{U} \), \( h_{\mathrm{top}}(T) = \sup_{\mathcal{U}} h(T, \mathcal{U}) = 0 \).
The key geometric insight: the irrational rotation moves arcs rigidly (it is an isometry), so the join cover \( \mathcal{U}^n \) is determined by \( n \) arcs of fixed sizes, which can only subdivide \( \mathbb{T} \) into \( O(n) \) pieces — polynomial growth, not exponential.
Section 15.2: Entropy of Subshifts of Finite Type
A subshift of finite type (SFT) is defined by a finite alphabet and a set of forbidden words of length 2 (transitions). Its entropy can be computed exactly from the transition matrix.
By the Perron–Frobenius theorem, \( \sum_{i,j}(A^n)_{ij} \sim C \phi^n \) as \( n \to \infty \), so the number of admissible length-\( n \) words grows like \( \phi^n \). Therefore:
\[ h_{\mathrm{top}}(\sigma_A) = \lim_{n \to \infty} \frac{1}{n} \log (\text{number of length-}n\text{ words}) = \log \phi = \log\!\left(\frac{1+\sqrt{5}}{2}\right) \approx 0.481. \]This is strictly between 0 (zero entropy) and \( \log 2 \) (full shift entropy), reflecting the restriction imposed by forbidding \( 11 \).
Chapter 16: The Koopman Operator and Spectral Theory in Depth
Section 16.1: Eigenfunctions of the Koopman Operator for the Irrational Rotation
The spectral theory of the Koopman operator \( U_T \) on \( L^2(X, \mu) \) is one of the most powerful tools for understanding the dynamical properties of a measure-preserving system. For the irrational rotation, this spectral theory is completely explicit. We work it out in full detail here because it forms the backbone of the proofs of unique ergodicity, ergodicity, and the absence of mixing for this system.
So each \( e_k \) is an eigenfunction with eigenvalue \( \lambda_k = e^{2\pi i k \alpha} \).
Step 2: All eigenvalues are distinct. We claim \( \lambda_k \neq \lambda_j \) for \( k \neq j \). Indeed, \( \lambda_k = \lambda_j \iff e^{2\pi i(k-j)\alpha} = 1 \iff (k-j)\alpha \in \mathbb{Z} \). Since \( \alpha \notin \mathbb{Q} \) and \( k - j \in \mathbb{Z} \setminus \{0\} \), this is impossible. So all eigenvalues \( \{e^{2\pi i k\alpha} : k \in \mathbb{Z}\} \) are distinct.
Step 3: The spectrum is pure point. The characters \( \{e_k\}_{k \in \mathbb{Z}} \) form a complete orthonormal basis for \( L^2(\mathbb{T}) \), and each is an eigenfunction of \( U_T \). Therefore \( U_T \) has pure point spectrum — \( L^2(\mathbb{T}) \) is spanned by eigenfunctions.
Step 4: Ergodicity from spectral theory. The eigenvalue 1 corresponds to \( k = 0 \), with eigenfunction \( e_0 \equiv 1 \) (the constant function). Since \( \lambda_k = e^{2\pi ik\alpha} \neq 1 \) for \( k \neq 0 \), the eigenvalue 1 has multiplicity 1. By the spectral characterization (Section 9.2C), \( T \) is ergodic. \( \square \)
Step 5: Non-mixing from spectral theory. Strongly mixing would require \( \langle U_T^n f, g \rangle \to 0 \) for all \( f, g \in L^2_0 = L^2 \ominus \mathbb{C} \). But \( \langle U_T^n e_k, e_k \rangle = e^{2\pi i k n \alpha} \), and for \( k \neq 0 \) this does not converge to 0 (it is a sequence on the unit circle that is equidistributed but does not converge). So \( T \) is not strongly mixing. Similarly, since \( e_k \) (for \( k \neq 0 \)) are eigenfunctions with eigenvalues \( \neq 1 \), the system is not weakly mixing.
Section 16.2: Spectral Types and the Classification Problem
We say \( T \) has:
- pure point spectrum if \( L^2_0 \) is spanned by eigenfunctions of \( U_T \);
- absolutely continuous spectrum if the spectral measures of all \( f \in L^2_0 \) are absolutely continuous with respect to Lebesgue measure on \( \mathbb{T} \);
- Lebesgue spectrum if \( L^2_0 \) decomposes as a countable direct sum of invariant subspaces on each of which \( U_T \) acts as the bilateral shift — the "most random" possible spectrum.
Chapter 17: Joinings and Disjointness
Section 17.1: The Concept of Joining
Joinings, introduced by Furstenberg in 1967, provide a powerful framework for comparing and combining ergodic systems. A joining of two systems is a measure on their product space that is invariant under the product transformation and has the correct marginals. The existence of special joinings (e.g., off-diagonal joinings, graph joinings) reflects deep structural properties of the systems.
The product measure \( \mu \times \nu \) is always a joining (the “independent joining”). A joining is off-diagonal if \( \lambda \) is supported on the graph \( \{(x, \phi(x)) : x \in X\} \) of a measure-theoretic isomorphism \( \phi : X \to Y \); off-diagonal joinings exist if and only if the two systems are isomorphic.
Appendix E: Supplementary Problems
The following problems extend the material from Appendices A–D and the later chapters. They are intended for students who have completed the main problem set and wish to explore more advanced topics.
Part E: Unique Ergodicity and Spectral Theory
Problem E1 (Unique Ergodicity of Skew Products). Let \( \alpha \notin \mathbb{Q} \) and \( f : \mathbb{T} \to \mathbb{T} \) be a measurable function. Consider the skew product \( T_{f} : \mathbb{T}^2 \to \mathbb{T}^2 \) given by \( T_f(x, y) = (x + \alpha, y + f(x)) \). Show that \( T_f \) preserves Lebesgue measure on \( \mathbb{T}^2 \). Then: (a) if \( f(x) = x \), show \( T_f \) is uniquely ergodic; (b) if \( f(x) = 0 \) (constant), show \( T_f \) is not ergodic (the measure decomposes over orbits of the \( y \)-coordinate rotation). (Hint: For (a), use the Fourier argument from Section 13.2.)
Problem E2 (Eigenvalue Group of the 2-Torus Rotation). Let \( T(x, y) = (x + \alpha, y + \beta) \) be a rotation on \( \mathbb{T}^2 \) with \( 1, \alpha, \beta \) rationally independent. (a) Find all eigenfunctions of the Koopman operator \( U_T \) on \( L^2(\mathbb{T}^2) \). (b) Compute the eigenvalue group \( \Lambda = \{e^{2\pi i(m\alpha + n\beta)} : m, n \in \mathbb{Z}\} \). (c) Show that \( T_\alpha \times T_\beta \) (product of two irrational rotations) is ergodic using the spectral characterization. (d) Is \( T \) isomorphic to \( T_\alpha \times T_\beta \)? Justify using the Halmos–von Neumann theorem.
Problem E3 (Koopman Operator for the Doubling Map). Let \( T(x) = 2x \pmod{1} \) on \( [0,1) \) with Lebesgue measure \( \lambda \). (a) Show that \( U_T e_k = e_{2k} \) where \( e_k(x) = e^{2\pi i kx} \). (b) Show that \( U_T \) has no eigenvalues except 0 (in \( L^2_0 \)). (Hint: If \( U_T f = \lambda f \) and \( f \in L^2_0 \), expand \( f = \sum \hat{f}(k) e_k \) and use \( U_T e_k = e_{2k} \) to derive a recursion on Fourier coefficients. Show the recursion forces all coefficients to vanish.) (c) Conclude that the doubling map is weakly mixing. (d) Is it strongly mixing?
Problem E4 (Spectral Theory of the Cat Map). Let \( A : \mathbb{T}^2 \to \mathbb{T}^2 \) be the Arnold cat map. (a) Show that the characters \( e_{(m,n)}(x,y) = e^{2\pi i(mx + ny)} \) for \( (m,n) \in \mathbb{Z}^2 \) are permuted by the Koopman operator: \( U_A e_{(m,n)} = e_{A^T(m,n)} \), where \( A^T \) is the transpose matrix acting on the dual lattice \( \mathbb{Z}^2 \). (b) The eigenvalues of \( U_A \) on the full \( L^2(\mathbb{T}^2) \) are those characters \( e_{(m,n)} \) that are fixed by \( A^T \): \( A^T(m,n) = (m,n) \). Show the only such vector is \( (0,0) \), so the only fixed function is the constant. Conclude \( A \) is ergodic. (c) Show the orbit of every non-trivial character under \( A^T \) is infinite (so \( U_A \) has no eigenvalues on \( L^2_0 \)), concluding \( A \) is weakly mixing.
Problem E5 (Entropy from Spectral Theory — Variational Principle in Action). Let \( T \) be an ergodic pmp transformation with a unique measure of maximal entropy \( \mu^* \). (a) Prove that \( \mu^* \) is ergodic. (Hint: If \( \mu^* = \frac{1}{2}\nu_1 + \frac{1}{2}\nu_2 \) with \( \nu_i \) ergodic invariant measures, show \( h_{\mu^*}(T) = \frac{1}{2}h_{\nu_1}(T) + \frac{1}{2}h_{\nu_2}(T) \leq \max(h_{\nu_i}(T)) \leq h_{\mathrm{top}}(T)\), with equality forcing \( \nu_1 = \nu_2 = \mu^* \).) (b) For the full 2-shift, verify that the uniform Bernoulli measure is ergodic and achieves the topological entropy \( \log 2 \). (c) For a general SFT with irreducible transition matrix \( A \), state (without proof) that the unique measure of maximal entropy is the Markov measure associated to the Perron eigenvectors of \( A \).
Chapter 18: Recurrence Theorems — A Unified Perspective
Section 18.1: Poincaré, Birkhoff, and Beyond
The various recurrence theorems in topological and measure-theoretic dynamics form a hierarchy of increasing strength. Understanding this hierarchy — and the relationships between its members — is a central goal of the course. This chapter synthesizes the results from earlier chapters into a unified picture, with emphasis on how each theorem strengthens the previous one and what new proof technique is required at each step.
The most basic recurrence theorem is Birkhoff recurrence (topological): every compact flow has a recurrent point. This requires no measure theory — only compactness and Zorn’s lemma. One step up is Poincaré recurrence (measure-theoretic): any measure-preserving transformation of a probability space brings almost every point back to any positive-measure set infinitely often. This requires measure theory but no ergodicity. Then comes ergodic recurrence: for ergodic systems, the average return frequency to any positive-measure set equals the measure of that set (by the Birkhoff ergodic theorem). Finally, multiple recurrence (Furstenberg, 1977) says that for any pmp system and any positive-measure set, there exist simultaneous returns to all of \( B, T^{-n}B, \ldots, T^{-kn}B \) for some \( n \geq 1 \). This is the deepest, requiring the Furstenberg–Zimmer structure theory.
- Poincaré Recurrence: For a.e. \( x \in B \), there exist infinitely many \( n \geq 1 \) with \( T^n x \in B \).
- Ergodic Quantification: If \( T \) is ergodic, then \( \frac{1}{N}\sum_{n=1}^N \mathbf{1}_B(T^n x) \to \mu(B) \) for a.e. \( x \).
- Poincaré + Measure: \( \limsup_{N \to \infty} \frac{1}{N}\sum_{n=1}^N \mu(B \cap T^{-n}B) \geq \mu(B)^2 > 0 \).
- Multiple Recurrence (Furstenberg): For every \( k \geq 1 \), \( \liminf_{N \to \infty} \frac{1}{N}\sum_{n=1}^N \mu(B \cap T^{-n}B \cap \cdots \cap T^{-kn}B) > 0 \).
Section 18.2: The Van der Corput Lemma
The van der Corput lemma is a key analytic tool used in the proof of multiple recurrence and in the theory of equidistribution. It replaces “prove convergence of \( a_n \)” with “prove convergence of \( a_{n+h} \overline{a_n} \) on average,” which is often easier.
Section 18.3: The Furstenberg–Sárközy Theorem
The Furstenberg–Sárközy theorem is the case \( k = 1 \) (single recurrence with polynomial returns) of the polynomial multiple recurrence theorem. It shows that any set of positive upper density contains two elements differing by a perfect square.
The key is to analyze the averages \( \frac{1}{N}\sum_{n=1}^N U_T^{n^2} \mathbf{1}_B \) in \( L^2 \). Decompose \( L^2 = L^2_{pp} \oplus L^2_{wm} \) (pure point spectrum plus weakly mixing part). On the pure point spectrum part, each eigenfunction \( f \) with eigenvalue \( e^{2\pi i \lambda} \) contributes \( U_T^{n^2} f = e^{2\pi i \lambda n^2} f \). By Weyl’s polynomial equidistribution theorem, the averages \( \frac{1}{N}\sum_{n=1}^N e^{2\pi i \lambda n^2} \to \int_0^1 e^{2\pi i t^2} \, dt \) … but more precisely, one shows that
\[ \frac{1}{N}\sum_{n=1}^N \mu(B \cap T^{-n^2}B) \to \left\| \mathbb{E}[\mathbf{1}_B | \mathcal{Z}_1] \right\|_{L^2}^2 \geq \mu(B)^2 > 0, \]where \( \mathcal{Z}_1 \) is the Kronecker factor (the maximal equicontinuous factor). This uses the fact that on the weakly mixing part, the polynomial averages still go to 0 (by a variant of van der Corput for polynomial sequences), while on the Kronecker factor, the polynomial equidistribution gives the correct limit. Since \( \mu(B)^2 > 0 \), the limsup is positive, giving the desired \( n \). \( \square \)
Chapter 19: Subshifts, Complexity, and Topological Dynamics
Section 19.1: Complexity Function of Subshifts
For a subshift \( (X, \sigma) \subseteq A^{\mathbb{Z}} \), the complexity function \( p_X(n) \) counts the number of distinct words of length \( n \) that appear in elements of \( X \). The complexity function is a combinatorial invariant that reflects the topological entropy and dynamical properties of the subshift.
The complexity function of a Sturmian sequence is \( p(n) = n + 1 \): there are exactly \( n + 1 \) distinct words of length \( n \). This is the minimum possible complexity for an aperiodic sequence — a theorem of Morse and Hedlund states that if \( p(n) \leq n \) for some \( n \), the sequence is eventually periodic. The entropy is \( \lim_{n} \frac{1}{n}\log(n+1) = 0 \), consistent with our earlier computation that irrational rotations have zero topological entropy.
The Sturmian system is minimal (it is the orbit closure of an irrational rotation coding) and uniquely ergodic (the unique invariant measure is the projection of Lebesgue measure under the coding). Sturmian sequences provide the canonical model of a zero-entropy minimal uniquely ergodic subshift with the simplest possible complexity.
The Thue–Morse subshift \( X_{TM} \subseteq \{0,1\}^{\mathbb{Z}} \) is minimal and uniquely ergodic (with invariant measure \( \nu_{TM} \), the Thue–Morse measure). Its complexity function satisfies \( p(n) = O(n^{1 + \varepsilon}) \) for any \( \varepsilon > 0 \) but grows faster than linearly, so \( h_{\mathrm{top}} = 0 \). The Koopman operator for \( (X_{TM}, \nu_{TM}, \sigma) \) has purely singular continuous spectrum — it is ergodic, not weakly mixing, and the spectrum is not pure point (unlike Sturmian systems). This makes the Thue–Morse system a model for the intermediate spectral type between pure point and absolutely continuous.
Section 19.2: Substitution Dynamics
Substitution dynamical systems — generated by iterating a map \( \tau : A \to A^+ \) (sending each letter to a word) — are one of the richest sources of examples in topological dynamics and ergodic theory. They include Sturmian sequences, the Thue–Morse sequence, the Fibonacci sequence, and many others.
- The orbit closure \( X_\tau = \overline{\{\sigma^n x : n \in \mathbb{Z}\}} \) is a minimal, uniquely ergodic subshift (the substitution subshift).
- The unique invariant measure \( \mu_\tau \) is determined by the Perron eigenvector of the substitution matrix: the frequency of each letter \( a \in A \) in \( x \) is the \( a \)-component of the Perron eigenvector of \( M_\tau \), normalized.
- The topological entropy of \( (X_\tau, \sigma) \) is \( 0 \) (because the complexity function \( p(n) \) is at most polynomial in \( n \) for primitive substitutions of constant length, and at most exponential with exponent below 1 in general).
The Fibonacci subshift is isomorphic (as a measure-preserving system) to the irrational rotation \( T_{1/\phi} \) on \( \mathbb{T} \) (with angle \( 1/\phi = \phi - 1 \)): this is because the Fibonacci sequence codes the irrational rotation of angle \( 1/\phi \), and the two systems have the same spectral properties (pure point spectrum with eigenvalue group generated by \( e^{2\pi i/\phi} \)).
Appendix F: Key Formulas and Computational Reference
The following reference compiles the most important formulas and computational results from the course, organized for quick access.
F.1: Entropy Formulas
| System | Topological Entropy | Metric Entropy (Lebesgue) |
|---|---|---|
| Irrational rotation \( T_\alpha \) on \( \mathbb{T} \) | \( 0 \) | \( 0 \) |
| Doubling map \( T(x) = 2x \pmod{1} \) | \( \log 2 \) | \( \log 2 \) |
| Full \( k \)-shift \( \sigma \) on \( A^{\mathbb{Z}} \), ( | A | = k ) |
| Bernoulli shift \( (p_0, \ldots, p_{k-1}) \) | \( \log k \) (for uniform) | \( -\sum p_i \log p_i \) |
| Arnold cat map \( A = \begin{pmatrix}2&1\\1&1\end{pmatrix} \) | \( \log\frac{3+\sqrt{5}}{2} \) | \( \log\frac{3+\sqrt{5}}{2} \) |
| Golden mean shift | \( \log \phi \) (\( \phi = \frac{1+\sqrt{5}}{2} \)) | \( = h_{\mathrm{top}} \) (Markov) |
| SFT with matrix \( A \) | \( \log \rho(A) \) | \( = h_{\mathrm{top}} \) (Markov) |
| Sturmian subshift | \( 0 \) | \( 0 \) |
| Thue–Morse subshift | \( 0 \) | \( 0 \) |
(\( \rho(A) \) = spectral radius of the transition matrix \( A \).)
F.2: Spectral Properties of Key Systems
| System | Ergodic? | Weakly Mixing? | Strongly Mixing? | Spectral Type |
|---|---|---|---|---|
| Irrational rotation \( T_\alpha \) | Yes | No | No | Pure point |
| Rational rotation \( T_{p/q} \) | No (unless \( X = \{0\} \)) | No | No | Pure point (periodic) |
| Doubling map | Yes | Yes | Yes | Lebesgue (abs. cont.) |
| Bernoulli shift | Yes | Yes | Yes | Lebesgue, mult. \( \infty \) |
| Arnold cat map | Yes | Yes | Yes | Lebesgue, mult. \( \infty \) |
| Sturmian (irrational rotation code) | Yes | No | No | Pure point |
| Thue–Morse system | Yes | No | No | Singular continuous |
| Generic (Baire typical) pmp | Yes | Yes | No | Singular |
F.3: Birkhoff Averages — What They Converge To
For a pmp system \( (X, \mu, T) \) and \( f \in L^1(X, \mu) \):
\[ \frac{1}{N}\sum_{n=0}^{N-1} f(T^n x) \xrightarrow{\text{a.e. and }L^1} \mathbb{E}[f \mid \mathcal{I}](x), \]where \( \mathcal{I} = \{A : T^{-1}A = A \text{ a.e.}\} \) is the invariant \( \sigma \)-algebra. Special cases:
- Ergodic \( T \): \( \mathcal{I} \) is trivial, so \( \mathbb{E}[f|\mathcal{I}] = \int f \, d\mu \) (constant). The time average equals the space average a.e.
- Identity \( T = \mathrm{id} \): \( \mathcal{I} = \mathcal{B} \) (the full \( \sigma \)-algebra), so \( \mathbb{E}[f|\mathcal{I}] = f \). The time average is the function itself.
- Irrational rotation (unique ergodicity): Convergence holds for every \( x \), uniformly, and the limit is \( \int f \, d\lambda \) (Oxtoby’s theorem).
- Strongly mixing \( T \): Not only do time averages converge, but \( \mu(T^{-n}A \cap B) \to \mu(A)\mu(B) \) for all measurable \( A, B \).
F.4: Furstenberg Correspondence — Summary Table
| Combinatorial Property of \( A \subseteq \mathbb{Z} \) | Dynamical Equivalent in \( (X, \mu, T, B) \) |
|---|---|
| \( d^*(A) > 0 \) | \( \mu(B) > 0 \) |
| \( A \) contains \( \{a, a+n\} \) for some \( n \geq 1 \) | \( \mu(B \cap T^{-n}B) > 0 \) for some \( n \) (Poincaré) |
| \( A \) contains 3-term AP \( \{a, a+n, a+2n\} \) | \( \mu(B \cap T^{-n}B \cap T^{-2n}B) > 0 \) |
| \( A \) contains \( k \)-term AP | \( \mu(B \cap T^{-n}B \cap \cdots \cap T^{-kn}B) > 0 \) |
| \( A \) contains \( \{a, a+n^2\} \) (Sárközy) | \( \mu(B \cap T^{-n^2}B) > 0 \) (polynomial recurrence) |
| \( A \) contains all finite configurations (Furstenberg–Katznelson) | Multiple recurrence for \( \mathbb{Z}^d \)-systems |
Appendix G: Glossary of Key Terms
Aperiodic (transformation): A pmp transformation \( T \) such that \( \mu(\{x : T^k x = x\}) = 0 \) for all \( k \geq 1 \). Equivalently, almost every orbit is infinite.
Birkhoff ergodic theorem: For a pmp transformation \( T \) and \( f \in L^1(\mu) \), the time averages \( \frac{1}{N}\sum_{n=0}^{N-1} f(T^n x) \) converge a.e. to the conditional expectation \( \mathbb{E}[f|\mathcal{I}] \), where \( \mathcal{I} \) is the invariant \( \sigma \)-algebra.
Central set: A subset \( A \subseteq \mathbb{N} \) that belongs to some minimal idempotent ultrafilter in \( \beta\mathbb{N} \). Every central set is an IP set; central sets are piecewise syndetic.
Complexity function: For a subshift \( X \subseteq A^{\mathbb{Z}} \), the function \( p(n) = |\mathcal{L}_n(X)| \) counting distinct words of length \( n \) in \( X \). The entropy is \( \lim \frac{1}{n}\log p(n) \).
Distal (flow/pair): A flow \( (X, G) \) is distal if no two distinct points are proximal: \( \inf_g d(gx, gy) > 0 \) for all \( x \neq y \). The flow is distal iff every element of the Ellis semigroup \( E(X) \) is a bijection.
Ellis semigroup: For a flow \( (X, G) \), the closure \( E(X) \) of \( \{x \mapsto gx : g \in G\} \) in \( X^X \) (product topology), with the semigroup operation of composition. It is a compact right-topological semigroup.
Entropy (topological): \( h_{\mathrm{top}}(T) = \sup_{\mathcal{U}} \lim_{n\to\infty} \frac{1}{n}\log N(\mathcal{U}^n) \), where \( N(\mathcal{U}^n) \) is the minimum cover size of the \( n \)-fold join.
Entropy (metric/KS): \( h_\mu(T) = \sup_\xi \lim_{n\to\infty} \frac{1}{n} H(\bigvee_{k=0}^{n-1} T^{-k}\xi) \), where the sup is over finite partitions and \( H(\xi) = -\sum \mu(A_i)\log\mu(A_i) \) is Shannon entropy.
Equicontinuous (flow): A flow \( (X, G) \) where the family \( \{x \mapsto gx : g \in G\} \) is equicontinuous — for every \( \varepsilon > 0 \) there is \( \delta > 0 \) such that \( d(x, y) < \delta \Rightarrow d(gx, gy) < \varepsilon \) for all \( g \in G \).
Ergodic (transformation): A pmp transformation \( T : (X, \mu) \to (X, \mu) \) such that every \( T \)-invariant measurable set has measure 0 or 1. Equivalently, the only \( T \)-invariant \( L^2 \) functions are constants (spectral characterization).
Furstenberg correspondence principle: A dictionary between density results in additive combinatorics and recurrence results in ergodic theory. Given \( A \subseteq \mathbb{Z} \) with \( d^*(A) > 0 \), there exists a pmp system \( (X, \mu, T) \) and \( B \) with \( \mu(B) \geq d^*(A) \) such that density of combinatorial configurations in \( A \) is lower-bounded by measure of corresponding dynamical configurations in \( B \).
IP set: A set containing the set of all finite sums \( \mathrm{FS}(x_1, x_2, \ldots) = \{\sum_{n \in F} x_n : F \text{ finite}\} \) for some sequence \( (x_n) \). An IP\(^*\) set intersects every IP set.
Joining: For two pmp systems \( (X, \mu, T) \) and \( (Y, \nu, S) \), a \( (T \times S) \)-invariant measure on \( X \times Y \) with the correct marginals. The independent joining is \( \mu \times \nu \).
Koopman operator: \( U_T : L^2(X, \mu) \to L^2(X, \mu) \), \( U_T f = f \circ T \). Unitary when \( T \) is measure-preserving. Encodes all spectral information about \( T \).
Minimal (flow): A \( G \)-flow with no proper closed invariant subsets; equivalently, every orbit is dense.
Nilsystem: A measure-preserving system of the form \( (G/\Gamma, m, T_g) \) where \( G \) is a nilpotent Lie group, \( \Gamma \leq G \) is a cocompact lattice, \( m \) is the Haar measure on \( G/\Gamma \), and \( T_g \) is left rotation by \( g \in G \). The characteristic factor for \( k \)-term multiple ergodic averages is a \( (k-1) \)-step nilsystem.
Proximal (flow/pair): A pair \( (x, y) \) in a flow is proximal if \( \inf_g d(gx, gy) = 0 \). A flow is proximal if every pair is proximal.
Strongly mixing: \( \mu(T^{-n}A \cap B) \to \mu(A)\mu(B) \) for all measurable \( A, B \). Equivalent to \( U_T^n \to 0 \) weakly on \( L^2_0 \).
Syndetic: A set \( A \subseteq \mathbb{Z} \) with bounded gaps: there is a finite set \( K \) with \( A \cap (n + K) \neq \emptyset \) for all \( n \).
Unique ergodicity: A topological system \( (X, T) \) with exactly one \( T \)-invariant Borel probability measure. Equivalent (by Oxtoby) to uniform convergence of Birkhoff averages for all continuous functions.
\[ h_{\mathrm{top}}(T) = \sup\{ h_\mu(T) : \mu \text{ is } T\text{-invariant Borel probability}\}. \]Weakly mixing: \( \frac{1}{N}\sum_{n=0}^{N-1}|\mu(T^{-n}A \cap B) - \mu(A)\mu(B)| \to 0 \) for all measurable \( A, B \). Equivalent to \( U_T \) having no eigenvalues on \( L^2_0 \) (pure continuous spectrum).
Appendix H: Extended Proof Gallery
This appendix collects full detailed proofs of selected results that are central to the course but whose proofs were only sketched in the main text. The goal is to give the reader a complete argument to study, illustrate proof techniques, and provide a model for how these theorems fit together.
H.1: Full Proof of the Mean Ergodic Theorem
We give a self-contained proof of von Neumann’s mean ergodic theorem, including the key spectral decomposition step.
Step 1: The decomposition is orthogonal. We claim \( \mathrm{Fix}(U) \perp \mathrm{ran}(I - U) \). If \( Uf = f \) and \( g \in H \), then \( \langle f, g - Ug \rangle = \langle f, g \rangle - \langle f, Ug \rangle = \langle f, g \rangle - \langle U^*f, g \rangle = \langle f, g \rangle - \langle U^{-1}f, g \rangle = \langle f, g \rangle - \langle f, g \rangle = 0 \), using \( U^* = U^{-1} \) (unitarity) and \( U^{-1}f = f \). So \( \mathrm{Fix}(U) \perp \mathrm{ran}(I-U) \), and by taking closures, \( \mathrm{Fix}(U) \perp \overline{\mathrm{ran}(I-U)} \).
Step 2: The decomposition is complete. We need \( H = \mathrm{Fix}(U) \oplus \overline{\mathrm{ran}(I-U)} \). By the orthogonal decomposition theorem, it suffices to show \( \mathrm{Fix}(U)^\perp = \overline{\mathrm{ran}(I-U)} \). The inclusion \( \overline{\mathrm{ran}(I-U)} \subseteq \mathrm{Fix}(U)^\perp \) follows from Step 1. For the other direction: if \( f \perp \mathrm{ran}(I-U) \), then \( 0 = \langle f, g - Ug \rangle = \langle f, g \rangle - \langle f, Ug \rangle = \langle f, g \rangle - \langle U^{-1}f, g \rangle \) for all \( g \in H \), giving \( U^{-1}f = f \), i.e., \( Uf = f \in \mathrm{Fix}(U) \).
Step 3: Averages converge on each summand.
- If \( f \in \mathrm{Fix}(U) \): \( \frac{1}{N}\sum_{n=0}^{N-1}U^n f = \frac{1}{N}\sum_{n=0}^{N-1} f = f = Pf \). Convergence is trivial.
- If \( f = g - Ug \in \mathrm{ran}(I - U) \): \( \sum_{n=0}^{N-1} U^n f = \sum_{n=0}^{N-1}(U^n g - U^{n+1}g) = g - U^N g \). So \( \left\|\frac{1}{N}\sum_{n=0}^{N-1}U^n f\right\| = \frac{1}{N}\|g - U^N g\| \leq \frac{2\|g\|}{N} \to 0 \). So \( \frac{1}{N}\sum_{n=0}^{N-1}U^n f \to 0 = Pf \) (since \( f \perp \mathrm{Fix}(U) \) means \( Pf = 0 \)).
- For \( f \in \overline{\mathrm{ran}(I-U)} \): Approximate by \( f_k \in \mathrm{ran}(I-U) \) with \( \|f - f_k\| < \varepsilon \). Then \( \left\|\frac{1}{N}\sum U^n f - 0\right\| \leq \left\|\frac{1}{N}\sum U^n(f - f_k)\right\| + \left\|\frac{1}{N}\sum U^n f_k\right\| \leq \|f - f_k\| + \frac{2\|g_k\|}{N} < \varepsilon + \frac{2\|g_k\|}{N} \). For large \( N \), this is \( < 2\varepsilon \). Since \( \varepsilon \) is arbitrary, \( \frac{1}{N}\sum U^n f \to 0 = Pf \).
Step 4: General \( f \). Decompose \( f = f_1 + f_2 \) with \( f_1 \in \mathrm{Fix}(U) \) and \( f_2 \in \overline{\mathrm{ran}(I-U)} \). Then \( \frac{1}{N}\sum U^n f = \frac{1}{N}\sum U^n f_1 + \frac{1}{N}\sum U^n f_2 \to f_1 + 0 = Pf \). \( \square \)
H.2: Full Proof of Poincaré Recurrence with Quantitative Bound
Now compute:
\[ \frac{1}{N}\sum_{n=0}^{N-1}\mu(B \cap T^{-n}B) = \frac{1}{N}\sum_{n=0}^{N-1} \langle U_T^n f, f \rangle_{L^2} = \langle A_N f, f \rangle_{L^2} \to \langle Pf, f \rangle_{L^2} = \|Pf\|_{L^2}^2. \](The limit uses \( A_N f \to Pf \) in \( L^2 \) and \( f \in L^2 \), so the inner product converges.) Since \( P \) is an orthogonal projection, \( \|Pf\|^2 \geq |\langle Pf, \mathbf{1}\rangle|^2 / \|\mathbf{1}\|^2 = |\int f \, d\mu|^2 = \mu(B)^2 > 0 \) (by Cauchy–Schwarz applied to \( \mathbf{1} \in \mathrm{Fix}(U_T) \), which is in the range of \( P \)). So
\[ \lim_{N \to \infty} \frac{1}{N}\sum_{n=0}^{N-1}\mu(B \cap T^{-n}B) = \|Pf\|^2 \geq \mu(B)^2 > 0. \]This means the Cesàro average of \( \mu(B \cap T^{-n}B) \) converges to a positive number, so \( \mu(B \cap T^{-n}B) > 0 \) for infinitely many \( n \). \( \square \)
H.3: Proof that Irrational Rotation is Not Mixing
which has absolute value 1 for all \( n \). For weak mixing, we would need
\[ \frac{1}{N}\sum_{n=0}^{N-1}|\langle U_T^n e_1, e_1 \rangle| = 1 \to 0, \]which is impossible. Therefore \( T \) is not weakly mixing.
Concrete measure formulation. Take \( A = [0, 1/3) \) and \( B = [0, 1/3) \). For weak mixing we need \( \frac{1}{N}\sum_{n=0}^{N-1}|\lambda([0,1/3) \cap T^{-n}[0,1/3)) - 1/9| \to 0 \). However, when \( n\alpha \pmod{1} \) is very small (which happens for infinitely many \( n \) since the orbit of 0 equidistributes), \( T^{-n}[0,1/3) = [-n\alpha, 1/3 - n\alpha) \approx [0, 1/3) \), so \( \lambda(A \cap T^{-n}A) \approx 1/3 \gg (1/3)^2 = 1/9 \). The oscillation prevents Cesàro convergence to 0. \( \square \)
H.4: Derivation of the Doubling Map’s Spectral Properties
Step 1: \( U_T e_k = e_{2k} \). Compute: \( U_T e_k(x) = e_k(T(x)) = e^{2\pi i k(2x)} = e^{2\pi i(2k)x} = e_{2k}(x) \).
Step 2: No \( L^2_0 \) eigenfunctions. Suppose \( U_T f = \lambda f \) for some \( f = \sum_{k \neq 0} \hat{f}(k) e_k \in L^2_0 \) and \( \lambda \neq 0 \). Then \( U_T f = \sum_k \hat{f}(k) e_{2k} \), so matching Fourier coefficients: \( \lambda \hat{f}(k) = \hat{f}(k/2) \) (where \( \hat{f}(k/2) = 0 \) if \( k \) is odd). For even \( k = 2m \): \( \lambda \hat{f}(2m) = \hat{f}(m) \), i.e., \( \hat{f}(m) = \lambda \hat{f}(2m) = \lambda^2 \hat{f}(4m) = \cdots = \lambda^j \hat{f}(2^j m) \). Since \( f \in L^2 \), \( \sum_k |\hat{f}(k)|^2 < \infty \). If \( |\lambda| < 1 \), then \( |\hat{f}(m)| = |\lambda|^j |\hat{f}(2^j m)| \leq |\lambda|^j \|f\|_2 \to 0 \), so \( \hat{f}(m) = 0 \) for all \( m \neq 0 \). If \( |\lambda| > 1 \), a similar argument (going the other direction) forces \( \hat{f} = 0 \). If \( |\lambda| = 1 \), then \( |\hat{f}(m)| = |\hat{f}(2^j m)| \) for all \( j \), which contradicts \( \sum |\hat{f}(k)|^2 < \infty \) unless \( \hat{f}(m) = 0 \). In all cases \( f = 0 \), so there are no \( L^2_0 \) eigenfunctions.
Step 3: \( T \) is weakly mixing. Since \( U_T|_{L^2_0} \) has no eigenvalues, \( T \) is weakly mixing (by the spectral characterization).
\[ \langle U_T^n e_k, e_j \rangle = \langle e_{2^n k}, e_j \rangle = \begin{cases} 1 & \text{if } 2^n k = j \\ 0 & \text{otherwise}\end{cases}. \]For \( k \neq 0 \) and any fixed \( j \), \( 2^n k = j \) can hold for at most one value of \( n \) (since \( 2^n k \) is injective in \( n \)). So \( \langle U_T^n e_k, e_j \rangle = 0 \) for all but at most one \( n \), hence \( \langle U_T^n e_k, e_j \rangle \to 0 \) as \( n \to \infty \). By linearity and density of trigonometric polynomials, \( \langle U_T^n f, g \rangle \to 0 \) for all \( f, g \in L^2_0 \), confirming strong mixing.
| Irrational Rotation | Doubling Map | Bernoulli Shift | |
|---|---|---|---|
| Minimal | Yes | No | No (not even defined topologically as such) |
| Uniquely ergodic | Yes | No (many inv. measures) | Yes (uniform Bernoulli) |
| Ergodic | Yes | Yes | Yes |
| Weakly mixing | No | Yes | Yes |
| Strongly mixing | No | Yes | Yes |
| Entropy | 0 | \( \log 2 \) | \( \log 2 \) |
| Spectral type | Pure point | Lebesgue | Lebesgue (mult. \( \infty \)) |
| Isomorphism class | Rotation on \( \mathbb{T} \) | Lebesgue class | Bernoulli |
The irrational rotation and doubling map have the same entropy (0 vs. \( \log 2 \)) but radically different spectral properties. The doubling map and the Bernoulli shift have the same topological entropy and the same spectral type, yet the Bernoulli shift is a richer system (the doubling map is a factor of the Bernoulli shift, not the other way around). Ornstein’s theorem implies that any ergodic system with entropy \( \log 2 \) and the Lebesgue spectral type is isomorphic to the Bernoulli shift — the doubling map satisfies this and hence is, measure-theoretically, isomorphic to the \( (1/2,1/2) \)-Bernoulli shift.
Appendix I: Quick Reference — Theorems and Their Dependencies
The following diagram outlines the logical dependencies among the main theorems of the course. Understanding these dependencies is essential for seeing the overall structure of the subject.
Topological branch:
- Tychonoff’s theorem (compactness of products) → Existence of minimal subflows (Zorn’s lemma + compactness) → Birkhoff recurrence theorem → Auslander–Ellis theorem (proximal pairs have common proximal points)
- Ellis semigroup (closure in \( X^X \)) → Ellis–Namiooka–Ruppert (idempotents in compact right-topological semigroups) → Idempotents in \( \beta G \) → Hindman’s theorem (idempotent ultrafilter proof) → Central sets theorem
- Furstenberg structure theorem for distal flows (transfinite tower of equicontinuous extensions) → Universal minimal distal flow → Bohr compactification = universal minimal equicontinuous \( \mathbb{Z} \)-system
Measure-theoretic branch:
- Lebesgue integration theory (monotone convergence, dominated convergence, Radon–Nikodym) → Definition of pmp transformations and \( L^2 \) → Koopman operator (unitary on \( L^2 \)) → Mean ergodic theorem (von Neumann, 1932) → Birkhoff ergodic theorem (1931) — requires maximal ergodic lemma
- Spectral theory of unitary operators → Halmos–von Neumann theorem (pure point spectrum \( \iff \) rotation on compact abelian group)
- Shannon information theory → Kolmogorov–Sinai entropy (1958) → Entropy of Bernoulli shifts → Ornstein isomorphism theorem (entropy is complete invariant for Bernoulli shifts)
Connecting branch (Furstenberg–Zimmer):
- Birkhoff ergodic theorem + Koopman operator theory → Poincaré recurrence (quantitative form) → Furstenberg–Zimmer structure theorem (compact and weakly mixing extensions) → Multiple recurrence theorem (Furstenberg, 1977)
- Furstenberg correspondence principle (encoding \( A \subseteq \mathbb{Z} \) as a pmp system) → Szemerédi’s theorem (positive density \( \Rightarrow \) arithmetic progressions)
- Host–Kra theory (nilfactors as characteristic factors) → Convergence of multiple ergodic averages
- Green–Tao (primes contain arithmetic progressions of every length, 2004)