PMATH 457: Topological Dynamics and Ergodic Theory

Estimated study time: 1 hr 20 min

Table of contents

Sources and References

Primary textbooks — N. Hindman & D. Strauss, Algebra in the Stone-Čech Compactification, 2nd ed., de Gruyter, 2012; J. Auslander, Minimal Flows and Their Extensions, North-Holland, 1988; H. Furstenberg, Recurrence in Ergodic Theory and Combinatorial Number Theory, Princeton University Press, 1981.
Supplementary texts — W. Parry, Topics in Ergodic Theory, Cambridge University Press, 1981; P. Walters, An Introduction to Ergodic Theory, Springer, 1982; T. Downarowicz, Entropy in Dynamical Systems, Cambridge University Press, 2011; D. Rudolph, Fundamentals of Measurable Dynamics, Oxford University Press, 1990; E. Glasner, Ergodic Theory via Joinings, AMS, 2003.
Online resources — T. Tao’s blog posts on ergodic theory and the Furstenberg correspondence principle; MIT OpenCourseWare 18.125 (Real Analysis); Cambridge Part III lecture notes on ergodic theory; Stanford lecture notes on topological dynamics; notes by Kra, Host, and Frantzikinakis on ergodic theory and combinatorics.


Chapter 1: Compact Topological Dynamical Systems

Section 1.1: Basic Definitions and Examples

The central objects of topological dynamics are flows — continuous actions of groups on compact Hausdorff spaces. This chapter develops the foundational vocabulary and establishes the first structural theorems.

Definition (Topological Dynamical System / Flow): A topological dynamical system (or flow) is a triple \( (X, G, \pi) \) where \( X \) is a compact Hausdorff space, \( G \) is a topological group, and \( \pi : G \times X \to X \) is a continuous action — that is, \( \pi \) is continuous, \( \pi(e, x) = x \) for all \( x \in X \) (where \( e \) is the identity of \( G \)), and \( \pi(gh, x) = \pi(g, \pi(h, x)) \) for all \( g, h \in G \), \( x \in X \).

When \( G = \mathbb{Z} \) or \( G = \mathbb{Z}_{\geq 0} \), the system is generated by a single homeomorphism (resp. continuous self-map) \( T = \pi(1, \cdot) : X \to X \), and we write the system as \( (X, T) \). When \( G = \mathbb{R} \) the system is called a continuous flow.

We will frequently drop \( \pi \) from the notation, writing \( g \cdot x \) or \( gx \) for \( \pi(g, x) \). For a fixed \( g \in G \), the map \( x \mapsto gx \) is a homeomorphism of \( X \) (when \( G \) is a group); for a fixed \( x \in X \), the map \( g \mapsto gx \) is continuous and called the orbit map at \( x \).

Definition (Orbit): The orbit of a point \( x \in X \) under a \( G \)-flow is the set \( \mathcal{O}(x) = \{ gx : g \in G \} \subseteq X \). The orbit closure of \( x \) is \( \overline{\mathcal{O}(x)} \).
Example (Irrational Rotation). Let \( X = \mathbb{T} = \mathbb{R}/\mathbb{Z} \) (the circle), \( G = \mathbb{Z} \), and \( T : \mathbb{T} \to \mathbb{T} \) defined by \( T(x) = x + \alpha \pmod{1} \) for a fixed \( \alpha \in \mathbb{R} \setminus \mathbb{Q} \). This is an irrational rotation. It is a fundamental example: every orbit is dense in \( \mathbb{T} \), and the system is minimal (see below). More generally, for \( \alpha \in \mathbb{Q} \), every orbit is finite and periodic.
Example (Shift Spaces). Let \( A \) be a finite discrete alphabet and \( X = A^{\mathbb{Z}} \) equipped with the product (Tychonoff) topology, which is compact by Tychonoff's theorem. The shift map \( \sigma : A^{\mathbb{Z}} \to A^{\mathbb{Z}} \) is defined by \( (\sigma x)_n = x_{n+1} \). Then \( (A^{\mathbb{Z}}, \sigma) \) is a topological dynamical system called the full shift on \( A \). A subshift is a closed \( \sigma \)-invariant subset \( X \subseteq A^{\mathbb{Z}} \), yielding a subsystem.
Example (Toral Automorphisms). Let \( X = \mathbb{T}^n = \mathbb{R}^n / \mathbb{Z}^n \) and \( A \in GL(n, \mathbb{Z}) \) (an integer matrix with determinant \( \pm 1 \)). Then \( A \) induces a homeomorphism of \( \mathbb{T}^n \). For \( n = 2 \) and the matrix \( A = \begin{pmatrix} 2 & 1 \\ 1 & 1 \end{pmatrix} \) (the Arnold cat map), the resulting system is hyperbolic and mixing — an important prototype in the theory.

Section 1.2: Invariant Sets and Minimality

Definition (Invariant Set): A subset \( Y \subseteq X \) of a \( G \)-flow is:
  • positively invariant if \( gY \subseteq Y \) for all \( g \in G \) (when \( G \) acts by semigroup);
  • invariant if \( gY = Y \) for all \( g \in G \).
A subflow is a closed invariant subset \( Y \subseteq X \) together with the restricted action. A flow is minimal if it has no proper closed invariant subsets — equivalently, every orbit is dense.
Theorem (Existence of Minimal Subflows): Every compact \( G \)-flow contains a minimal subflow.
Proof. The collection of all closed invariant nonempty subsets of \( X \), ordered by inclusion, satisfies the hypotheses of Zorn's lemma: any chain has a nonempty intersection (by compactness, a decreasing chain of nonempty compact sets has nonempty intersection, and the intersection is clearly closed and invariant). A minimal element of this poset is a minimal subflow.
Remark. This existence theorem is purely topological — no algebraic structure on \( G \) beyond acting continuously is needed. It is one of the first places where compactness of \( X \) plays an essential role.
Definition (Equicontinuity): A \( G \)-flow \( (X, G) \) is equicontinuous if the family of maps \( \{ x \mapsto gx : g \in G \} \) is equicontinuous, i.e., for every \( \varepsilon > 0 \) there exists \( \delta > 0 \) such that \( d(x, y) < \delta \) implies \( d(gx, gy) < \varepsilon \) for all \( g \in G \).

For metrizable \( X \), equicontinuous minimal flows are exactly the isometric flows — those that preserve a compatible metric. They are closely related to compact group rotations, as we shall see in the chapter on Bohr compactification.

Section 1.3: Proximality and Distality

Two points \( x, y \in X \) capture qualitatively different behaviors under the action.

Definition (Proximal and Distal Points): In a compact \( G \)-flow \( (X, G) \), two points \( x, y \in X \) are proximal (written \( x \sim_P y \)) if \[ \inf_{g \in G} d(gx, gy) = 0, \] i.e., there exists a net \( (g_\alpha) \) in \( G \) with \( \lim_\alpha g_\alpha x = \lim_\alpha g_\alpha y \). They are distal if they are not proximal. The flow itself is called:
  • proximal if every pair \( (x, y) \) is proximal;
  • distal if every pair of distinct points is distal.
Theorem (Auslander–Ellis): In a compact \( G \)-flow, every proximal pair \( (x, y) \) has a common proximal fixed point: there exists a minimal point \( z \in X \) such that \( (x, z) \) and \( (y, z) \) are proximal. In particular, a minimal proximal flow has a fixed point.
Remark. Distal flows were introduced by Hilbert and studied systematically by Auslander and Ellis. The Furstenberg structure theorem (Chapter 5) characterizes distal flows as those built from equicontinuous extensions in a transfinite tower.

Section 1.4: Morphisms and Extensions

Definition (Homomorphism of Flows): A homomorphism (or factor map) between \( G \)-flows \( (X, G) \) and \( (Y, G) \) is a continuous surjective map \( \phi : X \to Y \) that intertwines the actions: \( \phi(gx) = g\phi(x) \) for all \( g \in G \), \( x \in X \). We call \( (Y, G) \) a factor of \( (X, G) \) and \( (X, G) \) an extension of \( (Y, G) \). An isomorphism is a bijective homomorphism (which is automatically a homeomorphism since \( X \) is compact Hausdorff).
Definition (Equicontinuous Extension): A factor map \( \phi : X \to Y \) is an equicontinuous extension if for every \( \varepsilon > 0 \) there exists \( \delta > 0 \) such that whenever \( d_Y(\phi(x_1), \phi(x_2)) < \delta \) and \( d_X(x_1, x_2) < \delta \), then \( d_X(gx_1, gx_2) < \varepsilon \) for all \( g \in G \).

Section 1.5: Recurrence

Definition (Recurrent Point): A point \( x \in X \) in a \( G \)-flow is recurrent if \( x \in \overline{\mathcal{O}(x) \setminus \{x\}} \), i.e., \( x \) returns arbitrarily close to itself under the action. More precisely (for \( G = \mathbb{Z} \)), \( x \) is recurrent if for every neighborhood \( U \ni x \) there exist infinitely many \( n \in \mathbb{Z} \) with \( T^n x \in U \).
Theorem (Birkhoff Recurrence): Every compact \( G \)-flow contains a recurrent point.
Proof. By the existence of minimal subflows, take any minimal subflow \( M \subseteq X \). Any \( x \in M \) has dense orbit in \( M \), so in particular \( x \in \overline{\mathcal{O}(x) \setminus \{x\}} \).

Chapter 2: Ultrafilters, Čech-Stone Compactification, and the Greatest Ambit

Section 2.1: Ultrafilters

Ultrafilters are the key algebraic device for studying Stone-Čech compactifications and, ultimately, Ramsey-type combinatorial results.

Definition (Filter and Ultrafilter): A filter on a set \( S \) is a collection \( \mathcal{F} \subseteq \mathcal{P}(S) \) such that:
  1. \( \emptyset \notin \mathcal{F} \) and \( S \in \mathcal{F} \);
  2. If \( A \in \mathcal{F} \) and \( A \subseteq B \), then \( B \in \mathcal{F} \);
  3. If \( A, B \in \mathcal{F} \), then \( A \cap B \in \mathcal{F} \).
An ultrafilter is a maximal filter — equivalently, a filter \( \mathcal{U} \) such that for every \( A \subseteq S \), either \( A \in \mathcal{U} \) or \( S \setminus A \in \mathcal{U} \).

A principal ultrafilter is \( \mathcal{U}_x = \{ A \subseteq S : x \in A \} \) for some fixed \( x \in S \). All other ultrafilters are non-principal (or free); their existence requires the axiom of choice (specifically, the ultrafilter lemma, which is weaker than full AC but not provable in ZF alone).

Theorem (Existence of Non-Principal Ultrafilters): For any infinite set \( S \), there exist non-principal ultrafilters on \( S \).
Proof. The Fréchet filter \( \mathcal{F}_0 = \{ A \subseteq S : S \setminus A \text{ is finite} \} \) is a filter not containing any finite sets. By Zorn's lemma, extend \( \mathcal{F}_0 \) to a maximal filter, which is an ultrafilter. Since it contains no finite sets, it is non-principal.

Section 2.2: The Stone-Čech Compactification

Definition (Stone-Čech Compactification): The Stone-Čech compactification of a discrete space \( S \) is a compact Hausdorff space \( \beta S \) together with an injection \( \iota : S \hookrightarrow \beta S \) such that:
  • \( \iota(S) \) is dense in \( \beta S \);
  • Every bounded function \( f : S \to [0, 1] \) extends uniquely to a continuous function \( \bar{f} : \beta S \to [0, 1] \).
\[ \overline{A} = \{ \mathcal{U} \in \beta S : A \in \mathcal{U} \} \]

for \( A \subseteq S \). These sets are also clopen, making \( \beta S \) a totally disconnected compact Hausdorff space (a Stone space).

For \( f : S \to [0, 1] \), the extension \( \bar{f} : \beta S \to [0, 1] \) is given by \( \bar{f}(\mathcal{U}) = \lim_{\mathcal{U}} f \), the \( \mathcal{U} \)-limit of \( f \): the unique real number \( r \) such that \( f^{-1}(r - \varepsilon, r + \varepsilon) \in \mathcal{U} \) for all \( \varepsilon > 0 \).

Theorem (Universal Property of \(\beta S\)): For any compact Hausdorff space \( K \) and any function \( f : S \to K \), there is a unique continuous extension \( \bar{f} : \beta S \to K \).

Section 2.3: The Greatest Ambit

When \( G \) is a discrete group (or, more generally, a topological group), the Stone-Čech compactification \( \beta G \) carries a natural structure as a \( G \)-flow.

Definition (G-Ambit): A \( G \)-ambit is a compact \( G \)-flow \( (X, G) \) together with a distinguished point \( x_0 \in X \) whose orbit \( Gx_0 \) is dense in \( X \).
Definition (Greatest Ambit): The greatest ambit (or Samuel compactification) of \( G \) is a \( G \)-ambit \( (S(G), G, e_G) \) with the property that for every \( G \)-ambit \( (X, G, x_0) \), there is a unique \( G \)-equivariant continuous map \( \phi : S(G) \to X \) with \( \phi(e_G) = x_0 \). It is unique up to isomorphism.

Construction for discrete \( G \). For a discrete group \( G \), the greatest ambit is simply \( \beta G \) with \( G \) acting by left multiplication: \( g \cdot \mathcal{U} = \{ gA : A \in \mathcal{U} \} \). The distinguished point is the principal ultrafilter \( \mathcal{U}_e \) at the identity, and \( G \cdot \mathcal{U}_e = \{ \mathcal{U}_g : g \in G \} = G \), which is dense in \( \beta G \).

Remark. For a general locally compact group \( G \), the greatest ambit is more subtle: it is the Samuel compactification of \( G \) with respect to its right uniformity. The left action of \( G \) on itself by multiplication extends to a continuous action on \( S(G) \).

Section 2.4: Universal Minimal Flow

Definition (Universal Minimal Flow): A minimal \( G \)-flow \( M(G) \) is called the universal minimal flow of \( G \) if every minimal \( G \)-flow is a factor of \( M(G) \). It exists and is unique up to isomorphism (for any topological group \( G \)).

Construction. The universal minimal flow \( M(G) \) is the unique minimal subflow of the greatest ambit \( S(G) \). Its existence follows from the general existence of minimal subflows (Theorem 1.1). Its uniqueness and universality follow from the universal property of \( S(G) \): every minimal flow \( Y \) is an ambit (take any point \( y \in Y \) — its orbit is dense by minimality), so there is a factor map \( S(G) \to Y \), whose image is a closed invariant subset of \( Y \), hence all of \( Y \).

Example. For \( G = \mathbb{Z} \), the universal minimal flow \( M(\mathbb{Z}) \) is the Stone-Čech remainder \( \beta\mathbb{Z} \setminus \mathbb{Z} \) with the shift. It is an extremely large space. By contrast, for compact groups \( G \), the universal minimal flow is a single point (the trivial flow), because fixed-point theorems apply.

Chapter 3: Semigroup Structure on the Greatest Ambit, Idempotent Ultrafilters, and Applications

Section 3.1: The Ellis Semigroup

Definition (Ellis Semigroup): For a compact \( G \)-flow \( (X, G) \), the Ellis semigroup \( E(X) \) is the closure in \( X^X \) (with the product topology) of the set of maps \( \{ x \mapsto gx : g \in G \} \subseteq X^X \). Composition of maps makes \( E(X) \) a compact right-topological semigroup (right multiplication is continuous).

For the greatest ambit \( S(G) = \beta G \) (with \( G \) discrete), the Ellis semigroup of \( \beta G \) acting on itself is \( \beta G \) itself, with the semigroup operation described below.

Section 3.2: The Semigroup Operation on \(\beta G\)

For \( G \) a discrete group, \( \beta G \) inherits a semigroup structure extending the group operation on \( G \).

Definition (Semigroup Operation on \(\beta G\)): For \( \mathcal{U}, \mathcal{V} \in \beta G \) and \( G \) a discrete semigroup, define the product \( \mathcal{U} \cdot \mathcal{V} \) by \[ A \in \mathcal{U} \cdot \mathcal{V} \iff \{ g \in G : g^{-1}A \in \mathcal{V} \} \in \mathcal{U}, \] where \( g^{-1}A = \{ h \in G : gh \in A \} \). Equivalently, for any \( f : G \to K \) (compact Hausdorff \( K \)), \[ \lim_{\mathcal{U} \cdot \mathcal{V}} f = \lim_{\mathcal{U}} \left( g \mapsto \lim_{\mathcal{V}} (h \mapsto f(gh)) \right). \]
Theorem (Properties of the Operation): The operation \( (\mathcal{U}, \mathcal{V}) \mapsto \mathcal{U} \cdot \mathcal{V} \) on \( \beta G \) satisfies:
  1. For fixed \( \mathcal{V} \), the map \( \mathcal{U} \mapsto \mathcal{U} \cdot \mathcal{V} \) is continuous (right-topological semigroup).
  2. The map \( \mathcal{U} \mapsto \mathcal{V} \cdot \mathcal{U} \) need NOT be continuous in general.
  3. The restriction to \( G \hookrightarrow \beta G \) (principal ultrafilters) coincides with the group operation on \( G \).
  4. \( (\beta G, \cdot) \) is a compact right-topological semigroup.

Section 3.3: Idempotent Ultrafilters

Definition (Idempotent): An element \( p \in \beta G \) is an idempotent if \( p \cdot p = p \).
Theorem (Ellis–Namiooka–Ruppert): Every compact right-topological semigroup contains an idempotent.
Proof. Let \( T \) be the semigroup. By Zorn's lemma, choose a minimal closed sub-semigroup \( S \subseteq T \). Fix any \( s \in S \); then \( sS \subseteq S \) is a closed sub-semigroup (closed since \( \cdot s \) is continuous), hence \( sS = S \) by minimality. So there exists \( e \in S \) with \( se = s \). Then \( \{ t \in S : ts = s \} \) is a closed sub-semigroup of \( S \) (one checks it is closed under multiplication), hence equals \( S \) by minimality; in particular \( ee = e \).

Corollary. \( \beta G \) contains idempotent ultrafilters (non-principal ones, since the only idempotent in \( G \) is the identity \( e \) when \( G \) is a group).

Section 3.4: Hindman’s Theorem

The idempotent structure on \( \beta\mathbb{N} \) yields spectacular combinatorial results.

Definition (IP Set and Finite Sum Set): Given a sequence \( (x_n)_{n=1}^\infty \) in \( \mathbb{N} \), the set of finite sums (FS-set) is \[ FS((x_n)_{n=1}^\infty) = \left\{ \sum_{n \in F} x_n : F \subseteq \mathbb{N},\, F \text{ finite and nonempty} \right\}. \] A set \( A \subseteq \mathbb{N} \) is an IP set if it contains \( FS((x_n)) \) for some sequence \( (x_n) \).
Theorem (Hindman, 1974): For any finite coloring \( \mathbb{N} = C_1 \cup C_2 \cup \cdots \cup C_r \), there exist \( i \in \{1, \ldots, r\} \) and a sequence \( (x_n)_{n=1}^\infty \) such that \( FS((x_n)) \subseteq C_i \). That is, one color class contains an IP set.
Proof (via idempotents). An ultrafilter \( p \in \beta\mathbb{N} \) is an IP-ultrafilter if \( A \in p \) implies \( A \) is an IP set. One shows that any idempotent \( p = p \cdot p \in \beta\mathbb{N} \) has this property. Given a coloring \( \mathbb{N} = \bigsqcup C_i \), one of the \( C_i \) belongs to \( p \), and for any idempotent \( p \), one shows \( C_i \) must contain an IP set.

More precisely: since \( p \cdot p = p \) and \( C_i \in p \), we have \( \{ n : n^{-1}C_i \in p \} \in p \), so there exists \( x_1 \in C_i \) with \( x_1^{-1}C_i = \{ n : x_1 + n \in C_i \} \in p \). By induction, picking \( x_{k+1} \in C_i \cap x_1^{-1}C_i \cap \cdots \cap \big(\sum_{i \in F} x_i\big)^{-1} C_i \) (using \( p \)’s membership for each of these finitely many intersections), one builds the required sequence.

Section 3.5: The Central Sets Theorem (Preview)

A set \( C \subseteq \mathbb{N} \) is central if it belongs to some minimal idempotent in \( \beta\mathbb{N} \). Central sets are IP sets, satisfy strong recurrence properties, and are the subject of Chapter 4. The connection between the algebraic structure of \( \beta G \) and combinatorial number theory is one of the most striking features of the theory.


Chapter 4: Minimal Idempotents, Central Sets, and Applications

Section 4.1: Minimal Idempotents and the Ideal Structure of \(\beta G\)

Definition (Ideal): In a semigroup \( (S, \cdot) \), a nonempty subset \( I \subseteq S \) is:
  • a left ideal if \( SI \subseteq I \);
  • a right ideal if \( IS \subseteq I \);
  • a two-sided ideal if it is both.
The smallest two-sided ideal of \( S \), if it exists, is called the kernel of \( S \), denoted \( K(S) \).
Theorem (Structure of Compact Right-Topological Semigroups): Let \( T \) be a compact right-topological semigroup. Then:
  1. \( T \) has a smallest two-sided ideal \( K(T) \);
  2. \( K(T) \) is a union of minimal left ideals and a union of minimal right ideals;
  3. Every minimal left ideal and every minimal right ideal is closed;
  4. The intersection of any minimal left ideal with any minimal right ideal is a group;
  5. All the groups arising this way are isomorphic.

Applied to \( \beta G \), the kernel \( K(\beta G) \) is the smallest ideal, and an idempotent \( p \in \beta G \) is minimal if it lies in \( K(\beta G) \) — equivalently, if the left ideal \( \beta G \cdot p \) is minimal.

Definition (Central Set): A set \( A \subseteq G \) is central if \( A \in p \) for some minimal idempotent \( p \in K(\beta G) \).

Section 4.2: Characterization of Central Sets

Central sets can be characterized dynamically. A set \( A \subseteq \mathbb{N} \) is central if and only if it is a member of some idempotent ultrafilter in the smallest ideal of \( (\beta\mathbb{N}, +) \). Furstenberg gave an equivalent dynamical characterization:

Theorem (Furstenberg's Characterization of Central Sets): \( A \subseteq \mathbb{N} \) is central if and only if there is a minimal compact dynamical system \( (X, T) \), a uniformly recurrent point \( x_0 \in X \), and a neighborhood \( U \ni x_0 \) such that \[ A \supseteq \{ n \geq 1 : T^n x_0 \in U \}. \]
Remark. This equivalence links the algebraic notion (membership in a minimal idempotent) to the dynamical notion (return times of recurrent orbits in minimal systems), bridging the two main themes of the course.

Section 4.3: Central Sets Theorem

Theorem (Central Sets Theorem, Furstenberg–Hindman): Let \( A \subseteq \mathbb{N} \) be a central set. Then for any finite family \( (y_n^{(i)})_{n=1}^\infty \) (\( 1 \leq i \leq r \)) of sequences in \( \mathbb{N} \), there exist a sequence \( (a_k)_{k=1}^\infty \) in \( \mathbb{N} \) and a sequence \( (H_k)_{k=1}^\infty \) of finite nonempty subsets of \( \mathbb{N} \) with \( \max H_k < \min H_{k+1} \) such that for each \( 1 \leq i \leq r \), \[ FS\!\left(\left(a_k + \sum_{n \in H_k} y_n^{(i)}\right)_{k=1}^\infty\right) \subseteq A. \]

Section 4.4: Density and Combinatorial Consequences

Definition (Syndetic, Thick, and Piecewise Syndetic Sets): A set \( A \subseteq \mathbb{Z} \) is:
  • syndetic if it has bounded gaps: \( \exists K \) finite such that \( A \cap (n + K) \neq \emptyset \) for all \( n \in \mathbb{Z} \);
  • thick if it contains arbitrarily long intervals: for each \( N \), \( \exists n \) with \( \{n, n+1, \ldots, n+N\} \subseteq A \);
  • piecewise syndetic if it is the intersection of a syndetic set with a thick set, equivalently if it has bounded gaps on arbitrarily long intervals.
Central sets are piecewise syndetic.
Theorem (van der Waerden): For any finite coloring \( \mathbb{N} = C_1 \cup \cdots \cup C_r \), one color class contains arithmetic progressions of every length.
Proof (via topological dynamics / ultrafilters). One approach: use the Furstenberg–Weiss topological proof. Define the orbit-closure system on \( \{1, \ldots, r\}^{\mathbb{Z}} \): if \( x \in \{1, \ldots, r\}^{\mathbb{Z}} \) is defined by \( x(n) = i \) where \( n \in C_i \), then the orbit closure \( X = \overline{\{\sigma^n x : n \in \mathbb{Z}\}} \) is a compact \( \mathbb{Z} \)-system. Any minimal subflow \( M \subseteq X \) carries a uniformly recurrent sequence. The van der Waerden theorem then follows from the multiple recurrence theorem (Furstenberg–Sárközy), which itself is an ergodic-theoretic result (Chapter 5).

Chapter 5: Furstenberg’s Theorem, Bohr Compactification, and the Universal Minimal Distal Flow

Section 5.1: Furstenberg’s Structure Theorem for Distal Flows

One of the deepest results in topological dynamics is Furstenberg’s structure theorem, which characterizes distal flows.

Theorem (Furstenberg Structure Theorem for Minimal Distal Flows): Every minimal distal compact \( G \)-flow \( (X, G) \) can be obtained from the trivial one-point flow by a (possibly transfinite) tower of equicontinuous extensions. More precisely, there exists an ordinal \( \eta \) and a tower of minimal flows \[ \{*\} = X_0 \leftarrow X_1 \leftarrow \cdots \leftarrow X_\alpha \leftarrow \cdots \leftarrow X_\eta = X \] where each \( X_{\alpha+1} \to X_\alpha \) is an equicontinuous (isometric) extension, and for limit ordinals \( \lambda \), \( X_\lambda = \varprojlim_{\alpha < \lambda} X_\alpha \).
Remark. The theorem says distal flows are "built from rotations." The transfinite tower accounts for iterated equicontinuous extensions, which can have countably or even uncountably many steps. Compare this with the analogous structure theorem in ergodic theory (Furstenberg–Zimmer), where distal measure-preserving systems are also characterized by towers of isometric extensions.

Section 5.2: Bohr Compactification

Definition (Bohr Compactification): The Bohr compactification of a topological group \( G \) is a compact group \( bG \) together with a continuous group homomorphism \( \iota : G \to bG \) with dense image, such that every continuous homomorphism \( \phi : G \to K \) into a compact group \( K \) factors uniquely through \( \iota \): \( \phi = \bar{\phi} \circ \iota \) for a unique continuous homomorphism \( \bar{\phi} : bG \to K \).

Construction. The Bohr compactification of \( G \) is constructed as follows. Let \( \mathcal{F} \) be the family of all continuous homomorphisms \( \phi : G \to K_\phi \) into compact groups \( K_\phi \). Define \( bG = \overline{\iota(G)} \subseteq \prod_{\phi \in \mathcal{F}} K_\phi \), where \( \iota(g) = (\phi(g))_{\phi \in \mathcal{F}} \). This works by Tychonoff compactness.

For \( G = \mathbb{Z} \), the Bohr compactification is the Pontryagin dual of \( \mathbb{T} \) (the circle group thought of as discrete): \( b\mathbb{Z} = \widehat{\mathbb{T}_d} \), the Bohr compactification of \( \mathbb{Z} \). Concretely, \( b\mathbb{Z} = $ closure of \( \mathbb{Z} \) in \( \prod_{\alpha \in [0,1)} \mathbb{T} \) via the diagonal embedding \( n \mapsto (e^{2\pi i n \alpha})_{\alpha} \).

Theorem (Bohr Almost Periodic Functions): A function \( f : G \to \mathbb{C} \) extends to a continuous function on \( bG \) if and only if it is almost periodic — i.e., the set of translates \( \{ g \mapsto f(hg) : h \in G \} \) is precompact in the uniform norm.

Section 5.3: Universal Minimal Distal Flow

Definition (Universal Minimal Distal Flow): A minimal distal \( G \)-flow \( D(G) \) is the universal minimal distal flow if every minimal distal \( G \)-flow is a factor of \( D(G) \).

The universal minimal distal flow is the inverse limit of all isometric towers (with the natural compatible factor maps). For abelian groups, the universal minimal distal flow is the Bohr compactification itself with the natural action. For general groups, it is more complex.

Section 5.4: Almost Periodicity and Equicontinuous Flows

Theorem (Structure of Minimal Equicontinuous Flows): A compact \( G \)-flow is minimal and equicontinuous if and only if it is (equivariantly homeomorphic to) a compact group rotation: there is a compact group \( K \), a closed subgroup \( H \leq K \), and a homomorphism \( \rho : G \to K \) with dense image such that \( X \cong K/H \) with action \( g \cdot kH = \rho(g)kH \).

In particular, for \( G = \mathbb{Z} \), minimal equicontinuous systems are exactly the rotations on compact abelian groups \( K \), i.e., systems of the form \( T(k) = k + a \) for a fixed \( a \in K \) that generates a dense subgroup of \( K \).


Chapter 6: Universal Minimal Proximal and Strongly Proximal Flows

Section 6.1: Proximal Flows Revisited

We recall the definition of proximal flow (Definition 1.3) and develop the theory further.

Theorem (Characterization of Proximal Minimal Flows): A minimal compact \( G \)-flow \( (X, G) \) is proximal if and only if it has a fixed point — or more precisely, if and only if \( E(X) \), the Ellis semigroup of \( X \), consists entirely of constant maps.
Proof sketch. A constant map \( c_z : x \mapsto z \) is an idempotent in \( E(X) \). If \( E(X) \) contains a constant map \( c_z \), then for any \( x, y \in X \), there is a net \( g_\alpha \in G \) with \( g_\alpha \to c_z \) in \( E(X) \), so \( g_\alpha x \to z \) and \( g_\alpha y \to z \), making \( x, y \) proximal. Conversely, if every pair is proximal, one shows every minimal idempotent in \( E(X) \) is a constant map.

Section 6.2: Strongly Proximal Flows

Definition (Strongly Proximal Flow): Let \( G \) be a locally compact group. A compact \( G \)-flow \( (X, G) \) is strongly proximal if for every probability measure \( \mu \) on \( X \), the orbit closure \( \overline{G\mu} \subseteq \mathcal{M}(X) \) contains a point mass \( \delta_x \) for some \( x \in X \).

Equivalently, every \( G \)-invariant measure on \( X \) is a point mass — which (for minimal \( X \)) means every probability measure on \( X \) can be “compressed” to a point mass by the group action.

Remark. Strongly proximal flows arise naturally in the study of boundaries of groups (Furstenberg boundaries). Every compact group has a trivial strongly proximal boundary. Hyperbolic groups and semisimple Lie groups have non-trivial Furstenberg boundaries.

Section 6.3: Furstenberg Boundary

Definition (Furstenberg Boundary / Poisson Boundary): For a locally compact group \( G \) with a spread-out probability measure \( \mu \), the Poisson boundary is the measure-theoretic boundary \( (B, \nu) \) that encodes the asymptotic behavior of the \( \mu \)-random walk on \( G \). The Furstenberg boundary is the universal strongly proximal minimal \( G \)-flow, which is the topological counterpart.
Example. For \( G = SL_2(\mathbb{R}) \), the Furstenberg boundary is the projective line \( \mathbb{P}^1(\mathbb{R}) \cong S^1 \), with \( G \) acting by Möbius transformations. For \( G = SL_n(\mathbb{R}) \) with \( n \geq 2 \), the Furstenberg boundary is the full flag manifold.

Section 6.4: Universal Minimal Proximal Flow

Theorem (Existence of Universal Minimal Proximal Flow): For any topological group \( G \), there is a universal minimal proximal \( G \)-flow \( \Pi(G) \) — a minimal proximal flow such that every minimal proximal \( G \)-flow is a factor of \( \Pi(G) \).

The construction uses the greatest ambit \( S(G) \): one quotients out the “distal component” of the action, leaving the proximal part. The universal minimal proximal flow is the quotient of the universal minimal flow \( M(G) \) by the distal proximal equivalence relation.


Chapter 7: Probability Measures, Affine Flows, and Amenable Groups

Section 7.1: The Space of Probability Measures as a \(G\)-Flow

\[ (g_* \mu)(A) = \mu(g^{-1}A),\quad g \in G,\, A \subseteq X \text{ Borel}. \]

This makes \( \mathcal{M}(X) \) into a compact convex \( G \)-flow.

Definition (Affine Flow): A compact convex \( G \)-flow (or affine \( G \)-flow) is a compact convex subset \( K \) of a locally convex topological vector space with a continuous affine \( G \)-action: \( g \cdot (\lambda x + (1-\lambda) y) = \lambda (g \cdot x) + (1-\lambda)(g \cdot y) \).

Section 7.2: Invariant Probability Measures

Definition (Invariant Measure): A Borel probability measure \( \mu \in \mathcal{M}(X) \) is \( G \)-invariant if \( g_* \mu = \mu \) for all \( g \in G \), i.e., \( \mu(g^{-1}A) = \mu(A) \) for all \( g \in G \) and all Borel \( A \subseteq X \).
Theorem (Markov–Kakutani): Let \( G \) be a commutative group acting continuously and affinely on a compact convex set \( K \). Then there exists a \( G \)-fixed point in \( K \).
Proof. For each \( g \in G \) and \( N \in \mathbb{N} \), the Cesàro average \( \frac{1}{N}\sum_{k=0}^{N-1} g^k x \) converges (by compactness, on a subnet) to a fixed point of \( g \). For commutative \( G \), the set of fixed points of any \( g \) is nonempty closed convex, and the intersection over all \( g \) is nonempty by the finite intersection property.

Applied with \( K = \mathcal{M}(X) \): for any abelian group \( G \) acting on a compact space \( X \), there is a \( G \)-invariant probability measure on \( X \).

Section 7.3: Amenable Groups

Definition (Amenable Group): A locally compact group \( G \) is amenable if every compact \( G \)-flow \( (X, G) \) admits a \( G \)-invariant probability measure.

Equivalent conditions (for discrete \( G \)):

  • \( G \) has a left-invariant finitely additive probability measure (a “mean”) on bounded functions.
  • For every \( \varepsilon > 0 \) and finite set \( F \subseteq G \), there exists a finite set \( K \subseteq G \) with \( |F \cdot K \triangle K| / |K| < \varepsilon \) (Følner condition).
  • Every affine compact \( G \)-flow has a fixed point (Ryll-Nardzewski / Day theorem).
Example. All abelian groups are amenable (by Markov-Kakutani). Finite groups are amenable (average over all elements). Extensions of amenable groups by amenable groups are amenable. The free group \( F_2 \) on two generators is not amenable (Hausdorff paradox / Banach-Tarski).
Theorem (Tarski's Alternative): A discrete group \( G \) is amenable if and only if it does not contain a free group on two generators "paradoxically decomposing" a set.

Section 7.4: Fixed Point Properties and Topological Dynamics

Theorem (Day): A discrete group \( G \) is amenable if and only if every compact affine \( G \)-flow has a fixed point.
Theorem (Furstenberg): \( G \) is amenable if and only if the universal minimal flow \( M(G) \) supports a \( G \)-invariant probability measure, i.e., the one-point flow is a factor of \( M(G) \) in the category of affine flows.

Chapter 8: Standard Lebesgue Spaces, Probability-Measure-Preserving Actions, and \(\mathrm{Aut}(X, \mu)\)

Section 8.1: Standard Probability Spaces

We now transition to the measure-theoretic side of dynamics.

Definition (Standard Lebesgue Space): A standard probability space (or Lebesgue probability space) is a triple \( (X, \mathcal{B}, \mu) \) where \( X \) is a standard Borel space (a Polish space with its Borel \(\sigma\)-algebra, or isomorphic to one), \( \mathcal{B} \) is the \(\sigma\)-algebra, and \( \mu \) is a probability measure.

The key theorem of Lebesgue measure theory is:

Theorem (Isomorphism of Lebesgue Spaces): Any two atomless standard probability spaces are isomorphic as measure spaces (i.e., there is a measure-preserving bijection between them, defined a.e.). In particular, every atomless standard probability space is isomorphic to \( ([0,1], \mathcal{B}, \lambda) \) where \( \lambda \) is Lebesgue measure.

This means up to isomorphism, there is essentially only one atomless standard probability space, which justifies writing “a standard probability space \( (X, \mu) \)” without loss of generality.

Section 8.2: Measure-Preserving Transformations

Definition (Measure-Preserving Transformation): Let \( (X, \mathcal{B}, \mu) \) be a standard probability space. A measurable map \( T : X \to X \) is measure-preserving (or pmp) if \( \mu(T^{-1}A) = \mu(A) \) for all \( A \in \mathcal{B} \). If \( T \) is bijective and both \( T \) and \( T^{-1} \) are measurable and measure-preserving, \( T \) is a measure-preserving automorphism.
Example (Circle Rotation). The map \( T : \mathbb{T} \to \mathbb{T} \), \( T(x) = x + \alpha \pmod 1 \), preserves Lebesgue measure \( \lambda \) (since \( \lambda(T^{-1}[a,b]) = \lambda([a-\alpha, b-\alpha]) = b - a \)).
Example (Doubling Map). The map \( T : [0,1) \to [0,1) \), \( T(x) = 2x \pmod 1 \), preserves Lebesgue measure and is ergodic but not invertible.
Example (Bernoulli Shift). Let \( (Y, \nu) \) be a finite probability space (the alphabet). The Bernoulli shift on \( X = Y^{\mathbb{Z}} \) with product measure \( \mu = \nu^{\mathbb{Z}} \) has shift map \( T((x_n)) = (x_{n+1}) \). This preserves \( \mu \) and is a fundamental example.

Section 8.3: The Group \(\mathrm{Aut}(X, \mu)\)

Definition (\(\mathrm{Aut}(X, \mu)\)): The group \( \mathrm{Aut}(X, \mu) \) is the group of all measure-preserving automorphisms of a standard probability space \( (X, \mu) \), where two automorphisms are identified if they agree \( \mu \)-a.e.

\( \mathrm{Aut}(X, \mu) \) carries a natural topology, the weak topology: \( T_n \to T \) if \( \mu(T_n A \triangle T A) \to 0 \) for every measurable \( A \). With this topology, \( \mathrm{Aut}(X, \mu) \) is a Polish group (separable completely metrizable).

Theorem (Halmos Conjugacy Lemma): The conjugacy class of any aperiodic measure-preserving transformation in \( \mathrm{Aut}(X, \mu) \) is dense in the weak topology. In particular, the set of ergodic transformations is a dense \( G_\delta \) set.

Section 8.4: Probability-Measure-Preserving Group Actions

Definition (Pmp Action): A probability-measure-preserving (pmp) action of a countable group \( \Gamma \) on a standard probability space \( (X, \mu) \) is a group homomorphism \( \Gamma \to \mathrm{Aut}(X, \mu) \), \( \gamma \mapsto T_\gamma \), where each \( T_\gamma \) is a measure-preserving automorphism. We write \( \Gamma \curvearrowright (X, \mu) \).
Definition (Ergodicity): A pmp action \( \Gamma \curvearrowright (X, \mu) \) is ergodic if every \( \Gamma \)-invariant measurable set \( A \) (i.e., \( T_\gamma^{-1} A = A \) a.e. for all \( \gamma \)) satisfies \( \mu(A) \in \{0, 1\} \).

Equivalently, ergodicity says \( X \) cannot be split into two non-trivial invariant pieces. This is the measure-theoretic analog of minimality (though the two notions differ).


Chapter 9: \(L^2\) and Birkhoff Ergodic Theorems

Section 9.1: The Koopman Operator

Given a pmp action \( \Gamma \curvearrowright (X, \mu) \), every \( T \in \mathrm{Aut}(X, \mu) \) induces a unitary operator on \( L^2(X, \mu) \):

Definition (Koopman Operator): For \( T \in \mathrm{Aut}(X, \mu) \), the Koopman operator \( U_T : L^2(X, \mu) \to L^2(X, \mu) \) is defined by \( U_T f = f \circ T \). It is a unitary operator: \( \langle U_T f, U_T g \rangle = \langle f, g \rangle \) for all \( f, g \in L^2 \).

The map \( T \mapsto U_T \) is a group homomorphism from \( \mathrm{Aut}(X, \mu) \) to the unitary group \( \mathcal{U}(L^2(X, \mu)) \).

Theorem (Mean Ergodic Theorem, von Neumann, 1932): Let \( U \) be a unitary operator on a Hilbert space \( H \). Then for every \( f \in H \), \[ \frac{1}{N} \sum_{n=0}^{N-1} U^n f \xrightarrow{H} P f, \] where \( P \) is the orthogonal projection onto the fixed-point subspace \( \text{Fix}(U) = \{ h \in H : Uh = h \} \).
Proof. The Hilbert space decomposes as \( H = \text{Fix}(U) \oplus \overline{\text{ran}(I - U)} \) (orthogonal complement of the fixed-point subspace is the closure of the range of \( I - U \)). For \( f \in \text{Fix}(U) \), clearly the averages converge to \( f = Pf \). For \( f = g - Ug \) in \( \text{ran}(I - U) \), the telescoping sum gives \( \frac{1}{N}\sum_{n=0}^{N-1} U^n(g-Ug) = \frac{1}{N}(g - U^N g) \to 0 \). By density and boundedness, convergence holds for all \( f \).
\[ \frac{1}{N} \sum_{n=0}^{N-1} f(T^n x) \xrightarrow{L^2} \int_X f \, d\mu. \]

Section 9.2: Birkhoff Pointwise Ergodic Theorem

Theorem (Birkhoff Ergodic Theorem, 1931): Let \( (X, \mathcal{B}, \mu) \) be a probability space, \( T : X \to X \) a measure-preserving transformation, and \( f \in L^1(X, \mu) \). Then \[ \frac{1}{N} \sum_{n=0}^{N-1} f(T^n x) \xrightarrow{N \to \infty} \tilde{f}(x) \quad \mu\text{-a.e.}, \] where \( \tilde{f} \in L^1 \) is \( T \)-invariant (i.e., \( \tilde{f} \circ T = \tilde{f} \) a.e.) and \( \int \tilde{f} \, d\mu = \int f \, d\mu \).

If \( T \) is ergodic, then \( \tilde{f} = \int f \, d\mu \) a.e. — the time average equals the space average.

Proof sketch. The key estimate is the maximal ergodic lemma: for \( f \in L^1 \), define \( f^*(x) = \sup_{N \geq 1} \frac{1}{N}\sum_{n=0}^{N-1} f(T^n x) \). Then \[ \mu(\{ x : f^*(x) > \lambda \}) \leq \frac{1}{\lambda} \int_{\{f^* > \lambda\}} f \, d\mu. \] This is the Hopf maximal inequality. Given this, the pointwise convergence follows by a density argument: approximate \( f \in L^1 \) by coboundaries and invariant functions, showing the limsup and liminf of the ergodic averages coincide a.e.
Example (Equidistribution). For an irrational rotation \( T(x) = x + \alpha \) on \( \mathbb{T} \), the Birkhoff ergodic theorem applied to \( f = \mathbf{1}_{[a,b]} \) gives: for Lebesgue-a.e. \( x \in \mathbb{T} \), \[ \frac{1}{N} \#\{0 \leq n < N : T^n x \in [a, b]\} \to b - a. \] In fact, by Weyl's equidistribution theorem, this holds for every \( x \in \mathbb{T} \) (not just a.e.).

Section 9.3: Consequences and Applications

Theorem (Poincaré Recurrence): Let \( T : X \to X \) be a measure-preserving transformation of a probability space \( (X, \mu) \). For any measurable set \( A \) with \( \mu(A) > 0 \), \( \mu \)-almost every \( x \in A \) returns to \( A \) infinitely often: for a.e. \( x \in A \), \( T^n x \in A \) for infinitely many \( n \geq 1 \).
Proof. Let \( B = \{ x \in A : T^n x \notin A \text{ for all } n \geq 1 \} \). The sets \( B, T^{-1}B, T^{-2}B, \ldots \) are pairwise disjoint (if \( x \in T^{-k}B \cap T^{-l}B \) with \( k < l \), then \( T^k x \in B \subseteq A \) and \( T^l x \in B \subseteq A \), so \( T^{l-k}(T^k x) \in A \), contradicting \( T^k x \in B \)). Since \( \mu(X) = 1 \) and all these sets have measure \( \mu(B) \), we must have \( \mu(B) = 0 \).
Theorem (Multiple Recurrence, Furstenberg 1977): Let \( (X, \mu) \) be a probability space, \( T \) a measure-preserving transformation, and \( A \subseteq X \) measurable with \( \mu(A) > 0 \). Then for any \( k \geq 1 \), \[ \liminf_{N \to \infty} \frac{1}{N} \sum_{n=1}^N \mu(A \cap T^{-n}A \cap T^{-2n}A \cap \cdots \cap T^{-kn}A) > 0. \] In particular, there exists \( n \geq 1 \) such that \( A \cap T^{-n}A \cap \cdots \cap T^{-kn}A \neq \emptyset \).

This is Furstenberg’s ergodic-theoretic proof of Szemerédi’s theorem (every subset of integers with positive upper density contains arithmetic progressions of every length).


Chapter 10: Orbit Equivalence for Aperiodic pmp \(\mathbb{Z}\)-Actions

Section 10.1: Orbit Equivalence Relations

Definition (Orbit Equivalence): Two pmp actions \( \Gamma \curvearrowright (X, \mu) \) and \( \Lambda \curvearrowright (Y, \nu) \) are orbit equivalent (OE) if there is a measure space isomorphism \( \phi : X \to Y \) (a measure-preserving bijection, defined a.e.) such that for a.e. \( x \in X \), \[ \phi(\Gamma \cdot x) = \Lambda \cdot \phi(x). \] That is, \( \phi \) maps orbits to orbits (but not necessarily respecting the group structure).

Orbit equivalence is a much coarser equivalence than isomorphism of measure-preserving systems (which requires \( \phi \) to intertwine the group actions, not just map orbits to orbits).

Definition (Aperiodic Action): A pmp action \( \Gamma \curvearrowright (X, \mu) \) is aperiodic (or free) if \( \mu(\{ x : \gamma \cdot x = x \}) = 0 \) for every non-identity \( \gamma \in \Gamma \). Equivalently, almost every orbit is infinite.

Section 10.2: Dye’s Theorem

Theorem (Dye, 1959): Any two aperiodic pmp \( \mathbb{Z} \)-actions (equivalently, any two aperiodic invertible measure-preserving transformations) are orbit equivalent.

This is a remarkable theorem: it says that for measure-preserving \( \mathbb{Z} \)-actions, there is only one orbit structure (up to measure-theoretic isomorphism). The group theoretic structure of the action (e.g., ergodicity, mixing, entropy) is entirely invisible to orbit equivalence among \( \mathbb{Z} \)-actions.

Proof sketch. The proof proceeds by showing both systems have the same orbit equivalence relation: the relation \( x \sim y \iff \exists n \in \mathbb{Z},\, T^n x = y \) (a.e.). For two aperiodic \( \mathbb{Z} \)-actions \( T \) on \( (X, \mu) \) and \( S \) on \( (Y, \nu) \), one constructs an explicit isomorphism as follows: partition \( X \) into "Rokhlin towers" (sets \( F_j, TF_j, T^2 F_j, \ldots, T^{h_j - 1} F_j \)) that cover most of \( X \), and similarly for \( Y \), then match towers of equal height. Iterating this process in the spirit of the Rokhlin lemma gives the desired orbit equivalence in the limit.

Section 10.3: The Rokhlin Lemma

Theorem (Rokhlin Lemma): Let \( T : (X, \mu) \to (X, \mu) \) be an aperiodic measure-preserving transformation. For any \( n \in \mathbb{N} \) and \( \varepsilon > 0 \), there exists a measurable set \( F \subseteq X \) (a Rokhlin tower base) such that \( F, TF, T^2 F, \ldots, T^{n-1}F \) are pairwise disjoint and \[ \mu\!\left(\bigcup_{k=0}^{n-1} T^k F\right) > 1 - \varepsilon. \]
Proof. By aperiodicity, for each \( m \) the set \( A_m = \{ x : T^m x = x \} \) has measure 0. By the pointwise ergodic theorem, for a.e. \( x \), the first-return time to any set is well-defined. Using the uniform distribution of orbits, one constructs \( F \) explicitly using the first-return time decomposition.

Section 10.4: The Full Group and Cocycles

Definition (Full Group): For a pmp action \( T \in \mathrm{Aut}(X, \mu) \), the full group \( [T] \) is the set of all \( S \in \mathrm{Aut}(X, \mu) \) such that for a.e. \( x \in X \), \( Sx \in \{T^n x : n \in \mathbb{Z}\} \) (i.e., \( S \) maps each point to another point in its \( T \)-orbit).

Two aperiodic transformations are orbit equivalent if and only if their full groups are isomorphic as abstract groups (Dye’s theorem reformulated). The full group is a complete invariant of the orbit equivalence class.


Chapter 11: Entropy for \(\mathbb{Z}\)-Actions and Bernoulli Systems

Section 11.1: Metric Entropy of a Partition

Definition (Shannon Entropy of a Partition): Let \( (X, \mathcal{B}, \mu) \) be a probability space and \( \xi = \{ A_1, \ldots, A_k \} \) a finite measurable partition. The Shannon entropy of \( \xi \) is \[ H(\xi) = -\sum_{i=1}^k \mu(A_i) \log \mu(A_i), \] with the convention \( 0 \log 0 = 0 \). (Logarithm to base 2 gives bits; natural log gives nats.)
Definition (Entropy of a Transformation with Respect to a Partition): For a pmp transformation \( T \) and finite partition \( \xi \), define the join} \( \xi \vee T^{-1}\xi \vee \cdots \vee T^{-(n-1)}\xi \) as the partition into atoms \( A_{i_0} \cap T^{-1}A_{i_1} \cap \cdots \cap T^{-(n-1)}A_{i_{n-1}} \). The entropy of \( T \) with respect to \( \xi \) is \[ h(T, \xi) = \lim_{n \to \infty} \frac{1}{n} H\!\left(\bigvee_{k=0}^{n-1} T^{-k}\xi\right). \] The limit exists by subadditivity of \( n \mapsto H(\bigvee_{k=0}^{n-1} T^{-k}\xi) \).
Definition (Kolmogorov–Sinai Entropy): The metric entropy (or KS entropy) of \( T \) is \[ h(T) = \sup_\xi h(T, \xi), \] where the supremum is over all finite measurable partitions \( \xi \).
Theorem (Kolmogorov–Sinai): If \( \xi \) is a generating partition for \( T \) — meaning the \(\sigma\)-algebra generated by \( \{ T^{-n}\xi : n \in \mathbb{Z} \} \) is (a.e.) the full \(\sigma\)-algebra \( \mathcal{B} \) — then \( h(T) = h(T, \xi) \).

This makes entropy computable: instead of taking the supremum over all partitions, one need only find a single generating partition and compute its entropy.

Section 11.2: Entropy of Bernoulli Shifts

Example (Entropy of Bernoulli Shifts). Let \( (Y, \nu) = (\{0, 1, \ldots, k-1\}, (p_0, \ldots, p_{k-1})) \) be a probability distribution on \( k \) symbols, and let \( T \) be the Bernoulli shift on \( Y^{\mathbb{Z}} \) with product measure \( \nu^{\mathbb{Z}} \). The partition \( \xi = \{ [i] : 0 \leq i \leq k-1 \} \) (where \( [i] = \{ x \in Y^{\mathbb{Z}} : x_0 = i \} \)) is a generating partition, and \[ H\!\left(\bigvee_{j=0}^{n-1} T^{-j}\xi\right) = H(\xi \vee T^{-1}\xi \vee \cdots) = n \cdot H(\nu) = -n \sum_{i=0}^{k-1} p_i \log p_i. \] Hence \( h(T) = H(\nu) = -\sum p_i \log p_i \).

In particular:

  • The \( (1/2, 1/2) \)-Bernoulli shift has entropy \( \log 2 \).
  • The \( (1/3, 1/3, 1/3) \)-Bernoulli shift has entropy \( \log 3 \).
  • These are not isomorphic as measure-preserving systems (Kolmogorov, 1958 — the first use of entropy in ergodic theory).

Section 11.3: Properties of Entropy

Theorem (Basic Properties): For pmp transformations \( T, S \) on probability spaces:
  1. Isomorphism invariance: If \( T \cong S \) (measure-theoretic isomorphism), then \( h(T) = h(S) \).
  2. Power formula: \( h(T^n) = |n| \cdot h(T) \) for \( n \in \mathbb{Z} \).
  3. Invertibility: \( h(T^{-1}) = h(T) \).
  4. Factors decrease entropy: If \( S \) is a factor of \( T \), then \( h(S) \leq h(T) \).
  5. Products: \( h(T \times S) = h(T) + h(S) \).

Section 11.4: Topological Entropy

There is a parallel notion of entropy for topological dynamical systems.

Definition (Topological Entropy, Adler–Konheim–McAndrew): For a continuous map \( T : X \to X \) on a compact topological space, and an open cover \( \mathcal{U} \), let \( N(\mathcal{U}) \) be the minimum number of sets needed to cover \( X \) from \( \mathcal{U} \), and \( \mathcal{U}^n = \mathcal{U} \vee T^{-1}\mathcal{U} \vee \cdots \vee T^{-(n-1)}\mathcal{U} \) the join cover. Define \[ h(T, \mathcal{U}) = \lim_{n \to \infty} \frac{1}{n} \log N(\mathcal{U}^n). \] The topological entropy is \( h_{top}(T) = \sup_{\mathcal{U}} h(T, \mathcal{U}) \).
Theorem (Variational Principle): For a continuous map \( T : X \to X \) on a compact metrizable space, \[ h_{top}(T) = \sup_\mu h_\mu(T), \] where the supremum is over all \( T \)-invariant Borel probability measures \( \mu \), and \( h_\mu(T) \) is the metric entropy with respect to \( \mu \).

Chapter 12: Ornstein’s Theorem

Section 12.1: The Isomorphism Problem

Two central questions in ergodic theory are:

  1. Classification up to isomorphism: When are two measure-preserving systems isomorphic?
  2. Entropy as a complete invariant: Is entropy sufficient to classify Bernoulli shifts?

The first question motivated much of classical ergodic theory (from Halmos-von Neumann’s spectral theory to the entropy theory of Kolmogorov-Sinai). The second was answered definitively by Ornstein.

Section 12.2: Finitely Determined Processes and Weak Bernoulli

Definition (Very Weak Bernoulli / Finitely Determined): A stationary process \( (X, T, \xi) \) (where \( \xi \) is a generating partition) is:
  • weakly Bernoulli if for every \( \varepsilon > 0 \) there exists \( n \) such that the join \( \bigvee_{k=0}^{n-1} T^{-k}\xi \) and the "past" \( \bigvee_{k=n}^\infty T^{-k}\xi \) are \( \varepsilon \)-independent (in the sense of \(\bar{d}\)-distance);
  • finitely determined if it is determined (up to \(\bar{d}\)-closeness) by finitely many parameters — its entropy and the distribution of a finite block.
The \(\bar{d}\)-metric on processes is defined by \[ \bar{d}(\xi, \eta) = \inf\left\{ \int d_H(\omega_\xi(x), \omega_\eta(y)) : \text{couplings of the two processes} \right\}, \] where \( d_H \) is a normalized Hamming distance.

Section 12.3: Ornstein’s Isomorphism Theorem

Theorem (Ornstein, 1970): Two Bernoulli shifts \( (Y_1^{\mathbb{Z}}, \sigma, \nu_1^{\mathbb{Z}}) \) and \( (Y_2^{\mathbb{Z}}, \sigma, \nu_2^{\mathbb{Z}}) \) are isomorphic as measure-preserving systems if and only if they have the same entropy: \[ h(\sigma_1) = h(\sigma_2) \iff \nu_1^{\mathbb{Z}} \text{ and } \nu_2^{\mathbb{Z}} \text{ are isomorphic}. \]
Proof sketch. The proof has two parts:

Entropy is an isomorphism invariant (Kolmogorov, 1958): This was established earlier and shows entropy is necessary.

Entropy is sufficient (Ornstein’s theorem proper): The key insight is that Bernoulli shifts are “finitely determined”: given \( \varepsilon > 0 \), two Bernoulli shifts \( T, S \) with the same entropy, there is a partition \( \xi \) of the first system that is \( \varepsilon \)-close (in \( \bar{d} \)) to a partition of the second with the same distribution and entropy. By iteratively improving approximations (the “Ornstein copying lemma”), one constructs an actual measure-preserving isomorphism in the limit.

The core technical lemma, the Ornstein Copying Lemma, states: given a stationary process and \( \varepsilon > 0 \), one can “copy” a long block of the process into another system that has the right entropy, with an error at most \( \varepsilon \) in \( \bar{d} \).

Remark. Ornstein's theorem is a profound result: it says that for the class of Bernoulli systems, entropy is a complete} invariant. This is in stark contrast to the general classification problem (there are uncountably many non-isomorphic ergodic systems with any given entropy). It also means the \( (1/2, 1/2) \)-Bernoulli shift and the \( (1/4, 1/4, 1/4, 1/4) \)-Bernoulli shift are isomorphic — a fact with no obvious direct proof.

Section 12.4: Bernoulli Systems and Mixing

Definition (Mixing): A pmp transformation \( T \) is:
  • strongly mixing if for all measurable \( A, B \): \( \mu(T^{-n}A \cap B) \to \mu(A)\mu(B) \) as \( n \to \infty \);
  • weakly mixing if \( \frac{1}{N}\sum_{n=0}^{N-1} |\mu(T^{-n}A \cap B) - \mu(A)\mu(B)| \to 0 \).
Theorem: Every Bernoulli shift is strongly mixing.
Proof. For cylinder sets \( A = [a_{-m}, \ldots, a_0, \ldots, a_k] \) and \( B = [b_{-l}, \ldots, b_l] \), for \( n \) large enough \( T^{-n}A \) and \( B \) depend on disjoint coordinates, so by independence of product measure, \( \mu(T^{-n}A \cap B) = \mu(A)\mu(B) \) exactly for large \( n \). The result extends to all measurable sets by approximation.

Section 12.5: Extensions and Further Results

Ornstein’s theory extends in several directions:

  1. Bernoulli spectrum: A system \( (X, T) \) is Bernoulli if and only if it is weakly Bernoulli. This gives a tractable characterization.

  2. Ornstein–Weiss theorem (1987): Ornstein’s isomorphism theorem extends to Bernoulli actions of amenable groups \( \Gamma \): two Bernoulli \( \Gamma \)-actions are isomorphic if and only if they have the same entropy.

  3. Non-amenable groups: For non-amenable groups, entropy is no longer a complete invariant for Bernoulli actions. Popa (2006) showed there are non-isomorphic Bernoulli actions of free groups with the same entropy using rigidity theory.

  4. Entropy and orbit equivalence: While entropy distinguishes Bernoulli shifts up to isomorphism, orbit equivalence (Chapter 10) is much coarser — all aperiodic pmp \( \mathbb{Z} \)-actions are orbit equivalent regardless of entropy.

Theorem (Sinai, 1962): Every ergodic pmp \( \mathbb{Z} \)-action \( T \) with entropy \( h(T) \geq h \) has the Bernoulli shift of entropy \( h \) as a factor.

This is Sinai’s “factor theorem” and was a precursor to Ornstein’s full theorem. Together with Ornstein’s result, it gives the complete picture: among Bernoulli systems, entropy classifies up to isomorphism; among general ergodic systems, entropy is only a necessary invariant.


Appendix A: Topological Prerequisites

Section A.1: Compact Hausdorff Spaces

We collect key facts about compact Hausdorff spaces used throughout.

  • A closed subset of a compact space is compact.
  • A continuous map from a compact space to a Hausdorff space is a closed map; if bijective, it is a homeomorphism.
  • The product of compact spaces is compact (Tychonoff’s theorem).
  • A compact Hausdorff space is normal (T4): disjoint closed sets can be separated by open sets.
  • The continuous real-valued functions \( C(X) \) on a compact Hausdorff \( X \) separate points and determine the topology.

Section A.2: Nets and Convergence

In non-metrizable spaces, sequences do not suffice for topology; nets are the proper generalization.

Definition (Net): A net} in a set \( X \) is a function \( (x_\alpha)_{\alpha \in \Lambda} \) where \( (\Lambda, \leq) \) is a directed set. A net converges to \( x \in X \) (in a topological space) if for every neighborhood \( U \ni x \) there exists \( \alpha_0 \) such that \( x_\alpha \in U \) for all \( \alpha \geq \alpha_0 \).

In a compact Hausdorff space, every net has a convergent subnet (compactness). The ultrafilter \( \mathcal{U} \)-limit \( \lim_{\mathcal{U}} f \) (used in defining the Stone-Čech extension) is a net-theoretic limit.

Section A.3: Baire Category Theorem

Theorem (Baire Category Theorem): A complete metric space (or a locally compact Hausdorff space) is a Baire space: the intersection of countably many dense open sets is dense. Equivalently, a countable union of nowhere dense sets has empty interior.

This is used in ergodic theory (e.g., genericity of ergodic transformations in \( \mathrm{Aut}(X, \mu) \)) and in topological dynamics (e.g., the set of minimal flows is residual in appropriate spaces).


Appendix B: Measure-Theoretic Prerequisites

Section B.1: Lebesgue Integration Summary

Let \( (X, \mathcal{B}, \mu) \) be a measure space.

  • \( L^p(X, \mu) \) for \( 1 \leq p \leq \infty \) denotes the Banach space of (equivalence classes of) measurable functions \( f \) with \( \|f\|_p = \left(\int |f|^p \, d\mu\right)^{1/p} < \infty \) (with the appropriate modification for \( p = \infty \)).
  • \( L^2(X, \mu) \) is a Hilbert space with inner product \( \langle f, g \rangle = \int f \bar{g} \, d\mu \).
  • Dominated Convergence Theorem: if \( f_n \to f \) a.e. and \( |f_n| \leq g \in L^1 \), then \( \int f_n \to \int f \).
  • Monotone Convergence Theorem: if \( 0 \leq f_n \nearrow f \) a.e., then \( \int f_n \nearrow \int f \).

Section B.2: Conditional Expectation

Definition (Conditional Expectation): Let \( (X, \mathcal{B}, \mu) \) be a probability space and \( \mathcal{A} \subseteq \mathcal{B} \) a sub-\(\sigma\)-algebra. For \( f \in L^1(X, \mu) \), the conditional expectation} \( \mathbb{E}[f | \mathcal{A}] \) is the unique \( \mathcal{A}\)-measurable function (up to a.e. equality) such that \[ \int_A \mathbb{E}[f | \mathcal{A}] \, d\mu = \int_A f \, d\mu \quad \text{for all } A \in \mathcal{A}. \]

Conditional expectation is the \( L^2 \)-orthogonal projection onto \( L^2(X, \mathcal{A}, \mu) \) (when \( f \in L^2 \)). It satisfies the tower property: if \( \mathcal{A} \subseteq \mathcal{A}' \subseteq \mathcal{B} \), then \( \mathbb{E}[\mathbb{E}[f|\mathcal{A}']|\mathcal{A}] = \mathbb{E}[f|\mathcal{A}] \). Conditional expectation with respect to the invariant \(\sigma\)-algebra is central to the proof of the Birkhoff ergodic theorem and to the theory of disintegrations.

Section B.3: The Ergodic Decomposition

Theorem (Ergodic Decomposition): Let \( T \) be a pmp transformation of a standard probability space \( (X, \mu) \). Then there is a measurable decomposition \( \mu = \int \mu_x \, d\mu(x) \) where each \( \mu_x \) is an ergodic \( T \)-invariant probability measure, and \( \mu_x = \mu_y \) a.e. when \( x \) and \( y \) are in the same ergodic component.

This shows every pmp system decomposes into ergodic components, so it suffices (for most purposes) to study ergodic systems.

Back to top