PMATH 451/651: Measure and Integration

Alexandru Nica

Estimated study time: 2 hr 15 min

Table of contents

These notes follow the Fall 2020 offering of PMath 451/651 (Measure and Integration) at the University of Waterloo, taught by Professor Alexandru Nica. The course develops abstract measure and integration theory from first principles, beginning with algebras of sets and culminating in the Radon-Nikodym theorem and the Fubini-Tonelli theorem on product measures. The “work-out files” Nica distributes are already written in a conversational, proof-driven style, and these notes aim to preserve that spirit while providing a unified reference for the full twenty-lecture arc.

Part I: Algebras of Sets and Additive Set-Functions

Lecture 1: Algebras of Sets

The Basic Definition

The course opens with a precise combinatorial structure that organizes the sets we want to measure. An algebra of sets over a non-empty set \(X\) is a collection \({\mathcal A} \subseteq 2^X\) satisfying three axioms:

(AS1) \(\emptyset \in {\mathcal A}\).
(AS2) \(A \in {\mathcal A} \Rightarrow X \setminus A \in {\mathcal A}\).
(AS3) \(A, B \in {\mathcal A} \Rightarrow A \cup B \in {\mathcal A}\).

From these three conditions one derives the closure of \({\mathcal A}\) under finite unions, finite intersections, and set-differences. A pair \((X, {\mathcal A})\) where \({\mathcal A}\) is an algebra is called an algebra of sets (or algebraic space).

Proposition 1.3. Every algebra of sets is closed under finite intersections and under set differences.

Proof sketch. For intersections, use De Morgan: \(A \cap B = X \setminus ((X \setminus A) \cup (X \setminus B))\). Since AS2 and AS3 are given, the result follows. For set differences, write \(A \setminus B = A \cap (X \setminus B)\).

Semi-Algebras

A less structured cousin is a semi-algebra, a collection \({\mathcal S} \subseteq 2^X\) satisfying:

(Semi-AS1) \(\emptyset \in {\mathcal S}\).
(Semi-AS2) For every \(A \in {\mathcal S}\), the complement \(X \setminus A\) can be written as a finite disjoint union of elements of \({\mathcal S}\).
(Semi-AS3) \({\mathcal S}\) is closed under finite intersections.

The canonical example is the collection \({\mathcal E}\) of half-open intervals \([a, b) \subseteq {\mathbb R}\) together with \(\emptyset\) and \({\mathbb R}\). Semi-algebras are important because they arise naturally in product spaces (measurable rectangles form a semi-algebra) and because there is a clean mechanism for extending additive functions defined on a semi-algebra to the full algebra it generates.

Proposition 1.7. (Semi-algebra trick.) Let \({\mathcal S}\) be a semi-algebra on \(X\). Then the collection of all finite disjoint unions of sets from \({\mathcal S}\) is already an algebra. Equivalently, \(\mathrm{Alg}({\mathcal S}\), the smallest algebra containing \({\mathcal S}\), consists precisely of such finite disjoint unions.

Generated Algebras

For any collection \({\mathcal C} \subseteq 2^X\), the algebra generated by \({\mathcal C}\), denoted \mathrm{Alg}({\mathcal C})), is the intersection of all algebras containing \({\mathcal C}\). This intersection is non-empty because \(2^X\) is itself an algebra. Proposition 1.9 confirms that this intersection is again an algebra, and Definition 1.10 makes the notation official.

Elementary Families

Between the notion of a semi-algebra and that of a full algebra, there is another useful level of structure that appears naturally in product-measure constructions. An elementary family relaxes the closure axioms of an algebra just enough to retain an explicit description of the generated algebra, while accommodating the collections of “measurable rectangles” that arise in higher-dimensional settings.

Definition. A collection \({\mathcal E} \subseteq {\mathcal P}(X)\) is called an elementary family if:

\(X \in {\mathcal E}\).
For every \(E, F \in {\mathcal E}\), the intersection \(E \cap F\) is a finite disjoint union of elements of \({\mathcal E}\).
For every \(E \in {\mathcal E}\), the complement \(X \setminus E\) is a finite disjoint union of elements of \({\mathcal E}\).

A straightforward induction shows that any finite intersection of elements of \({\mathcal E}\) can be expressed as a finite disjoint union of elements of \({\mathcal E}\). Indeed, if \(E_1 \cap \cdots \cap E_n\) is such a disjoint union, then intersecting with \(E_{n+1}\) distributes over the union, and each resulting piece decomposes by axiom (ii).

Example. In \({\mathbb R}\), the collection \({\mathcal H} = \{(a, b] : -\infty \le a \le b \le \infty\}\) is an elementary family. The intersection of two half-open intervals is again a half-open interval (or empty, which is \((c, c]\)), and the complement of \((a, b]\) in \({\mathbb R}\) is the disjoint union \((-\infty, a] \cup (b, \infty]\), both of which belong to \({\mathcal H}\).

The next lemma gives the payoff: elementary families yield a completely explicit description of the algebra they generate, paralleling the semi-algebra trick of Proposition 1.7.

\[ \mathrm{Alg}({\mathcal E}) = \left\{ \bigsqcup_{i=1}^n E_i : n \in {\mathbb N},\; E_1, \ldots, E_n \in {\mathcal E} \text{ pairwise disjoint} \right\}. \]

That is, the algebra generated by \({\mathcal E}\) consists precisely of finite disjoint unions of members of \({\mathcal E}\).

Proof sketch. Call the right-hand side \({\mathcal F}\). By axiom (i), \(X \in {\mathcal F}\). To show \({\mathcal F}\) is closed under complements: the complement of \(\bigsqcup E_i\) is \(\bigcap (X \setminus E_i)\), and each \(X \setminus E_i\) is a finite disjoint union by axiom (iii); the finite intersection of such unions is again a finite disjoint union by the induction remark above. Closure under finite unions follows from writing \(A \cup B = (A \setminus B) \sqcup (A \cap B) \sqcup (B \setminus A)\) and using the complement and intersection properties. Since \({\mathcal F}\) is an algebra containing \({\mathcal E}\), and any algebra containing \({\mathcal E}\) must contain all finite disjoint unions, we get \({\mathcal F} = \mathrm{Alg}({\mathcal E})\).

Elementary families sit between semi-algebras and algebras in generality. The primary reason they merit attention is that they arise naturally when constructing product \(\sigma\)-algebras: the measurable rectangles \(\{A \times B : A \in {\mathcal M}, B \in {\mathcal N}\}\) form an elementary family on \(X \times Y\). The explicit description of the generated algebra as “finite disjoint unions” streamlines the verification of the pre-measure property in the Carathéodory extension for product spaces, a theme we will return to in Lectures 18–20.

Lecture 2: Additive Set-Functions

Additive Set-Functions

Given an algebra \((X, {\mathcal A})\), a function \(\mu : {\mathcal A} \to [0,\infty]\) is called additive (or finitely additive) if it satisfies:

(Add1) \(\mu(\emptyset) = 0\).
(Add2) For every finite family of pairwise disjoint sets \(A_1, \ldots, A_k \in {\mathcal A}\) with \(A_1 \cup \cdots \cup A_k \in {\mathcal A}\), we have \(\mu(A_1 \cup \cdots \cup A_k) = \mu(A_1) + \cdots + \mu(A_k)\).

Proposition 2.4 records the basic consequences: an additive set-function is monotone (if \(A \subseteq B\) then \(\mu(A) \le \mu(B)\), sub-additive over finite collections, and satisfies the inclusion-exclusion identity \(\mu(A \cup B) = \mu(A) + \mu(B) - \mu(A \cap B)\) when all quantities are finite.

Two fundamental examples are the counting measure on a countable set \(X\) (which counts the number of elements of any subset) and the Dirac measure \(\delta_x\) concentrated at a point \(x_0 \in X\), which assigns 1 to any set containing \(x_0\) and 0 to all others.

The Semi-Algebra Extension Trick

An important theme is that additive functions defined on small collections can be extended. Definition 2.8 introduces the notion of respecting decompositions: a function \(\mu_o : {\mathcal S} \to [0,\infty]\) defined on a semi-algebra respects decompositions if whenever \(S = S_1 \cup \cdots \cup S_k\) is a finite disjoint union in \({\mathcal S}\) with \(S \in {\mathcal S}\), then \(\mu_o(S) = \sum_{i=1}^k \mu_o(S_i)\).

Lemma 2.9 shows that any function respecting decompositions on a semi-algebra \({\mathcal S}\) admits a unique extension to an additive function on \(\mathrm{Alg}({\mathcal S})\). Proposition 2.10 packages this: any such \(\mu_o\) extends uniquely to \(\rho_o : \mathrm{Alg}({\mathcal S}) \to [0,\infty)\) that is finitely additive.

The Lebesgue measure on \({\mathbb R}\) begins to take shape here: one defines \(\lambda([a,b)) = b - a\) on the semi-algebra \({\mathcal E}\) of half-open intervals, verifies this respects decompositions, and then extends to \(\mathrm{Alg}({\mathcal E})\).

Part II: Sigma-Algebras, Positive Measures, and the Extension Theorem

Lecture 3: Sigma-Algebras and Positive Measures

Sigma-Algebras

To handle countable operations — essential for all limiting arguments in analysis — we need the stronger notion of a sigma-algebra. A collection \({\mathcal M} \subseteq 2^X\) is a sigma-algebra if it is an algebra that is additionally closed under countable unions:

Sigma-algebra hierarchy: ℱ ⊆ σ(ℱ) ⊆ Borel ℬ ⊆ Lebesgue measurable sets (nested ovals)

(Sigma-AS3) If \((A_n)_{n=1}^\infty\) is any sequence of sets in \({\mathcal M}\), then \(\bigcup_{n=1}^\infty A_n \in {\mathcal M}\).

A pair \((X, {\mathcal M})\) is called a measurable space.

\[ \mu\!\left(\bigcup_{n=1}^\infty A_n\right) = \sum_{n=1}^\infty \mu(A_n). \]

A triple \((X, {\mathcal M}, \mu)\) is called a measure space.

Proposition 3.5 draws from countable additivity the same basic consequences derived earlier for additive functions: monotonicity, finite sub-additivity, and countable sub-additivity.

Continuity Along Chains

Two central propositions capture the “continuity” of a positive measure.

\[ \mu\!\left(\bigcup_{n=1}^\infty A_n\right) = \lim_{n\to\infty} \mu(A_n). \]

Proof. Write the union as a telescope: \(\bigcup_n A_n = A_1 \cup (A_2 \setminus A_1) \cup (A_3 \setminus A_2) \cup \cdots\), a disjoint union. Countable additivity gives the result.

Proposition 3.9. (Decreasing chains.) If \((A_n)_{n=1}^\infty\) is a decreasing chain in \({\mathcal M}\) and \(\mu(A_1) < \infty\), then \(\mu\!\left(\bigcap_{n=1}^\infty A_n\right) = \lim_{n\to\infty} \mu(A_n)\). The hypothesis \(\mu(A_1) < \infty\) is essential: without it the result fails (consider \(A_n = [n, \infty)\) under Lebesgue measure).

Example 3.10. (Borel-Cantelli.) Let \((A_n)\) be sets in \({\mathcal M}\) with \(\sum_{n=1}^\infty \mu(A_n) < \infty\). Then the set of points lying in infinitely many \(A_n\), namely \(\limsup_n A_n = \bigcap_{N=1}^\infty \bigcup_{n=N}^\infty A_n\), has \(\mu\)-measure zero. This follows from the decreasing-chain continuity applied to the tail sets \(C_N = \bigcup_{n=N}^\infty A_n\), whose measures tend to zero by the convergence of the series.

Lecture 4: The Borel Sigma-Algebra and Lebesgue-Stieltjes Measures

The Borel Sigma-Algebra

For a metric space \((X, d)\), the Borel sigma-algebra \({\mathcal B}_X\) is the sigma-algebra generated by the open sets of \(X\). In particular, \({\mathcal B}_{\mathbb R}\) is generated by any of the following: all open sets, all closed sets, all open intervals, all half-open intervals \([a,b)\).

Lebesgue-Stieltjes Measures

Definition 4.10. A Lebesgue-Stieltjes measure on \({\mathbb R}\) is a positive measure \(\mu : {\mathcal B}_{\mathbb R} \to [0,\infty]\) that assigns finite measure to every bounded interval.

\[ G_\mu(x) = \begin{cases} \mu((0, x]) & \text{if } x > 0, \\ 0 & \text{if } x = 0, \\ -\mu((x, 0]) & \text{if } x < 0. \end{cases} \]

Proposition/Definition 4.12 establishes that \(G_\mu\) is increasing and cadlag (right-continuous with left limits), and that \(G_\mu\) uniquely determines \(\mu\) via the formula \(\mu((a,b]) = G_\mu(b) - G_\mu(a)\). Conversely, every increasing cadlag function \(G\) with \(G(0) = 0\) arises in this way.

Remark 4.14. The connection to cumulative distribution functions in probability: if \(X\) is a real-valued random variable with CDF \(F\), the formula \(\mu((a,b]) = F(b) - F(a)\) defines the distribution measure of \(X\). This is a Lebesgue-Stieltjes measure, and \(G_\mu = F - F(0)\).

Lecture 5: π-Systems, Dynkin’s Theorem, and the Carathéodory Extension

π-Systems and Dynkin’s Consequence

Definition 5.2. A π-system is a collection \({\mathcal P} \subseteq 2^X\) closed under finite intersections.

Proposition 5.4. (Dynkin’s π-λ consequence.) Let \(\mu, \nu\) be positive measures on \((X, {\mathcal M})\) and let \({\mathcal P} \subseteq {\mathcal M}\) be a π-system. Suppose:

\(\mu(P) = \nu(P)\) for all \(P \in {\mathcal P}\).
There exists an exhausting sequence \((P_n) \subseteq {\mathcal P}\) with \(P_n \nearrow X\) and \(\mu(P_n) < \infty\) for all \(n\).

Then \(\mu = \nu\) on all of \({\mathcal M} = \sigma\text{-Alg}({\mathcal P})\).

Corollary 5.5. A Lebesgue-Stieltjes measure \(\mu\) is uniquely determined by its centred function \(G_\mu\), since two measures agreeing on the π-system of half-open intervals and on all bounded intervals must agree on all Borel sets.

Pre-Measures and the Carathéodory Extension Theorem

Definition 5.8. Let \({\mathcal U}\) be an algebra on \(X\). An additive function \(\rho_o : {\mathcal U} \to [0,\infty]\) is called a pre-measure if it is countably additive whenever a countable disjoint union of sets in \({\mathcal U}\) falls in \({\mathcal U}\) — condition (Pre-Sigma-Add).

Theorem 5.10. (Carathéodory Extension Theorem.) Let \(\rho_o : {\mathcal U} \to [0,\infty]\) be a pre-measure on an algebra \({\mathcal U}\). Then there exists a positive measure \(\rho : \sigma\text{-Alg}({\mathcal U}) \to [0,\infty]\) extending \(\rho_o\). If \(\rho_o\) is sigma-finite (i.e., \(X = \bigcup_n U_n\) with \(\rho_o(U_n) < \infty\), then this extension is unique.

Carathéodory Extension: pre-measure μ₀ on algebra 𝒜 → outer measure μ* on 2^X → measure μ on σ(𝒜) (unique if σ-finite)

Corollary 5.12. Every increasing cadlag function \(G : {\mathbb R} \to {\mathbb R}\) with \(G(0) = 0\) determines a unique Lebesgue-Stieltjes measure satisfying \(\mu((a,b]) = G(b) - G(a)\). In the special case \(G(x) = x\), we get the Lebesgue measure \(\mu_{\mathrm{Leb}}\).

Lecture 6: Outer Measures and the Proof of Carathéodory

The proof of Theorem 5.10 introduces the powerful tool of outer measures.

Definition 6.4. An outer measure on \(X\) is a function \(\mu^* : 2^X \to [0,\infty]\) satisfying:

\(\mu^*(\emptyset) = 0\).
Monotonicity: \(A \subseteq B \Rightarrow \mu^*(A) \le \mu^*(B)\).
Countable sub-additivity: \(\mu^*(\bigcup_n A_n) \le \sum_n \mu^*(A_n)\).

Note that an outer measure is defined on all subsets of \(X\), not just measurable ones.

\[ \mu^*(A) = \inf\!\left\{\sum_{n=1}^\infty \rho_o(U_n) : U_n \in {\mathcal U},\; A \subseteq \bigcup_{n=1}^\infty U_n\right\}. \]

Proposition 6.5 verifies this is an outer measure. The “overshoot-and-trim” idea in the proof shows that we can always find efficient covers. Lemma 6.7 shows that \(\mu^*(U) = \rho_o(U)\) for every \(U \in {\mathcal U}\) (the outer measure agrees with the pre-measure on the algebra). Lemma 6.8 establishes, via the Carathéodory measurability criterion, that the collection of \(\mu^*\)-measurable sets forms a sigma-algebra containing \({\mathcal U}\), and that \(\mu^*\) restricted to this sigma-algebra is a positive measure extending \(\rho_o\).

Theorem 6.9 (recasting Theorem 5.10 with uniqueness): if additionally \(\rho_o\) is sigma-finite, any two extensions of \(\rho_o\) to \(\sigma\text{-Alg}({\mathcal U})\) must agree. Uniqueness is proved by Dynkin’s π-λ theorem applied to the π-system \({\mathcal U}\).

Cantor Sets, Cantor Functions, and Singular Measures

With the Carathéodory extension in hand, Lebesgue measure \(\lambda\) on \({\mathbb R}\) is fully constructed. Before moving to the theory of measurable functions, it is illuminating to pause and examine a family of subsets and measures that reveal the subtlety of this construction. The generalized Cantor sets show that Lebesgue measure can assign any prescribed value in \([0, 1]\) to a nowhere-dense closed set, and the associated Cantor functions produce probability measures that are singular with respect to Lebesgue measure yet supported on uncountable sets.

Generalized Cantor Sets

Fix a parameter \(0 < \alpha \le 1\). Begin with the unit interval \(I_{0,1} = [0, 1]\). Let \(J_{0,1}\) be the open middle sub-interval of \(I_{0,1}\) having length \(\alpha / 3\). Removing \(J_{0,1}\) leaves two closed intervals \(I_{1,1}\) and \(I_{1,2}\), each of length strictly less than \(1/2\).

Having constructed \(2^m\) closed intervals \(I_{m,1}, \ldots, I_{m,2^m}\), each of length at most \(1/2^m\), proceed inductively: from each \(I_{m,k}\) remove the open middle sub-interval \(J_{m,k}\) of length \(\alpha / 3^{m+1}\). This produces \(2^{m+1}\) closed intervals at stage \(m+1\).

\[ C_{\alpha,n} = \bigcup_{k=1}^{2^n} I_{n,k}, \]\[ C_\alpha = \bigcap_{n=1}^\infty C_{\alpha,n}. \]

When \(\alpha = 1\), the set \(C = C_1\) is the classical Cantor middle-thirds set.

Cantor set construction: first 4 iterations of middle-third removal (n=0 to n=3)

The construction removes precisely prescribed amounts of length at each stage. The key properties of \(C_\alpha\) are as follows.

Property 1: \(C_\alpha\) is nowhere dense. Let \(x \in C_\alpha\) and \(\varepsilon > 0\). Choose \(n\) large enough so that \(1/2^n < 2\varepsilon\). Then the interval \((x - \varepsilon, x + \varepsilon)\) has length \(2\varepsilon > 1/2^n\), so it must meet at least one of the removed open intervals \(J_{n,k}\). Hence \((x - \varepsilon, x + \varepsilon) \cap ({\mathbb R} \setminus C_\alpha) \ne \emptyset\), which shows that \(C_\alpha\) has empty interior and its closure (itself, since it is closed) contains no open interval.

\[ \lambda([0,1] \setminus C_\alpha) = \sum_{n=0}^{\infty} 2^n \cdot \frac{\alpha}{3^{n+1}} = \frac{\alpha}{3} \sum_{n=0}^{\infty} \left(\frac{2}{3}\right)^n = \frac{\alpha}{3} \cdot \frac{1}{1 - 2/3} = \alpha. \]

Therefore \(\lambda(C_\alpha) = 1 - \alpha\). In particular, the classical Cantor set satisfies \(\lambda(C) = 0\).

This computation is striking: by choosing \(\alpha\) close to 0, one obtains a nowhere-dense closed subset of \([0, 1]\) with Lebesgue measure arbitrarily close to 1 — a so-called “fat Cantor set.” The existence of such sets underscores that topological smallness (nowhere dense, even meagre) and measure-theoretic smallness (null) are independent notions.

The Cantor Function

Write \(I_{n,k} = [a_{n,k}, b_{n,k}]\) for the closed intervals at stage \(n\). Define a sequence of continuous functions \(\varphi_{\alpha,n} : {\mathbb R} \to {\mathbb R}\) as follows:

\(\varphi_{\alpha,n}(x) = 0\) for \(x \le 0\).
On each interval \(I_{n,k}\), \(\varphi_{\alpha,n}\) is linear with slope \(1 / (2^n (b_{n,k} - a_{n,k}))\).
On each removed interval \(J_{m,k}\) (for \(m \le n\)), \(\varphi_{\alpha,n}\) is constant with value \((2k - 1) / 2^{m+1}\).
\(\varphi_{\alpha,n}(x) = 1\) for \(x \ge 1\).

\[ \varphi_\alpha := \lim_{n \to \infty} \varphi_{\alpha,n}. \]

The limit \(\varphi_\alpha\) is continuous (as the uniform limit of continuous functions) and non-decreasing. It maps \({\mathbb R}\) onto \([0, 1]\) and belongs to the class \(\mathrm{ND}_r({\mathbb R})\) of right-continuous non-decreasing functions.

The locally finite Borel measure \(\mu_{\varphi_\alpha}\) determined by \(\varphi_\alpha\) via the Lebesgue-Stieltjes correspondence is called the Cantor-Stieltjes measure associated to parameter \(\alpha\). When \(\alpha = 1\), this is the Cantor singular measure \(\mu_\varphi\).

Singular Measures

\[ \mu_{\varphi_\alpha}({\mathbb R} \setminus C_\alpha) = 0 \quad \text{and} \quad \mu_{\varphi_\alpha}(C_\alpha) = \mu_{\varphi_\alpha}({\mathbb R}) = 1. \]

The measure \(\mu_{\varphi_\alpha}\) is concentrated on \(C_\alpha\).

When \(\alpha = 1\), we have \(\lambda(C) = 0\) and \(\mu_\varphi(C) = 1\). In particular, \(\mu_\varphi\) and \(\lambda\) are mutually singular: \(\mu_\varphi \perp \lambda\). The Cantor singular measure is a probability measure supported on an uncountable set of Lebesgue measure zero.

The Cantor function is the prototypical example of a continuous, monotone function whose derivative is zero Lebesgue-almost everywhere — a phenomenon impossible for absolutely continuous functions. Indeed, \(\varphi_1\) is constant on each \(J_{m,k}\), and the union of these open intervals covers \([0,1]\) up to a set of measure zero, so \(\varphi_1'(x) = 0\) for \(\lambda\)-a.e. \(x\). Yet \(\varphi_1(0) = 0\) and \(\varphi_1(1) = 1\), so the fundamental theorem of calculus in the form “\(\int_0^1 f' \, d\lambda = f(1) - f(0)\)” fails for \(\varphi_1\). This demonstrates that a measure can be “spread” over an uncountable nowhere-dense set in a way that avoids every interval. The pathology motivates the distinction between absolute continuity and mere continuity of measures, a theme central to the Radon-Nikodym theorem developed in Part VI.

Translation Invariance and the Characterization of Lebesgue Measure

The Carathéodory extension produces Lebesgue measure from the length function on intervals. A natural question is: to what extent does this construction single out a unique measure? The following theorem gives a satisfying answer — Lebesgue measure is, up to a multiplicative constant, the only translation-invariant locally finite Borel measure on \({\mathbb R}\). This is a powerful uniqueness result whose statement does not require sigma-finiteness as a hypothesis: translation invariance alone forces the measure to be a scalar multiple of \(\lambda\).

Theorem (Characterization of Lebesgue Measure).

(i) The Lebesgue measure space \(({\mathbb R}, {\mathcal L}, \lambda)\) is translation invariant: for every \(x \in {\mathbb R}\) and \(E \in {\mathcal L}\), the translate \(E + x := \{e + x : e \in E\}\) satisfies \(E + x \in {\mathcal L}\) and \(\lambda(E + x) = \lambda(E)\).

(ii) If \(\mu : {\mathcal B}({\mathbb R}) \to [0, \infty]\) is a translation-invariant locally finite Borel measure, then \(\mu = c\lambda\) for some constant \(c \ge 0\).

Part (i) is immediate from the construction: the pre-measure \(\lambda_0((a, b]) = b - a\) is visibly translation-invariant, and the Carathéodory outer measure inherits this invariance because translating a cover produces a cover of the same total length.

\[ F(x + y) - F(x) = \mu((x, x + y]) = \mu((0, y]) = F(y). \]

Setting \(x = y\) gives \(F(2y) = 2F(y)\), and by induction \(F(ny) = nF(y)\) for all \(n \in {\mathbb N}\). From this, \(F(y/n) = F(y)/n\), so \(F(qy) = qF(y)\) for all \(q \in {\mathbb Q}_{>0}\). Since \(F\) is right-continuous and monotone, a density argument extends this to \(F(ty) = tF(y)\) for all \(t \ge 0\). Taking \(y = 1\) yields \(F(t) = F(1) \cdot t\) for \(t \ge 0\), and a symmetric argument handles \(t < 0\). Setting \(c = F(1) \ge 0\) gives \(\mu = c\lambda\) on all Borel sets.

This characterization has a satisfying conceptual interpretation. Among all the Borel measures on \({\mathbb R}\), Lebesgue measure is picked out by the geometric requirement of translation invariance. In higher dimensions, the analogous statement holds for \(\lambda^d\) on \({\mathbb R}^d\): any translation-invariant locally finite Borel measure is a constant multiple of Lebesgue measure. The result also connects to the non-existence of translation-invariant extensions of \(\lambda\) to all subsets of \({\mathbb R}\) — a theme explored through Vitali’s construction of a non-measurable set.

Part III: Measurable Functions

Lecture 7: Measurable Functions and Bor(X, ℝ)

The Definition and Basic Structure

\[ f^{-1}(B) \in {\mathcal M} \text{ for every } B \in {\mathcal B}_{\mathbb R}. \]

More generally, one says \(f\) is \({\mathcal M}/{\mathcal N}\)-measurable if \(f^{-1}(N) \in {\mathcal M}\) for all \(N \in {\mathcal N}\).

Notation 7.3. The space of all measurable functions from \((X, {\mathcal M})\) to \({\mathbb R}\) is denoted \(\mathrm{Bor}(X, {\mathbb R})\), or more carefully \(\mathrm{Bor}(X, {\mathcal M}, {\mathbb R})\).

Four Tools for Proving Measurability

The course develops four practical tools for establishing that a function is measurable.

Tool 1 (Proposition 7.4 — composition). If \(f : X \to Y\) is \({\mathcal M}/{\mathcal N}\)-measurable and \(g : Y \to Z\) is \({\mathcal N}/{\mathcal P}\)-measurable, then \(g \circ f\) is \({\mathcal M}/{\mathcal P}\)-measurable.

Tool 2 (Proposition 7.5 — generator criterion). If \({\mathcal N} = \sigma\text{-Alg}({\mathcal C})\), then \(f : X \to Y\) is \({\mathcal M}/{\mathcal N}\)-measurable if and only if \(f^{-1}(C) \in {\mathcal M}\) for every \(C \in {\mathcal C}\). For real-valued functions, it suffices to check preimages of half-infinite intervals \((-\infty, c)\), or \((-\infty, c]\), or \((c, \infty)\).

Tool 3. Pointwise algebraic operations preserve measurability: if \(f, g \in \mathrm{Bor}(X, {\mathbb R})\) then so are \(f + g\), \(fg\), \(|f|\), \(\max(f,g)\), \(\min(f,g)\), and \(\lambda f\) for \(\lambda \in {\mathbb R}\).

Proposition 7.10 makes this precise: \(\mathrm{Bor}(X, {\mathbb R})\) is simultaneously an algebra of functions (under pointwise addition and multiplication) and a lattice (under pointwise max and min).

Lecture 8: Limits of Measurable Functions and Simple Approximation

Stability Under Pointwise Limits

Tool 4 (Proposition 8.3). The pointwise limit of a sequence of measurable functions is measurable: if \(f_n \in \mathrm{Bor}(X, {\mathbb R})\) for all \(n\) and \(f(x) = \lim_{n\to\infty} f_n(x)\) exists for every \(x \in X\), then \(f \in \mathrm{Bor}(X, {\mathbb R})\).

The key observation is that sets like \(\{x : \limsup_n f_n(x) > c\}\) can be expressed in terms of countable unions and intersections of the measurable sets \(\{x : f_n(x) > c\}\).

Proposition 8.5 extends this: \(\sup_n f_n\), \(\inf_n f_n\), \(\limsup_n f_n\), and \(\liminf_n f_n\) all lie in \(\mathrm{Bor}(X, {\mathbb R})\) (or in the extended real-valued version) whenever each \(f_n\) does.

Simple Functions

Definition 8.8. A simple function is a measurable function \(s : X \to {\mathbb R}\) that takes only finitely many values. Any simple function can be written as \(s = \sum_{i=1}^k c_i \chi_{A_i}\) where \(A_1, \ldots, A_k\) are pairwise disjoint measurable sets covering \(X\), and \(c_1, \ldots, c_k \in {\mathbb R}\).

Proposition 8.10. (Binning/approximation.) Every \(f \in \mathrm{Bor}^+(X, {\mathbb R})\) (non-negative measurable function) can be approximated from below by simple functions: there exist simple functions \(s_1 \le s_2 \le \cdots \le f\) with \(s_n(x) \nearrow f(x)\) for every \(x \in X\).

Construction. For each \(n \in {\mathbb N}\), define the binning function by setting \(s_n(x) = k/2^n\) when \(f(x) \in [k/2^n, (k+1)/2^n)\) for \(k = 0, 1, \ldots, n \cdot 2^n - 1\), and \(s_n(x) = n\) when \(f(x) \ge n\). Each \(s_n\) is measurable, and the sequence increases to \(f\) pointwise.

Part IV: Integration Theory

Lecture 9: The Integral on L⁺

The Framework

Integration is developed in two stages. First, we define the integral for non-negative measurable functions; then we extend to signed integrable functions.

Let \((X, {\mathcal M}, \mu)\) be a measure space. We write \(\mathrm{Bor}^+(X, {\mathbb R})\) for the collection of non-negative measurable functions (including those that take the value \(+\infty\).

The L⁺ Functional on Simple Functions

\[ L^+_s(s) := \sum_{i=1}^k c_i \cdot \mu(A_i) \in [0,\infty]. \]

This does not depend on which canonical form is chosen for \(s\).

Lemma 9.5 and Proposition 9.6 establish that \(L^+_s\) is:

Monotone: \(s \le t \Rightarrow L^+_s(s) \le L^+_s(t)\).
Positively homogeneous: \(L^+_s(\lambda s) = \lambda L^+_s(s)\) for \(\lambda \ge 0\).
Additive: \(L^+_s(s + t) = L^+_s(s) + L^+_s(t)\).

Extension to All of L⁺

\[ L^+(f) := \sup\!\left\{ L^+_s(s) : s \text{ simple}, 0 \le s \le f \right\} \in [0,\infty]. \]

Proposition 9.10 confirms that this extension preserves monotonicity and positive homogeneity, and agrees with \(L^+_s\) on simple functions. Moreover, if \(f = 0\) a.e.-\(\mu\) then \(L^+(f) = 0\), even if \(f\) is not identically zero.

Lecture 10: The Monotone Convergence Theorem

Statement and Proof

\[ L^+(f) = \lim_{n\to\infty} L^+(f_n). \]\[ \int_X f\, d\mu = \lim_{n\to\infty} \int_X f_n\, d\mu. \]

The proof proceeds in two steps. The inequality \(L^+(f) \ge \lim_n L^+(f_n)\) is easy: since \(f \ge f_n\) for all \(n\), monotonicity gives \(L^+(f) \ge L^+(f_n)\) for all \(n\). The reverse inequality requires a clever bootstrapping argument.

Lemma 10.10 establishes the key step: if \(f_n \nearrow f\) and \(s\) is any simple function with \(0 \le s \le f\), then for any \(0 < \varepsilon < 1\) the sets \(B_n = \{x : f_n(x) \ge (1-\varepsilon) s(x)\}\) form an increasing chain with union \(X\). From this one deduces that \(L^+(f_n) \ge (1-\varepsilon) L^+_s(s)\) eventually, and since \(\varepsilon\) is arbitrary, \(\lim_n L^+(f_n) \ge L^+_s(s)\). Taking the supremum over \(s\) gives the result.

The MCT makes \(L^+\) additive on all of \(\mathrm{Bor}^+(X,{\mathbb R})\) — Proposition 10.8 — because we can approximate any two non-negative measurable functions by increasing sequences of simple functions and pass to the limit.

Lecture 11: The Space L¹(µ) and the Integral

Positive and Negative Parts

\[ f^+(x) = \max(f(x), 0), \qquad f^-(x) = \max(-f(x), 0). \]

Both \(f^+\) and \(f^-\) lie in \(\mathrm{Bor}^+(X, {\mathbb R})\), and \(f = f^+ - f^-\), \(|f| = f^+ + f^-\).

Integrable Functions

Definition 11.3. A function \(f \in \mathrm{Bor}(X, {\mathbb R})\) is called integrable (or in \(L^1(\mu)\) if both \(L^+(f^+) < \infty\) and \(L^+(f^-) < \infty\), equivalently if \(L^+(|f|) < \infty\).

\[ L(f) = \int_X f\, d\mu := L^+(f^+) - L^+(f^-) \in {\mathbb R}. \]

The functional \(L : L^1(\mu) \to {\mathbb R}\) is linear: \(L(f + g) = L(f) + L(g)\) and \(L(\lambda f) = \lambda L(f)\) for \(\lambda \in {\mathbb R}\). Linearity is proved by carefully handling the positive and negative parts of sums.

Proposition 11.7. \(|L(f)| \le L(|f|) = \int_X |f|\, d\mu\).

The standard notation also includes integrals over subsets: \(\int_A f\, d\mu := L(f \chi_A)\) for \(A \in {\mathcal M}\).

Almost-everywhere equality. If \(f = g\) a.e.-\(\mu\) (i.e., \(\mu(\{x : f(x) \ne g(x)\}) = 0\), then \(\int f\, d\mu = \int g\, d\mu\).

Complex-Valued Integrable Functions

So far we have defined the integral for real-valued measurable functions. However, much of modern analysis — Fourier analysis, spectral theory, quantum mechanics — requires integration of complex-valued functions. The extension from \({\mathbb R}\) to \({\mathbb C}\) is straightforward once the real theory is in place, and we describe it now.

Given a measure space \((X, {\mathcal M}, \mu)\), we introduce the following notation for spaces of measurable functions:

\({\mathcal M}(X, {\mathcal M})\) denotes the set of all \({\mathcal M}\)-measurable functions \(f : X \to {\mathbb C}\).
\({\mathcal M}^{\mathbb R}(X, {\mathcal M})\) denotes the real-valued measurable functions \(f : X \to {\mathbb R}\).
\({\mathcal M}^+(X, {\mathcal M})\) denotes the non-negative measurable functions \(f : X \to [0, \infty)\).

For \(f \in {\mathcal M}^{\mathbb R}(X, {\mathcal M})\), the positive part \(f^+ = \max\{f, 0\}\) and the negative part \(f^- = \max\{-f, 0\}\) both belong to \({\mathcal M}^+(X, {\mathcal M})\), with the decomposition \(f = f^+ - f^-\) and \(|f| = f^+ + f^-\). For a complex-valued \(f \in {\mathcal M}(X, {\mathcal M})\), the absolute value function \(|\cdot| : {\mathbb C} \to [0, \infty)\) is continuous and hence Borel measurable, so the composition \(|f| \in {\mathcal M}^+(X, {\mathcal M})\).

\[ {\mathcal L}(\mu) = {\mathcal L}(X, {\mathcal M}, \mu) = \left\{ f \in {\mathcal M}(X, {\mathcal M}) : \int_X |f|\, d\mu < \infty \right\}. \]\[ \int_X f\, d\mu = \int_X (\operatorname{Re} f)^+\, d\mu - \int_X (\operatorname{Re} f)^-\, d\mu + i\left(\int_X (\operatorname{Im} f)^+\, d\mu - \int_X (\operatorname{Im} f)^-\, d\mu\right). \]

Each of the four integrals on the right is finite because \((\operatorname{Re} f)^\pm \le |f|\) and \((\operatorname{Im} f)^\pm \le |f|\), so the integrability of \(|f|\) controls all four pieces.

\[ \int_X (f + g)\, d\mu = \int_X f\, d\mu + \int_X g\, d\mu, \qquad \int_X (cf)\, d\mu = c \int_X f\, d\mu. \]

Proof sketch. Integrability follows from \(|f + g| \le |f| + |g|\) and \(|cf| = |c||f|\). Linearity of the integral reduces to linearity of the real-valued integral applied to the real and imaginary parts.

The following estimate is the complex analogue of Proposition 11.7:

\[ \left|\int_X f\, d\mu\right| \le \int_X |f|\, d\mu. \]

Proof sketch. Write \(\int f\, d\mu = r e^{i\theta}\) in polar form. Then \(r = e^{-i\theta} \int f\, d\mu = \int e^{-i\theta} f\, d\mu\). Since \(r\) is real and non-negative, \(r = \operatorname{Re} \int e^{-i\theta} f\, d\mu = \int \operatorname{Re}(e^{-i\theta} f)\, d\mu \le \int |e^{-i\theta} f|\, d\mu = \int |f|\, d\mu\).

The extension to complex-valued functions is essential for Fourier analysis and the spectral theory of operators, where one naturally works with complex exponentials and complex measures. In particular, the \(L^p\) spaces defined later can equally well consist of complex-valued functions, and the Riesz representation theorem for \(L^2\) generalizes to complex Hilbert spaces with a sesquilinear inner product \(\langle f, g \rangle = \int f \bar{g}\, d\mu\).

Lecture 12: The Lebesgue Dominated Convergence Theorem and Fatou’s Lemma

Reverse MCT and the Dominated Convergence Theorem

\[ \lim_{n\to\infty} \int_X f_n\, d\mu = \int_X f\, d\mu. \]

Theorem 12.3 gives a sharper formulation: \(\lim_{n\to\infty} \int |f_n - f|\, d\mu = 0\) under the same hypotheses, which is the statement of convergence in \(L^1\).

The proof uses the reverse MCT (Lemma 12.5): if \((h_n)\) is a decreasing sequence in \(\mathrm{Bor}^+(X,{\mathbb R})\) with \(L^+(h_1) < \infty\) and \(h_n \searrow h\), then \(L^+(h_n) \to L^+(h)\). This follows from the MCT applied to the differences \(h_1 - h_n \nearrow h_1 - h\).

Fatou’s Lemma

\[ L^+\!\left(\liminf_{n\to\infty} f_n\right) \le \liminf_{n\to\infty} L^+(f_n). \]

This is deduced from the MCT applied to the infimum tails \(\inf_{k \ge n} f_k \nearrow \liminf f_n\), combined with monotonicity.

Proposition 12.8. (Fatou’s Lemma.) The same inequality holds: \(\int \liminf f_n\, d\mu \le \liminf \int f_n\, d\mu\). The moral: the integral of a limit is at most the limit of the integrals, but not necessarily equal (equality requires extra domination, as in LDCT).

Differentiation Under the Integral Sign

One of the most useful applications of the dominated convergence theorem is the rigorous justification for interchanging differentiation and integration. In practice, one often encounters a family of integrals depending on a parameter and needs to compute the derivative with respect to that parameter. The following result, known as the Leibniz integral rule, provides precise conditions under which this interchange is valid.

Proposition (Leibniz Integral Rule). Let \((X, {\mathcal M}, \mu)\) be a measure space and let \(f : X \times (a, b) \to {\mathbb C}\) satisfy:

For each \(s \in (a, b)\), the function \(f(\cdot, s) \in {\mathcal L}(\mu)\).
For each \((x, s) \in X \times (a, b)\), the partial derivative \[ \frac{\partial}{\partial s} f(x, s) = \lim_{h \to 0} \frac{f(x, s+h) - f(x, s)}{h} \] exists.
There exists \(g \in {\mathcal L}^+(\mu)\) (i.e., \(g \ge 0\) and \(\int g\, d\mu < \infty\)) such that for each \(s \in (a, b)\), \[ \left|\frac{\partial f}{\partial s}(x, s)\right| \le g(x) \quad \mu\text{-a.e.} \]

\[ F(s) = \int_X f(x, s)\, d\mu(x) \]\[ F'(s) = \int_X \frac{\partial f}{\partial s}(x, s)\, d\mu(x). \]\[ \varphi_n(x) = \frac{f(x, s + h_n) - f(x, s)}{h_n}. \]\[ \lim_{n \to \infty} \int_X \varphi_n\, d\mu = \int_X \frac{\partial f}{\partial s}(x, s)\, d\mu(x). \]

Since the limit is the same for every sequence \(h_n \to 0\), the derivative \(F'(s)\) exists and equals the right-hand side.

This result gives rigorous justification for the common practice of “differentiating under the integral sign” — a technique Feynman famously championed as a powerful problem-solving tool. The domination hypothesis (3) is the key: without a uniform integrable bound on the partial derivatives, the interchange can fail. In applications, verifying the domination condition is usually the main work; once it is established, the conclusion follows immediately from the LDCT.

More generally, the same argument shows that if all partial derivatives up to order \(k\) satisfy analogous domination conditions, then \(F\) is \(k\)-times differentiable and each derivative can be computed by differentiating under the integral sign. This higher-order version is indispensable in the theory of characteristic functions in probability, where one differentiates \(\int e^{itx}\, d\mu(x)\) to compute moments.

Part V: Lᵖ Spaces

Lecture 13: Lᵖ Spaces, Hölder, and Minkowski

The Lᵖ Spaces

\[ L^p(\mu) := \left\{ f \in \mathrm{Bor}(X, {\mathbb R}) : \int_X |f|^p\, d\mu < \infty \right\}, \]

and equip it with the Lᵖ seminorm \(\|f\|_p := \left(\int |f|^p\, d\mu\right)^{1/p}\). For \(p = \infty\), one uses the essential supremum: \(\|f\|_\infty := \inf\{M : |f| \le M \text{ a.e.-}\mu\}\).

Proposition 13.3. \(L^p(\mu)\) is a linear subspace of \(\mathrm{Bor}(X, {\mathbb R})\). The proof that \(f + g \in L^p\) whenever \(f, g \in L^p\) uses the pointwise inequality \(|f + g|^p \le 2^p(|f|^p + |g|^p)\).

Hölder’s Inequality

Conjugate exponents: \(p\) and \(q\) are conjugate if \(1/p + 1/q = 1\) (with the convention \(1/\infty = 0\), so 1 and ∞ are conjugate).

\[ \int_X |fg|\, d\mu \le \|f\|_p \cdot \|g\|_q. \]

Proof. By the Young inequality \(ab \le a^p/p + b^q/q\) (for \(a, b \ge 0\), applied to \(a = |f(x)|/\|f\|_p\) and \(b = |g(x)|/\|g\|_q\), integrating over \(X\) gives the result.

For \(p = q = 2\), Hölder’s inequality becomes the Cauchy-Schwarz inequality: \(\int |fg|\, d\mu \le \|f\|_2 \|g\|_2\).

Minkowski’s Inequality

\[ \|f + g\|_p \le \|f\|_p + \|g\|_p. \]

This is the triangle inequality for \(\|\cdot\|_p\). The proof for \(1 < p < \infty\) uses Hölder: write \(|f+g|^p = |f+g| \cdot |f+g|^{p-1} \le (|f|+|g|)|f+g|^{p-1}\), apply Hölder with exponents \(p\) and \(q\), and simplify using \((p-1)q = p\).

The quotient space. Since \(\|f\|_p = 0\) whenever \(f = 0\) a.e.-\(\mu\), the natural norm space is the quotient \({\mathcal L}^p(\mu) = L^p(\mu) / \!\sim\) where \(f \sim g \Leftrightarrow f = g\) a.e. By mild abuse of notation, both the quotient and the function space are written \(L^p(\mu)\).

Lecture 14: Completeness of Lᵖ and Modes of Convergence

Banach Space Completeness

A normed space is a Banach space if every Cauchy sequence converges.

The key mechanism (Lemma 14.8): given a Cauchy sequence \((f_n)\) in \(L^p(\mu)\), one can extract a subsequence \((f_{n_k})\) with \(\|f_{n_{k+1}} - f_{n_k}\|_p \le 1/2^k\). The telescoping sums \(F_N = \sum_{k=1}^N |f_{n_{k+1}} - f_{n_k}|\) form an increasing sequence in \(\mathrm{Bor}^+(X,{\mathbb R})\) with bounded \(L^p\) norms (by Minkowski), so by MCT their pointwise limit \(F = \sum_{k=1}^\infty |f_{n_{k+1}} - f_{n_k}|\) lies in \(L^p\). Wherever \(F < \infty\) (which is a.e.), the telescoping series converges absolutely, defining a limit function \(f\).

Proposition 14.9. (Riesz-Fischer.) \(L^p(\mu)\) is a Banach space for all \(1 \le p \le \infty\).

Corollary 14.10. Every Cauchy sequence in \(L^p\) has a subsequence converging pointwise a.e.-\(\mu\).

Corollary 14.11. Every norm-convergent series \(\sum_{k=1}^\infty f_k\) in \(L^p\) converges pointwise a.e. and in \(L^p\) norm.

Modes of Convergence

Three modes of convergence are compared for sequences in \(L^p\).

Definition 14.12. A sequence \((f_n)\) converges almost everywhere (a.e.) to \(f\) if \(\mu(\{x : f_n(x) \not\to f(x)\}) = 0\).

Definition 14.13. \((f_n)\) converges in Lᵖ to \(f\) if \(\|f_n - f\|_p \to 0\).

Definition 14.14. \((f_n)\) converges in probability (or in measure) to \(f\) if for every \(\varepsilon > 0\): \(\mu(\{x : |f_n(x) - f(x)| \ge \varepsilon\}) \to 0\) as \(n \to \infty\).

Proposition 14.15 records the implications: Lᵖ convergence implies convergence in probability (by Markov’s inequality); a.e. convergence with a dominating function in \(L^p\) implies Lᵖ convergence (by LDCT). Neither a.e. convergence nor convergence in probability generally implies the other, but a.e. convergence along a subsequence is guaranteed from convergence in probability.

Egoroff’s Theorem

The gap between pointwise a.e. convergence and uniform convergence is bridged by Egoroff’s theorem, which says that on a finite measure space, a.e. convergence is “almost” uniform. This is a remarkably strong structural result: it tells us that measurable functions on finite measure spaces behave much more tamely than one might expect.

Definition. A sequence \((f_n)\) converges to \(f\) \(\mu\)-almost uniformly if for every \(\varepsilon > 0\), there exists \(E \in {\mathcal M}\) with \(\mu(E) < \varepsilon\) such that \(f_n \to f\) uniformly on \(X \setminus E\).

Almost uniform convergence is strictly stronger than pointwise a.e. convergence but weaker than uniform convergence. The idea is that one is allowed to discard a set of arbitrarily small measure, on whose complement convergence is genuinely uniform.

Theorem (Egoroff). Suppose \((X, {\mathcal M}, \mu)\) is a finite measure space, \((f_n) \subseteq {\mathcal M}(X, {\mathcal M})\), \(f \in {\mathcal M}(X, {\mathcal M})\), and \(\lim_{n \to \infty} f_n = f\) \(\mu\)-a.e. Then \(f_n \to f\) \(\mu\)-almost uniformly.

\[ E_{n,k} = \bigcup_{m=n}^{\infty} \left\{x : |f_m(x) - f(x)| \ge \frac{1}{k}\right\}. \]\[ \lim_{n \to \infty} \mu(E_{n,k}) = \mu\!\left(\bigcap_{n=1}^{\infty} E_{n,k}\right) = 0. \]\[ \mu(E) \le \sum_{k=1}^{\infty} \mu(E_{n_k, k}) < \sum_{k=1}^{\infty} \frac{\varepsilon}{2^k} = \varepsilon. \]

For \(x \in X \setminus E\), we have \(x \notin E_{n_k, k}\) for every \(k\), which means \(|f_n(x) - f(x)| < 1/k\) for all \(n \ge n_k\). This is precisely uniform convergence on \(X \setminus E\).

The finiteness hypothesis is essential. Consider the sequence \(f_n = \chi_{[n, n+1]}\) on \({\mathbb R}\) with Lebesgue measure: \(f_n \to 0\) pointwise everywhere, but for any set \(E\) with \(\lambda(E) < 1\), the complement \({\mathbb R} \setminus E\) must intersect \([n, n+1]\) for infinitely many \(n\), so convergence on \({\mathbb R} \setminus E\) is not uniform. The problem is that Lebesgue measure on \({\mathbb R}\) is not finite, so the continuity-from-above argument breaks down.

Egoroff’s theorem has important consequences. For instance, it provides a quick proof that convergence in measure follows from a.e. convergence on finite measure spaces: if \(f_n \to f\) a.e. and \(\mu(X) < \infty\), then for any \(\varepsilon, \delta > 0\), Egoroff gives a set \(E\) with \(\mu(E) < \delta\) and uniform convergence on \(X \setminus E\); for large enough \(n\), \(|f_n - f| < \varepsilon\) on \(X \setminus E\), so \(\mu(\{|f_n - f| \ge \varepsilon\}) \le \mu(E) < \delta\).

Riemann and Lebesgue Integrals

A natural question is how the Lebesgue integral relates to the Riemann integral from elementary analysis. The answer is reassuring: when the Riemann integral exists, it agrees with the Lebesgue integral. This means that the Lebesgue theory is a genuine extension of the classical theory, not a rival to it.

\[ \int_{[a,b]} f\, d\lambda = \int_a^b f(x)\, dx. \]\[ \int_{[a,b]} f\, d\lambda = \lim_{j \to \infty} \int_{[a,b]} \varphi_{n_j}\, d\lambda = \text{Riemann integral}. \]

More generally, continuous functions on \([a, b]\) are Riemann integrable, hence Lebesgue integrable. Even improperly Riemann-integrable non-negative functions satisfy the equality when \(f \ge 0\) and \(\int_a^b f < \infty\): apply the MCT to the sequence \(f \cdot \chi_{[a, b - 1/n]}\), which increases pointwise to \(f \cdot \chi_{[a, b)}\), and note that \(f \cdot \chi_{[a, b)}\) and \(f \cdot \chi_{[a, b]}\) differ on at most one point, hence have the same Lebesgue integral.

The converse fails: there exist Lebesgue-integrable functions that are not Riemann integrable. The standard example is \(f = \chi_{{\mathbb Q} \cap [0,1]}\), which equals 0 \(\lambda\)-a.e. and so has Lebesgue integral 0, but whose upper and lower Riemann sums are 1 and 0 respectively, so the Riemann integral does not exist. The Lebesgue theory thus strictly extends the Riemann theory, while agreeing with it whenever both are defined.

Lecture 15: L² as a Hilbert Space and Riesz Representation

Inner Product Structure of L²

\[ \langle f, g \rangle := \int_X f(x)\, g(x)\, d\mu(x). \]

This is well-defined by Cauchy-Schwarz and makes \(L^2(\mu)\) into an inner product space. The associated norm is \(\|f\|_2 = \langle f, f \rangle^{1/2}\), and since \(L^2\) is complete (by Riesz-Fischer), it is a Hilbert space.

Proposition 15.5. (Cauchy-Schwarz.) \(|\langle f, g \rangle| \le \|f\|_2 \|g\|_2\), which is Hölder for \(p = q = 2\).

Bounded Linear Functionals

Definition 15.4. A linear map \(\Phi : L^2(\mu) \to {\mathbb R}\) is bounded if there exists \(C > 0\) such that \(|\Phi(f)| \le C \|f\|_2\) for all \(f \in L^2(\mu)\). The smallest such \(C\) is the operator norm \(\|\Phi\|\).

For any \(h \in L^2(\mu)\), the map \(\Phi_h : f \mapsto \langle f, h \rangle\) is a bounded linear functional with \(\|\Phi_h\| = \|h\|_2\). The Riesz representation theorem says all bounded linear functionals arise this way.

Riesz Representation Theorem for L²

Theorem 15.6 / 15.9. (Riesz Representation.) Every bounded linear functional \(\Phi : L^2(\mu) \to {\mathbb R}\) is of the form \(\Phi(f) = \langle f, h \rangle\) for a unique \(h \in L^2(\mu)\).

Proof sketch. The key is the projection onto a closed convex set: for any bounded linear functional \(\Phi\), let \(K = \ker(\Phi)\), a closed subspace of \(L^2\). If \(\Phi \ne 0\), choose \(g \notin K\) and consider the decomposition \(g = g_K + g_\perp\) where \(g_K \in K\) is the projection.

Lemma 15.11 establishes the projection existence: for any closed convex \(C\) in a Hilbert space, there is a unique closest point in \(C\) to any given point. Lemma 15.12 shows that the projection onto a closed subspace is characterized by the orthogonality condition \(\langle f - f_K, k \rangle = 0\) for all \(k \in K\).

The parallelogram law (Lemma 15.10) — \(\|f+g\|^2 + \|f-g\|^2 = 2\|f\|^2 + 2\|g\|^2\) — plays a key role in establishing the uniqueness of closest-point projections.

The Dual of Lᵖ

The Riesz representation theorem for \(L^2\) generalizes beautifully to all \(L^p\) spaces with \(1 < p < \infty\). Where the \(L^2\) result relied on the Hilbert space structure (inner products and orthogonal projections), the general \(L^p\) result requires the Radon-Nikodym theorem as a substitute — the bounded linear functional defines a measure, and Radon-Nikodym extracts a density. This interplay between functional analysis and measure theory is one of the highlights of the subject.

Definition. For a complex normed space \((L, \|\cdot\|)\), the dual space \(L^*\) is the space of bounded linear functionals \(\Phi : L \to {\mathbb C}\), equipped with the operator norm \(\|\Phi\|_* = \sup\{|\Phi(f)| : \|f\| \le 1\}\).

For \(g \in L^q(\mu)\) (where \(1/p + 1/q = 1\)), the map \(\Phi_g(f) = \int fg\, d\mu\) defines a bounded linear functional on \(L^p(\mu)\) with \(\|\Phi_g\|_* = \|g\|_q\). The inequality \(\|\Phi_g\| \le \|g\|_q\) follows from Holder’s inequality. The reverse inequality \(\|\Phi_g\| \ge \|g\|_q\) is achieved by testing against the function \(f = |g|^{q-1} \operatorname{sgn}(\bar{g}) / \|g\|_q^{q-1}\), which lies in \(L^p\) with \(\|f\|_p = 1\) and satisfies \(\Phi_g(f) = \|g\|_q\).

Theorem (Lᵖ Duality, Riesz Representation). Let \((X, {\mathcal M}, \mu)\) be a measure space, \(p > 1\), \(1/p + 1/q = 1\). The map \(g \mapsto \Phi_g\) is an isometric surjection from \(L^q(\mu)\) onto \((L^p(\mu))^*\).

Proof sketch (for \(\mu\) finite). Given \(\Phi \in (L^p)^*\), define \(\nu(E) = \Phi(\chi_E)\). This is a complex measure with \(\nu \ll \mu\) (since \(\mu(E) = 0\) implies \(\chi_E = 0\) in \(L^p\)). By Radon-Nikodym, \(\nu = g \cdot \mu\) for some \(g \in {\mathcal L}(\mu)\). One then shows \(\Phi(f) = \int fg\, d\mu\) for all \(f \in L^p\) by approximation (first for simple functions, then by density). The bound \(\|g\|_q \le \|\Phi\|\) comes from testing \(\Phi\) against truncations of \(|g|^{q-1} \operatorname{sgn}(\bar{g})\).

For the \(\sigma\)-finite case: localize to sets of finite measure, obtain local densities \(g_E\), and use coherence together with the MCT to build the global density.

Remark. For \(p = 1\), the dual of \(L^1(\mu)\) is \(L^\infty(\mu)\) when \(\mu\) is \(\sigma\)-finite, but this requires additional machinery. For \(p = \infty\), \((L^\infty)^*\) is strictly larger than \(L^1\) in general — it contains finitely additive set functions, not just countably additive ones.

The \(L^p\) duality theorem is one of the workhorses of functional analysis. It tells us that the “evaluation-against-a-function” pairing between \(L^p\) and \(L^q\) exhausts all continuous linear functionals — there are no “exotic” bounded linear functionals hiding beyond integration. This fact is crucial for the theory of weak convergence in \(L^p\) spaces and for establishing reflexivity: since the dual of \(L^p\) is \(L^q\) and the dual of \(L^q\) is \(L^p\) (for \(1 < p < \infty\)), these spaces are reflexive Banach spaces, a property with far-reaching consequences in optimization and PDE theory.

Signed Measures and the Hahn Decomposition

The Radon-Nikodym theorem in its full generality requires us to move beyond positive measures. A signed measure on \((X, {\mathcal M})\) is a function \(\nu : {\mathcal M} \to {\mathbb R}\) (finite values only) satisfying \(\nu(\emptyset) = 0\) and countable additivity: if \(E_1, E_2, \ldots \in {\mathcal M}\) are pairwise disjoint, then \(\nu(\bigcup E_i) = \sum \nu(E_i)\), where the series converges absolutely.

Signed measures arise naturally in two ways. First, if \(\mu_1, \mu_2\) are finite positive measures, then \(\nu = \mu_1 - \mu_2\) is a signed measure. Second — and more importantly — if \(f \in {\mathcal L}(\mu)\) (i.e., \(f \in L^1(\mu)\)), then \(\nu(E) = \int_E f\, d\mu\) defines a signed measure. This second construction is the canonical source: signed measures are what you get when you integrate a function that changes sign.

A set \(E \in {\mathcal M}\) is positive for \(\nu\) if \(\nu(F) \ge 0\) for all \(F \subseteq E\) with \(F \in {\mathcal M}\). It is negative if \(\nu(F) \le 0\) for all such \(F\). It is null if \(\nu(F) = 0\) for all such \(F\). Note that a positive set is not merely a set with \(\nu(E) \ge 0\) — every measurable subset must also have non-negative measure.

Lemma. (i) If \(P\) is positive and \(Q \subseteq P\), then \(Q\) is positive. (ii) If \(P_1, P_2, \ldots\) are positive, then \(\bigcup P_i\) is positive.

Proof of (ii). Let \(Q_1 = P_1\), \(Q_{n+1} = P_{n+1} \setminus \bigcup_{i=1}^n P_i\), so the \(Q_i\) are disjoint positive sets with \(\bigcup Q_i = \bigcup P_i\). For \(E \subseteq \bigcup P_i\) with \(E \in {\mathcal M}\), we have \(\nu(E) = \sum \nu(E \cap Q_i) \ge 0\).

The lemma tells us that positivity is hereditary and closed under countable unions — positive sets form a robust class stable under the standard set operations of measure theory.

Theorem (Hahn Decomposition). Let \((X, {\mathcal M}, \nu)\) be a signed measure space. There exist \(P, N \in {\mathcal M}\) such that:

(i) \(P\) is positive for \(\nu\), (ii) \(N\) is negative for \(\nu\), (iii) \(P \cup N = X\), \(P \cap N = \emptyset\).

Furthermore, if \(P', N'\) is another such decomposition, then \(P \triangle P'\) and \(N \triangle N'\) are null for \(\nu\).

Proof sketch (four steps).

Step I: If \(E \in {\mathcal M}\) and \(\varepsilon > 0\), there exists \(E_\varepsilon \subseteq E\) such that \(\nu(E_\varepsilon) \ge \nu(E)\) and \(\nu(B) \ge -\varepsilon\) for all \(B \subseteq E_\varepsilon\). (Proof: If not, iteratively find \(B_1, B_2, \ldots \subseteq E\) with \(\nu(B_n) \le -\varepsilon\), producing \(\nu(\bigcup B_i) = \sum \nu(B_i) = -\infty\), contradicting finiteness.)

Step II: If \(E \in {\mathcal M}\) with \(\nu(E) > 0\), there exists a positive \(P \subseteq E\) with \(\nu(P) \ge \nu(E)\). (Use Step I with \(\varepsilon = 1/n\) repeatedly, refining \(E\).)

Step III: Let \(s = \sup\{\nu(E) : E \in {\mathcal M}\}\). Find \(E_1, E_2, \ldots\) with \(\nu(E_n) \to s\). Let \(P_n \subseteq E_n\) be positive with \(\nu(P_n) \ge \nu(E_n)\). Set \(P = \bigcup P_n\) (positive by the Lemma), and \(\nu(P) \ge \lim \nu(P_n) \ge \lim \nu(E_n) = s\). Since \(P\) is positive, \(\nu(P) \le s\), so \(\nu(P) = s\).

Step IV: \(N = X \setminus P\) is negative. If not, there exists \(E \subseteq N\) with \(\nu(E) > 0\), hence a positive \(P' \subseteq E\) with \(\nu(P') > 0\). But \(P \cup P'\) is positive with \(\nu(P \cup P') = s + \nu(P') > s\), contradicting the definition of \(s\).

Essential uniqueness: Any two Hahn decompositions differ only on null sets. If \(P, N\) and \(P', N'\) are two decompositions, then for any \(E \subseteq P \triangle P'\), \(E\) is simultaneously a subset of a positive set and a subset of a negative set, forcing \(\nu(E) = 0\).

The Hahn decomposition reveals that a signed measure concentrates its positivity and negativity on two complementary pieces of the space. This is the signed-measure analogue of writing a real number as the difference of its positive and negative parts, and it enables the Jordan decomposition \(\nu = \nu^+ - \nu^-\) where \(\nu^+(E) = \nu(E \cap P)\) and \(\nu^-(E) = -\nu(E \cap N)\). The total variation \(|\nu| = \nu^+ + \nu^-\) is then a finite positive measure. The Jordan decomposition is unique (since the Hahn decomposition is essentially unique), and it provides the canonical way to express any signed measure as a difference of two positive measures with disjoint supports.

Part VI: The Radon-Nikodym Theorem

Lecture 16: Inequalities Between Measures and a Preliminary Radon-Nikodym

Linear Combinations and Ordering of Measures

Definition 16.1. Let \((X, {\mathcal M})\) be a measurable space and \(\mu, \nu : {\mathcal M} \to [0,\infty]\) positive measures. The linear combination \(\alpha\mu + \beta\nu\) for \(\alpha, \beta \ge 0\) is again a positive measure, defined pointwise: \((\alpha\mu + \beta\nu)(A) = \alpha \mu(A) + \beta \nu(A)\).

Definition 16.3. We write \(\nu \le \mu\) to mean \(\nu(A) \le \mu(A)\) for all \(A \in {\mathcal M}\). This defines a partial order on positive measures.

Density Integration and the Radon-Nikodym Preliminary Version

\[ \nu(A) = \int_A h\, d\mu = \int_X \chi_A(x)\, h(x)\, d\mu(x), \quad \forall A \in {\mathcal M}. \]

The function \(h\) is called the density of \(\nu\) with respect to \(\mu\).

Lemma 16.7. If \(d\nu = h\, d\mu\) with \(0 \le h(x) \le 1\) for all \(x\), then \(\nu \le \mu\).

Theorem 16.11. (Preliminary Radon-Nikodym.) Let \((X, {\mathcal M}, \mu)\) be a finite measure space and \(\nu : {\mathcal M} \to [0,\infty)\) a finite positive measure with \(\nu \le \mu\). Then there exists a density \(h \in \mathrm{Bor}^+(X,{\mathbb R})\) with \(0 \le h(x) \le 1\) for all \(x\), such that \(d\nu(x) = h(x)\, d\mu(x)\).

Proof. Since \(\nu \le \mu\), any \(f \in L^2(\mu)\) satisfies \(\|f\|_{L^2(\nu)} \le \|f\|_{L^2(\mu)}\). The map \(\Phi(f) = \int_X f\, d\nu\) is therefore a bounded linear functional on \(L^2(\mu)\). By the Riesz Representation Theorem, there exists \(h \in L^2(\mu)\) with \(\Phi(f) = \int_X f\, h\, d\mu\) for all \(f \in L^2(\mu)\). Choosing \(f = \chi_A\) gives \(\nu(A) = \int_A h\, d\mu\). One then argues that \(0 \le h(x) \le 1\) a.e.-\(\mu\) by testing against indicator functions of the sets \(\{h < 0\}\) and \(\{h > 1\}\).

Lecture 17: Absolute Continuity, Radon-Nikodym, and Lebesgue Decomposition

Absolute Continuity

\[ \mu(A) = 0 \Rightarrow \nu(A) = 0, \quad \forall A \in {\mathcal M}. \]

Example 17.2. If \(d\nu = h\, d\mu\) for some density \(h \in \mathrm{Bor}^+(X,{\mathbb R})\), then \(\nu \ll \mu\). Indeed, if \(\mu(A) = 0\) then \(\int_A h\, d\mu = 0\).

The word “continuity” is justified by Proposition 17.3: if \(\nu(X) < \infty\), then \(\nu \ll \mu\) if and only if for every \(\varepsilon > 0\) there exists \(\delta > 0\) such that \(\mu(A) < \delta \Rightarrow \nu(A) < \varepsilon\). The proof of \((\Rightarrow)\) is a beautiful contradiction argument using bad sets and Borel-Cantelli (the same tail-set technique as the original Borel-Cantelli). The finiteness of \(\nu\) is essential for the decreasing chain argument.

Remark 17.4 gives a counterexample showing that \(\nu(X) < \infty\) is necessary: the counting measure and the \(1/2^n\)-weighted measure on \({\mathbb N}\) demonstrate that absolute continuity does not imply the \(\varepsilon\)-\(\delta\) condition for infinite \(\nu\).

The Full Radon-Nikodym Theorem

Theorem 17.5. (Radon-Nikodym for finite measures.) Let \(\mu, \nu : {\mathcal M} \to [0,\infty)\) be finite positive measures with \(\nu \ll \mu\). Then there exists a density \(h \in \mathrm{Bor}^+(X,{\mathbb R}) \cap L^1(\mu)\) such that \(d\nu(x) = h(x)\, d\mu(x)\).

The Connecting Function

\[ \int_X f\, g\, d\mu = \int_X f\,(1-g)\, d\nu, \quad \forall f \text{ bounded, } f \in \mathrm{Bor}^+(X,{\mathbb R}). \]

Proof. Let \(\rho = \mu + \nu\). Since \(\nu \le \rho\), Theorem 16.11 provides a density \(g\) with \(d\nu = g\, d\rho\). Writing \(d\rho = d\mu + d\nu\) and substituting, the connecting equation (17.1) follows by cancellation.

Lemma 17.10. If \(N = \{x : g(x) = 1\}\), then \(\mu(N) = 0\). (Take \(f = \chi_N\) in the connecting equation.)

Proofs of Radon-Nikodym and Lebesgue Decomposition

Proof of Theorem 17.5. Let \(g\) be a connecting function, \(N = \{g = 1\}\). Since \(\mu(N) = 0\) and \(\nu \ll \mu\), we also have \(\nu(N) = 0\). Modify \(g\) to \(\tilde g = g \cdot (1 - \chi_N)\), ensuring \(0 \le \tilde g < 1\) everywhere. Set \(h = \tilde g / (1 - \tilde g)\). For any \(A \in {\mathcal M}\), approximate \(\chi_A / (1 - \tilde g)\) by the partial sums \(f_n = \chi_A (1 + \tilde g + \cdots + \tilde g^n)\), apply the connecting equation to each \(f_n\), and pass to the limit via the MCT. The result is \(\nu(A) = \int_A h\, d\mu\).

Definition 17.6. We say \(\mu\) is concentrated on a set \(P \in {\mathcal M}\) if \(\mu(X \setminus P) = 0\). Two measures are mutually singular, written \(\mu \perp \nu\), if there exist disjoint \(P, Q \in {\mathcal M}\) with \(\mu\) concentrated on \(P\) and \(\nu\) concentrated on \(Q\).

Theorem 17.7. (Lebesgue Decomposition.) Any finite positive measure \(\nu\) on a finite measure space \((X,{\mathcal M},\mu)\) decomposes uniquely as \(\nu = \nu_1 + \nu_2\) where \(\nu_1 \ll \mu\) and \(\nu_2 \perp \mu\).

Proof. From the connecting function \(g\), set \(N = \{g=1\}\), then \(\nu_1(A) = \nu(A \cap (X \setminus N))\) and \(\nu_2(A) = \nu(A \cap N)\). The set \(\mu(N) = 0\) ensures \(\nu_1 \ll \mu\) (via Exercise 17.11), and the construction forces \(\nu_2 \perp \mu\) (since \(\mu\) is concentrated on \(X \setminus N\) and \(\nu_2\) on \(N\)).

The General Radon-Nikodym Theorem

The Radon-Nikodym theorem extends far beyond finite positive measures. The full version handles complex measures and \(\sigma\)-finite reference measures, providing the definitive form of the density theorem that underpins modern analysis and probability theory.

Theorem (Lebesgue-Radon-Nikodym, General Version). Let \((X, {\mathcal M})\) be a measurable space, \(\nu : {\mathcal M} \to {\mathbb C}\) a complex measure, and \(\mu : {\mathcal M} \to [0,\infty]\) a \(\sigma\)-finite positive measure. Then:

(i) There exists a unique complex measure \(\rho : {\mathcal M} \to {\mathbb C}\) such that \(\rho \perp \mu\) and \(\nu - \rho \ll \mu\).

(ii) There exists \(f \in {\mathcal L}(\mu)\) such that \(\nu - \rho = f \cdot \mu\) (i.e., \((\nu - \rho)(E) = \int_E f\, d\mu\) for all \(E \in {\mathcal M}\)).

The decomposition \(\nu = \rho + f \cdot \mu\) is the Lebesgue decomposition, and \(f = d\nu/d\mu\) is the Radon-Nikodym derivative.

Proof strategy. Decompose \(\nu\) into real and imaginary parts, then into positive and negative parts (Jordan decomposition), giving four finite positive measures. Apply the finite-case Radon-Nikodym (Theorem 17.5) and Lebesgue decomposition (Theorem 17.7) to each, then reassemble.

For the \(\sigma\)-finite extension: if \(\mu\) is \(\sigma\)-finite, write \(X = \bigcup X_n\) with \(\mu(X_n) < \infty\). Apply the finite case on each \(X_n\) to get densities \(f_n\). These are coherent (\(f_{n+1}|_{X_n} = f_n\) a.e.) and piece together to give \(f \in {\mathcal L}(\mu)\).

The passage from finite to \(\sigma\)-finite mirrors the coherent-family mechanism we used to build the product measure (Proposition 20.5). In both cases, the key insight is that local solutions on an exhausting chain glue together uniquely into a global object.

Chain Rule for Radon-Nikodym Derivatives

Proposition. Let \(\nu\) be a complex measure, \(\mu\) a finite positive measure, \(\lambda\) a \(\sigma\)-finite positive measure.

(i) If \(\nu \ll \lambda\), then for \(g \in {\mathcal L}(\nu)\), \(g \cdot (d\nu/d\lambda) \in {\mathcal L}(\lambda)\) and \(\int g\, d\nu = \int g \cdot (d\nu/d\lambda)\, d\lambda\).

(ii) If \(\nu \ll \mu\) and \(\mu \ll \lambda\), then \(\nu \ll \lambda\) and \(d\nu/d\lambda = (d\nu/d\mu) \cdot (d\mu/d\lambda)\) \(\lambda\)-a.e.

This chain rule is the measure-theoretic analogue of the change-of-variables formula \(dy/dx = (dy/du)(du/dx)\). It says that Radon-Nikodym derivatives compose multiplicatively, justifying the Leibniz-style notation \(d\nu/d\mu\). In probability, this is exactly the mechanism behind the likelihood ratio: if \(P\) and \(Q\) are probability measures both absolutely continuous with respect to a reference measure \(\lambda\), then \(dP/dQ = (dP/d\lambda)/(dQ/d\lambda)\) wherever \(dQ/d\lambda > 0\).

Part VII: Product Measures and the Fubini-Tonelli Theorem

Lecture 18: Direct Product of Two Finite Positive Measures

The Product Measurable Space

\[ {\mathcal M} \times {\mathcal N} := \sigma\text{-Alg}(\{A \times B : A \in {\mathcal M},\, B \in {\mathcal N}\}). \]

Proposition 18.3. The collection \({\mathcal R}\) of measurable rectangles is a semi-algebra on \(X \times Y\).

Proof. (Semi-AS1) and (Semi-AS3) are clear. For (Semi-AS2): the complement \((X \times Y) \setminus (A \times B) = ((X \setminus A) \times Y) \cup (A \times (Y \setminus B))\), a disjoint union of two rectangles.

Notation 18.4. Let \({\mathcal U} = \mathrm{Alg}({\mathcal R})\), the algebra generated by the semi-algebra of rectangles. By Proposition 1.7, every set in \({\mathcal U}\) is a finite disjoint union of measurable rectangles.

Existence of the Product Measure

\[ (\mu \times \nu)(A \times B) = \mu(A) \cdot \nu(B), \quad \forall A \in {\mathcal M},\, B \in {\mathcal N}. \]

Strategy. Apply the Carathéodory extension theorem (Theorem 5.10) to a pre-measure \(\rho_o : {\mathcal U} \to [0,\infty)\) and use Dynkin for uniqueness.

Step 1 — The set-function on the semi-algebra. Define \(\rho_{oo} : {\mathcal R} \to [0,\infty)\) by \(\rho_{oo}(A \times B) = \mu(A)\nu(B)\). Lemma 18.9 is the technical heart: if \(A \times B = \bigsqcup_{i=1}^\infty A_i \times B_i\), then the function \(y \mapsto \sum_{i=1}^n \mu(A_i)\chi_{B_i}(y)\) increases pointwise to \(y \mapsto \mu(A)\chi_B(y)\) — this is a non-trivial set-theoretic puzzle whose solution relies on a delicate slice argument. Proposition 18.10 then shows \(\rho_{oo}\) respects decompositions (using the MCT to handle the infinite series). By Proposition 2.10, \(\rho_{oo}\) extends to a finitely additive \(\rho_o : {\mathcal U} \to [0,\infty)\).

Step 2 — Pre-measure property. Lemma 18.12 and Proposition 18.13 confirm \(\rho_o\) is a pre-measure by bootstrapping from Proposition 18.10 through the structure of \({\mathcal U}\) as finite disjoint unions of rectangles.

Proof of Theorem 18.6. Apply Carathéodory to \(\rho_o\) to get \(\mu \times \nu\). Uniqueness follows from Dynkin since \({\mathcal R}\) is a π-system generating \({\mathcal M} \times {\mathcal N}\).

Lecture 19: Slicing and the Theorem of Tonelli (Finite Case)

Slices of Sets and Functions

Notation 19.2. For \(E \subseteq X \times Y\) and \(f : X \times Y \to {\mathbb R}\):

The vertical slice at \(x \in X\): \(E_{[x]} = \{y \in Y : (x,y) \in E\}\), and \(f_{[x]}(y) = f(x,y)\).
The horizontal slice at \(y \in Y\): \(E^{[y]} = \{x \in X : (x,y) \in E\}\), and \(f^{[y]}(x) = f(x,y)\).

Proposition 19.3. Slicing preserves measurability: if \(E \in {\mathcal M} \times {\mathcal N}\) then \(E_{[x]} \in {\mathcal N}\) for all \(x\), and \(E^{[y]} \in {\mathcal M}\) for all \(y\). Similarly for functions in \(\mathrm{Bor}(X \times Y, {\mathbb R})\).

Proof. The embedding maps \(V_x : y \mapsto (x,y)\) and \(H_y : x \mapsto (x,y)\) are measurable (by the generator criterion, Tool 2, applied to the semi-algebra \({\mathcal R}\)). Slices are preimages under these maps.

Computing µ×ν by Slicing

\[ (\mu \times \nu)(E) = \int_X \nu(E_{[x]})\, d\mu(x) = \int_Y \mu(E^{[y]})\, d\nu(y). \]

The proof proceeds via Dynkin’s π-λ theorem applied to the collection of “good” sets \({\mathcal G}\) for which the statement holds (Lemma 19.8). The collection \({\mathcal G}\) is stable under complements (I), finite disjoint unions (II), and increasing chains (III) — making it a λ-system. Since \({\mathcal G}\) contains all measurable rectangles (Lemma 19.9) and \({\mathcal R}\) is a π-system, Lemma 19.11 (Dynkin’s π-λ theorem in λ-system form) gives \({\mathcal G} \supseteq \sigma\text{-Alg}({\mathcal R}) = {\mathcal M} \times {\mathcal N}\).

\[ \int_X f\, d\mu = \int_0^c \mu(\{f \ge t\})\, dt. \]

This converts an abstract integral into a classical Lebesgue integral, and follows by evaluating \((\mu \times \lambda_{\mathrm{Leb}})(\{(x,t) : f(x) \ge t\})\) two ways.

Theorem of Tonelli (Finite, Bounded Case)

\[ \int_{X \times Y} f(x,y)\, d(\mu \times \nu)(x,y) = \int_X\!\!\left(\int_Y f(x,y)\, d\nu(y)\right) d\mu(x) = \int_Y\!\!\left(\int_X f(x,y)\, d\mu(x)\right) d\nu(y). \]

Proof sketch. The collection \({\mathcal F}\) of functions for which (19.7) holds contains all indicator functions of measurable rectangles (by Lemma 19.9) and is stable under:

Pointwise limits of increasing sequences (by MCT).
Linear combinations with non-negative coefficients.

By bootstrapping from indicator functions to simple functions and then to general bounded \(\mathrm{Bor}^+\) functions, one concludes \({\mathcal F} = \mathrm{Bor}^+(X \times Y, {\mathbb R})\).

Lecture 20: Sigma-Finite Measures, Full Tonelli, and Fubini

Sigma-Finite Measure Spaces

Definition 20.1. A measure space \((X, {\mathcal M}, \mu)\) is sigma-finite if there exists an increasing chain \(U_1 \subseteq U_2 \subseteq \cdots\) of sets from \({\mathcal M}\) with \(\bigcup_n U_n = X\) and \(\mu(U_n) < \infty\) for all \(n\). Such a sequence is called an exhausting chain of sets of finite measure.

Every finite measure space is sigma-finite (take \(U_n = X\)). The Lebesgue measure space \(({\mathbb R}, {\mathcal B}_{\mathbb R}, \mu_{\mathrm{Leb}})\) is sigma-finite with \(U_n = [-n, n]\).

From Finite to Sigma-Finite

Notation 20.4. If \(Z \in {\mathcal M}\), the restriction of \((X, {\mathcal M}, \mu)\) to \(Z\) is the measure space \((Z, {\mathcal M}{\downarrow}_Z, \mu{\downarrow}_Z)\) where \({\mathcal M}{\downarrow}_Z = \{A \in {\mathcal M} : A \subseteq Z\}\).

Proposition 20.5. (Coherent family mechanism.) Suppose \((Z_n)\) exhausts \(Z\) with increasing chain, and we have finite measures \(\rho_n : {\mathcal P}_n \to [0,\infty)\) satisfying the coherence condition \(\rho_{n+1}|_{\mathcal P_n} = \rho_n\). Then there is a unique sigma-finite measure \(\rho : {\mathcal P} \to [0,\infty]\) extending all \(\rho_n\), given by \(\rho(E) = \lim_{n\to\infty} \rho_n(E \cap Z_n)\).

The Product Measure for Sigma-Finite Spaces

Theorem 20.7. Let \((X, {\mathcal M}, \mu)\) and \((Y, {\mathcal N}, \nu)\) be sigma-finite measure spaces. There exists a unique sigma-finite positive measure \(\mu \times \nu : {\mathcal M} \times {\mathcal N} \to [0,\infty]\) satisfying \((\mu \times \nu)(A \times B) = \mu(A)\nu(B)\) for all \(A \in {\mathcal M}\), \(B \in {\mathcal N}\) (using \(0 \cdot \infty = 0\)).

Proof outline. Uniqueness follows from Dynkin’s theorem (Proposition 5.4) using the exhausting chain \((U_n \times V_n)\). Existence is constructed by applying Theorem 18.6 to the finite restrictions \(\mu{\downarrow}_{U_n}\) and \(\nu{\downarrow}_{V_n}\), producing finite measures \(\rho_n\) on \(({\mathcal M} \times {\mathcal N}){\downarrow}_{U_n \times V_n}\). Coherence holds by Exercise 20.6 (the restriction of the product sigma-algebra to a sub-rectangle is the product of the restricted sigma-algebras). Proposition 20.5 then assembles these into the sigma-finite product measure.

Theorem of Tonelli — Sigma-Finite Version

Theorem 20.8. (Tonelli, sigma-finite.) Let \(f \in \mathrm{Bor}^+(X \times Y, {\mathbb R})\) (possibly taking value \(\infty\)). Let \(T = \{x \in X : \int_Y f_{[x]}\, d\nu = \infty\}\). Then:

\(T \in {\mathcal M}\).
If \(\mu(T) > 0\), then \(\int_{X \times Y} f\, d(\mu \times \nu) = \infty\).
If \(\mu(T) = 0\), define \(F(x) = \int_Y f_{[x]}\, d\nu\) for \(x \notin T\) and \(F(x) = 0\) for \(x \in T\). Then \(F \in \mathrm{Bor}^+(X,{\mathbb R})\) and: \[ \int_X F(x)\, d\mu(x) = \int_{X \times Y} f(x,y)\, d(\mu \times \nu)(x,y). \]

Proof outline. Approximate \(f\) by an increasing sequence \((f_n)\) of bounded functions supported on \(U_n \times V_n\), apply the finite version of Tonelli (Theorem 19.12) to each \(f_n\), and pass to the limit by MCT.

Theorem of Fubini

\[ \iint f(x,y)\, d\nu(y)\, d\mu(x) := \int_X\!\!\left(\int_Y f_{[x]}(y)\, d\nu(y)\right) d\mu(x), \]

provided the inner integral exists for a.e.-\(\mu\) value of \(x\) and the outer integral then converges. The definition is independent of the choice of null set \(W\) where the inner integral may fail.

Remark 20.11. The iterated integral may exist in one order but not the other, and they may differ if both exist — pathologies are possible without an integrability hypothesis on \(f\). Fubini rules this out.

\[ \iint f(x,y)\, d\nu(y)\, d\mu(x) = \int_{X \times Y} f\, d(\mu \times \nu) = \iint f(x,y)\, d\mu(x)\, d\nu(y). \]

Proof. Write \(f = f^+ - f^-\). Since \(f \in L^1\), both \(\int f^\pm\, d(\mu \times \nu) < \infty\). Apply Tonelli’s Theorem 20.8 to \(f^+\) and \(f^-\) separately (finite integral forces us into case 3 of Tonelli), then subtract.

\[ \int_{X \times Y} |f|\, d(\mu \times \nu) = \iint |f(x,y)|\, d\nu(y)\, d\mu(x). \]

If this iterated integral is finite, then \(f \in L^1(\mu \times \nu)\) and Fubini applies.

Multidimensional Lebesgue Measure

The product measure machinery, combined with the Fubini-Tonelli theorem, yields the \(d\)-dimensional Lebesgue measure as a concrete and powerful object. Rather than constructing it from scratch, we simply iterate the one-dimensional construction using the product measure framework developed above.

Recall that \({\mathcal B}({\mathbb R}) \otimes \cdots \otimes {\mathcal B}({\mathbb R}) = {\mathcal B}({\mathbb R}^d)\) (the Borel \(\sigma\)-algebra on \({\mathbb R}^d\)). Let \(\lambda_d = \lambda \times \cdots \times \lambda : {\mathcal B}({\mathbb R}^d) \to [0,\infty]\) denote the \(d\)-dimensional Lebesgue measure, and \({\mathcal L}_d\) its completion.

The Fubini-Tonelli theorem immediately gives us the ability to compute \(d\)-dimensional integrals as iterated one-dimensional integrals, in any order we please.

\[ \int_{\mathbb{R}^d} f\, d\lambda_d = \int_{\mathbb{R}} \cdots \int_{\mathbb{R}} f(x_1, \ldots, x_d)\, dx_{\sigma(1)} \cdots dx_{\sigma(d)} \]

for any permutation \(\sigma : [d] \to [d]\).

This freedom to reorder the variables of integration is one of the most practically useful consequences of the product measure construction. It reduces multidimensional integrals to iterated one-dimensional integrals, and the choice of ordering can dramatically simplify a computation.

Proposition (Translation and Linear Change of Variables). Let \(f \in {\mathcal L}({\mathbb R}^d, {\mathcal B}({\mathbb R}^d), \lambda_d)\).

(i) For \(x \in {\mathbb R}^d\), let \(T_x(y) = x + y\). Then \(f \circ T_x \in {\mathcal L}(\lambda_d)\) and \(\int f \circ T_x\, d\lambda_d = \int f\, d\lambda_d\).

(ii) For \(A \in \mathrm{GL}_d({\mathbb R})\) (invertible \(d \times d\) matrix), \(f \circ A \in {\mathcal L}(\lambda_d)\) and \(\int f \circ A\, d\lambda_d = \frac{1}{|\det A|} \int f\, d\lambda_d\).

Part (i) says that Lebesgue measure is translation-invariant — shifting the argument of a function does not change its integral. Part (ii) is the linear change-of-variables formula, giving the precise way that a linear map rescales volume.

Proof of (ii). Factor \(A\) into elementary row operations: row additions \(A_{ij}\), row swaps \(S_{ij}\), and row scalings \(M_{ic}\). Row additions have determinant 1 and preserve \(\lambda_d\) by Fubini + translation invariance. Row swaps have \(|\det| = 1\) and preserve \(\lambda_d\) by Fubini (reordering integration). Row scalings \(M_{ic}\) satisfy \(\int f \circ M_{ic}\, d\lambda_d = \frac{1}{|c|} \int f\, d\lambda_d\) by the one-dimensional change of variables. Composing: \(\int f \circ A\, d\lambda_d = \int f \circ A_1 \circ \cdots \circ A_n\, d\lambda_d = \frac{1}{|\det A|} \int f\, d\lambda_d\).

This is the measure-theoretic foundation for the change-of-variables formula in multivariable calculus. The factor \(1/|\det A|\) — the reciprocal of the absolute value of the determinant — captures how the linear map stretches or compresses volume.

Part VIII: Radon Measures

The abstract theory of measures gains considerable power when specialized to locally compact metric spaces, where the interplay between topology and measure theory produces the rich class of Radon measures. These are the measures that behave well with respect to the topology: they can be approximated from outside by open sets and from inside by compact sets.

Radon Measures and Regularity

Definition. A metric space \((X, d)\) is locally compact if every point has a compact neighbourhood: for each \(x \in X\), there exists \(\varepsilon_x > 0\) such that the closed ball \(\overline{B}_{\varepsilon_x}(x)\) is compact.

Local compactness is a topological property that bridges the gap between the compactness of finite-dimensional closed bounded sets and the non-compactness of infinite-dimensional spaces. It is the minimal assumption needed for the rich interplay between continuous functions and measures.

Examples. (i) \({\mathbb R}^d\) with the Euclidean metric is locally compact (Heine-Borel). (ii) Any discrete metric space is locally compact (every singleton has a compact closed ball for small enough radius). (iii) Infinite-dimensional Banach spaces are not locally compact.

Definition. Let \((X, d)\) be a locally compact metric space. A measure \(\mu : {\mathcal B}(X) \to [0,\infty]\) is called a Radon measure if:

(Outer regularity) For \(E \in {\mathcal B}(X)\), \(\mu(E) = \inf\{\mu(U) : E \subseteq U,\, U \text{ open}\}\).
(Locally finite) For \(K \subseteq X\) compact, \(\mu(K) < \infty\).
(Inner regularity on open sets) If \(U \subseteq X\) is open, then \(\mu(U) = \sup\{\mu(K) : K \subseteq U,\, K \text{ compact}\}\).

These three conditions together ensure that the measure is tightly controlled by the topology. Outer regularity says every Borel set can be approximated from the outside by open sets, while inner regularity on open sets says open sets can be approximated from the inside by compact sets.

Proposition. Let \(\mu\) be a Radon measure. If \(E \in {\mathcal B}(X)\) satisfies \(\mu(E) < \infty\), then \(E\) is also inner regular: \(\mu(E) = \sup\{\mu(K) : K \subseteq E,\, K \text{ compact}\}\). Thus, if \(X\) is \(\sigma\)-compact (\(X = \bigcup K_n\) with each \(K_n\) compact), then \(\mu\) is inner regular on all Borel sets.

Radon measures are the “well-behaved” measures on locally compact spaces: they can be approximated from outside by open sets and from inside by compact sets. This approximation property is what makes them amenable to analysis.

The Riesz Representation Theorem for Positive Functionals

Let \(C_c(X)\) denote the space of continuous functions \(f : X \to {\mathbb C}\) with compact support (i.e., \(\{x : f(x) \ne 0\}\) has compact closure). This is a vector space that encodes the topology of \(X\) through its elements, and it serves as the bridge between functional analysis and measure theory.

Theorem (Riesz-Markov-Kakutani). Let \((X, d)\) be a \(\sigma\)-compact locally compact metric space. If \(I : C_c(X) \to {\mathbb C}\) is a positive linear functional (meaning \(f \ge 0\) implies \(I(f) \ge 0\)), then there is a unique Radon measure \(\mu : {\mathcal B}(X) \to [0,\infty]\) such that \(I(f) = \int_X f\, d\mu\) for all \(f \in C_c(X)\).

Proof outline.

Construction. For \(U\) open, set \(\mu^0(U) = \sup\{I(f) : f \prec U\}\) where \(f \prec U\) means \(0 \le f \le 1\), \(f \in C_c(X)\), \(\operatorname{supp} f \subseteq U\). Then define \(\mu^*(E) = \inf\{\mu^0(U) : E \subseteq U \text{ open}\}\) for all \(E \subseteq X\). Show:

\({\mathcal B}(X) \subseteq {\mathcal M}\) (the \(\mu^*\)-measurable sets), so \(\mu = \mu^*|_{{\mathcal B}(X)}\) is a measure.
\(\mu\) is outer regular (by construction) and locally finite (since \(C_c(X)\) functions have finite \(I\)-values on compact supports).
Inner regularity on open sets uses Urysohn-type arguments with compactly supported bump functions.

Verification: \(I(f) = \int f\, d\mu\) is proved by approximating \(0 \le f \le 1\) with \(\operatorname{supp} f\) compact by a partition argument, bounding from above and below.

Uniqueness: If \(\mu'\) is another Radon measure with \(\int f\, d\mu' = I(f)\) for all \(f \in C_c(X)\), then \(\mu\) and \(\mu'\) agree on open sets (by inner regularity + definition of \(\mu^0\)), hence on \({\mathcal B}(X)\) by outer regularity.

This is one of the deepest results connecting analysis and measure theory. It says that every “reasonable” way of assigning numbers to continuous functions (i.e., every positive linear functional) secretly comes from integration against a measure. The theorem is the foundation of the theory of distributions, harmonic analysis on locally compact groups, and probability theory on metric spaces.

Corollary. Every locally finite measure on a \(\sigma\)-compact locally compact metric space is a Radon measure.

This corollary tells us that in the \(\sigma\)-compact setting, the regularity properties of Radon measures come for free — any locally finite Borel measure automatically has them.

Corollary. \(C_c(X)/{\sim_\mu}\) is dense in \(L^p(\mu)\) for \(1 \le p < \infty\) when \(\mu\) is a Radon measure on a \(\sigma\)-compact locally compact metric space.

This density result is essential for approximation arguments throughout analysis. It says that any \(L^p\) function can be approximated arbitrarily well by continuous functions with compact support — a fact that will be crucial in the proof of the Lebesgue differentiation theorem below.

Corollary. The \(d\)-dimensional Lebesgue measure \(\lambda_d : {\mathcal B}({\mathbb R}^d) \to [0,\infty]\) is inner and outer regular, hence a Radon measure.

Part IX: Differentiation of Measures

The Lebesgue differentiation theorem — the measure-theoretic generalization of the fundamental theorem of calculus — is one of the crown jewels of real analysis. Its proof rests on the Hardy-Littlewood maximal inequality, a deep result about averaging operators.

The Hardy-Littlewood Maximal Function

For \(x \in {\mathbb R}^d\) and \(r > 0\), let \(B_r(x) = \{y \in {\mathbb R}^d : \|x - y\|_2 < r\}\). Define the averaging operator \(A_r f(x) = \frac{1}{\lambda_d(B_r(x))} \int_{B_r(x)} f(y)\, dy\) for \(f \in L_{\mathrm{loc}}(\lambda_d)\) (locally integrable).

The averaging operator computes the mean value of \(f\) over a ball of radius \(r\) centered at \(x\). As \(r \to 0\), we expect this average to converge to \(f(x)\) — at least for “nice” functions. The Hardy-Littlewood maximal function captures the worst-case behavior of these averages.

\[ Hf(x) = \sup_{r > 0} A_r|f|(x). \]

Since \(A_r|f|\) is measurable for each \(r\) (as the supremum can be taken over rational \(r\)), \(Hf\) is Borel measurable.

The maximal function \(Hf(x)\) records the largest possible average of \(|f|\) over any ball centered at \(x\). It is always at least as large as \(|f(x)|\) (by the Lebesgue differentiation theorem, which we are about to prove), and it controls how “concentrated” the mass of \(f\) can be near any given point.

Lemma (Covering). Let \({\mathcal C}\) be a collection of Euclidean balls in \({\mathbb R}^d\) with \(U = \bigcup_{B \in {\mathcal C}} B\). For any \(0 < c < \lambda_d(U)\), there exist finitely many disjoint balls \(B_1, \ldots, B_n \in {\mathcal C}\) with \(B_i \cap B_j = \emptyset\) for \(i \ne j\) and \(3^d \sum \lambda_d(B_i) \ge c\).

Proof sketch. Since \(c < \lambda_d(U)\) and \(U\) is open, find a compact \(K \subseteq U\) with \(\lambda_d(K) > c\). Cover \(K\) by finitely many balls from \({\mathcal C}\). Greedily select: take the largest ball \(B_1\), discard all balls meeting \(B_1\), take the largest remaining \(B_2\), etc. Each discarded ball \(B'\) (with \(B' \cap B_i \ne \emptyset\)) satisfies \(B' \subseteq 3B_i\) (the ball with same center but \(3 \times\) radius). So \(K \subseteq \bigcup 3B_i\), giving \(c < \lambda_d(K) \le \sum \lambda_d(3B_i) = 3^d \sum \lambda_d(B_i)\).

The covering lemma is a purely geometric result — it extracts a “well-separated” subcollection of balls that still captures a definite fraction of the total volume. The constant \(3^d\) arises because each selected ball, when tripled in radius, swallows all the balls it displaced during the greedy selection.

\[ \lambda_d\!\left(\{Hf > \alpha\}\right) \le \frac{3^d}{\alpha} \int_{\mathbb{R}^d} |f|\, d\lambda_d. \]

Proof. Let \(E_\alpha = \{x : Hf(x) > \alpha\}\). For each \(x \in E_\alpha\), there exists \(r_x > 0\) with \(A_{r_x}|f|(x) > \alpha\), i.e., \(\int_{B_{r_x}(x)} |f| > \alpha \cdot \lambda_d(B_{r_x}(x))\). The collection \({\mathcal C} = \{B_{r_x}(x)\}_{x \in E_\alpha}\) covers \(E_\alpha\). For any \(c < \lambda_d(E_\alpha)\), the covering lemma provides disjoint \(B_1, \ldots, B_n\) with \(c < 3^d \sum \lambda_d(B_i) \le \frac{3^d}{\alpha} \sum \int_{B_i} |f| \le \frac{3^d}{\alpha} \int |f|\). Since \(c\) was arbitrary, the result follows.

The maximal inequality is a “weak type (1,1)” estimate: it controls the size of the set where \(Hf\) is large, even though \(Hf\) itself may not be integrable. The constant \(3^d\) is not sharp, but the qualitative statement is all we need for the differentiation theorem.

The Lebesgue Differentiation Theorem

\[ \lim_{r \to 0^+} A_r f(x) = f(x) \quad \text{for } \lambda_d\text{-a.e. } x \in \mathbb{R}^d. \]

In other words, the average value of \(f\) over a ball centered at \(x\) converges to \(f(x)\) as the ball shrinks — for almost every \(x\).

Proof. It suffices to prove the result for \(f \in L^1(\lambda_d)\) (the general case follows by localizing to balls \(B_N(0)\) and using \(f \cdot \mathbf{1}_{B_N} \in L^1\)).

\[ |A_r f(x) - f(x)| \le A_r|f - h|(x) + |A_r h(x) - h(x)| + |h(x) - f(x)|. \]\[ \lambda_d(\{H(f-h) > \alpha\}) \le \frac{3^d}{\alpha} \|f - h\|_1 < \frac{3^d \varepsilon}{\alpha}. \]

The second term \(\to 0\) for all \(x\). So \(\limsup |A_r f(x) - f(x)| \le H(f-h)(x) + |h(x) - f(x)|\). The set where this exceeds \(2\alpha\) has measure at most \(3^d \varepsilon / \alpha + \|f - h\|_1 / \alpha < (3^d + 1)\varepsilon / \alpha\) (using Markov’s inequality for the second term). Since \(\varepsilon\) is arbitrary, the set where \(\limsup > 0\) has measure zero.

The Lebesgue differentiation theorem recovers \(f\) from its integral by a limiting averaging process — a perfect analogue of the fundamental theorem of calculus, which recovers \(f\) from its antiderivative \(F\) by differentiation: \(F'(x) = \lim (F(x+h) - F(x))/h = \lim (1/h) \int_x^{x+h} f(t)\, dt = f(x)\) a.e. The multidimensional version replaces one-sided limits of difference quotients with symmetric averages over shrinking balls.

\[ \lambda_d(f^{-1}((\alpha,\infty])) \le \frac{1}{\alpha} \int_{f^{-1}((\alpha,\infty])} |f|\, d\lambda_d \le \frac{1}{\alpha} \int |f|\, d\lambda_d. \]

This is an immediate consequence of the definition of the integral applied to the set where \(|f| > \alpha\).

Summary of Major Theorems

Theorem	Statement	Where Proved
Carathéodory Extension	Pre-measure on algebra extends to sigma-algebra; unique if sigma-finite	Lectures 5–6
Dynkin’s π-λ Theorem	A λ-system containing a π-system contains the generated sigma-algebra	Lecture 5, used throughout
Monotone Convergence (MCT)	\(f_n \nearrow f \Rightarrow \int f_n \nearrow \int f\)	Lecture 10
Fatou’s Lemma	\(\int \liminf f_n \le \liminf \int f_n\)	Lecture 12
LDCT	Dominated pointwise convergence implies L¹ convergence	Lecture 12
Hölder’s Inequality	(\int	fg
Minkowski’s Inequality	\(\\|f+g\\|_p \le \\|f\\|_p + \\|g\\|_p\)	Lecture 13
Riesz-Fischer	\(L^p(\mu)\) is a Banach space for \(1 \le p \le \infty\)	Lecture 14
Riesz Representation (L²)	Every bounded linear functional on \(L^2\) is an inner product	Lecture 15
Radon-Nikodym	\(\nu \ll \mu\) (finite) \Rightarrow) \(d\nu = h\, d\mu\) for some density \(h\)	Lectures 16–17
Lebesgue Decomposition	Any finite \(\nu = \nu_1 + \nu_2\) with \(\nu_1 \ll \mu\), \(\nu_2 \perp \mu\)	Lecture 17
Hahn Decomposition	Signed measure space decomposes into positive and negative sets	After Lecture 15
Egoroff’s Theorem	a.e. convergence implies almost uniform convergence (finite measure)	After Lecture 14
Lᵖ Duality	\((L^p)^* \cong L^q\) isometrically for \(1 < p < \infty\)	After Lecture 15
Tonelli	Non-negative function: iterated integrals equal product integral	Lectures 19–20
Fubini	\(L^1\) function: iterated integrals exist, equal, and equal product integral	Lecture 20
Riesz-Markov-Kakutani	Positive linear functional on \(C_c(X)\) = integration against a Radon measure	Part VIII
Hardy-Littlewood Maximal	\(\lambda_d(\{Hf > \alpha\}) \le (3^d/\alpha)\\|f\\|_1\)	Part IX
Lebesgue Differentiation	\(\lim_{r \to 0} A_r f(x) = f(x)\) for a.e. \(x\)	Part IX