PMATH 351: Real Analysis
Stephen New
Estimated study time: 5 hr 56 min
Table of contents
These notes synthesize material from multiple sources: Stephen New’s Fall 2018 lecture notes (primary), Kenneth Davidson’s PMATH 351 instructor notes, Brian Forrest’s PMATH 351 instructor notes, Laurent Marcoux’s PMATH 351 instructor notes, and student notes by Felix Zhou (Fall 2019, Davidson’s section), Richard Wu (Fall 2018, Forrest’s section), and David Duan. All sources are credited below.
Chapter 1: Cardinality
The first chapter lays the foundational set-theoretic framework upon which the rest of real analysis is built. Before we can discuss metric spaces and topology, we need a precise understanding of the “size” of sets — finite, countable, and uncountable. These distinctions will later be essential for understanding separability, the structure of the real line, and the Baire Category Theorem.
1.1 Functions, Injections, and Surjections
We begin by recalling the basic vocabulary of functions, which will be used throughout the course.
- \(f^{-1}(B_1 \cup B_2) = f^{-1}(B_1) \cup f^{-1}(B_2)\) and \(f^{-1}(B_1 \cap B_2) = f^{-1}(B_1) \cap f^{-1}(B_2)\).
- \(f^{-1}(B^c) = (f^{-1}(B))^c\).
- \(f(A_1 \cup A_2) = f(A_1) \cup f(A_2)\), but in general \(f(A_1 \cap A_2) \subseteq f(A_1) \cap f(A_2)\) with equality when \(f\) is injective.
- \(A \subseteq f^{-1}(f(A))\) with equality when \(f\) is injective; \(f(f^{-1}(B)) \subseteq B\) with equality when \(f\) is surjective.
These properties are fundamental: property (1) and (2) mean that preimages commute perfectly with all set operations. This is precisely why the topological characterization of continuity (preimages of open sets are open) is so natural.
- \(f\) is injective (one-to-one) if for all \(x_1, x_2 \in X\), \(f(x_1) = f(x_2)\) implies \(x_1 = x_2\).
- \(f\) is surjective (onto) if \(f(X) = Y\), i.e., every element of \(Y\) is hit.
- \(f\) is bijective (invertible) if it is both injective and surjective.
When \(f\) is bijective, we define its inverse \(f^{-1}: Y \to X\) by letting \(f^{-1}(y)\) be the unique \(x \in X\) with \(f(x) = y\). The fundamental algebraic fact about composition is that it preserves injectivity and surjectivity.
- If \(f\) and \(g\) are both injective, then \(g \circ f\) is injective.
- If \(f\) and \(g\) are both surjective, then \(g \circ f\) is surjective.
- If \(f\) and \(g\) are both bijective, then \(g \circ f\) is bijective with \((g \circ f)^{-1} = f^{-1} \circ g^{-1}\).
The relationship between injectivity/surjectivity and the existence of one-sided inverses is fundamental to both algebra and analysis:
- \(f\) is injective if and only if \(f\) has a left inverse (a function \(g: Y \to X\) with \(g \circ f = I_X\)).
- \(f\) is surjective if and only if \(f\) has a right inverse (a function \(h: Y \to X\) with \(f \circ h = I_Y\)).
- \(f\) is bijective if and only if \(f\) has both a left inverse \(g\) and a right inverse \(h\), in which case \(g = h = f^{-1}\).
Note that the proof that every surjection has a right inverse requires the Axiom of Choice — we must simultaneously choose, for each \(y \in Y\), a preimage \(x_y \in X\).
1.2 Cardinality and Ordering
With injections and bijections in hand, we can now compare the “sizes” of sets.
- We write \(|A| = |B|\) when there exists a bijection \(f: A \to B\).
- We write \(|A| \le |B|\) when there exists an injection \(f: A \to B\).
- We write \(|A| < |B|\) when \(|A| \le |B|\) and \(|A| \ne |B|\).
The relation \(|A| = |B|\) is an equivalence relation: it is reflexive (via the identity), symmetric (via the inverse), and transitive (via composition). The relation \(|A| \le |B|\) is reflexive and transitive — but showing it is antisymmetric requires a deep theorem.
A crucial characterization of infinite sets connects injectivity and surjectivity:
- \(A\) is infinite.
- \(A\) contains a countable subset.
- \(|\mathbb{N}| \le |A|\).
- There exists a map \(f: A \to A\) which is injective but not surjective.
The implication (4) \(\Rightarrow\) (1) is proved by its contrapositive: every injective map from a finite set to itself is surjective (the pigeonhole principle, proved by induction on \(|A|\)).
1.3 The Cantor–Schroeder–Bernstein Theorem
The antisymmetry of cardinality comparison is far from obvious. It is resolved by the following landmark result.
This theorem is remarkable because it does not require the Axiom of Choice — the proof is entirely constructive. It allows us to establish equalities of cardinality by exhibiting injections in both directions, which is often much easier than constructing an explicit bijection.
1.4 Countable and Uncountable Sets
- If \(A\) and \(B\) are countable, then so is \(A \times B\).
- If \(A\) and \(B\) are countable, then so is \(A \cup B\).
- A countable union of countable sets is countable.
- \(\mathbb{Q}\) is countable.
Part (4) follows because the map \(g: \mathbb{Q} \to \mathbb{Z} \times \mathbb{Z}\) sending each fraction in lowest terms \(\frac{a}{b}\) to the pair \((a,b)\) is injective, and \(\mathbb{Z} \times \mathbb{Z}\) is countable.
Now for the other direction — uncountability. Cantor’s argument is one of the most beautiful in all of mathematics.
In fact, one can show that \(|\mathbb{R}| = |2^{\mathbb{N}}| = |\mathcal{P}(\mathbb{N})|\) using the Cantor–Schroeder–Bernstein theorem. We write \(|\mathbb{R}| = \mathfrak{c}\) (the cardinality of the continuum) and \(|\mathbb{N}| = \aleph_0\). Cantor’s theorem yields an infinite hierarchy of infinite cardinalities: \(\aleph_0 < 2^{\aleph_0} < 2^{2^{\aleph_0}} < \cdots\). There is no “largest” cardinal number.
where \(C\) is the Cantor set. The algebraic numbers form a countable set (there are only countably many integer polynomials, each with finitely many roots), so the transcendental numbers are uncountable — indeed, they have cardinality \(\mathfrak{c}\). This is remarkable: transcendental numbers are “almost all” reals, yet explicitly constructing them (like \(e\) and \(\pi\)) requires significant effort.
1.5 The Axiom of Choice and Zorn’s Lemma
Several key results in this course (existence of right inverses for surjections, comparability of cardinals, bases for vector spaces) require the Axiom of Choice or one of its equivalents.
The Axiom of Choice is intuitively obvious for finite collections and even for countable collections (where one can make choices sequentially), but it is an independent axiom for uncountable collections. It was shown to be consistent with ZF by Gödel (1938) and independent of ZF by Cohen (1963).
- Non-measurable sets: The Vitali construction uses AC to produce a subset of \([0,1]\) that is not Lebesgue measurable. Partition \([0,1)\) by the equivalence relation \(x \sim y\) iff \(x - y \in \mathbb{Q}\). Use AC to choose one representative from each equivalence class, forming a set \(V\). If \(V\) were measurable, then \([0,1) = \bigcup_{q \in \mathbb{Q} \cap [0,1)} (V + q \mod 1)\) would give \(1 = \sum m(V)\), which is impossible since either \(m(V) = 0\) (giving sum 0) or \(m(V) > 0\) (giving sum \(\infty\)).
- Banach--Tarski paradox: AC implies that a solid ball in \(\mathbb{R}^3\) can be decomposed into finitely many pieces and reassembled (via rotations and translations) into two solid balls of the same size. The pieces are necessarily non-measurable.
- Hamel bases: AC guarantees that \(\mathbb{R}\) has a basis as a \(\mathbb{Q}\)-vector space, although no explicit Hamel basis can be constructed. The existence of a Hamel basis implies the existence of discontinuous additive functions \(f: \mathbb{R} \to \mathbb{R}\) satisfying \(f(x+y) = f(x) + f(y)\) --- the graph of such a function is dense in \(\mathbb{R}^2\).
1.5a Well-Ordered Sets, Transfinite Induction, and Ordinal Numbers
The Axiom of Choice is most naturally understood through the lens of well-ordered sets — posets in which every nonempty subset has a minimum element. The ordinary induction principle on \(\mathbb{N}\) generalizes to any well-ordered set, and the resulting framework of transfinite induction is one of the most powerful tools in set theory and logic.
Every well-ordered set is automatically totally ordered: given \(x, y \in X\), the subset \(\{x, y\}\) has a minimum element, which must be \(\le\) the other. The natural numbers \(\mathbb{N}\) under the usual order are the archetypal well-ordered set; the reals \(\mathbb{R}\) are not well-ordered (the open interval \((0,1)\) has no minimum element).
A slightly exotic example illuminates the concept: let \(\omega + 7 = \{1, 2, 3, \ldots\} \cup \{\omega, \omega+1, \omega+2, \ldots, \omega+6\}\), with the ordering \(n \le \omega + k\) for all \(n \in \mathbb{N}\) and \(0 \le k \le 6\), and \(\omega + i \le \omega + j\) when \(i \le j\). This set is well-ordered: it looks like the natural numbers followed by seven extra elements “after infinity.” The element \(\omega\) has no immediate predecessor — no largest natural number precedes it — yet every nonempty subset still has a minimum.
Then \(S = X\).
Proof. If \(S \ne X\), let \(x_0 = \min(X \setminus S)\), which exists since \(X \setminus S\) is nonempty and \(X\) is well-ordered. By definition of \(x_0\), every \(y < x_0\) belongs to \(S\), i.e., \(\{y : y < x_0\} \subseteq S\). By hypothesis, \(x_0 \in S\), a contradiction. Hence \(S = X\). \(\square\)
This generalizes ordinary induction: when \(X = \mathbb{N}\), the condition \(\{y < n\} \subseteq S \Rightarrow n \in S\) is exactly “if \(S(0), S(1), \ldots, S(n-1)\) hold then \(S(n)\) holds,” which is strong induction. When \(X\) contains limit ordinals (elements with no immediate predecessor, like \(\omega\)), transfinite induction handles them naturally via the hypothesis: if \(S\) contains all elements before a limit ordinal \(\lambda\), then \(S\) contains \(\lambda\) too.
Culture: Ordinal Numbers. The notion of well-ordered sets leads naturally to the concept of ordinal numbers, which measure the “order type” of well-ordered sets in the same way that cardinal numbers measure their size.
Two well-ordered sets \((A, \le_A)\) and \((B, \le_B)\) are called order-isomorphic if there exists an order-preserving bijection \(f: A \to B\) (i.e., \(a_1 \le_A a_2 \Rightarrow f(a_1) \le_B f(a_2)\)). An ordinal number is an equivalence class of well-ordered sets under order isomorphism.
The finite ordinals are just the natural numbers: \(0 = \emptyset\), \(1 = \{0\}\), \(2 = \{0,1\}\), and so on, with \(n = \{0, 1, \ldots, n-1\}\) for each \(n \in \mathbb{N}\). The first infinite ordinal is \(\omega = \{0, 1, 2, \ldots\}\), the order type of \(\mathbb{N}\). Beyond \(\omega\) lie \(\omega + 1, \omega + 2, \ldots\), then \(\omega + \omega = \omega \cdot 2\), then \(\omega^2, \omega^3, \ldots, \omega^\omega\), and so on through a transfinite hierarchy.
One striking feature is that ordinal addition is not commutative. If we prepend a single element to \(\mathbb{N}\), we get a set order-isomorphic to \(\mathbb{N}\) itself (just shift all indices), so \(1 + \omega = \omega\). But \(\omega + 1\) consists of \(\mathbb{N}\) followed by one extra element after all natural numbers; this has a maximum element while \(\omega\) does not, so \(\omega + 1 \ne \omega\). Thus \(1 + \omega = \omega \ne \omega + 1\).
This non-commutativity contrasts with cardinal arithmetic, where infinite cardinals satisfy \(\kappa + \lambda = \lambda + \kappa = \max(\kappa, \lambda)\). The distinction reflects the fact that ordinals care about order, while cardinals care only about size.
1.5b Proof of the Equivalences: AC, Zorn’s Lemma, and the Well-Ordering Principle
The three statements AC, ZL, and WO stated in §1.5 are equivalent in Zermelo–Fraenkel set theory. We give the complete proof here, following Marcoux’s treatment. The argument is the most technical in Chapter 1, but it pays off: it shows how all three principles are facets of the same underlying idea — that “arbitrary choices can always be made simultaneously.”
Examples. For \((R, \le)\), the interval \((-\infty, r)\) is an initial segment. For \(\mathbb{N}\), the set \(\{1, 2, \ldots, n\}\) is the initial segment at \(n+1\).
- (AC) For any nonempty collection \(\{X_\lambda\}_{\lambda \in \Lambda}\) of nonempty sets, \(\prod_{\lambda \in \Lambda} X_\lambda \ne \emptyset\).
- (ZL) Let \((Y, \le)\) be a nonempty poset in which every chain has an upper bound. Then \(Y\) has a maximal element.
- (WO) Every nonempty set \(Z\) admits a well-ordering.
Proof.
(AC \(\Rightarrow\) ZL). Suppose every chain in \((X, \le)\) has an upper bound, but assume for contradiction that \(X\) has no maximal element. Then every chain \(C\) has a strict upper bound — an element strictly greater than every upper bound. By AC, we can choose one strict upper bound \(f(C)\) for each chain \(C\); if \(C = \emptyset\), choose any \(x_0 \in X\) and set \(f(\emptyset) = x_0\).
Call a subset \(A \subseteq X\) a \(P\)-set if (I) \((A, \le)\) is well-ordered, and (II) for every \(x \in A\), \(x = f(P(A, x))\) — each element of \(A\) is the chosen strict upper bound of its initial segment within \(A\).
Claim 1: If \(A\) and \(B\) are both \(P\)-sets and \(A \ne B\), then one is an initial segment of the other. (The proof compares minimum elements of \(A \setminus B\) and \(B \setminus A\), using the well-orderings and property (II) to establish the equality \(P(A,x) = P(B,y)\) at the key element, and concludes that one is an initial segment of the other.)
Let \(V = \bigcup\{A \subseteq X : A \text{ is a } P\text{-set}\}\).
Claim 2: \(V\) is itself a \(P\)-set. (Follows from Claim 1: the union of a “compatible” family of well-ordered sets is well-ordered, and the property (II) is inherited by the union.)
Now set \(w = f(V)\). Then \(V \cup \{w\}\) is a \(P\)-set strictly containing \(V\), so \(w \in V\), contradicting \(w = f(V)\) being a strict upper bound for \(V\). This contradiction shows that \(X\) must have had a maximal element.
(ZL \(\Rightarrow\) WO). Let \(X \ne \emptyset\). Let \(\mathcal{A}\) be the collection of pairs \((Y, \le_Y)\) where \(Y \subseteq X\) and \(\le_Y\) is a well-ordering of \(Y\). Partially order \(\mathcal{A}\) by: \((A, \le_A) \le (B, \le_B)\) if \(A\) is an initial segment of \(B\). Any chain in \(\mathcal{A}\) has an upper bound (the union with the induced ordering). By ZL, \(\mathcal{A}\) has a maximal element \((M, \le_M)\). If \(M \ne X\), pick \(x_0 \in X \setminus M\) and extend \(\le_M\) to a well-ordering of \(M \cup \{x_0\}\) by declaring \(x_0\) greater than everything in \(M\); this contradicts maximality of \((M, \le_M)\). Hence \(M = X\), giving a well-ordering of \(X\).
(WO \(\Rightarrow\) AC). Let \(\{X_\lambda\}_{\lambda \in \Lambda}\) be a nonempty collection of nonempty sets. By WO, each \(X_\lambda\) can be well-ordered, so it has a minimum element \(\min X_\lambda\). Define the choice function \(f(\lambda) = \min X_\lambda\). \(\square\)
The implication AC \(\Rightarrow\) ZL deserves a comment: the detailed verification of Claims 1 and 2 is intricate (it occupies several pages in Marcoux’s notes), but the core idea is clean — a maximal “coherent” well-ordered structure built by iterating the choice function must equal all of \(X\), and that yields a contradiction unless a maximal element exists.
1.6 Cardinal Arithmetic
Using the Cantor–Schroeder–Bernstein Theorem (Theorem 1.11) and the closure properties of countable sets (Theorem 1.12), we can develop a full arithmetic of cardinal numbers, extending ordinary arithmetic to infinite quantities.
The key laws include commutativity and associativity of addition and multiplication, distributivity, and the exponential laws:
\[ \kappa^{\lambda + \mu} = \kappa^\lambda \cdot \kappa^\mu, \quad (\kappa^\lambda)^\mu = \kappa^{\lambda \cdot \mu}, \quad (\kappa \cdot \lambda)^\mu = \kappa^\mu \cdot \lambda^\mu. \]For infinite cardinals, the arithmetic simplifies dramatically: if \(\kappa\) is an infinite cardinal, then \(\kappa + \kappa = \kappa \cdot \kappa = \kappa\). Also, \(2^{\aleph_0} = \mathfrak{c}\) and \(\mathfrak{c}^{\aleph_0} = \mathfrak{c}\). The Continuum Hypothesis posits that there is no cardinal strictly between \(\aleph_0\) and \(\mathfrak{c}\); this is independent of ZFC (proved by Gödel and Cohen).
- \(|\mathbb{R}| = |2^{\mathbb{N}}| = |\mathcal{P}(\mathbb{N})| = \mathfrak{c}\).
- \(|\mathbb{R}^n| = \mathfrak{c}\) for every \(n \ge 1\), and \(|\mathbb{R}^{\mathbb{N}}| = \mathfrak{c}\).
- The set of all continuous functions \(f: \mathbb{R} \to \mathbb{R}\) has cardinality \(\mathfrak{c}\), since a continuous function is determined by its values on \(\mathbb{Q}\).
- The set of all functions \(f: \mathbb{R} \to \mathbb{R}\) has cardinality \(2^{\mathfrak{c}} > \mathfrak{c}\).
- Since \(|C(\mathbb{R}, \mathbb{R})| = \mathfrak{c}\) but \(|\mathbb{R}^{\mathbb{R}}| = 2^{\mathfrak{c}} > \mathfrak{c}\), "most" functions \(\mathbb{R} \to \mathbb{R}\) are discontinuous everywhere. The continuous functions form a negligible minority from the cardinality standpoint.
- The set of Borel sets in \(\mathbb{R}\) has cardinality \(\mathfrak{c}\) (a consequence of the transfinite construction of the Borel \(\sigma\)-algebra), while the set of Lebesgue measurable sets has cardinality \(2^{\mathfrak{c}}\). Thus "most" Lebesgue measurable sets are not Borel sets.
- Every separable metric space has cardinality at most \(\mathfrak{c}\). Since the spaces \(C[0,1]\), \(\ell^p\) (for \(p < \infty\)), and \(\mathbb{R}^n\) are all separable and have cardinality exactly \(\mathfrak{c}\), they are "as large as possible" for separable spaces.
Chapter 2: Metric Spaces
Having established the set-theoretic foundations in Chapter 1, we now introduce the central object of study in this course: the metric space. A metric space is a set equipped with a distance function that generalizes the familiar notion of distance in \(\mathbb{R}^n\). This abstraction allows us to develop a unified theory of convergence, continuity, and completeness that applies far beyond Euclidean space.
2.1 Inner Product Spaces and Normed Vector Spaces
The most natural and important source of metric spaces is normed vector spaces, which themselves often arise from inner product spaces. We begin with these algebraic structures before abstracting to the purely topological setting.
- Sesquilinearity: \(\langle u + v, w\rangle = \langle u, w\rangle + \langle v, w\rangle\), \(\langle tu, v\rangle = t\langle u, v\rangle\), and \(\langle u, tv\rangle = \bar{t}\langle u, v\rangle\).
- Conjugate symmetry: \(\langle u, v\rangle = \overline{\langle v, u\rangle}\).
- Positive definiteness: \(\langle u, u\rangle \ge 0\) with equality iff \(u = 0\).
- Positive definiteness: \(\|v\| = 0 \iff v = 0\).
- Positive homogeneity: \(\|\lambda v\| = |\lambda| \|v\|\) for all \(\lambda \in \mathbb{F}\), \(v \in V\).
- Triangle inequality: \(\|u + v\| \le \|u\| + \|v\|\) for all \(u, v \in V\).
Every inner product space is a normed vector space via \(\|x\| = \langle x, x\rangle^{1/2}\). The triangle inequality follows from the Cauchy–Schwarz inequality.
with equality if and only if \(\{u, v\}\) is linearly dependent.
giving \(|\langle u, v\rangle|^2 < \|u\|^2 \|v\|^2\).
The Cauchy–Schwarz inequality yields the triangle inequality: \(\|u+v\|^2 = \|u\|^2 + 2\operatorname{Re}\langle u,v\rangle + \|v\|^2 \le (\|u\| + \|v\|)^2\). A useful identity relating norms and inner products is the polarization identity: in a real inner product space, \(\langle u, v\rangle = \frac{1}{4}(\|u+v\|^2 - \|u-v\|^2)\). The parallelogram law \(\|u+v\|^2 + \|u-v\|^2 = 2\|u\|^2 + 2\|v\|^2\) characterizes norms coming from inner products: a norm satisfies the parallelogram law if and only if it is induced by an inner product (the polarization identity then defines the inner product).
The \(\infty\)-norm is \(\|x\|_\infty = \max_{1 \le i \le n} |x_i|\). The \(p = 2\) case gives the Euclidean norm, which is induced by the standard inner product \(\langle x, y\rangle = \sum_{i=1}^n x_i y_i\). Only \(p = 2\) satisfies the parallelogram law; hence the \(p\)-norms for \(p \ne 2\) do not come from any inner product.
- The supremum norm: \(\|f\|_\infty = \sup_{x \in X} |f(x)|\) (attained by the Extreme Value Theorem).
- The \(L^p\) norms (for \([a,b]\)): \(\|f\|_p = \left(\int_a^b |f(x)|^p\, dx\right)^{1/p}\).
- The \(L^2\) inner product: \(\langle f, g\rangle = \int_a^b f(x) g(x)\, dx\), which induces \(\|f\|_2\).
That \(\|\cdot\|_p\) satisfies the triangle inequality for \(1 < p < \infty\) is the content of Minkowski’s inequality, which in turn depends on Hölder’s inequality.
with equality if and only if \(f\) and \(g\) are proportional.
which follows from the concavity of the logarithm: \(\log(ab) = \log a + \log b \le \log(a^p/p + b^q/q)\) by the weighted AM-GM inequality. Alternatively, since \(e^t\) is convex, \(e^{t/p + s/q} \le e^t/p + e^s/q\).
Applying Young’s inequality with \(a = |x_i|/\|x\|_p\) and \(b = |y_i|/\|y\|_q\):
\[ \frac{|x_i y_i|}{\|x\|_p \|y\|_q} \le \frac{|x_i|^p}{p\|x\|_p^p} + \frac{|y_i|^q}{q\|y\|_q^q}. \]Summing over \(i\): \(\sum |x_i y_i| / (\|x\|_p \|y\|_q) \le 1/p + 1/q = 1\), giving Hölder’s inequality.
Summing and applying Hölder’s inequality with exponents \(p\) and \(q = p/(p-1)\):
\[ \|x + y\|_p^p \le \|x\|_p \cdot \left(\sum |x_i + y_i|^{(p-1)q}\right)^{1/q} + \|y\|_p \cdot \left(\sum |x_i + y_i|^{(p-1)q}\right)^{1/q}. \]Since \((p-1)q = p\), the right side is \((\|x\|_p + \|y\|_p) \|x+y\|_p^{p/q}\). Dividing both sides by \(\|x+y\|_p^{p/q}\) (assuming \(x+y \ne 0\)) gives \(\|x+y\|_p^{p - p/q} = \|x+y\|_p \le \|x\|_p + \|y\|_p\).
2.2 Metric Spaces — Definition and Examples
- Positive definiteness: \(d(x,y) = 0 \iff x = y\).
- Symmetry: \(d(x,y) = d(y,x)\).
- Triangle inequality: \(d(x,z) \le d(x,y) + d(y,z)\).
A useful consequence is the reverse triangle inequality: \(|d(x,y) - d(y,z)| \le d(x,z)\).
Every normed vector space becomes a metric space via \(d(x,y) = \|x - y\|\), and every inner product space is a normed vector space via \(\|x\| = \langle x,x\rangle^{1/2}\). Every subset of a metric space inherits the induced metric.
The relationship between these structures is strict: every inner product space is a normed space (via \(\|x\| = \langle x,x \rangle^{1/2}\)), and every normed space is a metric space (via \(d(x,y) = \|x-y\|\)), but neither inclusion reverses. A norm arises from an inner product if and only if it satisfies the parallelogram law (Theorem 2.7); and a metric arises from a norm if and only if it is translation-invariant (\(d(x+z, y+z) = d(x,y)\)) and homogeneous (\(d(\alpha x, \alpha y) = |\alpha| d(x,y)\)).
Equivalently, \(d_H(A,B) = \inf\{r \ge 0 : A \subseteq B_r \text{ and } B \subseteq A_r\}\) where \(A_r = \{x : d(x,A) \le r\}\). The triangle inequality is verified by fixing \(a \in A\) and any \(b \in B\) and chaining: \(d(a,C) \le \|a - b\| + d(b,C) \le \|a - b\| + d_H(B,C)\), then taking the infimum over \(b\) and the supremum over \(a\).
In this metric, the sequence \(p^n \to 0\) as \(n \to \infty\). Ultrametric spaces have the striking property that every triangle is isosceles and every point inside a ball is a center.
- Every triangle is isosceles: if \(d(x,y) \ne d(y,z)\), then \(d(x,z) = \max\{d(x,y), d(y,z)\}\).
- Every point inside a ball is a center: if \(b \in B(a,r)\), then \(B(a,r) = B(b,r)\).
- Any two balls are either disjoint or one contains the other.
- Every open ball is also closed, and every closed ball is also open.
Other equivalent choices include \(D_1 = d + \rho\) and \(D_2 = (d^2 + \rho^2)^{1/2}\). In any of these metrics, a sequence \((x_n, y_n) \to (x_0, y_0)\) if and only if \(x_n \to x_0\) and \(y_n \to y_0\). The equivalence follows from the inequalities
\[ D_\infty \le D_2 \le D_1 \le 2 D_\infty, \]which hold because \(\max\{a,b\} \le (a^2 + b^2)^{1/2} \le a + b \le 2\max\{a,b\}\) for \(a, b \ge 0\).
This metrizes the product topology: a sequence converges iff it converges in each coordinate. When all \(X_n = \{0,1\}\) with the discrete metric, the product \(\{0,1\}^{\mathbb{N}} = 2^{\mathbb{N}}\) is homeomorphic to the Cantor set.
2.3 Open and Closed Balls, Equivalent Metrics
- The open ball is \(B(a, r) = \{x \in X \mid d(x, a) < r\}\).
- The closed ball is \(\overline{B}(a, r) = \{x \in X \mid d(x, a) \le r\}\).
- The punctured ball is \(B^*(a, r) = \{x \in X \mid 0 < d(x, a) < r\}\).
Consequently, they induce the same topology, and in particular have the same open sets, convergent sequences, and continuous functions.
A key example: on \((0,1)\), the usual metric \(d(x,y) = |x-y|\) and \(\rho(x,y) = |1/x - 1/y|\) are topologically equivalent (both induce the standard topology on \((0,1)\)) but not metrically equivalent. Under \(d\), the sequence \(x_n = 1/n\) is Cauchy; under \(\rho\), it is not (\(\rho(1/n, 1/m) = |n - m|\)). Conversely, \((0,1)\) is incomplete under \(d\) but \(((0,1), \rho)\) is isometric to \((1, \infty)\) via \(x \mapsto 1/x\), which is complete. This illustrates that completeness depends on the metric, not just the topology.
Another important pair: the standard metric \(d\) and the French railway metric (go to the origin, then to the target) on \(\mathbb{R}^2\) are topologically equivalent but not metrically equivalent.
Chapter 3: Topology of Metric Spaces
Having defined metric spaces and their metrics in Chapter 2, we now develop the topological language — open sets, closed sets, interior, closure, density, limit points, and boundary — that allows us to discuss continuity and convergence in full generality. The key insight of this chapter is that many properties depend not on the specific metric, but only on the collection of open sets it determines (the topology). This observation leads to the broader framework of topological spaces, though we remain in the metric setting.
3.1 Open and Closed Sets
- \(\emptyset\) and \(X\) are open.
- Any union of open sets is open.
- Any finite intersection of open sets is open.
The corresponding properties for closed sets are dual: \(\emptyset\) and \(X\) are closed, any intersection of closed sets is closed, and any finite union of closed sets is closed. A set that is both open and closed is called clopen.
3.2 Interior, Closure, and Density
- The interior of \(A\) is \(A^\circ = \bigcup\{U \subseteq X \mid U \text{ is open}, U \subseteq A\}\) --- the largest open subset of \(A\).
- The closure of \(A\) is \(\overline{A} = \bigcap\{K \subseteq X \mid K \text{ is closed}, A \subseteq K\}\) --- the smallest closed superset of \(A\).
- A point \(a \in X\) is a limit point (or accumulation point) of \(A\) if \(B^*(a, r) \cap A \ne \emptyset\) for every \(r > 0\). Write \(A'\) for the set of limit points.
- An interior point of \(A\) is a point \(a \in A\) with \(B(a,r) \subseteq A\) for some \(r > 0\). An isolated point of \(A\) is \(a \in A \setminus A'\).
- The boundary is \(\partial A = \overline{A} \setminus A^\circ\). A point \(a \in \partial A\) iff every ball \(B(a,r)\) meets both \(A\) and \(A^c\).
- \(A\) is dense in \(X\) when \(\overline{A} = X\), equivalently when every open ball meets \(A\).
- \(A^\circ = \{a \in A : a \text{ is an interior point}\}\).
- \(\overline{A} = A \cup A'\).
- \(A\) is open iff \(A = A^\circ\); \(A\) is closed iff \(A = \overline{A}\) iff \(A' \subseteq A\).
- \((A^\circ)^\circ = A^\circ\) and \(\overline{\overline{A}} = \overline{A}\) (idempotence).
- \(\overline{A^c} = (A^\circ)^c\) and \((A^c)^\circ = (\overline{A})^c\) (duality between interior and closure via complements).
- \(x \in A^\circ\) if and only if there exists \(r > 0\) with \(B(x,r) \subseteq A\). Equivalently, \(x \in A^\circ\) if and only if every sequence converging to \(x\) is eventually in \(A\).
- \(x \in \partial A\) if and only if every ball \(B(x,r)\) contains points of both \(A\) and \(A^c\). Equivalently, there exist sequences \((a_n)\) in \(A\) and \((b_n)\) in \(A^c\) both converging to \(x\).
- The set \(A\) is dense and open if and only if it is open and its complement has empty interior.
Then \(I_x\) is an open interval contained in \(U\) (if \(y \in I_x\), then the entire interval between \(x\) and \(y\) lies in \(U\), so \(y \in U\)). For \(x, y \in U\), either \(I_x = I_y\) or \(I_x \cap I_y = \emptyset\): if \(z \in I_x \cap I_y\), then the interval from \(x\) to \(y\) (passing through \(z\)) lies in both \(I_x\) and \(I_y\), forcing \(I_x \cup I_y\) to be an interval contained in both \(I_x\) and \(I_y\) by maximality. Thus the distinct intervals \(\{I_x : x \in U\}\) are pairwise disjoint. Since each contains a rational number (density of \(\mathbb{Q}\)), and distinct intervals contain distinct rationals, the collection is countable.
A remarkable consequence is that the Borel \(\sigma\)-algebra on \(\mathbb{R}\) is generated by open intervals: since every open set is a countable union of open intervals, and countable unions of generators remain in the \(\sigma\)-algebra, knowledge of open intervals suffices to determine all Borel sets.
3.3 Sequences in Metric Spaces
- Limits in metric spaces are unique.
- Every convergent sequence is bounded and Cauchy.
- Subsequences of convergent sequences converge to the same limit.
- If a Cauchy sequence has a convergent subsequence, then the full sequence converges.
- \(x_n \to a\) iff \(d(x_n, a) \to 0\) in \(\mathbb{R}\), iff for every open set \(U \ni a\), eventually \(x_n \in U\).
- \(a \in A'\) iff there is a sequence in \(A \setminus \{a\}\) converging to \(a\).
- \(a \in \overline{A}\) iff there is a sequence in \(A\) converging to \(a\).
- \(A\) is closed iff every convergent sequence in \(A\) has its limit in \(A\).
These always exist in \([-\infty, +\infty]\) and satisfy \(\liminf x_n \le \limsup x_n\), with equality iff the limit exists. The \(\limsup\) is the largest cluster point and the \(\liminf\) is the smallest cluster point of the sequence in \(\mathbb{R}\).
3.4 Subspace Topology
- \(A\) is open in \(P\) iff \(A = U \cap P\) for some open set \(U\) in \(X\).
- \(A\) is closed in \(P\) iff \(A = K \cap P\) for some closed set \(K\) in \(X\).
3.5 Urysohn’s Lemma for Metric Spaces
One of the powerful consequences of the distance function is the ability to separate closed sets by continuous functions.
This is well-defined since \(d(x, A) + d(x, B) > 0\) for all \(x\) (if both were 0, then \(x \in \overline{A} \cap \overline{B} = A \cap B = \emptyset\), a contradiction). The function \(f\) is continuous because \(d(\cdot, A)\) and \(d(\cdot, B)\) are Lipschitz continuous (Example 4.10) and the denominator is bounded away from 0. Clearly \(f(x) = 0\) when \(x \in A\) (since \(d(x,A) = 0\)) and \(f(x) = 1\) when \(x \in B\) (since \(d(x,B) = 0\)).
One checks that \(\tilde{f}|_A = f\), \(\tilde{f}\) is Lipschitz with the same constant \(L\), and \(\inf_A f \le \tilde{f} \le \sup_A f\) (with a small modification). For merely continuous functions, the proof is more involved and uses iterative approximation.
are both clopen in \(C\) (since \([0, 1/3]\) and \([2/3, 1]\) are both open and closed in \(C\)). This shows \(C\) has a separation at every scale, confirming total disconnectedness. More generally, if \(a \notin C\), then \(C \cap (-\infty, a)\) and \(C \cap (a, \infty)\) disconnect \(C\) into clopen halves.
Chapter 4: Continuity
Continuity is the single most important concept in analysis — it is the bridge that connects the topology of a metric space to its analytical properties. In this chapter we develop three equivalent characterizations of continuity — the \(\varepsilon\)-\(\delta\) definition, the sequential characterization, and the topological characterization — and introduce the stronger notions of uniform continuity and Lipschitz continuity.
4.1 Continuity and Its Characterizations
- If \(a\) is a limit point of \(X\) and \(b \in Y\), we say \(\lim_{x \to a} f(x) = b\) if for every \(\varepsilon > 0\) there exists \(\delta > 0\) such that \(0 < d_X(x,a) < \delta\) implies \(d_Y(f(x), b) < \varepsilon\).
- \(f\) is continuous at \(a\) if for every \(\varepsilon > 0\) there exists \(\delta > 0\) such that \(d_X(x,a) < \delta\) implies \(d_Y(f(x), f(a)) < \varepsilon\). If \(a\) is an isolated point, \(f\) is automatically continuous at \(a\).
- \(f\) is continuous (on \(X\)) if it is continuous at every point of \(X\).
This follows immediately from the open set characterization and the identity \(f^{-1}(C^c) = (f^{-1}(C))^c\). Together, Theorems 4.4 and 4.5 give us the topological characterization of continuity, which depends only on the topology (the collection of open sets) rather than on the specific metric. This is the characterization that generalizes to arbitrary topological spaces.
- Compactness: if \(X\) is compact and \(f: X \to Y\) is a homeomorphism, then \(Y\) is compact.
- Connectedness (and path-connectedness): preserved by continuous maps, hence by homeomorphisms.
- Separability: if \(D\) is a countable dense subset of \(X\), then \(f(D)\) is countable and dense in \(Y\).
- Second countability: a countable basis is mapped to a countable basis.
- The cardinality of the space: a bijection preserves cardinality.
Properties that are not topological invariants include: completeness (depends on the metric), boundedness (depends on the metric), total boundedness, and the specific value of the diameter.
- The map \(f(x) = \tan(x): (-\pi/2, \pi/2) \to \mathbb{R}\) is a homeomorphism. The spaces \((0,1)\) and \(\mathbb{R}\) are homeomorphic (via \(x \mapsto \tan(\pi x - \pi/2)\)). This shows that being bounded is not a topological invariant.
- The intervals \([0,1]\) and \((0,1)\) are not homeomorphic: removing the two endpoints from \([0,1]\) disconnects it, but removing any two points from \((0,1)\) may not. More precisely, \([0,1]\) is compact while \((0,1)\) is not, and compactness is a topological invariant.
- The circle \(S^1\) and the interval \([0,1]\) are not homeomorphic: removing any point from \(S^1\) leaves a connected space, but removing the midpoint from \([0,1]\) disconnects it.
- The map \(f: \mathbb{R} \to \mathbb{R}\) given by \(f(x) = x^3\) is a homeomorphism (continuous bijection with continuous inverse \(g(y) = y^{1/3}\)), but it is not Lipschitz at 0 as an inverse (since \(g'(0) = \infty\)).
- In any metric space \((X,d)\), the metric \(d: X \times X \to \mathbb{R}\) is continuous (in the product topology).
- In any normed space \((V, \|\cdot\|)\), the norm \(\|\cdot\|: V \to \mathbb{R}\) is Lipschitz continuous with constant 1: \(|\|x\| - \|y\|| \le \|x - y\|\).
- In any inner product space, the inner product \(\langle \cdot, \cdot \rangle: V \times V \to \mathbb{F}\) is continuous.
4.2 Uniform Continuity and Lipschitz Maps
- \(f\) is uniformly continuous if for every \(\varepsilon > 0\) there exists \(\delta > 0\) such that \(d_X(x,y) < \delta \implies d_Y(f(x), f(y)) < \varepsilon\) for all \(x, y \in X\).
- \(f\) is Lipschitz continuous with constant \(L \ge 0\) if \(d_Y(f(x), f(y)) \le L \cdot d_X(x,y)\) for all \(x, y \in X\).
- \(f\) is an isometry if \(d_Y(f(x), f(y)) = d_X(x,y)\) for all \(x, y \in X\).
- \(f\) is biLipschitz if there exist constants \(0 < c \le C < \infty\) such that \(c \cdot d_X(x,y) \le d_Y(f(x), f(y)) \le C \cdot d_X(x,y)\).
The implications are: isometry \(\Rightarrow\) biLipschitz \(\Rightarrow\) Lipschitz \(\Rightarrow\) uniformly continuous \(\Rightarrow\) continuous, and none of the converses hold in general. In summary:
\[ \text{isometry} \;\Longrightarrow\; \text{biLipschitz} \;\Longrightarrow\; \text{Lipschitz} \;\Longrightarrow\; \text{uniformly continuous} \;\Longrightarrow\; \text{continuous}. \]Each class strictly contains the next. An isometry preserves all metric properties (completeness, total boundedness, the Cauchy property). A biLipschitz map preserves these properties as well but may distort distances by a bounded multiplicative factor. A Lipschitz map may collapse distances (not injective) but controls how fast distances can grow. Uniform continuity ensures that the modulus of continuity is independent of the basepoint — crucial for extending functions to completions (Theorem 10.3). Plain continuity is a local condition that can behave badly at different scales.
Hölder spaces arise naturally in PDEs: the solution of Laplace’s equation with \(C^\alpha\) boundary data is \(C^\alpha\) up to the boundary (Schauder estimates). The Weierstrass function has Hölder exponent \(\alpha = -\log a/\log b\), quantifying its “roughness.” In probability, Brownian motion paths are almost surely Hölder-\(\alpha\) for all \(\alpha < 1/2\) but not for \(\alpha = 1/2\).
4.3 Continuity of Linear Maps
- \(F\) is Lipschitz continuous.
- \(F\) is continuous at some point.
- \(F\) is continuous at \(0\).
- \(F(B(0,1))\) is bounded.
It is the smallest Lipschitz constant for \(F\): \(\|F(u)\| \le \|F\| \cdot \|u\|\) for all \(u\).
To conclude that the full sequence converges: if \(f(x_n) \not\to f(x)\), there exists \(\varepsilon > 0\) and a subsequence with \(d(f(x_{n_k}), f(x)) \ge \varepsilon\). Extract a further convergent subsequence from \((f(x_{n_k}))\) (by compactness); by the argument above, its limit must be \(f(x)\), contradicting \(d(f(x_{n_k}), f(x)) \ge \varepsilon\).
4.4 Finite-Dimensional Normed Spaces
4.5 Uniform Convergence

Pointwise convergence is much weaker: it requires only that for each fixed \(x\), the sequence \(f_n(x) \to g(x)\). The uniform limit of continuous functions is continuous, but the pointwise limit need not be.
- Limit interchange: If \(f_n \to g\) uniformly on a set \(S\) and \(\lim_{x \to a} f_n(x) = L_n\) exists for each \(n\), then \(\lim_n L_n\) exists and \(\lim_{x \to a} g(x) = \lim_n L_n\). That is, \(\lim_{x \to a} \lim_{n \to \infty} f_n(x) = \lim_{n \to \infty} \lim_{x \to a} f_n(x)\).
- Integration: If \(f_n \to g\) uniformly on \([a,b]\), then \(\int_a^b f_n \to \int_a^b g\). This follows from \(|\int (f_n - g)| \le (b-a)\|f_n - g\|_\infty\).
- Differentiation: If \(f_n \to g\) pointwise, \(f_n' \to h\) uniformly, and each \(f_n'\) is continuous, then \(g\) is differentiable and \(g' = h\). (Proved by integrating \(f_n'\) and interchanging limits on the integrals.)
- \((B(X), d_\infty)\) is complete.
- \(C_b(X)\) is closed in \(B(X)\), hence complete.
Proof of (2): If \(f_n \in C_b(X)\) and \(f_n \to g\) uniformly, then \(g\) is continuous by the uniform limit theorem (Theorem 4.18).
Chapter 5: Completeness and Separability
Completeness — the property that every Cauchy sequence converges — is arguably the most important structural property a metric space can possess. It is the key hypothesis in the Baire Category Theorem, the Contraction Mapping Principle, and the existence theorem for solutions of ODEs. Separability — having a countable dense subset — is the topological analogue of being “not too large.”
5.1 Cauchy Sequences and Complete Metric Spaces
- Every Cauchy sequence is bounded.
- Every convergent sequence is Cauchy.
- If a Cauchy sequence has a convergent subsequence, then the full sequence converges (to the same limit).
(\(\Leftarrow\)): Let \((x_n)\) be Cauchy. Define \(C_n = \overline{\{x_k : k \ge n\}}\). These are nonempty, closed, nested, and \(\operatorname{diam}(C_n) = \operatorname{diam}(\{x_k : k \ge n\}) \to 0\) since the sequence is Cauchy. The unique point \(a \in \bigcap C_n\) is the limit: for each \(\varepsilon > 0\), choose \(N\) with \(\operatorname{diam}(C_N) < \varepsilon\), then for \(n \ge N\), \(d(x_n, a) \le \operatorname{diam}(C_N) < \varepsilon\).
This is a complete ultrametric space. To verify completeness: if \((\alpha_k)\) is Cauchy, then for each \(n\), there exists \(K\) such that the first \(n\) terms of \(\alpha_k\) agree for all \(k \ge K\). Define \(\beta(n) = \lim_k \alpha_k(n)\) (eventually constant). Then \(\alpha_k \to \beta\) in the metric.
The Baire space is homeomorphic to the irrational numbers \(\mathbb{R} \setminus \mathbb{Q}\) (via continued fraction expansion: each irrational has a unique continued fraction \([a_0; a_1, a_2, \ldots]\) with \(a_i \in \mathbb{N}\) for \(i \ge 1\), and this correspondence is a homeomorphism). Since \(\mathbb{N}^{\mathbb{N}}\) is complete, this gives an indirect proof that the irrationals are completely metrizable — they admit a complete metric inducing the same topology, even though the usual metric inherited from \(\mathbb{R}\) is not complete. The Baire space plays a fundamental role in descriptive set theory, where it serves as the “canonical” Polish space of dimension zero.
5.2 Completeness of Key Spaces
For \(\mathbb{R}^n\): extract a subsequence converging in the first coordinate, then a further subsequence converging in the second coordinate, etc. After \(n\) extractions, the final subsequence converges in all coordinates.
The proof strategy for all \(\ell^p\) spaces is the same three-step pattern, sometimes called the “standard completeness argument”:
- Coordinatewise convergence: show the Cauchy sequence converges in each coordinate.
- Membership: show the coordinatewise limit actually belongs to \(\ell^p\).
- Norm convergence: show convergence holds in the \(\ell^p\) norm, not just coordinatewise.
This strategy is used repeatedly in functional analysis whenever proving completeness of a function space.
Step 1 (Coordinatewise convergence): For each fixed \(k\), \(|a^{(n)}_k - a^{(m)}_k| \le \|a^{(n)} - a^{(m)}\|_p \to 0\), so \(b_k := \lim_n a^{(n)}_k\) exists.
Step 2 (Membership): Fix \(\varepsilon > 0\) and choose \(N\) with \(\|a^{(n)} - a^{(m)}\|_p < \varepsilon\) for \(n, m \ge N\). For any finite \(K\):
\[ \left(\sum_{k=1}^K |a^{(n)}_k - b_k|^p\right)^{1/p} = \lim_{m \to \infty} \left(\sum_{k=1}^K |a^{(n)}_k - a^{(m)}_k|^p\right)^{1/p} \le \varepsilon. \]Letting \(K \to \infty\): \(\|a^{(n)} - b\|_p \le \varepsilon\) for \(n \ge N\). In particular, \(a^{(n)} - b \in \ell^p\), so \(b = a^{(n)} - (a^{(n)} - b) \in \ell^p\) (since \(\ell^p\) is a vector space).
Step 3 (Norm convergence): The estimate \(\|a^{(n)} - b\|_p \le \varepsilon\) for \(n \ge N\) is exactly convergence in the \(\ell^p\) norm.
- \(c = \{(a_n) \in \ell^\infty : \lim a_n \text{ exists}\}\), the space of convergent sequences.
- \(c_0 = \{(a_n) \in \ell^\infty : \lim a_n = 0\}\), the space of sequences converging to zero.
One verifies \(\|f_n - f_m\|_1 \to 0\) as \(n, m \to \infty\), so \((f_n)\) is Cauchy in \(d_1\). The pointwise limit is the sign function \(\operatorname{sgn}(x)\), which is discontinuous, so the sequence has no limit in \(C[-1,1]\). The same construction works for \(d_2\). The completions of \((C[a,b], d_p)\) are the \(L^p\) spaces — a topic developed in PMATH 450 (Lebesgue Integration).
5.3 The Uniform Limit Theorem and Series of Functions
The interplay between completeness and uniform convergence yields powerful results about series of functions.
- Pointwise convergence of \(f_n \to g\) means: for each \(x\) and \(\varepsilon > 0\), there exists \(N(x,\varepsilon)\) with \(|f_n(x) - g(x)| < \varepsilon\) for \(n \ge N\). The \(N\) may depend on \(x\).
- Uniform convergence means: for each \(\varepsilon > 0\), there exists \(N(\varepsilon)\) (independent of \(x\)) with \(\sup_x |f_n(x) - g(x)| < \varepsilon\) for \(n \ge N\).
5.4 Separability
- A metric space is separable if and only if it is second-countable. If \(D\) is a countable dense set, then \(\{B(q, 1/n) : q \in D, n \ge 1\}\) is a countable basis.
- Every subspace of a separable metric space is separable.
- If every infinite subset of \(X\) has a limit point, then \(X\) is separable.
- \(\mathbb{R}^n\) is separable (\(\mathbb{Q}^n\) is a countable dense subset).
- \((C[a,b], d_\infty)\) is separable (polynomials with rational coefficients are countable and dense, by the Weierstrass Approximation Theorem).
- \((\ell^p, d_p)\) is separable for \(1 \le p < \infty\): the sequences with finitely many nonzero rational entries form a countable dense subset.
- \((\ell^\infty, d_\infty)\) is not separable: for each subset \(A \subseteq \mathbb{N}\), the characteristic function \(\chi_A\) satisfies \(\|\chi_A - \chi_B\|_\infty = 1\) when \(A \ne B\), giving an uncountable set of points mutually at distance 1.
- \((B[a,b], d_\infty)\) is not separable, for the same reason with characteristic functions of subsets of \([a,b]\).
Chapter 6: Compactness
Compactness is arguably the single most important topological property in analysis. It is the correct generalization of “closed and bounded” that works in all metric spaces. Compact sets have the remarkable property that every sequence has a convergent subsequence, which makes them indispensable in optimization, approximation theory, and differential equations.
6.1 Open Covers and the Definition of Compactness
The converse fails in general — the closed unit ball in \(C[0,1]\) is closed and bounded but not compact. More precisely:
6.2 Sequential Compactness and Total Boundedness
- Every subset of a totally bounded space is totally bounded.
- A metric space is totally bounded if and only if every sequence has a Cauchy subsequence.
- If \(f: X \to Y\) is uniformly continuous and \(X\) is totally bounded, then \(f(X)\) is totally bounded.
- Finite products of totally bounded spaces are totally bounded.
(\(\Leftarrow\)): If \(X\) is not totally bounded, there exists \(\varepsilon > 0\) and an infinite sequence \((a_n)\) with \(d(a_n, a_m) \ge \varepsilon\) for \(n \ne m\) (constructed greedily). No subsequence of \((a_n)\) is Cauchy, contradicting the hypothesis.
- In \(\mathbb{R}^n\), bounded and totally bounded are equivalent (by Heine--Borel, both are equivalent to having compact closure).
- In \(\ell^2\), the set \(S = \{e_n/n : n \ge 1\}\) is totally bounded (it converges to 0, so every sequence in \(S\) has a Cauchy subsequence). But the set \(\{e_n : n \ge 1\}\) is bounded (\(\|e_n\| = 1\)) but not totally bounded.
- In \(C[0,1]\), the set \(\{f_n(x) = x^n/n : n \ge 1\}\) is totally bounded (since \(\|f_n\|_\infty = 1/n \to 0\)), while \(\{x^n : n \ge 1\}\) is bounded but not totally bounded (the pointwise limit is discontinuous, so Arzela--Ascoli fails).
- \(X\) is compact (every open cover has a finite subcover).
- \(X\) has the finite intersection property on closed sets: if every finite sub-collection of a collection of closed sets has nonempty intersection, so does the whole collection.
- Every sequence in \(X\) has a convergent subsequence (sequential compactness).
- Every infinite subset of \(X\) has a limit point (Bolzano--Weierstrass property).
- \(X\) is complete and totally bounded.
(1)\(\Rightarrow\)(2): By contrapositive via complements. (2)\(\Rightarrow\)(3): Given \((x_n)\), the sets \(A_m = \overline{\{x_n : n > m\}}\) satisfy the FIP; choose \(a \in \bigcap A_m\) and build a subsequence converging to \(a\). (3)\(\Rightarrow\)(4): Choose a sequence of distinct points in the infinite set and extract a convergent subsequence.
(4)\(\Rightarrow\)(5): For completeness: if \((x_n)\) is Cauchy, the set \(\{x_n\}\) either has a repeated term (giving a constant convergent subsequence) or is infinite, hence has a limit point, so a subsequence converges, so the Cauchy sequence converges. For total boundedness: if \(X\) is not totally bounded, there exists \(\varepsilon > 0\) such that no finite \(\varepsilon\)-net covers \(X\). Construct inductively \(a_1, a_2, \ldots\) with \(d(a_n, a_m) \ge \varepsilon\) for \(m \ne n\). Then \(\{a_n\}\) is infinite with no limit point — contradiction.
(5)\(\Rightarrow\)(1): Suppose \(X\) is complete and totally bounded but not compact. Choose an open cover \(\mathcal{S}\) with no finite subcover. Cover \(X\) by finitely many balls of radius 1. At least one ball \(U_1 = B(a_1, 1)\) has no finite subcover from \(\mathcal{S}\). Cover \(X\) by balls of radius \(1/2\); choose \(U_2 = B(a_2, 1/2)\) with \(U_1 \cap U_2\) having no finite subcover. Continue to get nested intersections \(\bigcap_{k=1}^n U_k \ne \emptyset\) with no finite subcover. Pick \(x_n \in \bigcap_{k=1}^n U_k\). Since \(x_k, x_\ell \in U_m = B(a_m, 1/m)\) for \(k, \ell \ge m\), the sequence \((x_n)\) is Cauchy, hence converges to some \(a\). Choose \(U \in \mathcal{S}\) with \(a \in U\) and \(B(a,r) \subseteq U\). For large \(m\), \(U_m \subseteq B(a,r) \subseteq U\), so \(\{U\}\) finitely covers \(U_m\) — contradiction.
The equivalence (1) \(\Leftrightarrow\) (5) decomposes compactness into completeness (limits exist) and total boundedness (the space is “almost finite” at every scale). This decomposition is the key to understanding why compactness fails in infinite dimensions: while \(\ell^2\) is complete, the closed unit ball is bounded but not totally bounded (the standard basis vectors \(e_n\) are mutually at distance \(\sqrt{2}\), so no finite \(\varepsilon\)-net exists for \(\varepsilon < \sqrt{2}/2\)).
Important examples:
- The one-point compactification of \(\mathbb{R}\) is homeomorphic to the circle \(S^1\) (via stereographic projection).
- The one-point compactification of \(\mathbb{R}^n\) is homeomorphic to the sphere \(S^n\).
- The one-point compactification of the discrete space \(\mathbb{N}\) is homeomorphic to the convergent sequence \(\{0, 1, 1/2, 1/3, \ldots\}\) with the usual metric.
6.3 Compactness and Continuity
The theorems in this section are among the most frequently used consequences of compactness.
Fix \(\varepsilon > 0\). For each \(x \in X\), choose \(N(x)\) with \(g_{N(x)}(x) < \varepsilon/2\). By continuity of \(g_{N(x)}\), there exists an open neighborhood \(U_x\) of \(x\) with \(g_{N(x)}(y) < \varepsilon\) for all \(y \in U_x\). The sets \(\{U_x : x \in X\}\) form an open cover of \(X\). By compactness, extract a finite subcover \(U_{x_1}, \ldots, U_{x_k}\). Let \(N = \max\{N(x_1), \ldots, N(x_k)\}\). For any \(y \in X\), choose \(x_i\) with \(y \in U_{x_i}\). Since \(g_n\) is decreasing and \(n \ge N \ge N(x_i)\):
\[ 0 \le g_n(y) \le g_{N(x_i)}(y) < \varepsilon. \]Thus \(\|g_n\|_\infty < \varepsilon\) for all \(n \ge N\).
- Compactness: On \((0,1)\), define \(f_n(x) = x^n\). Then \(f_n \searrow 0\) pointwise but not uniformly (\(\sup_{(0,1)} f_n = 1\) for all \(n\)).
- Continuity of limit: On \([0,1]\), define \(f_n(x) = x^n\). Then \(f_n \searrow f\) where \(f(x) = 0\) for \(x < 1\) and \(f(1) = 1\). The limit is discontinuous, and convergence is not uniform.
- Monotonicity: On \([0,1]\), define \(f_n\) to be the tent function with peak \(1\) at \(1/n\) and support \([0, 2/n]\). Then \(f_n \to 0\) pointwise with continuous limit, but not uniformly (\(\|f_n\|_\infty = 1\)).
6.4 Compactness in C(X): The Arzela–Ascoli Theorem
In finite dimensions, Heine–Borel tells us compactness equals closed and bounded. In the infinite-dimensional space \(C(X)\), an additional condition is needed: equicontinuity.
- \(S\) is pointwise bounded if for each \(x \in X\), there exists \(M(x) > 0\) with \(|f(x)| \le M(x)\) for all \(f \in S\).
- \(S\) is uniformly bounded if there exists \(M > 0\) with \(\|f\|_\infty \le M\) for all \(f \in S\).
- \(S\) is equicontinuous at \(x_0\) if for every \(\varepsilon > 0\) there exists \(\delta > 0\) such that \(d(x, x_0) < \delta \implies |f(x) - f(x_0)| < \varepsilon\) for all \(f \in S\).
- \(S\) is equicontinuous if it is equicontinuous at every point. When \(X\) is compact, equicontinuity at every point is equivalent to uniform equicontinuity: for every \(\varepsilon > 0\) there exists \(\delta > 0\) such that \(d(x,y) < \delta \implies |f(x) - f(y)| < \varepsilon\) for all \(f \in S\) and all \(x, y \in X\). (The proof parallels Theorem 6.14.)
Step 1 (Diagonal argument): Since \(X\) is compact, hence separable, pick a countable dense subset \(\{q_1, q_2, \ldots\}\). The sequence \((f_n(q_1))\) is bounded, so extract a convergent subsequence \((f_{1,k})\). From this, extract a further subsequence \((f_{2,k})\) such that \((f_{2,k}(q_2))\) converges. Continue, and take the diagonal subsequence \(g_k = f_{k,k}\). Then \((g_k(q_j))\) converges for every \(j\).
Step 2 (Equicontinuity upgrades pointwise to uniform): Fix \(\varepsilon > 0\). Choose \(\delta > 0\) from equicontinuity. Cover \(X\) by \(B(q_{j_1}, \delta), \ldots, B(q_{j_p}, \delta)\). Choose \(N\) so that \(|g_k(q_{j_i}) - g_\ell(q_{j_i})| < \varepsilon/3\) for all \(k, \ell \ge N\) and all \(i\). For any \(x \in X\), choose \(q_{j_i}\) with \(d(x, q_{j_i}) < \delta\). Then:
\[ |g_k(x) - g_\ell(x)| \le |g_k(x) - g_k(q_{j_i})| + |g_k(q_{j_i}) - g_\ell(q_{j_i})| + |g_\ell(q_{j_i}) - g_\ell(x)| < \varepsilon. \]So \((g_k)\) is Cauchy in \(C(X)\), hence converges. Since \(S\) is closed, the limit lies in \(S\).
Converse: Compact implies closed and bounded (hence uniformly bounded, hence pointwise bounded). Every compact subset of \(C(X)\) is equicontinuous: if not, there exists \(\varepsilon > 0\) and functions \(f_n \in S\) with points \(x_n, y_n\) satisfying \(d(x_n, y_n) < 1/n\) but \(|f_n(x_n) - f_n(y_n)| \ge \varepsilon\), and no subsequence of \((f_n)\) can converge uniformly.
The integral operator \(T: C[a,b] \to C[a,b]\) defined by
\[ (Tf)(x) = \int_a^b K(x,t) f(t)\, dt, \]where \(K: [a,b]^2 \to \mathbb{R}\) is continuous, is a compact operator. To prove this using Arzela–Ascoli: if \(\|f\|_\infty \le M\), then \(\|Tf\|_\infty \le M(b-a)\|K\|_\infty\) (uniform boundedness), and for any \(\varepsilon > 0\), the uniform continuity of \(K\) on the compact set \([a,b]^2\) gives \(\delta > 0\) with
\[ |Tf(x) - Tf(y)| \le M \int_a^b |K(x,t) - K(y,t)|\, dt < \varepsilon \]whenever \(|x - y| < \delta\), independent of \(f\) (equicontinuity). By Arzela–Ascoli, \(\{Tf : \|f\|_\infty \le M\}\) is relatively compact.
Compact operators are central to spectral theory: the spectral theorem for compact self-adjoint operators on Hilbert spaces generalizes the diagonalization of symmetric matrices. The Fredholm alternative for compact operators is a key tool in the theory of integral equations and PDEs.
6.5 The Cantor Set and Space-Filling Curves

The Cantor set is one of the most remarkable objects in mathematics — simultaneously “large” (uncountable, same cardinality as \(\mathbb{R}\)) and “small” (measure zero, nowhere dense). Understanding the Cantor set is essential preparation for measure theory and fractal geometry.
Properties of the Cantor set:
- Closed (intersection of closed sets) and compact (closed and bounded in \(\mathbb{R}\)).
- Totally disconnected: contains no interval. If \(a < b\) with \(a, b \in C\), then for large enough \(n\), the interval \((a,b)\) contains a removed middle-third interval, so \((a,b) \not\subseteq C\).
- Perfect: every point is a limit point (\(C = C'\)). If \(x = \sum a_n/3^n \in C\), then changing the \(k\)-th digit from \(a_k\) to \(2 - a_k\) and keeping all other digits fixed gives a different point of \(C\) at distance \(\le 2/3^k \to 0\) from \(x\).
- Nowhere dense: \((\overline{C})^\circ = \emptyset\), since \(C\) contains no interval.
- Measure zero: the total length removed is \(1/3 + 2/9 + 4/27 + \cdots = \sum_{n=0}^\infty 2^n / 3^{n+1} = 1\). At stage \(n\), \(C_n\) consists of \(2^n\) intervals of total length \((2/3)^n \to 0\).
- Uncountable: the map \(\phi: C \to \{0,1\}^{\mathbb{N}}\) given by \(\phi(\sum a_n/3^n) = (a_1/2, a_2/2, \ldots)\) is a bijection from \(C\) to the set of binary sequences. Since \(|\{0,1\}^{\mathbb{N}}| = 2^{\aleph_0} = \mathfrak{c}\), the Cantor set has the cardinality of the continuum, despite having measure zero.
- Homeomorphic to \(\{0,1\}^{\mathbb{N}}\): the map \(\phi\) above is a homeomorphism when \(\{0,1\}^{\mathbb{N}}\) carries the product topology (equivalently, the metric \(d(s,t) = \sum 2^{-n} |s_n - t_n|\)). This is because the ternary expansion provides a dictionary between the Cantor set’s “left-right” branching structure and binary sequences.
- Continuous and monotonically nondecreasing.
- Differentiable almost everywhere, with \(\varphi'(x) = 0\) a.e. (on the complement of \(C\), which has full measure).
- Surjective onto \([0,1]\): it maps the measure-zero set \(C\) onto all of \([0,1]\).
- Not absolutely continuous: it increases from 0 to 1 but has derivative zero almost everywhere.
Base case. Since \(X\) is compact, it has a finite \(1/2\)-net \(x_1^{(1)}, \ldots, x_{n_1}^{(1)}\) with \(n_1 \ge 2\). Partition \(C\) into \(n_1\) disjoint nonempty clopen subsets \(C_1^{(1)}, \ldots, C_{n_1}^{(1)}\) with \(\operatorname{diam}(C_i^{(1)}) \le 1/3\). (This is possible since \(C\) is perfect, compact, and totally disconnected.) Define \(f_1: C \to X\) by \(f_1|_{C_i^{(1)}} = x_i^{(1)}\). Being locally constant, \(f_1\) is continuous.
Inductive step. Suppose at stage \(k\) we have a \(2^{-k}\)-net \(\{x_i^{(k)}\}_{i=1}^{n_k}\), a partition of \(C\) into disjoint clopen sets \(\{C_i^{(k)}\}\) with \(\operatorname{diam}(C_i^{(k)}) \le 3^{-k}\), and a locally constant \(f_k: C \to X\) with \(f_k|_{C_i^{(k)}} = x_i^{(k)}\). For each \(i\), refine the ball \(B(x_i^{(k)}, 2^{-k})\) by choosing a \(2^{-(k+1)}\)-net of it, giving new points \(x_j^{(k+1,i)}\). Split each \(C_i^{(k)}\) into the same number of disjoint clopen subsets of diameter \(\le 3^{-(k+1)}\). Define \(f_{k+1}\) to be constant on each new piece, mapping to the corresponding net point.
Convergence. Since \(f_{k+1}(y) \in B(f_k(y), 2^{-k})\) for all \(y \in C\), we have \(\|f_{k+1} - f_k\|_\infty \le 2^{-k}\). Since \(\sum 2^{-k} < \infty\), the sequence \((f_k)\) is Cauchy in the complete space \(C(C, X)\) and converges uniformly to a continuous function \(f: C \to X\).
Surjectivity. Let \(x \in X\). At each stage \(k\), choose \(i_k\) with \(d(x, x_{i_k}^{(k)}) \le 2^{-k}\) and pick any \(y_k \in C_{i_k}^{(k)}\). The sequence of sets \(C_{i_1}^{(1)} \supseteq C_{i_2}^{(2)} \supseteq \cdots\) is nested, nonempty, and has diameters \(\to 0\). By Cantor’s Intersection Theorem, there exists \(y \in \bigcap_k C_{i_k}^{(k)}\). Then \(d(f(y), x) = \lim_k d(f_k(y), x) \le \lim_k 2^{-k} = 0\), so \(f(y) = x\).
Construction. At the first stage, tile \(X\) with finitely many nonempty, disjoint, clopen sets \(X_1, \ldots, X_{n_1}\) of diameter at most \(1/2\), with \(n_1 \ge 2\). Partition \(C\) into the same number of disjoint clopen subsets \(C_1, \ldots, C_{n_1}\) with diameter at most \(1/3\) (this is always possible since the Cantor set is perfect and totally disconnected). Pick representative points \(x_i \in X_i\) and \(c_i \in C_i\), and define locally constant maps \(f_1: X \to C\) by \(f_1|_{X_i} = c_i\) and \(g_1: C \to X\) by \(g_1|_{C_i} = x_i\).
At the \(k\)-th stage, refine each \(X_i^{(k)}\) from the previous stage into finitely many disjoint clopen subsets of diameter at most \(2^{-k}\), and correspondingly refine each \(C_i^{(k)}\) into the same number of disjoint clopen subsets of diameter at most \(3^{-k}\). Define locally constant maps \(f_k: X \to C\) and \(g_k: C \to X\) by assigning each piece to a representative point.
Convergence. By construction, \(\|f_k - f_{k+1}\|_\infty \le 3^{-k}\) and \(\|g_k - g_{k+1}\|_\infty \le 2^{-k}\), so both \(\sum \|f_k - f_{k+1}\|_\infty\) and \(\sum \|g_k - g_{k+1}\|_\infty\) converge. Hence \((f_k)\) and \((g_k)\) are Cauchy in the complete spaces \(C(X, C)\) and \(C(C, X)\), and converge uniformly to continuous maps \(f: X \to C\) and \(g: C \to X\).
Inverse maps. At stage \(k\), \(\|g_k \circ f_k - \mathrm{id}_X\|_\infty \le \max_i \operatorname{diam}(X_i^{(k)}) \le 2^{-k} \to 0\) and similarly \(\|f_k \circ g_k - \mathrm{id}_C\|_\infty \le 3^{-k} \to 0\). Passing to limits, \(g \circ f = \mathrm{id}_X\) and \(f \circ g = \mathrm{id}_C\). Thus \(f\) and \(g\) are continuous bijections and inverses of each other, establishing the homeomorphism \(X \cong C\).
Chapter 7: Connectedness
While compactness captures “finiteness” of a space, connectedness captures the intuition that a space is “in one piece.” This chapter develops the basic theory of connected and path-connected spaces.
Connected spaces play a fundamental role in the intermediate value theorem, the classification of manifolds, and the theory of covering spaces. In metric spaces, the theory is somewhat simpler than in general topological spaces, but the key phenomena — the distinction between connectedness and path-connectedness, the existence of connected components — are already present.
7.1 Connected Sets
Equivalently, \(X\) is connected iff the only subsets that are both open and closed (clopen) are \(\emptyset\) and \(X\) itself.
- Existence of \(n\)-th roots: For any \(a > 0\) and \(n \ge 1\), the function \(f(x) = x^n - a\) is continuous on \([0, \max(1,a)]\) with \(f(0) < 0\) and \(f(\max(1,a)) \ge 0\). By the IVT, \(x^n = a\) has a positive root.
- Every odd-degree polynomial has a real root: If \(p(x) = x^{2k+1} + a_{2k}x^{2k} + \cdots + a_0\), then \(p(x) \to +\infty\) as \(x \to +\infty\) and \(p(x) \to -\infty\) as \(x \to -\infty\). By the IVT, \(p\) has a zero.
- The "ham sandwich theorem" in 1D: If \(f: [a,b] \to \mathbb{R}\) is continuous, there exists \(c \in [a,b]\) such that \(\int_a^c f = \int_c^b f\). Apply the IVT to \(g(c) = \int_a^c f - \int_c^b f\), which satisfies \(g(a) = -\int_a^b f\) and \(g(b) = \int_a^b f\).
7.2 Path-Connectedness
7.3 Connectedness of Products and Applications
7.4 Connected Components and Total Disconnectedness
Chapter 8: The Baire Category Theorem
The Baire Category Theorem is one of the most powerful consequences of completeness. It provides a framework for proving that certain “nice” properties hold for “most” points or functions, even when constructing a single explicit example may be difficult.
The key idea is to classify subsets of a complete metric space as either “topologically small” (first category) or “topologically large” (second category/residual), and to show that complete metric spaces themselves are always second category.
8.1 Nowhere Dense Sets and Categories
First category sets are the “topologically small” sets, analogous to how measure-zero sets are “measure-theoretically small.” However, these notions are genuinely different: the Cantor set has measure zero but is not countable, and there exist sets of first category with positive measure.
8.2 The Baire Category Theorem
- Every first category set in \(X\) has empty interior.
- Every residual set in \(X\) is dense.
- A countable union of closed sets with empty interiors has empty interior.
- A countable intersection of dense open sets is dense.
- The rationals \(\mathbb{Q}\) are first category in \(\mathbb{R}\) (a countable union of singletons) and have measure zero.
- There exist dense open sets of arbitrarily small measure: for each \(\varepsilon > 0\), the set \(G_\varepsilon = \bigcup_{n=1}^\infty (q_n - \varepsilon/2^n, q_n + \varepsilon/2^n)\) (where \(\{q_n\}\) enumerates \(\mathbb{Q}\)) is open, dense, and has Lebesgue measure at most \(2\varepsilon\). Its complement \(\mathbb{R} \setminus G_\varepsilon\) is closed, nowhere dense, and has measure \(\ge 1 - 2\varepsilon\) on \([0,1]\). Taking a countable intersection \(\bigcap_{k=1}^\infty G_{1/k}\) gives a residual set of measure zero.
- Conversely, the complement of such a set is first category but has full measure. So there exist sets that are "topologically large" (residual) but "measure-theoretically small" (measure zero), and vice versa.
- Generic continuous functions are nowhere differentiable (Theorem 8.10).
- Generic continuous functions are nowhere monotone: the set of \(f \in C[0,1]\) that are monotone on some interval is first category.
- Generic orbits of irrational rotations are equidistributed: in the space of rotations of the circle, a residual set produces equidistributed orbits (Weyl's theorem).
8.3 \(G_\delta\) and \(F_\sigma\) Sets
- A \(G_\delta\) set is a countable intersection of open sets.
- An \(F_\sigma\) set is a countable union of closed sets.
Every closed set is \(G_\delta\) (in a metric space, \(A = \bigcap_{n=1}^\infty \{x : d(x,A) < 1/n\}\)) and every open set is \(F_\sigma\). The BCT implies that in a complete metric space, a dense \(G_\delta\) set is residual, and the intersection of countably many dense \(G_\delta\) sets is again dense \(G_\delta\).
8.4 Oscillation and Continuity of Pointwise Limits
Then \(f\) is continuous at \(x\) if and only if \(\omega_f(x) = 0\).
Each \(A_n\) is closed by Theorem 8.7. Suppose for contradiction that some \(A_n\) contains an open interval \(I\). Fix \(\varepsilon = \frac{1}{3n}\). For each \(N \ge 1\), define
\[ E_N = \{x \in I : |f_i(x) - f_j(x)| \le \varepsilon \text{ for all } i, j \ge N\}. \]Each \(E_N\) is closed (an intersection of closed sets). Since \(f_i(x) \to f(x)\) pointwise, for each \(x \in I\) the sequence \((f_i(x))\) is Cauchy, so \(x \in E_N\) for some \(N\). Thus \(I = \bigcup_{N=1}^\infty E_N\). By the Baire Category Theorem applied to the complete metric space \(I\), some \(E_{N_0}\) has nonempty interior, so there is an open sub-interval \(J \subseteq E_{N_0}\).
Fix \(x \in J\). By uniform continuity of \(f_{N_0}\) on \([a,b]\), there exists \(\delta > 0\) such that \(|x - y| < \delta \implies |f_{N_0}(x) - f_{N_0}(y)| < \varepsilon\). We may assume \((x - \delta, x + \delta) \subseteq J\). Then for \(|y - x| < \delta\):
\[ |f(x) - f(y)| \le |f(x) - f_{N_0}(x)| + |f_{N_0}(x) - f_{N_0}(y)| + |f_{N_0}(y) - f(y)| < \varepsilon + \varepsilon + \varepsilon = \frac{1}{n}, \]where the first and third terms follow from \(|f(x) - f_{N_0}(x)| = \lim_{i \to \infty} |f_i(x) - f_{N_0}(x)| \le \varepsilon\) since \(x \in E_{N_0}\). This gives \(\omega_f(x) < 1/n\), so \(x \notin A_n\), contradicting \(J \subseteq I \subseteq A_n\).
This theorem shows that a pointwise limit of continuous functions, while not necessarily continuous everywhere, must be continuous on a “topologically large” set. The discontinuity set is meagre — it is contained in a countable union of nowhere dense sets.
- There exists a function continuous exactly on the irrationals (e.g., the Thomae function).
- There does not exist a function continuous exactly on the rationals, because \(\mathbb{Q}\) is not \(G_\delta\) (Proposition 8.9a).
- There exists a function continuous exactly on a given countable dense set \(D\) if and only if \(D\) is \(G_\delta\), which by the BCT requires \(D^c\) to be first category.
- The characteristic function of the rationals \(\mathbf{1}_{\mathbb{Q}}\) is discontinuous everywhere, and since \(\mathbb{R}\) is not meagre (by the BCT), \(\mathbf{1}_{\mathbb{Q}}\) cannot be the pointwise limit of any sequence of continuous functions.
- The Thomae function (Riemann function) \(f(p/q) = 1/q\), \(f(\text{irrational}) = 0\) is discontinuous exactly on \(\mathbb{Q}\), which is \(F_\sigma\) and meagre. Indeed, it is the pointwise limit of continuous functions (hence Baire class 1) and also Riemann integrable.
8.5 Application: The Uniform Boundedness Principle
The Baire Category Theorem has deep applications in functional analysis, even at the level of this course.
8.5a Application: Smooth Functions with Polynomial Restrictions
8.6 Application: \(\mathbb{Q}\) Is Not \(G_\delta\) in \(\mathbb{R}\)
But by the BCT, a countable intersection of dense open sets in the complete space \(\mathbb{R}\) is dense, hence nonempty — contradiction.
This gives another proof that \(\mathbb{Q}\) is not complete: if it were, the BCT would apply to \(\mathbb{Q}\) itself, but \(\mathbb{Q} = \bigcup \{q_n\}\) is first category, so its interior (all of \(\mathbb{Q}\) if it had nonempty interior) would be empty — which it is. The more subtle point is that \(\mathbb{Q}\) cannot even be \(G_\delta\), which distinguishes it from the irrationals \(\mathbb{R} \setminus \mathbb{Q} = \bigcap_{n} (\mathbb{R} \setminus \{q_n\})\), which is \(G_\delta\).
8.7 Nowhere Differentiable Functions
One of the most striking applications of the Baire Category Theorem is showing that “most” continuous functions are nowhere differentiable.
8.7 The Weierstrass Function
While the BCT gives an existential proof, Weierstrass gave an explicit example:
is continuous everywhere (by the Weierstrass M-test, since \(\sum 2^{-k} < \infty\)) but differentiable nowhere.
where \(0 < a < 1\), \(b\) is a positive odd integer, and \(ab > 1 + 3\pi/2\). Hardy (1916) proved the sharper result that nowhere-differentiability holds whenever \(ab \ge 1\), which is essentially optimal since \(ab < 1\) gives uniform convergence of the differentiated series by the Weierstrass M-test, yielding a differentiable function.
The Weierstrass function has Hölder exponent \(\alpha = -\log a/\log b\): it satisfies \(|f(x) - f(y)| \le C|x-y|^\alpha\) but not \(|f(x) - f(y)| \le C|x-y|^\beta\) for any \(\beta > \alpha\). This connects to fractal geometry: the graph of the Weierstrass function has Hausdorff dimension \(2 - \alpha\), which is strictly greater than 1, reflecting its rough, fractal-like nature.
- Van der Waerden's function (1930): \(g(x) = \sum_{n=0}^\infty 10^{-n} \phi(10^n x)\) where \(\phi(x) = \min_{n \in \mathbb{Z}} |x - n|\) is the sawtooth function (distance to nearest integer). The proof of nowhere-differentiability is elementary: the \(n\)-th partial sum is piecewise linear with slopes \(\pm 1\), and the remaining terms contribute a controlled error.
- Bolzano's function (~1830): Constructed decades before Weierstrass via a geometric process of progressively "zigzagging," though not published until much later. It demonstrates that pathological functions were being discovered even before the rigorous foundations of analysis were established.
- Brownian motion: With probability 1, a sample path of standard Brownian motion is continuous and nowhere differentiable. This connects the analytic phenomenon to probability theory and provides a "random" construction complementing the explicit ones above.
Chapter 9: The Contraction Mapping Principle
The Contraction Mapping Principle (or Banach Fixed-Point Theorem) is a simple yet remarkably powerful consequence of completeness. It guarantees the existence and uniqueness of fixed points, and provides an explicit iterative algorithm for finding them.
9.1 Contraction Maps
Every contraction is (uniformly) continuous, and strictly decreases distances between distinct points. The contraction constant \(c\) determines the rate of convergence: for \(c\) near 0 the convergence is fast, for \(c\) near 1 it is slow.
9.2 The Banach Fixed-Point Theorem
- \(T\) has a unique fixed point \(x^* \in X\).
- For any \(x_0 \in X\), the iterates \(x_n = T^n(x_0)\) converge to \(x^*\).
- The a priori error bound \(d(x_n, x^*) \le \frac{c^n}{1-c} d(x_0, T(x_0))\) holds.
- The a posteriori error bound \(d(x_n, x^*) \le \frac{c}{1-c} d(x_{n-1}, x_n)\) holds.
So \((x_n)\) is Cauchy, hence converges to some \(x^* \in X\). Since \(T\) is continuous, \(T(x^*) = T(\lim x_n) = \lim T(x_n) = \lim x_{n+1} = x^*\). For uniqueness: if \(T(a) = a\) and \(T(b) = b\), then \(d(a,b) = d(T(a), T(b)) \le c \cdot d(a,b)\) with \(c < 1\), forcing \(d(a,b) = 0\). The error bound follows by letting \(m \to \infty\) in the Cauchy estimate.
For example, with \(c = 1/2\) and \(d(x_0, x_1) = 1\): after \(n = 10\) iterations, \(d(x_{10}, x^*) \le 2^{-10}/(1 - 1/2) = 2^{-9} \approx 0.002\). After \(n = 20\): \(d(x_{20}, x^*) \le 2^{-19} \approx 10^{-6}\). The geometric convergence rate makes the theorem computationally practical.
9.3 Newton’s Method
Newton’s method is a prime application of the contraction mapping principle. The key insight is that the Newton operator \(T\) not only has a fixed point at the root, but has \(T'(x^*) = 0\), giving it “superlinear” contraction near \(x^*\).
so \(T'(x^*) = 0\). By continuity, \(|T'(x)| \le 1/2\) on some interval \([x^* - R, x^* + R]\), making \(T\) a contraction. The convergence is quadratic: by Taylor expansion, \(|x_{n+1} - x^*| \le \frac{|g''(\xi_n)|}{2|g'(x_n)|} |x_n - x^*|^2\).
The proof reduces to finding a fixed point. Define \(T_x(y) = y - [D_y F(a,b)]^{-1} F(x, y)\). Then \(F(x, y) = 0\) iff \(T_x(y) = y\). One computes \(D_y T_x(y) = I - [D_y F(a,b)]^{-1} D_y F(x,y)\). At \((a,b)\), this is \(I - I = 0\), so by continuity, \(\|D_y T_x\| < 1/2\) on a neighborhood of \((a,b)\). The Mean Value Inequality then shows \(T_x\) is a contraction in \(y\), and the Banach Fixed-Point Theorem gives a unique fixed point \(y = g(x)\). Smoothness of \(g\) follows from the smooth dependence of the fixed point on the parameter \(x\).
This argument generalizes to Banach spaces and is the standard approach to the Inverse Function Theorem as well: if \(F: U \to \mathbb{R}^n\) is \(C^1\) with \(\det(DF(a)) \ne 0\), local invertibility of \(F\) near \(a\) follows by applying the Implicit Function Theorem to \(G(x, y) = F(y) - x\).
9.4 Iterated Function Systems and Fractals
A beautiful application of the Contraction Mapping Principle is to iterated function systems (IFS).
- Each \(T_i\) is a contraction on \(\mathcal{H}(X)\): \(d_H(T_i(A), T_i(B)) \le c_i \cdot d_H(A,B)\).
- \(\mathcal{T}\) is a contraction with constant \(c = \max_i c_i < 1\).
- \(\mathcal{T}\) has a unique fixed point \(K^* \in \mathcal{H}(X)\) --- the attractor of the IFS --- satisfying \(K^* = T_1(K^*) \cup \cdots \cup T_k(K^*)\).
The attractor can be approximated starting from any compact set \(K_0\): the iterates \(K_n = \mathcal{T}^n(K_0)\) converge to \(K^*\) in the Hausdorff metric. The Banach Fixed-Point Theorem gives the error bound \(d_H(K_n, K^*) \le \frac{c^n}{1-c} d_H(K_0, \mathcal{T}(K_0))\). In practice, starting from any simple compact set (a point, a line segment, or a filled triangle) and repeatedly applying \(\mathcal{T}\) produces increasingly detailed approximations of the fractal attractor.
(the Moran equation). This is sometimes called the similarity dimension. For example:
- The Cantor set has \(k = 2\), \(r_1 = r_2 = 1/3\), so \(2 \cdot (1/3)^s = 1\), giving \(s = \log 2 / \log 3 \approx 0.631\).
- The Sierpinski triangle has \(k = 3\), \(r_i = 1/2\), so \(3 \cdot (1/2)^s = 1\), giving \(s = \log 3 / \log 2 \approx 1.585\).
- The Koch curve has \(k = 4\), \(r_i = 1/3\), giving \(s = \log 4 / \log 3 \approx 1.262\).
Chapter 10: Metric Completion and p-adic Numbers
Every metric space, no matter how incomplete, can be “filled in” to create a complete space. This chapter develops the theory of metric completion, applies it to construct the p-adic numbers, and shows how the real numbers themselves can be constructed as the completion of the rationals.
10.1 Existence of the Metric Completion
The central result of this section asserts that every metric space can be “completed” by adding the limits of Cauchy sequences that are “missing.” The construction is analogous to how \(\mathbb{R}\) is obtained from \(\mathbb{Q}\) by filling in the irrational numbers.
where the supremum is attained at \(t = y\). Since \(C_b(X)\) is complete, the closure \(\overline{F(X)}\) is a complete metric space containing \(F(X) \cong X\) as a dense subset.
- The completion of \((\mathbb{Q}, |\cdot|)\) is \(\mathbb{R}\) (by definition/construction).
- The completion of \((\mathbb{Q}, |\cdot|_p)\) is the p-adic numbers \(\mathbb{Q}_p\) (Section 10.3).
- The completion of \((C[a,b], \|\cdot\|_1)\) is the Lebesgue space \(L^1[a,b]\) (developed in PMATH 450).
- The completion of \((C[a,b], \|\cdot\|_2)\) is the Hilbert space \(L^2[a,b]\), fundamental to quantum mechanics and Fourier analysis.
- The completion of the rational polynomials under the p-adic valuation gives the ring of formal power series, an important tool in algebraic geometry.
10.2 Uniqueness of the Metric Completion
Having shown that completions exist, we now show they are unique up to isometry. This is analogous to the uniqueness of the real numbers as a complete ordered field. The proof uses the extension theorem for uniformly continuous maps, which is itself a consequence of completeness.
Well-definedness: If \((y_n)\) also converges to \(a\), then \(d(x_n, y_n) \to 0\), so \(d(f(x_n), f(y_n)) \to 0\) by uniform continuity, giving \(\lim f(x_n) = \lim f(y_n)\).
Uniform continuity of \(\hat{f}\): Given \(\varepsilon > 0\), choose \(\delta > 0\) with \(d(x,y) < \delta \implies d(f(x), f(y)) < \varepsilon\) for \(x, y \in X\). If \(d(a,b) < \delta\) in \(\hat{X}\), choose \(x_n \to a\) and \(y_n \to b\) in \(X\). Then \(d(x_n, y_n) \to d(a,b) < \delta\), so for large \(n\), \(d(f(x_n), f(y_n)) < \varepsilon\). Taking limits: \(d(\hat{f}(a), \hat{f}(b)) \le \varepsilon\).
Uniqueness: If \(\hat{g}\) is another continuous extension, then \(\hat{f}\) and \(\hat{g}\) agree on the dense set \(X\), and the set \(\{a \in \hat{X} : \hat{f}(a) = \hat{g}(a)\}\) is closed (preimage of the diagonal under the continuous map \(a \mapsto (\hat{f}(a), \hat{g}(a))\)). A closed set containing a dense subset equals the whole space, so \(\hat{f} = \hat{g}\).
10.3 The p-adic Numbers
The p-adic numbers provide the most important example of metric completion beyond the construction of \(\mathbb{R}\). They were introduced by Hensel in 1897 and are fundamental in number theory and algebraic geometry.
- \(x + y = \lim(x_n + y_n)\) and \(xy = \lim(x_n y_n)\) are well-defined.
- \(\|xy\|_p = \|x\|_p \|y\|_p\) (multiplicativity).
- \(\|x + y\|_p \le \max\{\|x\|_p, \|y\|_p\}\) (ultrametric inequality).
The p-adic numbers satisfy the ultrametric inequality \(d_p(x,z) \le \max\{d_p(x,y), d_p(y,z)\}\), which gives them a very different geometry from \(\mathbb{R}\):
- Every triangle is isosceles: if \(d_p(x,y) \ne d_p(y,z)\), then \(d_p(x,z) = \max\{d_p(x,y), d_p(y,z)\}\). (In an ultrametric, the two largest sides of any triangle are equal.)
- Every point in a ball is a center: if \(b \in B(a,r)\), then \(B(a,r) = B(b,r)\). Proof: if \(x \in B(a,r)\), then \(d_p(x,b) \le \max\{d_p(x,a), d_p(a,b)\} < r\), so \(x \in B(b,r)\). By symmetry, \(B(a,r) = B(b,r)\).
- Two balls are either disjoint or one contains the other. If \(B(a,r) \cap B(b,s) \ne \emptyset\) with \(r \le s\), pick \(c\) in the intersection. Then \(B(a,r) = B(c,r) \subseteq B(c,s) = B(b,s)\).
- Balls are clopen: the open ball \(B(a,r)\) is also closed, and the closed ball \(\overline{B}(a,r)\) is also open. This is because the p-adic norm takes only the discrete values \(\{0, p^{-n} : n \in \mathbb{Z}\}\), so there is always a "gap" between the boundary and the exterior.
- \(\mathbb{Z}_p\) is totally disconnected, compact, and perfect.
- The equation \(x^2 = -1\) has a solution! Starting with \(x_0 = 2\) (since \(2^2 = 4 \equiv -1 \pmod{5}\)), Newton's method in \(\mathbb{Z}_5\) produces a Cauchy sequence converging to a square root of \(-1\). Explicitly, \(i_5 = 2 + 1 \cdot 5 + 2 \cdot 5^2 + 1 \cdot 5^3 + \cdots\) satisfies \(i_5^2 = -1\) in \(\mathbb{Z}_5\).
- The equation \(x^2 = 2\) has no solution in \(\mathbb{Z}_5\), since 2 is not a quadratic residue mod 5.
This algebraic perspective reveals \(\mathbb{Z}_p\) as the completion of \(\mathbb{Z}\) in the p-adic metric, and explains why the topology (a product of discrete finite spaces) is compact, totally disconnected, and metrizable.
10.4 Ordered Fields and the Construction of \(\mathbb{R}\)
The real numbers can be characterized abstractly as the unique complete ordered field. To make this precise, we need the language of ordered fields.
- \(F = (-P) \cup \{0\} \cup P\) (disjoint union), where \(-P = \{-x : x \in P\}\).
- \(P + P \subseteq P\) and \(P \cdot P \subseteq P\).
- \(\mathbb{Q}\) embeds into \(F\) as an ordered subfield (since \(0 < 1 < 1 + 1 < \cdots\), so \(\mathbb{Z} \subseteq F\), hence \(\mathbb{Q} \subseteq F\)). In particular, \(F\) has characteristic 0.
- The LUBP implies completeness (every Cauchy sequence converges).
- The LUBP implies the Archimedean property.
- The Archimedean property implies \(\mathbb{Q}\) is dense in \(F\): if \(x < y\), there exists \(r \in \mathbb{Q}\) with \(x < r < y\).
The real numbers can be constructed as the completion of \(\mathbb{Q}\) with respect to the usual absolute value metric. There are several approaches:
Cauchy sequence construction: Define \(\mathbb{R}\) as the set of equivalence classes of Cauchy sequences in \(\mathbb{Q}\), where \((x_n) \sim (y_n)\) if \(\lim |x_n - y_n| = 0\). Addition and multiplication are defined termwise. The ordering is defined by: \([(x_n)] > 0\) if \(x_n > 0\) for all sufficiently large \(n\). One verifies that \(\mathbb{R}\) is a complete ordered field.
Dedekind cuts: A cut is a proper, downward-closed subset of \(\mathbb{Q}\) with no largest element. Define \(\mathbb{R}\) to be the set of all cuts, ordered by inclusion. The least upper bound property is immediate: the supremum of a bounded set of cuts is their union. Addition and multiplication require more care to define.
Infinite decimals: Represent each real as an infinite decimal \(a_0.a_1 a_2 a_3 \ldots\), identify \(0.999\ldots\) with \(1.000\ldots\), and define arithmetic via approximation.
Case 1 (Non-Archimedean): Suppose \(|n| \le 1\) for all \(n \in \mathbb{N}\). Then \(|\cdot|\) satisfies the ultrametric inequality. The set \(\{n \in \mathbb{Z} : |n| < 1\}\) is an ideal in \(\mathbb{Z}\), hence equals \(p\mathbb{Z}\) for some prime \(p\). Write any rational \(x = p^a r/s\) with \(p \nmid rs\), giving \(|x| = |p|^a\). Setting \(\alpha = -\log|p|/\log p\), we get \(|x| = |x|_p^\alpha\).
Case 2 (Archimedean): Suppose \(|n_0| > 1\) for some \(n_0 \in \mathbb{N}\). For any integers \(m, n \ge 2\), expand \(n\) in base \(m\) and use the triangle inequality to show that \(|n| \le n^{\log|m|/\log m}\). By symmetry, \(|m| \le m^{\log|n|/\log n}\), hence \(\log|n|/\log n = \log|m|/\log m\) is constant. This gives \(|n| = n^\alpha\) for some \(\alpha > 0\), so \(|\cdot|\) is equivalent to the usual absolute value.
10.5 Isometric Embeddings and Universal Spaces
A natural question arising from the metric completion is: how large must a “universal” complete metric space be to contain isometric copies of all separable metric spaces? The answer, due to Banach and Mazur, is surprisingly elegant.
This is well-defined since \(|d(x, q_n) - d(a, q_n)| \le d(x, a)\) by the triangle inequality, so \(\varphi(x) \in \ell^\infty\) with \(\|\varphi(x)\|_\infty \le d(x,a)\). To show \(\varphi\) is an isometry:
\[ \|\varphi(x) - \varphi(y)\|_\infty = \sup_n |d(x, q_n) - d(y, q_n)| \le d(x,y) \]by the triangle inequality. For the reverse inequality, given \(\varepsilon > 0\), choose \(q_n\) with \(d(q_n, y) < \varepsilon\). Then \(d(x, q_n) - d(y, q_n) \ge d(x,y) - 2d(y, q_n) > d(x,y) - 2\varepsilon\). Since \(\varepsilon\) is arbitrary, \(\|\varphi(x) - \varphi(y)\|_\infty = d(x,y)\).
Chapter 11: Approximation Theory
Having developed the abstract theory of metric spaces, completeness, compactness, and the Baire Category Theorem, we now turn to one of the most beautiful applications: approximation theory. This chapter addresses the fundamental question: which functions can be uniformly approximated by “nice” functions? The Weierstrass Approximation Theorem — every continuous function on a compact interval can be uniformly approximated by polynomials — is one of the most beautiful results in analysis, and the Stone–Weierstrass Theorem provides a vast generalization.
11.1 Bernstein Polynomials and Weierstrass Approximation
The most elegant proof of the Weierstrass Approximation Theorem uses Bernstein polynomials, which employ probability theory in a beautiful way.
The Bernstein operator has remarkable properties: it is linear, positive (\(f \ge 0 \implies B_n f \ge 0\)), monotone (\(f \le g \implies B_n f \le B_n g\)), and preserves constants and the identity function: \(B_n(1) = 1\) and \(B_n(x) = x\). The key computation is:
Consequently, \(B_n((x-a)^2)(a) = \frac{a - a^2}{n} \le \frac{1}{4n}\) for \(a \in [0,1]\).
Fixing \(a\) and applying the positive linear map \(B_n\):
\[ |B_n f(a) - f(a)| \le \varepsilon B_n(1)(a) + \frac{2\|f\|_\infty}{\delta^2} B_n((x-a)^2)(a) \le \varepsilon + \frac{2\|f\|_\infty}{\delta^2} \cdot \frac{1}{4n}. \]Choosing \(n\) large enough makes this less than \(2\varepsilon\).
Runge’s phenomenon demonstrates this failure dramatically. Consider \(f(x) = 1/(1 + 25x^2)\) on \([-1,1]\) with equally spaced interpolation points \(x_k = -1 + 2k/n\). The interpolating polynomials \(p_n\) diverge near the endpoints: \(\|f - p_n\|_\infty \to \infty\) as \(n \to \infty\). The wild oscillations near \(x = \pm 1\) grow without bound. This is not a defect of the particular function — Faber’s theorem (1914) shows that for any array of interpolation points, there exists a continuous function whose interpolants diverge.
The resolution is to use Chebyshev nodes \(x_k = \cos((2k+1)\pi/(2n+2))\), which are the roots of the Chebyshev polynomial \(T_{n+1}\). These points cluster near the endpoints of the interval, counteracting the oscillation tendency. For Lipschitz continuous functions, Chebyshev interpolation converges, and for analytic functions the convergence is exponentially fast. This connection between optimal interpolation points and the Chebyshev equioscillation theorem (Section 11.3) is one of the beautiful unifying themes of approximation theory.
Define the Landau kernel
\[ Q_n(t) = c_n(1 - t^2)^n, \quad t \in [-1, 1], \]where the constant \(c_n\) is chosen so that \(\int_{-1}^1 Q_n(t)\, dt = 1\). Note that \(Q_n\) is a polynomial of degree \(2n\) in \(t\). To estimate \(c_n\): using the substitution \(t = \sin\theta\) or the bound \((1-t^2)^n \ge (1-1/n)^n \ge e^{-2}\) on \(|t| \le 1/\sqrt{n}\), one shows \(\int_{-1}^1 (1-t^2)^n\, dt \ge \frac{2}{\sqrt{n}(1+1/n)} \), so \(c_n = O(\sqrt{n})\). The key property of \(Q_n\) is that for any fixed \(\delta > 0\),
\[ \int_{\delta \le |t| \le 1} Q_n(t)\, dt \le c_n(1-\delta^2)^n \to 0 \quad \text{as } n \to \infty. \]In other words, the mass of \(Q_n\) concentrates on \((-\delta, \delta)\) as \(n \to \infty\).
For \(f \in C[-1,1]\), extend \(f\) by zero outside \([-1,1]\) and define the polynomial approximant
\[ p_n(x) = \int_{-1}^1 f(x + t)\, Q_n(t)\, dt. \]This is a polynomial in \(x\) of degree \(2n\) (since \(Q_n(t)\) is a polynomial in \(t\) and we are integrating against a fixed function of \(t\)). To show \(p_n \to f\) uniformly on \([-1,1]\), write
\[ p_n(x) - f(x) = \int_{-1}^1 \bigl(f(x+t) - f(x)\bigr) Q_n(t)\, dt. \]Given \(\varepsilon > 0\), choose \(\delta > 0\) by uniform continuity so that \(|t| < \delta \Rightarrow |f(x+t) - f(x)| < \varepsilon\) for all \(x\). Split the integral at \(\pm\delta\): the contribution from \(|t| < \delta\) is bounded by \(\varepsilon \int Q_n = \varepsilon\), while the contribution from \(|t| \ge \delta\) is bounded by \(2\|f\|_\infty \cdot c_n(1-\delta^2)^n \to 0\). Choosing \(n\) large enough makes the total error less than \(2\varepsilon\), uniformly in \(x\).
This proof is more than a curiosity: the construction \(f \mapsto f * Q_n\) is exactly a convolution with an approximate identity. In Fourier analysis, the Fejér kernel \(F_N(\theta) = \frac{1}{N}\left(\frac{\sin(N\theta/2)}{\sin(\theta/2)}\right)^2\) plays the analogous role for trigonometric approximation — it is a non-negative kernel integrating to 1, concentrating at \(\theta = 0\), and the Cesàro means \(\sigma_N f = f * F_N\) converge uniformly to \(f\) for any \(f \in C(\mathbb{T})\). The Weierstrass theorem and Fejér’s theorem are thus parallel statements in two different approximation settings.
11.2 Best Approximation
Since \(\|x - \bar{k}\| \ge \delta\), we get \(\|k_1 - k_2\|^2 \le 0\), so \(k_1 = k_2\). Existence follows from the fact that the infimum is approached by a sequence \((k_n)\), which one shows is Cauchy using the parallelogram law again, hence converges (by completeness and closedness).
11.3 Chebyshev’s Equioscillation Theorem
The Weierstrass theorem guarantees that polynomials can approximate any continuous function arbitrarily well. But for a given degree \(n\), what is the best polynomial approximation, and how can we recognize it?
for all \(1 \le i \le n+2\). That is, the function alternately attains its maximum and minimum values at \(n+2\) consecutive points.
This forces \(r(x_i)\) to have the same sign as \(g(x_i)\), which alternates. Thus \(r\) changes sign at least \(n+1\) times, giving \(r\) at least \(n+1\) roots. But \(\deg(r) \le n\), so \(r \equiv 0\), meaning \(p = q\) — contradicting \(\|f - q\| < \|f - p\|\).
so \(\bar{p}\) also attains the minimum. By the lemma, \(f - \bar{p}\) has equioscillation points \(x_1, \ldots, x_{n+2}\) with \(|(f-\bar{p})(x_i)| = D\). Since \(|(f-p)(x_i)| \le D\) and \(|(f-q)(x_i)| \le D\), the equality \((f-\bar{p})(x_i) = \frac{1}{2}(f-p)(x_i) + \frac{1}{2}(f-q)(x_i)\) with \(|(f-\bar{p})(x_i)| = D\) forces \((f-p)(x_i) = (f-q)(x_i) = (f-\bar{p})(x_i)\) for all \(i\). Hence \((p-q)(x_i) = 0\) for \(i = 1, \ldots, n+2\), and since \(\deg(p-q) \le n\), we get \(p = q\).
11.4 The Stone–Weierstrass Theorem
The Stone–Weierstrass Theorem vastly generalizes the Weierstrass Approximation Theorem. Rather than restricting to polynomials on an interval, it characterizes exactly which subalgebras of \(C(X)\) are dense, for any compact metric space \(X\). The answer is both elegant and powerful: an algebra is dense if and only if it separates points and vanishes nowhere.
- \(A\) separates points if for all \(x \ne y\) in \(X\), there exists \(f \in A\) with \(f(x) \ne f(y)\).
- \(A\) vanishes nowhere if for all \(x \in X\), there exists \(f \in A\) with \(f(x) \ne 0\).
- \(\overline{A}\) is also a subalgebra that separates points and vanishes nowhere.
- If \(f \in \overline{A}\), then \(|f| \in \overline{A}\). This uses the fact that \(|t|\) can be uniformly approximated by polynomials on \([-1,1]\) (e.g., via the Weierstrass theorem for \(\sqrt{t^2}\)), applied to \(f/\|f\|_\infty\).
- \(\overline{A}\) is a vector lattice: \(\max(f,g) = \frac{f+g+|f-g|}{2}\) and \(\min(f,g) = \frac{f+g-|f-g|}{2}\).
- For any \(x \ne y\) and \(r, s \in \mathbb{R}\), there exists \(g \in A\) with \(g(x) = r\), \(g(y) = s\). This uses separation and non-vanishing.
- Given \(f \in C(X)\), \(a \in X\), \(\varepsilon > 0\): for each \(b\), choose \(g_b \in A\) with \(g_b(a) = f(a)\), \(g_b(b) = f(b)\). Since \(g_b - f\) is continuous and zero at \(b\), we have \(g_b < f + \varepsilon\) near \(b\). By compactness, finitely many suffice, and \(h = \min(g_{b_1}, \ldots, g_{b_n})\) satisfies \(h(a) = f(a)\) and \(h < f + \varepsilon\) everywhere.
- Repeat with \(a\) varying and use \(\max\) to get \(g \in \overline{A}\) with \(\|g - f\|_\infty < \varepsilon\).
The self-adjointness condition is necessary: the algebra of holomorphic polynomials in \(z\) separates points on \(\overline{\mathbb{D}}\) and vanishes nowhere, but is not dense because uniform limits of holomorphic functions are holomorphic (Maximum Modulus Principle). More precisely, by Mergelyan’s theorem, the closure of the holomorphic polynomials in \(C(\overline{\mathbb{D}}, \mathbb{C})\) consists exactly of the continuous functions on \(\overline{\mathbb{D}}\) that are holomorphic on \(\mathbb{D}\), which is a proper closed subspace.
- Bernstein polynomials: constructive, but with slow convergence rate \(O(1/\sqrt{n})\).
- Weierstrass Approximation: polynomials are dense in \(C[a,b]\). Proved via Bernstein polynomials.
- Stone--Weierstrass: any separating, non-vanishing subalgebra is dense in \(C(X)\) for compact \(X\). Proved via lattice theory.
Chapter 12: Ordinary Differential Equations
The final chapter applies the machinery developed throughout the course — completeness, contraction mappings, compactness, and the Arzela–Ascoli theorem — to prove fundamental existence and uniqueness results for ordinary differential equations.
12.1 Setup: Reduction to First Order
The ability to treat ODEs as fixed-point problems on function spaces is a paradigmatic example of how abstract mathematics provides concrete computational tools. The strategy is always the same: reformulate the differential equation as an integral equation, interpret the integral equation as a fixed-point problem \(F = T(F)\), and apply a fixed-point theorem (Banach for existence/uniqueness, Schauder/Arzela–Ascoli for existence alone).
A general \(n\)-th order ODE \(y^{(n)} = \Phi(x, y, y', \ldots, y^{(n-1)})\) can be written as a first-order system by introducing the vector \(F = (y, y', \ldots, y^{(n-1)})\). The general initial value problem (IVP) is:
\[ F'(x) = \Phi(x, F(x)), \quad F(a) = \Gamma, \]where \(F: [a,b] \to \mathbb{R}^n\) and \(\Phi: D \subseteq \mathbb{R} \times \mathbb{R}^n \to \mathbb{R}^n\).
This is equivalent to the integral equation
\[ F(x) = \Gamma + \int_a^x \Phi(t, F(t))\, dt, \]which is more amenable to fixed-point methods because the integral operator is a smoothing operation. The key idea of Picard’s proof is to view this integral equation as a fixed-point problem \(F = T(F)\) where \(T\) is the Picard operator acting on the Banach space of continuous functions.
12.2 Picard’s Existence and Uniqueness Theorem
One verifies: (i) \(T\) maps \(\mathcal{X}\) to \(\mathcal{X}\) (since \(|Tf(x) - b| \le K|x-a| \le K\delta\)); and (ii) \(T\) is a contraction with constant \(L\delta < 1\):
\[ \|Tf - Tg\|_\infty = \max_x \left|\int_a^x [\Phi(t,f(t)) - \Phi(t,g(t))]\, dt\right| \le L\delta \|f - g\|_\infty. \]By the Banach Fixed-Point Theorem, \(T\) has a unique fixed point, which is the unique solution.
The Picard iterates \(f_{n+1} = Tf_n\) starting from \(f_0(x) = b\) converge to the solution, providing both a constructive algorithm and an a priori error bound \(\|f_n - y\|_\infty \le \frac{(L\delta)^n}{1 - L\delta} \|f_1 - f_0\|_\infty\).
Define the Picard operator \(T\) by
\[ (Tf)(x) = 1 + x + \frac{x^2}{2} - \int_0^x f(t)\, dt. \]A fixed point of \(T\) solves the ODE. Moreover, \(T\) is a contraction: \(\|Tf - Tg\|_\infty = \|\int_0^x [f(t) - g(t)]\, dt\|_\infty \le |x| \cdot \|f - g\|_\infty \le \frac{1}{2}\|f - g\|_\infty\), with contraction constant \(1/2\).
Computing the iterates starting from \(f_0(x) = 1\):
\[ f_1(x) = 1 + x + \frac{x^2}{2} - x = 1 + \frac{x^2}{2}, \]\[ f_2(x) = 1 + x + \frac{x^2}{2} - \int_0^x \left(1 + \frac{t^2}{2}\right) dt = 1 + \frac{x^2}{2} - \frac{x^3}{6}, \]\[ f_3(x) = 1 - x + x - \frac{x^2}{2} + \frac{x^2}{2} - \frac{x^3}{6} + \frac{x^4}{24} + \cdots \]The pattern reveals the power series of \(e^{-x} + x\). Indeed, the exact solution is \(y(x) = e^{-x} + x\), which one verifies: \(y'(x) = -e^{-x} + 1 = 1 + x - (e^{-x} + x) = 1 + x - y(x)\) and \(y(0) = 1 + 0 = 1\). The Picard iterates converge to the Taylor polynomials of \(e^{-x} + x\).
- \(y_0(x) = 0\) (the trivial solution),
- \(y(x) = x^3\) (verify: \(y' = 3x^2 = 3(x^3)^{2/3}\)),
- For any \(c > 0\): \(y_c(x) = 0\) for \(x \le c\) and \(y_c(x) = (x-c)^3\) for \(x > c\).
12.3 Global Solutions and the Continuation Theorem
- The IVP has a unique solution on \([a,b]\), or
- There exists \(c \in (a,b]\) such that the IVP has a unique solution on \([a,c)\) with \(\lim_{x \to c^-} \|F(x)\| = +\infty\) (the solution "blows up").
The idea is to repeatedly apply the local existence theorem to extend the solution. The solution can fail to be global only if it escapes to infinity in finite time.
Multiplying both sides by the integrating factor \(e^{-\int_a^t v(s)\,ds}\), we get
\[ \frac{d}{dt}\left[U(t) e^{-\int_a^t v}\right] \le \alpha v(t) e^{-\int_a^t v}. \]Integrating from \(a\) to \(t\) (with \(U(a) = 0\)):
\[ U(t) e^{-\int_a^t v} \le \alpha\left(1 - e^{-\int_a^t v}\right), \]hence \(U(t) \le \alpha\left(e^{\int_a^t v} - 1\right)\). Since \(u(t) \le \alpha + U(t)\), we conclude \(u(t) \le \alpha e^{\int_a^t v(s)\, ds}\).
By Gronwall’s inequality with \(\alpha = u(a)\) and \(v(s) = L\), we get \(u(t) \le u(a) e^{L(t-a)}\).
This shows that solutions of Lipschitz ODEs depend continuously on initial conditions, with at most exponential divergence. The Lipschitz constant \(L\) controls how sensitively solutions respond to perturbations — a small \(L\) means nearby solutions stay close for a long time.
where \(\varepsilon(\lambda) = \sup_{(x,y)} \|\Phi_{\lambda_0}(x,y) - \Phi_\lambda(x,y)\| \to 0\) as \(\lambda \to \lambda_0\). By Gronwall with \(\alpha = \varepsilon(\lambda)(b-a)\) and \(v = L\):
\[ u(t) \le \varepsilon(\lambda)(b-a) e^{L(b-a)} \to 0. \]This justifies using numerical ODEs to study sensitivity to model parameters — an essential technique in applied mathematics and control theory.
12.4 Peano’s Existence Theorem
Step 1 (Euler polygons): For each \(n \ge 1\), define the Euler approximation \(F_n\) inductively on the partition \(t_k = a + kh/n\), \(k = 0, 1, \ldots, n\):
\[ F_n(t_k) = \Gamma, \quad F_n(t_{k+1}) = F_n(t_k) + \frac{h}{n}\Phi(t_k, F_n(t_k)), \]and extend linearly between partition points. Each \(F_n\) is continuous, and \(\|F_n(t) - \Gamma\| \le M(t-a) \le Mh \le R\), so \(F_n(t) \in \overline{B}_R(\Gamma)\) for all \(t\).
Step 2 (Equicontinuity): For any \(s, t \in [a, a+h]\),
\[ \|F_n(t) - F_n(s)\| \le M|t - s|, \]since the slope of \(F_n\) on each subinterval is bounded by \(M\). Thus \(\{F_n\}\) is equicontinuous (indeed, uniformly Lipschitz with constant \(M\)).
Step 3 (Uniform boundedness): \(\|F_n\|_\infty \le \|\Gamma\| + R\) for all \(n\).
Step 4 (Arzela–Ascoli): By the Arzela–Ascoli Theorem (Theorem 6.17), there exists a subsequence \(F_{n_k} \to F\) uniformly on \([a, a+h]\).
Step 5 (Passing to the limit): The Euler polygon \(F_n\) satisfies
\[ F_n(t) = \Gamma + \int_a^t \Phi_n(s)\, ds \]where \(\Phi_n(s) = \Phi(t_k, F_n(t_k))\) for \(s \in [t_k, t_{k+1})\). Since \(F_{n_k} \to F\) uniformly, and \(\Phi\) is uniformly continuous on the compact set \([a,a+h] \times \overline{B}_R(\Gamma)\), we have \(\Phi(s, F_{n_k}(s)) \to \Phi(s, F(s))\) uniformly. The “lagging error” \(\|F_n(t_k) - F_n(s)\| \le M/n \to 0\), so \(\Phi_n(s) - \Phi(s, F_{n_k}(s)) \to 0\) uniformly as well. Taking the limit in the integral equation:
\[ F(t) = \Gamma + \int_a^t \Phi(s, F(s))\, ds. \]Since the integrand is continuous, \(F\) is differentiable with \(F'(t) = \Phi(t, F(t))\).
If \(K\) is a nonempty, convex, compact subset of a normed space and \(T: K \to K\) is continuous, then \(T\) has a fixed point.
To apply this: the integral operator \(T(F)(t) = \Gamma + \int_a^t \Phi(s, F(s))\, ds\) maps the convex set \(K = \{F \in C[a, a+h] : \|F - \Gamma\|_\infty \le R\}\) to itself. By the Arzela–Ascoli theorem, \(T(K)\) has compact closure (since the image is equicontinuous and bounded). This requires passing to the convex hull of the closure. The Schauder theorem then gives a fixed point, which is a solution.
The Schauder theorem is strictly more general than Banach’s contraction theorem (it does not require contractivity, only continuity on a compact convex set), but it sacrifices uniqueness and constructivity. It is one of the cornerstones of nonlinear functional analysis, with applications to partial differential equations, integral equations, and game theory (where it yields Nash’s existence theorem for equilibria).
- Picard requires \(\Phi\) to be Lipschitz in \(y\) and gives both existence and uniqueness. The proof uses the Banach Fixed-Point Theorem on the complete space \(C[a-\delta, a+\delta]\).
- Peano requires only continuity of \(\Phi\) and gives existence (without uniqueness). The proof uses the Arzela--Ascoli Theorem --- compactness of function spaces replaces the contraction principle.
| Property | Picard | Peano |
|---|---|---|
| Hypothesis on \(\Phi\) | Lipschitz in \(y\) | Continuous |
| Conclusion | Existence + uniqueness | Existence only |
| Proof method | Banach Fixed-Point | Arzela–Ascoli |
| Key space property | Completeness | Compactness |
| Constructive? | Yes (Picard iterates) | No (subsequence extraction) |
| Error bounds? | Yes (a priori and a posteriori) | No |
The progression from Picard to Peano mirrors a general pattern in analysis: weakening hypotheses requires stronger topological machinery (compactness rather than mere completeness) and yields weaker conclusions (existence without uniqueness, no constructive algorithm).
where \(z = y'\). Here \(\Phi(x, y, z) = (z, -y)\) is globally Lipschitz (with constant 1), so Picard’s theorem guarantees a unique global solution. The Picard iterates starting from \((y_0, z_0) = (0, 1)\) give:
\[ y_1(x) = x, \quad z_1(x) = 1, \quad y_2(x) = x, \quad z_2(x) = 1 - x^2/2, \quad y_3(x) = x - x^3/6, \ldots \]These converge to \(y(x) = \sin(x)\), \(z(x) = \cos(x)\). The Picard iterates recover the Taylor series of the trigonometric functions.
12.5 Stability of Fixed Points
- attracting (stable) if \(|T'(x^*)| < 1\) --- nearby iterates converge to \(x^*\);
- repelling (unstable) if \(|T'(x^*)| > 1\) --- nearby iterates move away from \(x^*\);
- indeterminate if \(|T'(x^*)| = 1\) --- higher-order analysis is needed.
For an attracting fixed point with \(|T'(x^*)| < 1\), continuity of \(T'\) gives an interval around \(x^*\) on which \(|T'| \le c < 1\), so by the Mean Value Theorem, \(T\) is a contraction on this interval. The Banach Fixed-Point Theorem then guarantees that iterates starting in this interval converge to \(x^*\). This principle underlies the stability theory of differential equations: fixed points of the time-\(t\) flow map determine the long-term behavior of solutions.
For the logistic map with \(r = 2.5\): \(x^* = 0.6\), \(T'(x^*) = -0.5\), so iterates spiral inward. With \(r = 3.5\): \(x^*\) is repelling, but a stable 2-cycle emerges. The cascade of period-doubling bifurcations at \(r = 3, 3.449, 3.544, \ldots\) converges to the Feigenbaum point \(r_\infty \approx 3.5699\), beyond which the dynamics become chaotic. The Feigenbaum constant \(\delta \approx 4.6692\), governing the rate of convergence of the bifurcation cascade, is a universal constant appearing in all one-dimensional maps with quadratic maxima — a remarkable discovery that connects the Contraction Mapping Principle to the theory of dynamical systems.
- For any \(\varepsilon > 0\), there exists a norm \(\|\cdot\|_\varepsilon\) on \(\mathbb{R}^n\) such that \(\|DT(\mathbf{x}^*)\|_\varepsilon \le \rho + \varepsilon\).
- Choosing \(\varepsilon\) so that \(\rho + \varepsilon < 1\), the map \(T\) becomes a contraction near \(\mathbf{x}^*\) in this norm.
For \(\|\mathbf{y}(0)\|\) small enough, the remainder term \(\mathbf{r}\) is negligible compared to the linear decay, and Gronwall’s inequality shows \(\|\mathbf{y}(t)\| \le K e^{-\mu t/2}\|\mathbf{y}(0)\|\) for some \(K > 0\).
12.6 Summary: The Architecture of Real Analysis
The following diagram summarizes the logical dependencies among the main results of the course:
- Cardinality (Ch. 1) provides the set-theoretic foundation, culminating in the distinction between countable and uncountable. The uncountability of \(\mathbb{R}\) is the starting point for all of analysis.
- Metric spaces (Ch. 2--3) abstract the notion of distance, yielding a general framework for topology (open/closed sets, convergence, density).
- Continuity (Ch. 4) connects topology to analysis. The three characterizations (\(\varepsilon\)-\(\delta\), sequential, topological) provide flexible tools for different contexts.
- Completeness (Ch. 5) --- every Cauchy sequence converges --- is the foundation for three pillars:
- The Baire Category Theorem (Ch. 8): complete spaces cannot be written as countable unions of nowhere dense sets. Applications include existence of nowhere differentiable functions and continuity of pointwise limits.
- The Contraction Mapping Principle (Ch. 9): in complete spaces, contractions have unique fixed points. Applications include Picard's ODE theorem, Newton's method, and fractals.
- Metric completion (Ch. 10): every metric space embeds densely in a unique complete space. Applications include the construction of \(\mathbb{R}\) and the p-adic numbers.
- Compactness (Ch. 6) = completeness + total boundedness. It provides the Arzela--Ascoli theorem (compactness in function spaces), the Extreme Value Theorem, and uniform continuity. It is essential for Peano's existence theorem.
- Connectedness (Ch. 7) captures "one-piece" spaces. The Intermediate Value Theorem is its key consequence.
- Approximation theory (Ch. 11) uses completeness (Stone--Weierstrass) and probability (Bernstein) to show dense subsets exist in function spaces.
- ODEs (Ch. 12) synthesize all threads: contraction mapping (Picard), compactness (Peano), completeness (convergence of iterates), and Gronwall's inequality (stability).
The progression mirrors a recurring pattern in mathematics: concrete problems (convergence of Fourier series, existence of solutions to differential equations, pathological functions) drive the development of abstract theories (metric spaces, completeness, compactness) which then provide systematic tools far beyond the original motivations. Weierstrass’s nowhere differentiable function (1872) shocked mathematicians into recognizing that intuition about continuous functions could be profoundly misleading — a realization that demanded rigorous foundations. Cantor’s theory of infinite sets (1870s–80s), originally motivated by the question of uniqueness for trigonometric series, grew into the foundation of all modern mathematics.
The fixed-point theorems illustrate another important theme: the interplay between existence and construction. The Banach contraction principle (1922) provides both existence and an algorithm. The Brouwer fixed-point theorem (1911) and its infinite-dimensional extension by Schauder (1930) provide existence but no algorithm. The Baire category theorem (1899) shows that “generic” objects exist without constructing any specific one. Each level of abstraction trades constructive information for broader applicability.
- PMATH 450 (Lebesgue Integration): The completion of \(C[a,b]\) under the \(L^1\) and \(L^p\) norms gives the Lebesgue spaces. The Riesz--Fischer theorem shows that \(L^p\) spaces are complete (Banach spaces), and for \(p = 2\), they are Hilbert spaces. The distinction between Riemann and Lebesgue integration --- illustrated by the Cantor function (Example 6.18b) --- becomes precise.
- PMATH 453 (Functional Analysis): The Baire Category Theorem leads to the three fundamental theorems: the Open Mapping Theorem, the Closed Graph Theorem, and the Uniform Boundedness Principle (of which Theorem 8.9a is a "baby" version). The Stone--Weierstrass theorem generalizes to C*-algebras. The compactness theory extends to weak topologies via the Banach--Alaoglu theorem.
- PMATH 440 (Analytic Number Theory): The p-adic numbers (Chapter 10) are fundamental. Ostrowski's theorem classifies all completions of \(\mathbb{Q}\). The product formula connects archimedean and non-archimedean absolute values. Adeles and ideles provide a unified framework for local-global principles.
- PMATH 465 (Differential Geometry): The notion of completeness for Riemannian manifolds (every Cauchy sequence converges, equivalently every geodesic extends to all time) is central. The Hopf--Rinow theorem states that for connected Riemannian manifolds, metric completeness, geodesic completeness, and the Heine--Borel property are equivalent.
- Dynamical Systems: The contraction mapping principle and stability theory (Section 12.5) extend to ergodic theory. The Baire category theorem proves that generic dynamical systems exhibit chaotic behavior. The Arzela--Ascoli theorem underlies the compactness arguments in variational methods for ODEs and PDEs.
Sources and Acknowledgements
These notes draw on the following sources, which are gratefully acknowledged:
Primary source:
- Stephen New, PMATH 351 Lecture Notes, University of Waterloo, Fall 2018. The course structure, cardinality chapter (Ch. 1), core metric space development (Ch. 2–3), and continuity theory (Ch. 4) follow New’s notes most closely.
Instructor sources:
- Kenneth Davidson, PMath 351 Notes (University of Waterloo, hosted at
math.uwaterloo.ca/~krdavids). Davidson’s notes provide the normed space treatment (inner products, Minkowski/Hölder inequalities), the p-adic numbers (Ch. 10), Newton’s method (§9.3), the construction of \(\mathbb{R}\) from \(\mathbb{Q}\) (§10.4), and the ODE theory including Peano’s theorem and stability (Ch. 12). - Brian Forrest, PMATH 351 Notes (University of Waterloo, hosted at
math.uwaterloo.ca/~nspronk). Forrest’s notes contribute the treatment of the Axiom of Choice and Zorn’s Lemma (§1.5), the Banach contraction mapping theorem (§9.2), the Baire Category Theorem (Ch. 8), compactness characterizations via FIP and total boundedness (Ch. 6), and Stone-Weierstrass (Ch. 11). - Laurent Marcoux, PMATH 351 Notes (University of Waterloo, hosted at
math.uwaterloo.ca/~nspronk). Marcoux’s notes contribute the Urysohn Lemma and Tietze Extension Theorem (§3.5), topological spaces and normality, the completions chapter (Ch. 10), the Uniform Boundedness Principle (§8.5), and the Stone-Weierstrass theorem via lattice approximation (Ch. 11).
Student notes:
- Felix Zhou, PMATH 351 Notes (Fall 2019, Davidson’s section;
felix-zhou.com). Zhou’s detailed notes provide compactness equivalences with full proofs (FIP, sequential, totally bounded, Ch. 6), the Cantor set characterization via ternary expansions, connectedness examples including the Knaster-Kuratowski fan, Baire applications including the Weierstrass nowhere-differentiable function (§8.7), iterated function systems (§9.4), and the Stone-Weierstrass proof via lattices (Ch. 11). - Richard Wu, PMATH 351 Notes (Fall 2018, Forrest’s section;
richardwu.ca). Wu’s same-semester notes (parallel section to New’s) provide the \(\ell^p\) and normed space treatment and uniform convergence material. - David Duan, PMATH 351 Notes (
david-duan.me). Duan’s notes contribute to the uniform convergence and function space sections (Ch. 5 and Ch. 8).
Further reading:
- Walter Rudin, Principles of Mathematical Analysis (3rd ed.) — the canonical undergraduate reference.
- John Munkres, Topology (2nd ed.) — for topological spaces and general topology beyond the metric setting.
- W. A. Sutherland, Introduction to Metric and Topological Spaces — an accessible companion text.
- Walter Rudin, Real and Complex Analysis — for the Lebesgue theory that follows naturally from this course.