PMATH 441/641: Algebraic Number Theory
David McKinnon
Estimated study time: 6 hr 40 min
Table of contents
These notes synthesize material from multiple sources on algebraic number theory as taught at the University of Waterloo. The following resources were consulted and are gratefully credited:
- David McKinnon, Lecture Notes for PM 441/741, Department of Pure Mathematics, University of Waterloo, Spring 2024. PDF
- PMATH 441/641 Course Notes, Department of Pure Mathematics, University of Waterloo, Winter 2000. PDF
- S. New, PMATH 441 Lecture Notes (Chapters 1–6 and Appendix on Continued Fractions), Department of Mathematics, University of Waterloo. Course page
- Alex Rutar, Algebraic Number Theory (course notes for PMATH 441/641, Winter 2019), University of Waterloo. GitHub
Chapter 1: Factorization in Rings
The study of algebraic number theory begins with a careful analysis of factorization in commutative rings. In the familiar setting of the integers, every positive integer greater than 1 factors uniquely as a product of primes. This fundamental theorem of arithmetic, however, is not a universal phenomenon: many rings of algebraic integers fail to enjoy unique factorization, and understanding when and why this failure occurs is one of the central motivations for algebraic number theory. In this chapter, we develop the general theory of divisibility and factorization in commutative rings, establish the hierarchy of Euclidean domains, principal ideal domains, and unique factorization domains, and study concrete examples including the Gaussian integers and polynomial rings.
Divisibility and Associates
We begin with the most basic notions of divisibility in a commutative ring, generalizing the familiar concepts from the integers.
The relation of divisibility is intimately connected to the containment of principal ideals. The following theorem collects the basic properties of divisibility and associates in commutative rings.
- \(a \mid b\) if and only if \(b \in \langle a \rangle\) if and only if \(\langle b \rangle \subseteq \langle a \rangle\).
- \(a \sim b\) if and only if \(\langle a \rangle = \langle b \rangle\) if and only if \(a\) and \(b\) have the same multiples and divisors.
- \(a \sim 0\) if and only if \(a = 0\) if and only if \(\langle a \rangle = \{0\}\).
- \(a \sim 1\) if and only if \(a\) is a unit if and only if \(\langle a \rangle = R\).
- If \(R\) is an integral domain, then \(a \sim b\) if and only if \(b = au\) for some unit \(u \in R\).
Primes and Irreducibles
The integers enjoy the property that an element is prime if and only if it is irreducible. In a general commutative ring, these are distinct concepts, and understanding their relationship is essential.
- We say that \(a\) is reducible when \(a = bc\) for some non-units \(b, c \in R\), and otherwise we say that \(a\) is irreducible.
- We say that \(a\) is prime when for all \(b, c \in R\), if \(a \mid bc\) then either \(a \mid b\) or \(a \mid c\).
The distinction between primes and irreducibles is subtle but important. The property of being prime is fundamentally about divisibility behavior with respect to products, while irreducibility is about the impossibility of non-trivial factorizations. In the integers, these notions coincide, but in more exotic rings they can diverge.
- \(a = 0\) if and only if \(b = 0\).
- \(a\) is a unit if and only if \(b\) is a unit.
- \(a\) is irreducible if and only if \(b\) is irreducible.
- \(a\) is prime if and only if \(b\) is prime.
The next theorem establishes the fundamental relationship between primes and irreducibles: every prime is irreducible, but the converse may fail.
Prime and Maximal Ideals
The ideal-theoretic perspective on primality and irreducibility provides powerful tools and elegant characterizations through quotient rings.
The connection between elements and ideals is made precise by the following result.
- \(a\) is prime if and only if \(\langle a \rangle\) is a non-zero prime ideal.
- If \(R\) is an integral domain, then \(a\) is irreducible if and only if \(\langle a \rangle\) is maximal amongst non-zero principal ideals.
For (2), suppose \(a\) is irreducible and \(\langle a \rangle \subseteq \langle b \rangle\) for some non-zero \(b \in R\). Then \(b \mid a\), so \(a = bc\) for some \(c \in R\). Since \(a\) is irreducible, either \(b\) or \(c\) is a unit. If \(b\) is a unit, then \(\langle b \rangle = R\). If \(c\) is a unit, then \(a \sim b\) and \(\langle a \rangle = \langle b \rangle\). Conversely, if \(\langle a \rangle\) is maximal among non-zero principal ideals and \(a = bc\), then \(\langle a \rangle \subseteq \langle b \rangle\), so either \(\langle a \rangle = \langle b \rangle\) (meaning \(b \sim a\), so \(c\) is a unit) or \(\langle b \rangle = R\) (meaning \(b\) is a unit). ∎
The most important characterizations of prime and maximal ideals are through the structure of quotient rings. These results connect the abstract ideal-theoretic notions to the concrete algebraic properties of being an integral domain or a field.
Conversely, suppose \(R/P\) is an integral domain. Since \(1 + P \neq 0 + P\), we have \(1 \notin P\), so \(P \neq R\). Let \(a, b \in R\) with \(ab \in P\). Then \((a+P)(b+P) = ab + P = 0 + P\). Since \(R/P\) has no zero divisors, \(a + P = 0 + P\) or \(b + P = 0 + P\), so \(a \in P\) or \(b \in P\). ∎
Conversely, suppose \(R/M\) is a field. Since \(1 + M \neq 0 + M\), we have \(M \neq R\). Let \(I\) be an ideal with \(M \subseteq I \subseteq R\) and \(I \neq M\). Choose \(a \in I\) with \(a \notin M\). Then \(a + M \neq 0 + M\) in \(R/M\), so \(a + M\) has an inverse: \((a+M)(b+M) = 1 + M\) for some \(b\). Then \(1 - ab \in M \subseteq I\), and since \(a \in I\) we have \(ab \in I\), so \(1 \in I\), giving \(I = R\). ∎
Since every field is an integral domain, we immediately obtain:
These characterizations have immediate and illuminating applications.
The ED-PID-UFD Hierarchy
The most important classes of integral domains, from the viewpoint of factorization, form a strict chain of inclusions: Euclidean domains are contained in principal ideal domains, which are contained in unique factorization domains. We now develop each class and prove these containments.
- (Existence) \(a = a_1 a_2 \cdots a_\ell\) for some \(\ell \in \mathbb{Z}^+\) and some irreducible elements \(a_i \in R\), and
- (Uniqueness) if \(a = a_1 a_2 \cdots a_\ell = b_1 b_2 \cdots b_m\) where each \(a_i\) and \(b_j\) is irreducible, then \(m = \ell\) and for some permutation \(\sigma \in S_m\) we have \(a_i \sim b_{\sigma(i)}\) for all \(i\).
We now prove the first link in the chain: every Euclidean domain is a principal ideal domain.
To prove the next implication, we first need a chain condition on ideals.
Now we establish the second link in the chain, which is the deepest result in this section.
Existence of an irreducible factor. If \(a\) is irreducible, we are done. Otherwise, \(a = a_1 b_1\) where \(a_1, b_1\) are non-units, and \(\langle a \rangle \subsetneq \langle a_1 \rangle\). If \(a_1\) is irreducible, we are done. Otherwise, \(a_1 = a_2 b_2\) with \(a_2, b_2\) non-units, and \(\langle a \rangle \subsetneq \langle a_1 \rangle \subsetneq \langle a_2 \rangle\). Continuing, this process must terminate since \(R\) is Noetherian (Theorem 1.16), yielding an irreducible factor \(a_n\) of \(a\).
Existence of a complete factorization. If \(a\) is irreducible, we are done. Otherwise, let \(a_1\) be an irreducible factor of \(a\), say \(a = a_1 b_1\). Then \(b_1\) is not a unit (since \(a\) is reducible while \(a_1\) is irreducible). If \(b_1\) is irreducible, we are done. Otherwise, let \(a_2\) be an irreducible factor of \(b_1\), say \(b_1 = a_2 b_2\). Continuing, we obtain \(\langle a \rangle \subsetneq \langle b_1 \rangle \subsetneq \langle b_2 \rangle \subsetneq \cdots\). By the Noetherian property, this chain must stabilize, so eventually some \(b_n\) is irreducible and \(a = a_1 a_2 \cdots a_n b_n\).
Uniqueness. Suppose \(a = a_1 a_2 \cdots a_\ell = b_1 b_2 \cdots b_m\) where each \(a_i\) and \(b_j\) is irreducible. Since \(a_1 \mid b_1 b_2 \cdots b_m\), and since \(a_1\) is irreducible in a PID, the ideal \(\langle a_1 \rangle\) is maximal among nonzero principal ideals (Theorem 1.7), hence is a nonzero maximal ideal (in a PID, maximal among principal ideals means maximal), hence prime (Corollary 1.10), so \(a_1\) is prime. Thus \(a_1 \mid b_k\) for some \(k\). After reindexing, we may assume \(a_1 \mid b_1\). Since \(b_1\) is irreducible and \(a_1\) is not a unit, \(a_1 \sim b_1\), say \(b_1 = a_1 u\) for some unit \(u\). Then \(a_1 a_2 \cdots a_\ell = a_1 u b_2 \cdots b_m\), and by cancellation \(a_2 \cdots a_\ell = u b_2 \cdots b_m\). By induction, \(\ell = m\) and \(a_i \sim b_i\) after a suitable permutation of \(b_2, \ldots, b_m\). ∎
Gaussian Integers
The ring of Gaussian integers \(\mathbb{Z}[i] = \{a + bi : a, b \in \mathbb{Z}\}\) is one of the most important examples in algebraic number theory. It is a Euclidean domain with norm \(N(a+bi) = a^2 + b^2\), and hence a UFD. The classification of its prime elements provides a beautiful connection between the arithmetic of \(\mathbb{Z}[i]\) and the representation of integers as sums of two squares.
- \(1 + i\),
- \(p\), where \(p\) is a prime number in \(\mathbb{Z}^+\) with \(p \equiv 3 \pmod{4}\),
- \(x + iy\), where \(x, y \in \mathbb{Z}\) with \(0 < y \leq x\) and \(x^2 + y^2 = p\) for some prime number \(p \in \mathbb{Z}^+\) with \(p \equiv 1 \pmod{4}\).
Let \(\pi\) be a prime in \(\mathbb{Z}[i]\). Then \(\pi \mid \pi\bar{\pi} = N(\pi) \in \mathbb{Z}^+\), so \(\pi\) divides some rational prime \(p\). Write \(p = \pi \alpha\) in \(\mathbb{Z}[i]\), so \(p^2 = N(p) = N(\pi)N(\alpha)\). Since \(\pi\) is not a unit, \(N(\pi) > 1\), so \(N(\pi) = p\) or \(N(\pi) = p^2\).
Case 1: \(N(\pi) = p\). Then \(p = a^2 + b^2\) where \(\pi = a + bi\). This requires \(p = 2\) (giving \(\pi \sim 1+i\)) or \(p \equiv 1 \pmod{4}\) (by Fermat’s theorem on sums of two squares).
Case 2: \(N(\pi) = p^2\). Then \(N(\alpha) = 1\), so \(\alpha\) is a unit and \(\pi \sim p\). Taking norms, \(p^2 = N(\pi)\), so \(p\) is irreducible in \(\mathbb{Z}[i]\). If \(p \equiv 1 \pmod{4}\), then \(p = a^2 + b^2\) for some integers \(a, b\), giving \(p = (a+bi)(a-bi)\) with both factors non-units, contradicting irreducibility. If \(p = 2\), then \(2 = -i(1+i)^2\) is not irreducible. Thus \(p \equiv 3 \pmod{4}\).
Conversely, one verifies that each element listed is indeed prime. ∎
This classification leads to the classical characterization of which positive integers are representable as sums of two squares.
where \(m \in \mathbb{N}\), \(k_\alpha, \ell_\beta \in \mathbb{Z}^+\), the \(p_\alpha\) are distinct primes with \(p_\alpha \equiv 1 \pmod{4}\), and the \(q_\beta\) are distinct primes with \(q_\beta \equiv 3 \pmod{4}\). Then there exists a solution \((x, y) \in \mathbb{Z}^2\) to \(x^2 + y^2 = n\) if and only if each exponent \(\ell_\beta\) is even, and in this case the number of solutions \((x, y) \in \mathbb{Z}^2\) is equal to \(4 \prod_\alpha (k_\alpha + 1)\).
Units in Quadratic Integer Rings
The structure of units in quadratic integer rings is governed by a remarkable theorem that connects to Pell’s equation.
Polynomial Rings
The theory of polynomials over integral domains provides essential tools for algebraic number theory. We collect here the key results about polynomial division, irreducibility criteria, and the passage between \(\mathbb{Z}[x]\) and \(\mathbb{Q}[x]\).
The Division Algorithm and its Consequences
Uniqueness. Suppose \(f = qg + r = pg + s\) with \(\deg(r), \deg(s) < \deg(g)\). Then \((q-p)g = s - r\). Since the leading coefficient of \(g\) is a unit (hence not a zero divisor), \(\deg((q-p)g) = \deg(q-p) + \deg(g)\). If \(q - p \neq 0\), then \(\deg((q-p)g) \geq \deg(g)\), contradicting \(\deg(s-r) < \deg(g)\). Thus \(q = p\) and \(r = s\). ∎
Two classical and frequently used consequences of the division algorithm are the Remainder and Factor theorems.
Gauss’ Lemma and Irreducibility Criteria
To transfer irreducibility questions between \(\mathbb{Z}[x]\) and \(\mathbb{Q}[x]\), we need the notion of content and the fundamental result of Gauss.
- For all \(f, g \in \mathbb{Z}[x]\), we have \(c(fg) = c(f)c(g)\). In particular, the product of primitive polynomials is primitive.
- Let \(0 \neq f \in \mathbb{Z}[x]\) and let \(g(x) = \frac{1}{c(f)} f(x) \in \mathbb{Z}[x]\). Then \(f\) is irreducible in \(\mathbb{Q}[x]\) if and only if \(g\) is irreducible in \(\mathbb{Z}[x]\).
Write \(h(x) = \sum_{i=0}^n a_i x^i\) and \(k(x) = \sum_{i=0}^m b_i x^i\). Suppose for contradiction that some prime \(p\) divides \(c(hk)\). Since \(c(h) = 1\), choose the smallest index \(r\) with \(p \nmid a_r\). Since \(c(k) = 1\), choose the smallest index \(s\) with \(p \nmid b_s\). The coefficient of \(x^{r+s}\) in \(hk\) is
\[ c_{r+s} = a_0 b_{r+s} + \cdots + a_r b_s + \cdots + a_{r+s} b_0. \]Since \(p \mid c_{r+s}\) and \(p \mid a_i\) for \(i < r\) and \(p \mid b_j\) for \(j < s\), it follows that \(p \mid a_r b_s\). Since \(p\) is prime, \(p \mid a_r\) or \(p \mid b_s\), contradicting our choices.
Part (2). Let \(g(x) = \frac{1}{c(f)} f(x)\) so \(c(g) = 1\).
Suppose \(g\) is reducible in \(\mathbb{Z}[x]\), say \(g = hk\) with \(h, k\) non-units in \(\mathbb{Z}[x]\). Since \(c(h)c(k) = c(g) = 1\), both \(h\) and \(k\) are primitive, hence nonconstant. Then \(f = c(f)g = c(f) \cdot h \cdot k\), and since \(c(f)h\) and \(k\) are nonconstant in \(\mathbb{Q}[x]\), \(f\) is reducible in \(\mathbb{Q}[x]\).
Conversely, suppose \(f\) is reducible in \(\mathbb{Q}[x]\), say \(f = h \cdot k\) with \(h, k\) nonconstant in \(\mathbb{Q}[x]\). Clearing denominators and dividing out content, write \(h = \frac{c(ah)}{a} p\) and \(k = \frac{c(bk)}{b} q\) where \(p, q \in \mathbb{Z}[x]\) are primitive and nonconstant. Then \(f = c(ah)c(bk) \cdot pq / (ab)\), and comparing content yields \(g = pq\), a product of nonconstant polynomials in \(\mathbb{Z}[x]\). ∎
This powerful result means that to test irreducibility over \(\mathbb{Q}\), we need only look for factorizations over \(\mathbb{Z}\).
Thus \(r \mid c_0 s^n\). Since \(\gcd(r, s) = 1\), we have \(\gcd(r, s^n) = 1\), so \(r \mid c_0\). Similarly, \(s \mid c_n r^n\) and \(\gcd(s, r^n) = 1\) give \(s \mid c_n\). ∎
The next two results provide efficient irreducibility tests that avoid the need to search for factorizations directly.
Since \(p \mid c_1 = a_0 b_1 + a_1 b_0\) and \(p \mid a_0\), we get \(p \mid a_1 b_0\), and since \(p \nmid b_0\), we get \(p \mid a_1\). Continuing inductively, \(p \mid c_j\) for \(j < n\) and \(p \mid a_i\) for \(i < j\) together imply \(p \mid a_j b_0\), hence \(p \mid a_j\). In particular, \(p \mid a_k\). But then \(p \mid a_k b_\ell = c_n\), contradicting \(p \nmid c_n\). ∎
Exercises
Chapter 2: Field Extensions and Galois Theory
Galois theory is one of the crown jewels of algebra, providing a profound correspondence between the structure of field extensions and the structure of groups of automorphisms. This chapter develops the theory of field extensions, embeddings, and normal extensions, culminating in the Fundamental Theorem of Galois Theory. Throughout, we work with subfields of \(\mathbb{C}\), which simplifies several technical aspects while retaining all the essential ideas needed for algebraic number theory.
Field Extensions and Degree
The degree of an extension is the single most important numerical invariant. The following result shows that degrees are multiplicative in towers, which is the field-theoretic analogue of the dimension formula for nested vector spaces.
More precisely, if \(U\) is a basis for \(K\) over \(F\) and \(V\) is a basis for \(L\) over \(K\), then \(\{uv : u \in U, v \in V\}\) is a basis for \(L\) over \(F\).
Spanning. Let \(a \in L\). Since \(V\) spans \(L/K\), write \(a = \sum_j s_j v_j\) with \(s_j \in K\). Since \(U\) spans \(K/F\), write each \(s_j = \sum_i r_{ij} u_i\) with \(r_{ij} \in F\). Then \(a = \sum_{i,j} r_{ij} u_i v_j\), so \(W\) spans \(L/F\).
Linear independence. Suppose \(\sum_{i,j} r_{ij} u_i v_j = 0\) with \(r_{ij} \in F\). Then \(\sum_j \left(\sum_i r_{ij} u_i\right) v_j = 0\). Since each \(\sum_i r_{ij} u_i \in K\) and \(V\) is linearly independent over \(K\), we get \(\sum_i r_{ij} u_i = 0\) for each \(j\). Since \(U\) is linearly independent over \(F\), we get \(r_{ij} = 0\) for all \(i, j\). ∎
Algebraic Elements and Minimal Polynomials
The key dichotomy for elements in field extensions is whether they satisfy a polynomial equation or not. This distinction governs the entire structure theory.
When \(F\) and \(K\) are fields with \(F \subseteq K\) and \(U \subseteq K\), the subfield of \(K\) generated by \(U\) over \(F\), denoted \(F(U)\), is the smallest subfield of \(K\) containing \(F \cup U\). When \(U = \{u_1, \ldots, u_n\}\), we have
\[ F(u_1, \ldots, u_n) = \left\{\frac{f(u_1, \ldots, u_n)}{g(u_1, \ldots, u_n)} : f, g \in F[x_1, \ldots, x_n],\; g(u_1, \ldots, u_n) \neq 0\right\}. \]The fundamental structure theorem for simple extensions tells us that the algebraic/transcendental dichotomy completely determines the structure of \(F(a)\).
- If \(a\) is transcendental over \(F\), then \(F[a] \cong F[x]\) and \(F(a) \cong F(x)\) (the field of rational functions). In this case \([F(a) : F] = \infty\) and the set \(\{1, a, a^2, \ldots\}\) is linearly independent over \(F\).
- If \(a\) is algebraic over \(F\), then there exists a unique monic irreducible polynomial \(f(x) \in F[x]\) with \(f(a) = 0\). The ideal generated by this polynomial satisfies \(\langle f \rangle = \{g \in F[x] : g(a) = 0\}\), and we have
\[
F(a) = F[a] \cong F[x]/\langle f \rangle.
\]
Setting \(n = \deg(f)\), the set \(\{1, a, a^2, \ldots, a^{n-1}\}\) is a basis for \(F(a)\) over \(F\) and \([F(a) : F] = n\).
(1) If \(a\) is transcendental, then \(\ker(\phi) = \{0\}\), so \(\phi\) is injective and \(F[a] \cong F[x]\). Since \(F[x]\) is an integral domain, \(F(a)\) is its field of fractions, isomorphic to \(F(x)\).
(2) If \(a\) is algebraic, then \(\ker(\phi) \neq \{0\}\). Since \(F[x]\) is a PID, \(\ker(\phi) = \langle f \rangle\) for some monic polynomial \(f\). We claim \(f\) is irreducible. If \(f = gh\) with \(g, h\) nonconstant, then \(0 = f(a) = g(a)h(a)\). Since \(K\) is a field (hence an integral domain), \(g(a) = 0\) or \(h(a) = 0\), contradicting the minimality of \(\deg(f)\) among nonzero elements of \(\ker(\phi)\). By the first isomorphism theorem, \(F[a] \cong F[x]/\langle f \rangle\). Since \(f\) is irreducible in the PID \(F[x]\), \(\langle f \rangle\) is maximal, so \(F[x]/\langle f \rangle\) is a field, giving \(F[a] = F(a)\). The images of \(1, x, \ldots, x^{n-1}\) form a basis for \(F[x]/\langle f \rangle\), so \(\{1, a, \ldots, a^{n-1}\}\) is a basis for \(F(a)/F\). ∎
The minimal polynomial interacts well with extensions of the base field:
The following results establish fundamental properties of algebraic extensions.
Each \(c_i\) is algebraic over \(F\), so each step \(F(c_0, \ldots, c_{i-1}) \subseteq F(c_0, \ldots, c_i)\) has finite degree. The final step also has finite degree since \(a\) is a root of \(f(x) \in F(c_0, \ldots, c_{n-1})[x]\). By the tower law, \([F(c_0, \ldots, c_{n-1}, a) : F]\) is finite, so \(a\) is algebraic over \(F\). ∎
Conjugates and Embeddings
The notion of conjugates generalizes the familiar idea that a quadratic irrational \(\alpha = a + b\sqrt{d}\) has a conjugate \(\bar{\alpha} = a - b\sqrt{d}\). More generally, the conjugates of an algebraic element are the roots of its minimal polynomial, and they are intimately connected to field embeddings.
Note that \(\operatorname{Aut}_K(L)\) is a group under composition and \(\operatorname{Aut}_K(L) \subseteq \operatorname{Hom}_K(L, M)\).
The following observation will be used repeatedly: for finite extensions, an embedding whose image lands inside the source must be an automorphism.
The Separability of Subfields of C
Before proving the embedding extension theorem, we need the crucial fact that irreducible polynomials over subfields of \(\mathbb{C}\) have no repeated roots. This is the separability property, which holds automatically in characteristic zero.
The Embedding Extension Theorem
The embedding extension theorem is the engine that drives the entire theory of field extensions and Galois theory. It tells us exactly how many ways an embedding can be extended when we adjoin an algebraic element.
Let \(\psi : K(a) \to \mathbb{C}\) be any embedding extending \(\phi\). Since \(\psi(c_i) = \phi(c_i)\), applying \(\psi\) to \(f(a) = 0\) gives \(g(\psi(a)) = 0\). Thus \(\psi(a)\) must be one of \(b_1, \ldots, b_n\).
Conversely, for each root \(b_k\), the formula
\[ \psi_k\left(\sum_{i=0}^{n-1} r_i a^i\right) = \sum_{i=0}^{n-1} \phi(r_i) b_k^i \]defines an embedding \(\psi_k : K(a) \to \mathbb{C}\) extending \(\phi\) with \(\psi_k(a) = b_k\). This is well-defined since \(\{1, a, \ldots, a^{n-1}\}\) is a basis for \(K(a)/K\), and one verifies it is an injective ring homomorphism. Thus there are exactly \(n\) extensions. ∎
By the Primitive Element Theorem (which we prove below), this generalizes immediately:
The Primitive Element Theorem
The Primitive Element Theorem asserts that every finite extension of subfields of \(\mathbb{C}\) is simple, i.e., generated by a single element. This is a remarkably powerful simplification.
Let \(u, v \in L\). Let \(f(x) \in K[x]\) be the minimal polynomial of \(u\) over \(K\) with roots \(a_1 = u, a_2, \ldots, a_k\) in \(\mathbb{C}\), and let \(g(x) \in K[x]\) be the minimal polynomial of \(v\) over \(K\) with roots \(b_1 = v, b_2, \ldots, b_\ell\) in \(\mathbb{C}\). Choose \(t \in K\) such that \(t \neq -\frac{u - a_i}{v - b_j}\) for any pair of indices \((i, j)\) with \((i,j) \neq (1,1)\). Such a choice is possible since \(K\) is infinite (as it contains \(\mathbb{Q}\)). Let \(w = u + tv\).
Clearly \(w \in K(u, v)\), so \(K(w) \subseteq K(u, v)\). We claim \(K(u, v) \subseteq K(w)\). Consider the polynomial \(h(x) = f(w - tx) \in K(w)[x]\). Note that \(h(v) = f(w - tv) = f(u) = 0\) and \(g(v) = 0\). If \(x \in \mathbb{C}\) is a common root of \(g\) and \(h\), then \(g(x) = 0\) implies \(x = b_j\) for some \(j\), and \(h(x) = 0\) implies \(f(w - tx) = 0\), so \(w - tb_j = a_i\) for some \(i\), giving \(t = \frac{w - a_i}{b_j} = \frac{u + tv - a_i}{b_j}\). If \(b_j \neq v\), then \(t = -\frac{u - a_i}{v - b_j}\) for some \((i, j) \neq (1,1)\), contradicting our choice of \(t\). Thus \(v\) is the only common root of \(g\) and \(h\).
Let \(d(x) = \gcd(g(x), h(x))\) in \(K(w)[x]\). Since \(v\) is the only common root and all roots are simple (by separability), \(d(x) = (x - v)\). Since \(d(x) \in K(w)[x]\), it follows that \(v \in K(w)\). Then \(u = w - tv \in K(w)\), so \(K(u, v) \subseteq K(w)\). ∎
Normal Extensions
Not every finite extension is “well-behaved” from the Galois-theoretic perspective. The concept of normality isolates those extensions for which the Galois group achieves its maximum possible size.
The next theorem gives several equivalent characterizations of normality, each useful in different contexts.
- \(|\operatorname{Aut}_K(L)| = [L : K]\).
- \(\operatorname{Hom}_K(L, \mathbb{C}) = \operatorname{Aut}_K(L)\).
- For every \(a \in L\), the minimal polynomial of \(a\) over \(K\) splits in \(L\).
- \(L\) is the splitting field of some polynomial \(f(x) \in K[x]\).
(2) \(\Rightarrow\) (3): Suppose \(\operatorname{Hom}_K(L, \mathbb{C}) = \operatorname{Aut}_K(L)\). Let \(a \in L\) and let \(f(x) \in K[x]\) be the minimal polynomial of \(a\) over \(K\), with roots \(a_1 = a, a_2, \ldots, a_m\) in \(\mathbb{C}\). The identity embedding \(\operatorname{id} : K \hookrightarrow \mathbb{C}\) extends to \(m\) embeddings \(\phi_j : K(a) \to \mathbb{C}\) with \(\phi_j(a) = a_j\). Each \(\phi_j\) extends to at least one embedding \(\psi_j : L \to \mathbb{C}\). Since \(\psi_j\) fixes \(K\), we have \(\psi_j \in \operatorname{Hom}_K(L, \mathbb{C}) = \operatorname{Aut}_K(L)\). Therefore \(a_j = \psi_j(a) \in L\) for all \(j\), so \(f\) splits in \(L\).
(3) \(\Rightarrow\) (4): By the Primitive Element Theorem, choose \(a \in L\) with \(L = K(a)\). Let \(f(x)\) be the minimal polynomial of \(a\) with roots \(a_1, \ldots, a_n\). By hypothesis, each \(a_i \in L\), so \(L = K(a) = K(a_1, \ldots, a_n)\) is the splitting field of \(f\).
(4) \(\Rightarrow\) (1): Let \(L\) be the splitting field of \(f(x) \in K[x]\) with roots \(a_1, \ldots, a_n \in L\). We use an inductive construction. If \(f\) splits completely in \(K\), then \(L = K\) and \(|\operatorname{Aut}_K(L)| = 1 = [L:K]\). Otherwise, let \(g_1(x) \in K[x]\) be a nonlinear irreducible factor of \(f\) with roots \(a_{1,1}, \ldots, a_{1,\ell_1}\). The identity extends to \(\ell_1\) embeddings \(\phi_{j_1} : K(a_{1,1}) \to \mathbb{C}\) with \(\phi_{j_1}(a_{1,1}) = a_{1,j_1}\). Since all roots lie in \(L\), the image lands in \(L\). Continuing inductively through a tower \(K \subset K(a_{1,1}) \subset K(a_{1,1}, a_{2,1}) \subset \cdots = L\), we obtain \(\ell_1 \ell_2 \cdots \ell_m = [L:K]\) embeddings, each of which maps \(L\) into \(L\) (since images of roots of \(f\) are roots of \(f\), hence in \(L\)). By Proposition 2.12, these are all automorphisms, giving \(|\operatorname{Aut}_K(L)| = [L:K]\). ∎
Galois Groups and the Fundamental Theorem
We now arrive at the central construction of Galois theory: the Galois group of a normal extension, and the correspondence between subgroups and intermediate fields.
One verifies directly that \(L^H\) is a subfield of \(L\).
Before stating the Fundamental Theorem, we establish a key preliminary result that shows the fixed field and Galois group operations are well-behaved.
- \(L^G = K\).
- If \(H \leq G\) with \(L^H = K\), then \(H = G\).
so \([L^G : K] = 1\) and \(L^G = K\).
(2) Suppose \(L^H = K\). Write \(L = K(\alpha)\) by the Primitive Element Theorem and consider the polynomial
\[ f(x) = \prod_{\sigma \in H} (x - \sigma(\alpha)). \]The coefficients of \(f\) are elementary symmetric polynomials in the \(\sigma(\alpha)\), and for any \(\tau \in H\), the map \(\sigma \mapsto \tau\sigma\) permutes \(H\), so \(\tau\) permutes the roots \(\{\sigma(\alpha)\}\) and fixes each coefficient. Thus the coefficients lie in \(L^H = K\), so \(f(x) \in K[x]\). Since \(\operatorname{id} \in H\), \(f(\alpha) = 0\), and \(\deg(f) = |H|\). The minimal polynomial of \(\alpha\) over \(K\) has degree at most \(|H|\), so
\[ [L : K] = [K(\alpha) : K] \leq |H| \leq |G| = [L : K]. \]Therefore \(|H| = |G|\), and since \(H \leq G\), we conclude \(H = G\). ∎
We are now ready for the main theorem, which establishes a perfect dictionary between the lattice of intermediate fields and the lattice of subgroups of the Galois group.
- \(L^{\operatorname{Gal}(L/F)} = F\).
- If \(H \leq G\), then \(\operatorname{Gal}(L/L^H) = H\).
- \(F/K\) is normal if and only if \(\operatorname{Gal}(L/F) \trianglelefteq \operatorname{Gal}(L/K)\). In this case, \[ \operatorname{Gal}(F/K) \cong \operatorname{Gal}(L/K) / \operatorname{Gal}(L/F). \]
(ii) Let \(H' = \operatorname{Gal}(L/L^H)\). By definition, \(H\) fixes \(L^H\), so \(H \leq H'\). Since \(L/L^H\) is normal (as it is a sub-extension of the normal extension \(L/K\)), and \(H \leq \operatorname{Gal}(L/L^H)\) has \(L^H\) as its fixed field, Theorem 2.25(2) gives \(H = H'\).
(iii) Let \(H = \operatorname{Gal}(L/F)\). For \(\sigma \in G\), the map \(\sigma|_F : F \to \sigma(F)\) is an isomorphism, and one computes \(\sigma H \sigma^{-1} = \operatorname{Gal}(L/\sigma(F))\). Therefore:
\[ H \trianglelefteq G \iff \operatorname{Gal}(L/\sigma(F)) = \operatorname{Gal}(L/F) \text{ for all } \sigma \in G \iff \sigma(F) = F \text{ for all } \sigma \in G. \]The condition \(\sigma(F) = F\) for all \(\sigma \in G\) means exactly that every \(K\)-fixing automorphism of \(L\) maps \(F\) into itself, which (since each such restriction \(\sigma|_F : F \to F\) is an automorphism) is equivalent to \(F/K\) being normal.
When \(H \trianglelefteq G\), the restriction map \(\operatorname{Gal}(L/K) \to \operatorname{Gal}(F/K)\) given by \(\sigma \mapsto \sigma|_F\) is a well-defined group homomorphism (since \(\sigma(F) = F\)). Its kernel is \(\{\sigma \in G : \sigma|_F = \operatorname{id}_F\} = \operatorname{Gal}(L/F) = H\). By the first isomorphism theorem, \(\operatorname{Gal}(F/K) \cong G/H\). ∎
Algebraic Numbers and Algebraic Integers
In Chapter 2 we studied field extensions and their basic properties. We now turn to the central objects of algebraic number theory: algebraic integers and the rings they form inside number fields. The guiding question is deceptively simple—which elements of a number field deserve to be called “integers”? In the rational numbers, the answer is obvious: the integers are precisely the elements of \(\mathbb{Z}\). But in a general number field, the notion of “having no denominator” is not immediately available, and we must find a more intrinsic characterization.
The key insight is that an ordinary integer \(a \in \mathbb{Z}\) is a root of the monic polynomial \(x - a \in \mathbb{Z}[x]\). More generally, a rational number \(a/b\) in lowest terms is a root of the monic polynomial \(x - a/b\), which has integer coefficients if and only if \(b = 1\). This observation suggests the correct generalization.
Thus the algebraic integers are precisely the elements of \(\mathbb{C}\) that are integral over \(\mathbb{Z}\). Note that we require the polynomial to be monic but not necessarily irreducible. This flexibility is essential: it is often easier to exhibit some monic polynomial satisfied by \(\alpha\) than to compute its minimal polynomial.
An important first observation is that for elements of \(\overline{\mathbb{Q}}\), integrality is detected by the minimal polynomial.
Equivalent Characterizations of Integrality
The definition of integrality in terms of monic polynomials is clean, but it does not immediately reveal that the sum or product of two algebraic integers is again an algebraic integer. To establish this, we need an alternative characterization that involves modules.
A submodule of \(M\) is a subset \(N \subseteq M\) that is itself an \(R\)-module under the same operations. The module \(M\) is finitely generated if there exist \(m_1, \ldots, m_k \in M\) such that every element of \(M\) can be written as an \(R\)-linear combination of the \(m_i\).
The reader should keep in mind two fundamental examples: a module over a field is simply a vector space, and a module over \(\mathbb{Z}\) is the same thing as an abelian group. When \(R \subseteq S\) is an inclusion of rings, the ring \(S\) itself is naturally an \(R\)-module.
(i) \(\alpha\) is integral over \(R\), i.e., \(\alpha\) is a root of a monic polynomial in \(R[x]\).
(ii) The ring \(R[\alpha]\) is finitely generated as an \(R\)-module.
(iii) There exists a subring \(T\) with \(R \subseteq T \subseteq S\) such that \(\alpha \in T\) and \(T\) is finitely generated as an \(R\)-module.
so \(\alpha^n\) lies in the \(R\)-module \(M\) generated by \(\{1, \alpha, \alpha^2, \ldots, \alpha^{n-1}\}\). Multiplying by \(\alpha\) shows that \(\alpha^{n+1} \in M\) as well, and by induction every power of \(\alpha\) lies in \(M\). Therefore \(R[\alpha] = M\) is finitely generated.
(ii) \(\Rightarrow\) (iii): Take \(T = R[\alpha]\).
(iii) \(\Rightarrow\) (i): Let \(T\) be finitely generated as an \(R\)-module with generators \(t_1, \ldots, t_n\). Since \(\alpha \in T\) and \(T\) is a ring, we have \(\alpha t_i \in T\) for each \(i\), so we may write
\[ \alpha t_i = \sum_{j=1}^{n} a_{ij} t_j \]for some \(a_{ij} \in R\). In matrix form, \((\alpha I_n - A)\mathbf{t} = 0\), where \(A = (a_{ij})\) and \(\mathbf{t} = (t_1, \ldots, t_n)^T\). Multiplying on the left by the adjugate matrix gives \(\det(\alpha I_n - A) \cdot t_i = 0\) for each \(i\). Since \(1 \in T\) is an \(R\)-linear combination of the \(t_i\), we obtain \(\det(\alpha I_n - A) = 0\). Expanding the determinant gives a monic polynomial in \(\alpha\) with coefficients in \(R\). ∎
The power of the module-theoretic characterization is that it allows us to prove closure properties of integral elements with ease.
The Ring of Algebraic Integers
is a subring of \(K\).
Now \(\mathbb{Z}[\alpha + \beta]\), \(\mathbb{Z}[\alpha - \beta]\), and \(\mathbb{Z}[\alpha\beta]\) are all subrings of \(\mathbb{Z}[\alpha, \beta]\). Since \(\mathbb{Z}\) is a Noetherian ring, every submodule of a finitely generated \(\mathbb{Z}\)-module is itself finitely generated. Thus \(\alpha \pm \beta\) and \(\alpha\beta\) are each contained in a finitely generated \(\mathbb{Z}\)-module, hence are integral over \(\mathbb{Z}\) by condition (iii) of Theorem 3.4. ∎
More generally, the same argument establishes the following:
Integral Closure and Transitivity
A crucial property of integrality is its transitivity, which ensures that the integral closure is itself integrally closed.
Each \(c_i\) is integral over \(R\) since \(S\) is integral over \(R\), so by Theorem 3.4, each ring \(R[c_0, \ldots, c_k]\) is finitely generated as a module over \(R[c_0, \ldots, c_{k-1}]\). The element \(\alpha\) is integral over \(R[c_0, \ldots, c_{n-1}]\), so \(R[c_0, \ldots, c_{n-1}, \alpha]\) is finitely generated over \(R[c_0, \ldots, c_{n-1}]\). By iterating the module generation (if \(M\) is finitely generated over \(N\) and \(N\) is finitely generated over \(P\), then \(M\) is finitely generated over \(P\)), we conclude that \(R[c_0, \ldots, c_{n-1}, \alpha]\) is finitely generated as an \(R\)-module. Since \(\alpha\) lies in this ring, it is integral over \(R\) by Theorem 3.4(iii). ∎
Number Fields and Their Rings of Integers
the integral closure of \(\mathbb{Z}\) in \(K\).
A fundamental property is that every element of a number field is “almost” an algebraic integer—it becomes one after clearing a single integer denominator.
This shows that \(a_n \alpha\) is a root of a monic polynomial in \(\mathbb{Z}[x]\), so \(a_n \alpha \in \mathcal{O}_K\). ∎
Quadratic Number Fields
The simplest nontrivial number fields are the quadratic fields, and they provide the ideal testing ground for the general theory.
Every quadratic extension of \(\mathbb{Q}\) inside \(\mathbb{C}\) takes this form for a unique squarefree \(d\). The ring of integers of a quadratic field has a surprisingly clean description that depends on the residue of \(d\) modulo 4.
By Theorem 3.2, the element \(\alpha\) is an algebraic integer if and only if both \(2a \in \mathbb{Z}\) and \(a^2 - db^2 \in \mathbb{Z}\).
Write \(a = m/2\) and \(b = n/2\) with \(m, n \in \mathbb{Z}\) (taking \(2a \in \mathbb{Z}\) already forces \(a\) to be a half-integer, and a similar analysis of \(a^2 - db^2 \in \mathbb{Z}\) forces \(b\) to be a half-integer as well). Then the integrality condition \(a^2 - db^2 \in \mathbb{Z}\) becomes \(m^2 - dn^2 \equiv 0 \pmod{4}\).
Case 1: \(d \not\equiv 1 \pmod{4}\). Since \(d\) is squarefree, we have \(d \equiv 2\) or \(3 \pmod{4}\). A case analysis shows \(m^2 - dn^2 \equiv 0 \pmod{4}\) forces both \(m\) and \(n\) to be even. So \(a, b \in \mathbb{Z}\), and \(\mathcal{O}_K = \mathbb{Z}[\sqrt{d}]\).
Case 2: \(d \equiv 1 \pmod{4}\). Then \(m^2 - dn^2 \equiv m^2 - n^2 \equiv 0 \pmod{4}\), which holds if and only if \(m \equiv n \pmod{2}\). The algebraic integers are therefore elements of the form \(\frac{m + n\sqrt{d}}{2}\) with \(m \equiv n \pmod{2}\). This is precisely \(\mathbb{Z}\left[\frac{1+\sqrt{d}}{2}\right]\), since every such element can be written as \(a + b \cdot \frac{1+\sqrt{d}}{2}\) with \(a, b \in \mathbb{Z}\). ∎
\(\mathcal{O}_K\) as a Free \(\mathbb{Z}\)-Module
One of the most important structural results about the ring of integers is that, as an additive group, it looks exactly like \(\mathbb{Z}^n\) where \(n = [K : \mathbb{Q}]\). This rigidity is what makes algebraic number theory possible.
The proof requires the trace pairing, which we develop in Chapter 4, but let us sketch the main ideas here. Choose a \(\mathbb{Q}\)-basis \(\{x_1, \ldots, x_n\}\) for \(K\) with each \(x_i \in \mathcal{O}_K\) (this is possible by Proposition 3.11, after clearing denominators). Define the linear map \(\varphi : K \to \mathbb{Q}^n\) by
\[ \varphi(\alpha) = \bigl(\operatorname{Tr}_{K/\mathbb{Q}}(x_1 \alpha), \ldots, \operatorname{Tr}_{K/\mathbb{Q}}(x_n \alpha)\bigr). \]The nondegeneracy of the trace pairing (Theorem 4.5 below) implies that \(\varphi\) is injective. Since the trace of an algebraic integer is an integer, \(\varphi\) maps \(\mathcal{O}_K\) into \(\mathbb{Z}^n\). As \(\mathcal{O}_K\) contains the \(n\) linearly independent elements \(x_1, \ldots, x_n\), its image under \(\varphi\) is a subgroup of \(\mathbb{Z}^n\) of rank \(n\), hence isomorphic to \(\mathbb{Z}^n\).
We close this chapter by recording that the ring of integers in a number field is a Dedekind domain—the class of rings for which the theory of ideal factorization works perfectly.
Chapter 4: Trace, Norm, and Discriminant
Trace and Norm via Embeddings
Let \(K\) be a number field with \([K : \mathbb{Q}] = n\). There are exactly \(n\) field embeddings \(\sigma_1, \ldots, \sigma_n : K \hookrightarrow \mathbb{C}\) that fix \(\mathbb{Q}\) pointwise. These embeddings are the key to defining two fundamental arithmetic invariants of the elements of \(K\).
There is an equivalent linear-algebraic definition. For each \(\alpha \in K\), the multiplication map \(M_\alpha : K \to K\) defined by \(M_\alpha(x) = \alpha x\) is a \(\mathbb{Q}\)-linear transformation. Relative to any \(\mathbb{Q}\)-basis of \(K\), this map has a matrix representation, and
\[ \operatorname{Tr}_{K/\mathbb{Q}}(\alpha) = \operatorname{tr}(M_\alpha), \qquad N_{K/\mathbb{Q}}(\alpha) = \det(M_\alpha). \]The characteristic polynomial of \(M_\alpha\) is \(f_\alpha(x) = \prod_{i=1}^{n}(x - \sigma_i(\alpha))\), from which both definitions agree by Vieta’s formulas.
Equivalently, relative to the basis \(\{1, \sqrt{d}\}\), the matrix of \(M_\alpha\) is \(\begin{pmatrix} a & bd \\ b & a \end{pmatrix}\), which has trace \(2a\) and determinant \(a^2 - db^2\).
Properties of Trace and Norm
The trace and norm satisfy the properties one would hope for.
(i) \(\operatorname{Tr}_{K/\mathbb{Q}}(\alpha + \beta) = \operatorname{Tr}_{K/\mathbb{Q}}(\alpha) + \operatorname{Tr}_{K/\mathbb{Q}}(\beta)\) (additivity),
(ii) \(\operatorname{Tr}_{K/\mathbb{Q}}(r\alpha) = r \cdot \operatorname{Tr}_{K/\mathbb{Q}}(\alpha)\) (homogeneity),
(iii) \(N_{K/\mathbb{Q}}(\alpha\beta) = N_{K/\mathbb{Q}}(\alpha) \cdot N_{K/\mathbb{Q}}(\beta)\) (multiplicativity),
(iv) \(N_{K/\mathbb{Q}}(r\alpha) = r^n \cdot N_{K/\mathbb{Q}}(\alpha)\).
The trace and norm relate to the minimal polynomial of \(\alpha\) through a multiplicity factor.
To verify this equals the characteristic polynomial of \(M_\alpha\), choose a basis \(\{u_1, \ldots, u_m\}\) for \(K\) over \(\mathbb{Q}(\alpha)\) and use the basis \(\{u_j \alpha^{k-1} : 1 \le j \le m, \, 0 \le k \le \ell-1\}\) for \(K\) over \(\mathbb{Q}\). Relative to this basis, \(M_\alpha\) decomposes into \(m\) blocks, each being the companion matrix of \(p(x)\), giving \(f_\alpha(x) = p(x)^m\). ∎
An important consequence is the transitivity of trace and norm in towers of extensions.
The argument for the norm is identical, with products replacing sums. ∎
Integrality of Trace and Norm
Units of \(\mathcal{O}_K\)
Conversely, suppose \(N_{K/\mathbb{Q}}(\alpha) = \pm 1\). Then \(\alpha \prod_{i=2}^{n} \sigma_i(\alpha) = \pm 1\), so \(\alpha^{-1} = \pm \prod_{i=2}^{n} \sigma_i(\alpha)\). Each \(\sigma_i(\alpha)\) is an algebraic integer, so their product is an algebraic integer. Since \(\alpha^{-1} \in K\) and is an algebraic integer, \(\alpha^{-1} \in \mathcal{O}_K\). ∎
The Discriminant
The discriminant is a fundamental invariant that encodes information about the “complexity” of a number field and its ring of integers. It measures how far apart the conjugates of a basis are spread.
Equivalently, \(\operatorname{disc}(\alpha_1, \ldots, \alpha_n) = \det\bigl(\operatorname{Tr}_{K/\mathbb{Q}}(\alpha_j \alpha_k)\bigr)_{1 \le j,k \le n}\).
The equivalence of the two formulas is a key computation:
∎
The discriminant detects linear independence:
The Vandermonde Determinant
Its determinant is \(\det V(a_1, \ldots, a_n) = \prod_{1 \le i < j \le n} (a_j - a_i)\).
Using this, if \(K = \mathbb{Q}(\theta)\) with minimal polynomial \(p(x)\) having roots \(\theta_1, \ldots, \theta_n\), then
\[ \operatorname{disc}(1, \theta, \ldots, \theta^{n-1}) = \det V(\theta_1, \ldots, \theta_n)^2 = \prod_{1 \le i < j \le n} (\theta_i - \theta_j)^2. \]Change of Basis and Integral Bases
∎
Choose a basis \(\{\omega_1, \ldots, \omega_n\} \subset \mathcal{O}_K\) minimizing \(|\operatorname{disc}(\omega_1, \ldots, \omega_n)|\). We claim this is an integral basis. If not, there exists \(\gamma \in \mathcal{O}_K\) with \(\gamma = a_1\omega_1 + \cdots + a_n\omega_n\) where some \(a_j \notin \mathbb{Z}\). Say \(a_1 \notin \mathbb{Z}\), and write \(a_1 = a + r\) with \(a \in \mathbb{Z}\) and \(0 < r < 1\). Replace \(\omega_1\) by \(\omega_1' = \gamma - a\omega_1 = r\omega_1 + a_2\omega_2 + \cdots + a_n\omega_n\). Then \(\{\omega_1', \omega_2, \ldots, \omega_n\}\) is a basis of algebraic integers, and the change of basis has determinant \(r\), so
\[ |\operatorname{disc}(\omega_1', \omega_2, \ldots, \omega_n)| = r^2 |\operatorname{disc}(\omega_1, \ldots, \omega_n)| < |\operatorname{disc}(\omega_1, \ldots, \omega_n)|, \]contradicting minimality. ∎
Uniqueness of Discriminant and the Discriminant of a Number Field
The discriminant is an isomorphism invariant: two isomorphic number fields have the same discriminant. It therefore “discriminates” between non-isomorphic number fields.
Discriminant and Minimal Polynomial
Taking the product over all \(k\):
\[ N_{K/\mathbb{Q}}(p'(\theta)) = \prod_{k=1}^{n} p'(\theta_k) = \prod_{k=1}^{n} \prod_{i \ne k}(\theta_k - \theta_i). \]Now \(\prod_{k} \prod_{i \ne k}(\theta_k - \theta_i) = \prod_{i < j}(\theta_i - \theta_j)(\theta_j - \theta_i) = (-1)^{n(n-1)/2}\prod_{i < j}(\theta_i - \theta_j)^2\). ∎
Discriminant of Quadratic Fields
Case 2: \(d \equiv 1 \pmod{4}\). An integral basis is \(\bigl\{1, \frac{1+\sqrt{d}}{2}\bigr\}\), and
\[ \operatorname{disc}\!\left(1, \tfrac{1+\sqrt{d}}{2}\right) = \det\begin{pmatrix} 1 & \frac{1+\sqrt{d}}{2} \\[4pt] 1 & \frac{1-\sqrt{d}}{2}\end{pmatrix}^2 = (-\sqrt{d})^2 = d. \qedhere \]∎
Finding Integral Bases in Practice
A useful technique for computing \(\mathcal{O}_K\) is the following containment result.
Stickelberger’s Theorem
A remarkable constraint on the discriminant of a number field is given by Stickelberger’s theorem.
which equals \(\delta\). Each summand is an algebraic integer. Now consider the partition of \(S_n\) into even and odd permutations: \(\delta = A - B\) where \(A\) is the sum over even permutations and \(B\) is the sum over odd permutations. Then \(A + B = \sum_{\pi} |\operatorname{sgn}(\pi)| \prod \sigma_i(\omega_{\pi(i)})\) and
\[ \delta^2 = (A - B)^2 = (A + B)^2 - 4AB. \]Both \(A + B\) and \(AB\) are symmetric functions of the \(\sigma_i(\omega_k)\), hence lie in \(\mathbb{Q}\). Since they are also algebraic integers, they lie in \(\mathbb{Z}\). Therefore \(\operatorname{disc}(K) = (A+B)^2 - 4AB \equiv (A+B)^2 \pmod{4}\), and a perfect square modulo 4 is either 0 or 1. ∎
Sign of the Discriminant
If \(r_2\) is even, then \(\overline{\delta} = \delta\), so \(\delta \in \mathbb{R}\) and \(\delta^2 > 0\). If \(r_2\) is odd, then \(\overline{\delta} = -\delta\), so \(\delta\) is purely imaginary and \(\delta^2 < 0\). In both cases, the sign of \(\operatorname{disc}(K) = \delta^2\) is \((-1)^{r_2}\). ∎
Compositum and Discriminant
When two number fields are “independent” in a precise sense, the discriminant of their compositum factors nicely.
Chapter 5: Cyclotomic Number Fields
Roots of Unity and Cyclotomic Polynomials
The cyclotomic fields \(\mathbb{Q}(\zeta_n)\), obtained by adjoining a primitive \(n\)-th root of unity to the rationals, occupy a central position in algebraic number theory. They provide the richest source of abelian extensions of \(\mathbb{Q}\), and their arithmetic is remarkably explicit. In this chapter we develop the basic theory: the irreducibility of cyclotomic polynomials, the structure of the Galois group, the ring of integers, and the discriminant.
Its degree is \(\varphi(n)\), Euler’s totient function.
The cyclotomic polynomials are characterized by the fundamental factorization of \(x^n - 1\):
(i) \(x^n - 1 = \prod_{d \mid n} \Phi_d(x)\).
(ii) \(\Phi_n(x) \in \mathbb{Z}[x]\) for all \(n \ge 1\).
(iii) \(\Phi_1(0) = -1\) and \(\Phi_n(0) = 1\) for \(n \ge 2\).
(iv) For \(p\) prime: \(\Phi_p(x) = x^{p-1} + x^{p-2} + \cdots + x + 1\). More generally, \(\Phi_{p^k}(x) = \Phi_p(x^{p^{k-1}})\) and \(\Phi_{p^k}(1) = p\).
(ii) By induction on \(n\). We have \(\Phi_1(x) = x - 1 \in \mathbb{Z}[x]\). For \(n > 1\), write \(x^n - 1 = \Phi_n(x) g(x)\) where \(g(x) = \prod_{d \mid n, d \ne n} \Phi_d(x) \in \mathbb{Z}[x]\) by the induction hypothesis. Since \(g\) is monic, long division of \(x^n - 1\) by \(g(x)\) in \(\mathbb{Z}[x]\) yields a quotient \(\Phi_n(x) \in \mathbb{Z}[x]\).
(iii) By induction. \(\Phi_1(0) = -1\). For \(n \ge 2\), evaluating \(x^n - 1 = \Phi_n(x) \Phi_1(x) h(x)\) at \(x = 0\) gives \(-1 = \Phi_n(0)(-1)(1) = -\Phi_n(0)\), so \(\Phi_n(0) = 1\). (Here \(h(x) = \prod_{d \mid n, d \ne 1, d \ne n} \Phi_d(x)\), which satisfies \(h(0) = 1\) by induction.)
(iv) From part (i), \(x^p - 1 = \Phi_p(x) \Phi_1(x) = \Phi_p(x)(x-1)\), so \(\Phi_p(x) = \frac{x^p - 1}{x - 1} = x^{p-1} + \cdots + x + 1\). Similarly, \(x^{p^k} - 1 = \Phi_{p^k}(x) \cdot (x^{p^{k-1}} - 1)\), giving \(\Phi_{p^k}(x) = \frac{x^{p^k} - 1}{x^{p^{k-1}} - 1} = \Phi_p(x^{p^{k-1}})\). Evaluating at \(x = 1\): \(\Phi_{p^k}(1) = \Phi_p(1) = p\). ∎
Irreducibility of \(\Phi_n(x)\) over \(\mathbb{Q}\)
The irreducibility of the cyclotomic polynomial is one of the cornerstones of algebraic number theory. It tells us that all primitive \(n\)-th roots of unity are conjugate over \(\mathbb{Q}\), and therefore \([\mathbb{Q}(\zeta_n) : \mathbb{Q}] = \varphi(n)\).
The proof proceeds by showing that if \(f(x)\) is the minimal polynomial of any primitive \(n\)-th root of unity over \(\mathbb{Q}\), then every other primitive \(n\)-th root of unity is also a root of \(f\), forcing \(f = \Phi_n\). The key step is a \(p\)-th power argument.
We will show that if \(\theta\) is any root of \(f\) and \(p\) is any prime with \(\gcd(p, n) = 1\), then \(\theta^p\) is also a root of \(f\). Since every \(k\) with \(\gcd(k, n) = 1\) can be written as a product of primes coprime to \(n\), iterating this step shows that \(\zeta^k\) is a root of \(f\) for every such \(k\), giving \(\Phi_n \mid f\) and hence \(f = \Phi_n\).
Suppose, for contradiction, that \(f(\theta^p) \ne 0\) for some root \(\theta\) of \(f\) and some prime \(p\) with \(p \nmid n\). Then \(\theta^p\) is a root of \(x^n - 1\) but not of \(f\), so \(g(\theta^p) = 0\). Thus \(\theta\) is a root of \(h(x) = g(x^p) \in \mathbb{Z}[x]\). Since \(f\) is the minimal polynomial of \(\theta\), we have \(f \mid h\) in \(\mathbb{Q}[x]\), and since both are in \(\mathbb{Z}[x]\) with \(f\) monic, \(h = fk\) for some \(k \in \mathbb{Z}[x]\).
Reduce modulo \(p\): \(\overline{h}(x) = \overline{g}(x^p) = \overline{g}(x)^p\) by Lemma 5.4. So \(\overline{f}\,\overline{k} = \overline{g}^{\,p}\). Let \(s(x)\) be an irreducible factor of \(\overline{f}\) in \(\mathbb{F}_p[x]\). Then \(s \mid \overline{g}^{\,p}\), hence \(s \mid \overline{g}\). Since \(x^n - 1 = f(x)g(x)\), reducing modulo \(p\) gives \(x^n - 1 = \overline{f}\,\overline{g}\). Since \(s \mid \overline{f}\) and \(s \mid \overline{g}\), we have \(s^2 \mid x^n - 1\) in \(\mathbb{F}_p[x]\).
But the derivative of \(x^n - 1\) is \(nx^{n-1}\), and since \(p \nmid n\), the constant \(n\) is invertible in \(\mathbb{F}_p\). Therefore \(\gcd(x^n - 1, nx^{n-1}) = 1\) in \(\mathbb{F}_p[x]\), which means \(x^n - 1\) has no repeated factors—a contradiction. ∎
The Galois Group of \(\mathbb{Q}(\zeta_n)/\mathbb{Q}\)
Since \(\Phi_n(x)\) is irreducible over \(\mathbb{Q}\), we have \([\mathbb{Q}(\zeta_n) : \mathbb{Q}] = \varphi(n)\). Moreover, all roots of \(\Phi_n\) lie in \(\mathbb{Q}(\zeta_n)\), so the extension is Galois.
given by sending \(\sigma \in \operatorname{Gal}(\mathbb{Q}(\zeta_n)/\mathbb{Q})\) to the unique \(k \in (\mathbb{Z}/n\mathbb{Z})^\times\) such that \(\sigma(\zeta_n) = \zeta_n^k\).
Any automorphism \(\sigma \in \operatorname{Gal}(\mathbb{Q}(\zeta_n)/\mathbb{Q})\) must send \(\zeta_n\) to another root of \(\Phi_n\), which is \(\zeta_n^k\) for some \(k\) with \(\gcd(k, n) = 1\). Define \(\psi : \operatorname{Gal}(\mathbb{Q}(\zeta_n)/\mathbb{Q}) \to (\mathbb{Z}/n\mathbb{Z})^\times\) by \(\psi(\sigma_k) = k\). This is a homomorphism:
\[ \sigma_k \circ \sigma_\ell(\zeta_n) = \sigma_k(\zeta_n^\ell) = (\zeta_n^\ell)^k = \zeta_n^{k\ell} = \sigma_{k\ell}(\zeta_n), \]so \(\psi(\sigma_k \circ \sigma_\ell) = k\ell = \psi(\sigma_k)\psi(\sigma_\ell)\). Since an automorphism of \(\mathbb{Q}(\zeta_n)\) is determined by its action on \(\zeta_n\), the map \(\psi\) is injective. Both groups have order \(\varphi(n)\), so \(\psi\) is an isomorphism. ∎
Ring of Integers of Cyclotomic Fields
One of the most pleasing facts in algebraic number theory is that the ring of integers of \(\mathbb{Q}(\zeta_n)\) is exactly \(\mathbb{Z}[\zeta_n]\)—the “obvious guess” turns out to be correct. We prove this in two stages: first for prime powers, then for general \(n\).
The Prime Power Case
The proof exploits the special arithmetic of the element \(\lambda = 1 - \zeta_{p^r}\), which is a “uniformizer” in the sense that \(p\) is (up to a unit) a high power of \(\lambda\).
and the fact that each factor \(1 - \zeta^j\) is an associate of \(1 - \zeta\) (since \(\frac{1 - \zeta^j}{1 - \zeta}\) is a unit for \(\gcd(j,p) = 1\)), we obtain \(p = (1 - \zeta)^s \lambda\) for some unit \(\lambda \in \mathcal{O}_K^\times\).
Now the set \(\{1, (1-\zeta), (1-\zeta)^2, \ldots, (1-\zeta)^{s-1}\}\) is a \(\mathbb{Q}\)-basis for \(K\), and \(\operatorname{disc}(1 - \zeta) = \operatorname{disc}(\zeta)\), which is a power of \(p\) by Theorem 5.9 below.
By Theorem 4.18, any \(\alpha \in \mathcal{O}_K\) can be written as
\[ \alpha = \frac{\ell_1 + \ell_2(1-\zeta) + \cdots + \ell_s(1-\zeta)^{s-1}}{d} \]where \(d = \operatorname{disc}(\zeta)\) is a power of \(p\) and \(\ell_1, \ldots, \ell_s \in \mathbb{Z}\).
Suppose \(\mathcal{O}_K \ne \mathbb{Z}[\zeta]\). Then there exists \(\alpha \in \mathcal{O}_K \setminus \mathbb{Z}[\zeta]\) of the above form where not all \(\ell_i\) are divisible by \(p\). Let \(i\) be the smallest index with \(p \nmid \ell_i\). Then
\[ \gamma = \frac{\ell_i(1-\zeta)^{i-1} + \cdots + \ell_s(1-\zeta)^{s-1}}{p} \in \mathcal{O}_K. \]Since \(p = (1-\zeta)^s \lambda\), multiplying by \((1-\zeta)^{s-i}\lambda\) and simplifying shows that \(\theta = \frac{\ell_i}{1-\zeta} \in \mathcal{O}_K\). Taking norms: \(N(1-\zeta) \cdot N(\theta) = N(\ell_i)\), giving \(p \cdot N(\theta) = \ell_i^s\). Since \(N(\theta) \in \mathbb{Z}\), this forces \(p \mid \ell_i^s\), hence \(p \mid \ell_i\), contradicting our choice. ∎
The General Case
For the inductive step, write \(n = p_1^{a_1} \cdots p_k^{a_k}\) and set \(m = p_1^{a_1} \cdots p_{k-1}^{a_{k-1}}\) and \(q = p_k^{a_k}\). By induction, \(\mathcal{O}_{\mathbb{Q}(\zeta_m)} = \mathbb{Z}[\zeta_m]\) and \(\mathcal{O}_{\mathbb{Q}(\zeta_q)} = \mathbb{Z}[\zeta_q]\).
The compositum of \(\mathbb{Q}(\zeta_m)\) and \(\mathbb{Q}(\zeta_q)\) is \(\mathbb{Q}(\zeta_n)\), since by the Euclidean algorithm there exist integers \(g, h\) with \(\zeta_n = \zeta_m^g \zeta_q^h\). Moreover, \([\mathbb{Q}(\zeta_n) : \mathbb{Q}] = \varphi(n) = \varphi(m)\varphi(q)\), so the extensions have maximal compositum degree. Since \(\operatorname{disc}(\mathbb{Q}(\zeta_m))\) is supported only on primes dividing \(m\) and \(\operatorname{disc}(\mathbb{Q}(\zeta_q))\) is a power of \(p_k\), we have \(\gcd(\operatorname{disc}(\mathbb{Q}(\zeta_m)), \operatorname{disc}(\mathbb{Q}(\zeta_q))) = 1\).
By Theorem 4.21,
\[ \mathcal{O}_{\mathbb{Q}(\zeta_n)} = \mathcal{O}_{\mathbb{Q}(\zeta_m)} \cdot \mathcal{O}_{\mathbb{Q}(\zeta_q)} = \mathbb{Z}[\zeta_m] \cdot \mathbb{Z}[\zeta_q] = \mathbb{Z}[\zeta_m, \zeta_q] = \mathbb{Z}[\zeta_n]. \qedhere \]∎
Constructibility of Regular Polygons
The Galois-theoretic structure of cyclotomic fields yields a complete answer to the ancient Greek problem of constructing regular polygons with straightedge and compass.
where \(k \ge 0\) and \(p_1, \ldots, p_\ell\) are distinct Fermat primes (primes of the form \(2^{2^m} + 1\)).
Now \(\varphi(n)\) is a power of 2 if and only if \(n = 2^k p_1 \cdots p_\ell\) where each \(p_i\) is a distinct Fermat prime. Indeed, for an odd prime \(p\), \(\varphi(p^r) = p^{r-1}(p-1)\) is a power of 2 only if \(r = 1\) and \(p - 1\) is a power of 2, i.e., \(p\) is a Fermat prime. ∎
Discriminant of Cyclotomic Fields
More generally, if \(n = p_1^{k_1} \cdots p_\ell^{k_\ell}\), then
\[ \operatorname{disc}(\mathbb{Q}(\zeta_n)) = (-1)^{\varphi(n)/2} \prod_{i=1}^{\ell} p_i^{b_i} \]where \(b_i = \varphi(n)\bigl(k_i - \frac{1}{p_i - 1}\bigr)\).
Taking norms over \(\mathbb{Q}(\zeta_n)/\mathbb{Q}\):
\[ n^{\varphi(n)} \cdot N(\zeta_n^{n-1}) = N(\Phi_n'(\zeta_n)) \cdot N(g(\zeta_n)). \]Since \(\zeta_n\) is a unit in \(\mathcal{O}_{\mathbb{Q}(\zeta_n)}\), \(N(\zeta_n^{n-1}) = \pm 1\). By Theorem 4.16, \(N(\Phi_n'(\zeta_n)) = (-1)^{\varphi(n)(\varphi(n)-1)/2} \operatorname{disc}(\zeta_n) = \pm \operatorname{disc}(\mathbb{Q}(\zeta_n))\). Since \(g(\zeta_n) \in \mathcal{O}_{\mathbb{Q}(\zeta_n)}\), \(N(g(\zeta_n)) \in \mathbb{Z}\). Therefore \(\operatorname{disc}(\mathbb{Q}(\zeta_n))\) divides \(n^{\varphi(n)}\).
For \(p\) an odd prime, \(\Phi_p(x) = \frac{x^p - 1}{x - 1}\) and \(g(x) = x - 1\). Differentiating \(x^p - 1 = \Phi_p(x)(x-1)\) and substituting \(x = \zeta_p\):
\[ p\zeta_p^{p-1} = \Phi_p'(\zeta_p)(\zeta_p - 1). \]Taking norms: \(p^{p-1} = N(\Phi_p'(\zeta_p)) \cdot N(\zeta_p - 1)\). Now \(N(\zeta_p - 1) = \prod_{j=1}^{p-1}(\zeta_p^j - 1) = (-1)^{p-1}\Phi_p(1) = p\) and \(N(\zeta_p^{p-1}) = 1\) since \(p-1\) is even. Also, \(N(\Phi_p'(\zeta_p)) = (-1)^{(p-1)(p-2)/2} \operatorname{disc}(\zeta_p)\). Therefore
\[ p^{p-1} = (-1)^{(p-1)(p-2)/2} \operatorname{disc}(\zeta_p) \cdot p, \]giving \(\operatorname{disc}(\mathbb{Q}(\zeta_p)) = (-1)^{(p-1)/2} p^{p-2}\). (Here we use that \((-1)^{(p-1)(p-2)/2} = (-1)^{(p-1)/2}\) since \(p\) is odd.) ∎
In earlier chapters, we developed techniques for computing discriminants via the trace pairing. But for many number fields, computing the discriminant directly from the definition — as a product of differences of conjugates squared — is impractical. In this chapter, we introduce the resultant, a powerful algebraic tool that reduces discriminant computation to a determinant calculation. We then turn to the theory of composita, which governs what happens when two number fields are “combined,” and we use it to prove that the ring of integers of any cyclotomic field has the expected form. The chapter closes with a striking example, due to Dedekind, of a number field whose ring of integers admits no power basis.
The Resultant
Definition and the Sylvester Matrix
The resultant of two polynomials encodes, in a single determinant, whether the polynomials share a common root. It will serve as the key computational bridge between the minimal polynomial of an algebraic number and its discriminant.
where the first \(m\) rows are shifts of the coefficients of \(f\), and the last \(n\) rows are shifts of the coefficients of \(g\).
The resultant \(R(f,g)\) is homogeneous of degree \(m\) in the coefficients \(a_i\) and of degree \(n\) in the coefficients \(b_j\). This observation will be crucial when we derive the product formula.
Resultant and Common Roots
The fundamental property of the resultant is that it detects shared roots.
(i) \(f\) and \(g\) have a common root in \(\mathbb{C}\).
(ii) There exist polynomials \(h, k \in \mathbb{C}[x]\) with \(\deg h \le m-1\) and \(\deg k \le n-1\) such that \(h(x)f(x) = k(x)g(x)\).
(iii) \(R(f,g) = 0\).
(ii) \(\Rightarrow\) (i): If \(h(x)f(x) = k(x)g(x)\) with \(\deg h \le m-1\) and \(\deg k \le n-1\), then comparing degrees, the roots of \(k\) cannot account for all roots of \(f\), so some root of \(f\) must also be a root of \(g\).
(ii) \(\Leftrightarrow\) (iii): Write \(h(x) = c_{m-1}x^{m-1} + \cdots + c_0\) and \(k(x) = d_{n-1}x^{n-1} + \cdots + d_0\). Comparing coefficients of \(x^{n+m-1}, x^{n+m-2}, \ldots, x^0\) in the equation \(hf = kg\) gives a system of \(n+m\) linear equations in the \(n+m\) unknowns \(c_0, \ldots, c_{m-1}, -d_0, \ldots, -d_{n-1}\). This system has a nontrivial solution if and only if the coefficient matrix has zero determinant, and this determinant is precisely the transpose of the Sylvester matrix. Since \(\det(A) = \det(A^T)\), condition (ii) holds if and only if \(R(f,g) = 0\). ∎
The Product Formula for Resultants
The most useful computational form of the resultant expresses it directly in terms of the roots of the two polynomials.
Equivalently,
\[ R(f,g) = a_n^m \prod_{i=1}^{n} g(x_i) = (-1)^{mn}\, b_m^n \prod_{j=1}^{m} f(y_j). \]We regard the \(x_i\) and \(y_j\) as indeterminates and work in the polynomial ring \(\mathbb{C}[x_1, \ldots, x_n, y_1, \ldots, y_m]\). By Proposition 6.2, whenever \(x_i = y_j\) we have \(R(f,g) = 0\), so \((x_i - y_j)\) divides \(R(f,g)\). Since these are distinct irreducible elements in a UFD, the full product \(\prod_{i,j}(x_i - y_j)\) divides \(R(f,g)\).
Now \(S\) is homogeneous of degree \(m\) in the \(a_i\) (via the symmetric functions of the \(x_i\)) and of degree \(n\) in the \(b_j\) (via the symmetric functions of the \(y_j\)), matching the degrees of \(R(f,g)\). Since \(S\) divides \(R\) and both have the same degree, we have \(R = cS\) for some constant \(c \in \mathbb{C}\).
Comparing leading terms: the coefficient of \(a_n^m b_m^n\) in \(R(f,g)\) is \(1\) (from the Sylvester determinant), and the same coefficient in \(S\) is \(1\). Therefore \(c = 1\), and \(R(f,g) = S\).
The alternative forms follow because \(g(x) = b_m \prod_{j=1}^m (x - y_j)\), so \(a_n^m \prod_{i=1}^n g(x_i) = S\), and similarly for \(f(y_j)\). ∎
Discriminant as a Resultant
The product formula gives us an elegant connection between the discriminant and the resultant, which is the primary reason we developed this machinery.
Now \(f(x) = \prod_{k=1}^n (x - \alpha_k)\), so
\[ f'(\alpha_i) = \prod_{\substack{j=1 \\ j \ne i}}^{n} (\alpha_i - \alpha_j). \]Therefore
\[ R(f, f') = \prod_{i=1}^{n} \prod_{\substack{j=1 \\ j \ne i}}^{n} (\alpha_i - \alpha_j) = (-1)^{n(n-1)/2} \prod_{1 \le i < j \le n} (\alpha_i - \alpha_j)^2 = (-1)^{n(n-1)/2}\, \operatorname{disc}(\alpha). \]The sign arises because the ordered product over all \(i \ne j\) pairs differs from the product over \(i < j\) by a factor of \((-1)^{n(n-1)/2}\). ∎
This formula is extremely practical: to compute \(\operatorname{disc}(\theta)\) for a root \(\theta\) of an irreducible polynomial \(f\), one simply evaluates a \((2n-1) \times (2n-1)\) determinant rather than computing all conjugates explicitly.
Since \(n = 3\), we get \(\operatorname{disc}(\theta) = (-1)^{3 \cdot 2/2} \cdot 2012 = -2012 = -4 \cdot 503\).
Composita of Number Fields
When studying multiple number fields simultaneously, it is natural to ask what happens when we “combine” them. The compositum provides the answer.
The compositum \(KL\) is generated over \(\mathbb{Q}\) by the union \(K \cup L\); equivalently, it consists of all finite \(\mathbb{Q}\)-linear combinations of products \(\alpha\beta\) with \(\alpha \in K\) and \(\beta \in L\).
(i) \([KL : \mathbb{Q}] \le mn\).
(ii) \([KL : \mathbb{Q}] = mn\) if and only if for every pair of embeddings \(\sigma : K \hookrightarrow \mathbb{C}\) and \(\tau : L \hookrightarrow \mathbb{C}\), there exists a unique embedding \(\varepsilon : KL \hookrightarrow \mathbb{C}\) with \(\varepsilon|_K = \sigma\) and \(\varepsilon|_L = \tau\).
(ii) If \([KL : \mathbb{Q}] = mn\), then \(KL\) has exactly \(mn\) embeddings into \(\mathbb{C}\). Each embedding \(\varepsilon : KL \hookrightarrow \mathbb{C}\) is determined by \(\varepsilon|_K\) and \(\varepsilon|_L\) (since products of elements of \(K\) and \(L\) generate \(KL\)). There are \(m\) choices for \(\varepsilon|_K\) and \(n\) choices for \(\varepsilon|_L\), giving \(mn\) pairs, and since there are exactly \(mn\) embeddings, each pair \((\sigma, \tau)\) corresponds to a unique embedding. The converse is similar. ∎
Ring of Integers of Composita
The central result in the theory of composita connects the ring of integers of \(KL\) to the rings of integers of \(K\) and \(L\), using the discriminant as the controlling quantity.
In particular, if \(\gcd(\operatorname{disc}(K), \operatorname{disc}(L)) = 1\), then \(\mathcal{O}_{KL} = \mathcal{O}_K \mathcal{O}_L\).
where \(a_{ij}, r \in \mathbb{Z}\) with \(\gcd(a_{11}, \ldots, a_{mn}, r) = 1\). If \(\gamma \in \mathcal{O}_{KL}\), we must show that \(r \mid d\).
By symmetry, it suffices to show \(r \mid \operatorname{disc}(K)\). Since \([KL : \mathbb{Q}] = mn\), for each embedding \(\sigma_i : K \hookrightarrow \mathbb{C}\), there exists an extension \(\sigma_i' : KL \hookrightarrow \mathbb{C}\) that fixes \(L\) pointwise (by Lemma 6.6). Set \(x_i = \sum_{j=1}^n \frac{a_{ij}}{r} \beta_j \in L\) for each \(i\), so that
\[ \sigma_i'(\gamma) = \sum_{k=1}^m \sigma_i(\alpha_k) \, x_k. \]By Cramer’s rule, \(x_k = \gamma_k / \delta\), where \(\delta = \det(\sigma_i(\alpha_j))\) satisfies \(\delta^2 = \operatorname{disc}(K)\), and each \(\gamma_k\) is an algebraic integer. Then \(\operatorname{disc}(K) \cdot x_k = \delta \gamma_k \in \mathcal{O}_K\), and since \(x_k \in L\), we have \(\operatorname{disc}(K) \cdot x_k \in \mathcal{O}_L\). In particular,
\[ \frac{\operatorname{disc}(K) \cdot a_{ij}}{r} \in \mathbb{Z} \]for all \(i, j\). Since \(\gcd(a_{11}, \ldots, a_{mn}, r) = 1\), it follows that \(r \mid \operatorname{disc}(K)\). By the same argument applied to \(L\), we get \(r \mid \operatorname{disc}(L)\), so \(r \mid d\). ∎
Application: The Ring of Integers of \(\mathbb{Q}(\zeta_n)\)
The theory of composita gives a clean inductive proof that cyclotomic rings of integers have the expected form.
For the inductive step, write \(n = p_1^{e_1} \cdots p_k^{e_k}\), and set \(m = p_1^{e_1} \cdots p_{k-1}^{e_{k-1}}\). Let \(K = \mathbb{Q}(\zeta_m)\) and \(L = \mathbb{Q}(\zeta_{p_k^{e_k}})\). By induction, \(\mathcal{O}_K = \mathbb{Z}[\zeta_m]\) and \(\mathcal{O}_L = \mathbb{Z}[\zeta_{p_k^{e_k}}]\).
First, \(KL = \mathbb{Q}(\zeta_n)\): since \(\gcd(m, p_k^{e_k}) = 1\), there exist integers \(x, y\) with \(xm + yp_k^{e_k} = 1\), so \(\zeta_n = \zeta_m^y \cdot \zeta_{p_k^{e_k}}^x \in KL\). Moreover,
\[ \varphi(n) = \varphi(m)\,\varphi(p_k^{e_k}) = [K : \mathbb{Q}] \cdot [L : \mathbb{Q}] \ge [KL : \mathbb{Q}] \ge [\mathbb{Q}(\zeta_n) : \mathbb{Q}] = \varphi(n), \]so \([KL : \mathbb{Q}] = [K : \mathbb{Q}][L : \mathbb{Q}]\). Since \(\operatorname{disc}(K)\) is a power of \(p_1 \cdots p_{k-1}\) and \(\operatorname{disc}(L)\) is a power of \(p_k\), we have \(\gcd(\operatorname{disc}(K), \operatorname{disc}(L)) = 1\). By Theorem 6.7,
\[ \mathbb{Z}[\zeta_n] \subseteq \mathcal{O}_{\mathbb{Q}(\zeta_n)} \subseteq \mathcal{O}_K \mathcal{O}_L = \mathbb{Z}[\zeta_m] \cdot \mathbb{Z}[\zeta_{p_k^{e_k}}] = \mathbb{Z}[\zeta_n], \]so \(\mathcal{O}_{\mathbb{Q}(\zeta_n)} = \mathbb{Z}[\zeta_n]\). ∎
Dedekind’s Example: A Non-Monogenic Number Field
Not every number field has a power basis. Dedekind constructed the first example of this phenomenon.
Now suppose \(\lambda \in \mathcal{O}_K\) with \(\{1, \lambda, \lambda^2\}\) an integral basis. Write \(\lambda = a + b\theta + c\omega\) for \(a, b, c \in \mathbb{Z}\). A computation shows that \(\lambda^2 = A_1 + A_2 \theta + A_3 \omega\) where
\[ A_1 = a^2 - 2c^2 - 8bc, \quad A_2 = -2c^2 + 2ab + 2bc - b^2, \quad A_3 = 2b^2 + 2ac + c^2. \]By the change-of-basis formula,
\[ \operatorname{disc}(\lambda) = -503 \cdot (bA_3 - cA_2)^2 = -503 \cdot (2b^3 - bc^2 + b^2c + 2c^3)^2. \]But \(2b^3 - bc^2 + b^2c + 2c^3 \equiv bc(b - c) \pmod{2}\), which is always even. Therefore \(4 \mid \operatorname{disc}(\lambda)\), so \(\operatorname{disc}(\lambda) \ne -503 = \operatorname{disc}(K)\), and \(\{1, \lambda, \lambda^2\}\) can never be an integral basis. ∎
Chapter 7: Dedekind Domains and Ideal Factorization
The passage from elements to ideals is the central paradigm of algebraic number theory. In the integers \(\mathbb{Z}\), every element factors uniquely into primes. But in a general ring of integers \(\mathcal{O}_K\), unique factorization of elements can fail. The miracle, discovered by Dedekind, is that unique factorization is always recovered at the level of ideals. In this chapter, we develop the abstract framework of Dedekind domains, prove the unique factorization of ideals, and show how to compute prime factorizations in practice via the Kummer–Dedekind theorem.
Failure of Unique Factorization
To motivate the theory, we begin with the classical example showing that unique factorization of elements can fail.
Each of \(2, 3, 1+\sqrt{-5}, 1-\sqrt{-5}\) is irreducible in \(\mathbb{Z}[\sqrt{-5}]\) (as can be verified using the norm \(N(a + b\sqrt{-5}) = a^2 + 5b^2\)), and yet they give two genuinely different factorizations. This means \(\mathbb{Z}[\sqrt{-5}]\) is not a UFD.
The failure is not incidental — it reflects a deep structural feature. The key insight, due to Kummer and Dedekind, is that while elements may not factor uniquely, ideals always do, provided the ring satisfies certain natural conditions. The abstract encapsulation of these conditions is the notion of a Dedekind domain.
Noetherian Rings
The first ingredient is a finiteness condition on ideals.
There are several equivalent characterizations.
(i) (Ascending Chain Condition) Every ascending chain of ideals \(I_1 \subseteq I_2 \subseteq I_3 \subseteq \cdots\) eventually stabilizes: there exists \(N\) such that \(I_k = I_N\) for all \(k \ge N\).
(ii) (Maximal Condition) Every nonempty collection of ideals of \(R\) has a maximal element.
(iii) \(R\) is Noetherian.
(i) \(\Rightarrow\) (ii): If a nonempty collection \(\mathcal{S}\) had no maximal element, we could build a strictly ascending chain \(I_1 \subsetneq I_2 \subsetneq \cdots\) in \(\mathcal{S}\), contradicting (i).
(ii) \(\Rightarrow\) (iii): Let \(I\) be an ideal. Consider the collection \(\mathcal{S}\) of finitely generated ideals contained in \(I\). By (ii), \(\mathcal{S}\) has a maximal element \(M = (a_1, \ldots, a_r)\). If \(M \ne I\), choose \(b \in I \setminus M\); then \((a_1, \ldots, a_r, b) \in \mathcal{S}\) strictly contains \(M\), contradicting maximality. So \(M = I\) is finitely generated. ∎
Dedekind Domains
(1) \(R\) is Noetherian.
(2) Every nonzero prime ideal of \(R\) is maximal.
(3) \(R\) is integrally closed in its field of fractions.
Condition (2) says that the nonzero primes of \(R\) form a single “layer” in the spectrum: there is no room for a chain \((0) \subsetneq P \subsetneq Q\) of primes. Condition (3) says that any element of \(\operatorname{Frac}(R)\) satisfying a monic polynomial over \(R\) already lies in \(R\).
The theorem that makes the whole theory work is the following.
(1) \(\mathcal{O}_K\) is Noetherian. Every nonzero ideal \(I \subseteq \mathcal{O}_K\) has an integral basis \(\{\alpha_1, \ldots, \alpha_n\}\) (Theorem 8.4 of the previous chapter), so \(I = (\alpha_1, \ldots, \alpha_n)\) is finitely generated.
(2) Every nonzero prime ideal is maximal. Let \(P \subseteq \mathcal{O}_K\) be a nonzero prime ideal. It suffices to show \(\mathcal{O}_K / P\) is a field. Since \(P\) is nonzero, it contains a nonzero integer \(a \in P \cap \mathbb{Z}_{>0}\) (take the norm of any nonzero element). If \(\{\omega_1, \ldots, \omega_n\}\) is an integral basis for \(\mathcal{O}_K\), then every element of \(\mathcal{O}_K/P\) is represented by an integer linear combination of \(\omega_1, \ldots, \omega_n\) with coefficients in \(\{0, 1, \ldots, a-1\}\). Hence \(|\mathcal{O}_K/P| \le a^n < \infty\). Since every finite integral domain is a field, \(P\) is maximal.
(3) \(\mathcal{O}_K\) is integrally closed. Suppose \(\gamma \in K\) satisfies \(\gamma^m + c_{m-1}\gamma^{m-1} + \cdots + c_0 = 0\) with \(c_i \in \mathcal{O}_K\). Then \(\gamma\) lies in the ring \(A = \mathbb{Z}[c_0, \ldots, c_{m-1}, \gamma]\). Since each \(c_i\) is an algebraic integer of degree at most \([K : \mathbb{Q}]\), and \(\gamma^m\) can be expressed in terms of lower powers using the relation, the ring \(A\) is finitely generated as an abelian group. Hence \(\gamma\) is an algebraic integer, so \(\gamma \in \mathcal{O}_K\). ∎
Every Ideal Contains a Product of Primes
The first step toward unique factorization of ideals is showing that every nonzero ideal is “bounded below” by a product of prime ideals.
The ideal \(M\) is not prime (since any prime ideal contains itself, a product of one prime). So there exist \(r, s \in R \setminus M\) with \(rs \in M\). Set \(M_1 = M + (r)\) and \(M_2 = M + (s)\). Both \(M_1\) and \(M_2\) strictly contain \(M\), so neither is in \(\mathcal{S}\). Hence each contains a product of prime ideals: say \(P_1 \cdots P_\ell \subseteq M_1\) and \(Q_1 \cdots Q_k \subseteq M_2\). But then
\[ P_1 \cdots P_\ell \, Q_1 \cdots Q_k \subseteq M_1 M_2 \subseteq M, \]contradicting \(M \in \mathcal{S}\). ∎
The Inverse Ideal
To “divide” ideals, we need the notion of an inverse.
Every fractional ideal \(I\) has a “denominator”: there exists \(0 \ne d \in R\) with \(dI \subseteq R\), so \(I = \frac{1}{d}J\) for some integral ideal \(J\).
The crucial result is that prime ideals are invertible.
If \(r = 1\), then \(P_1 \subseteq (a) \subseteq I\), so \(P_1 = (a) = I\) (since \(P_1\) is maximal). Take \(\gamma = 1/a \in K \setminus R\); then \(\gamma I = R\).
If \(r > 1\), by minimality of \(r\), \(P_2 \cdots P_r \not\subseteq (a)\). Choose \(b \in P_2 \cdots P_r \setminus (a)\) and set \(\gamma = b/a\). Then \(\gamma \notin R\) (since \(b \notin (a)\)), but \(\gamma I \subseteq \gamma P_1 \subseteq \frac{b}{a} P_1 = \frac{1}{a} P_1 P_2 \cdots P_r \subseteq \frac{1}{a}(a) = R\). ∎
The Cancellation Law
The existence of inverses for ideals yields the cancellation property, which is the engine of unique factorization.
Let \(B = \frac{1}{\alpha} IJ\), an ideal of \(R\). We want \(B = R\). Suppose not: by Lemma 7.8, there exists \(\gamma \in K \setminus R\) with \(\gamma B \subseteq R\). Since \(\alpha \in I\), we have \(J \subseteq B\), so \(\gamma J \subseteq \gamma B \subseteq R\). Then \(\frac{\gamma}{\alpha} IJ = \gamma B \subseteq R\), meaning \((\gamma J) I \subseteq (\alpha)\), so \(\gamma J \subseteq J\) by definition of \(J\).
But \(J\) has an integral basis (being a nonzero ideal of a ring of integers), hence is a finitely generated abelian group. The inclusion \(\gamma J \subseteq J\) with \(J\) finitely generated implies \(\gamma\) is integral over \(R\). Since \(R\) is integrally closed, \(\gamma \in R\), contradicting \(\gamma \notin R\). ∎
Unique Factorization of Ideals
We now arrive at the main theorem, the fundamental result of the subject.
Since \(M \subsetneq P\) and \(M = PC\), we have \(C \ne R\). If \(C\) were a product of primes, then \(M = PC\) would be too, contradicting \(M \in \mathcal{S}\). So \(C \in \mathcal{S}\). But \(C \supseteq M\) (since \(M = PC \subseteq C\)) and \(C \ne M\) (otherwise \(M = PM\) and cancellation gives \(P = R\), a contradiction). This contradicts the maximality of \(M\) in \(\mathcal{S}\). Hence \(\mathcal{S} = \emptyset\).
Uniqueness: Suppose \(P_1 \cdots P_r = Q_1 \cdots Q_s\) with all \(P_i, Q_j\) prime. Then \(P_1 \supseteq Q_1 \cdots Q_s\). Since \(P_1\) is prime, \(P_1 \supseteq Q_j\) for some \(j\). Since nonzero primes are maximal, \(P_1 = Q_j\). Cancelling (by Corollary 7.10) gives \(P_2 \cdots P_r = Q_1 \cdots \hat{Q}_j \cdots Q_s\). By induction, the factorizations agree up to reordering.
(No proper nonempty product of prime ideals can equal \(R\), since each \(P_i \subsetneq R\), so the base case of the induction is immediate.) ∎
This theorem shows that the monoid of nonzero ideals of a Dedekind domain, under multiplication, is a free commutative monoid generated by the nonzero prime ideals. Equivalently, every nonzero fractional ideal is invertible, and the group of fractional ideals is free abelian on the primes.
PID if and only if UFD
For Dedekind domains, the notions of principal ideal domain and unique factorization domain coincide.
Kummer–Dedekind Theorem
In practice, the most important question is: given a rational prime \(p\), how does the ideal \((p)\) factor in \(\mathcal{O}_K\)? The Kummer–Dedekind theorem gives a complete answer whenever \(\mathcal{O}_K = \mathbb{Z}[\alpha]\), or more generally, whenever \(p\) does not divide the index \([\mathcal{O}_K : \mathbb{Z}[\alpha]]\).
where \(\overline{g}_1, \ldots, \overline{g}_r\) are distinct monic irreducible polynomials in \(\mathbb{F}_p[x]\). Then
\[ (p) = \mathfrak{p}_1^{e_1} \cdots \mathfrak{p}_r^{e_r} \]where \(\mathfrak{p}_i = (p, g_i(\alpha))\), and \(g_i \in \mathbb{Z}[x]\) is any monic lift of \(\overline{g}_i\). Moreover, \(N(\mathfrak{p}_i) = p^{f_i}\) where \(f_i = \deg \overline{g}_i\).
The prime ideals of \(\mathbb{Z}[\alpha]\) containing \(p\) correspond to the prime ideals of this product, which are exactly the ideals \((p, g_i(\alpha))\) corresponding to the irreducible factors \(\overline{g}_i\). Tracing the isomorphisms gives the stated factorization. Since \(p \nmid [\mathcal{O}_K : \mathbb{Z}[\alpha]]\), the factorization in \(\mathbb{Z}[\alpha]\) lifts to the same factorization in \(\mathcal{O}_K\). ∎
\(p = 5\): \(x^2 - 2\) is irreducible mod \(5\) (no roots), so \((5)\) remains prime in \(\mathcal{O}_K\), and \(\mathcal{O}_K/(5) \cong \mathbb{F}_{25}\).
\(p = 7\): \(x^2 - 2 \equiv (x-3)(x+3) \pmod{7}\), so \((7) = (\sqrt{2} - 3, 7)(\sqrt{2} + 3, 7)\) splits into two prime ideals of norm \(7\).
\(p = 2\): \(x^2 - 2 \equiv x^2 \pmod{2}\), so \((2) = (\sqrt{2})^2\) ramifies. The unique prime above \(2\) is \((\sqrt{2})\).
Ramification and the Discriminant
as a product of prime ideals in \(\mathcal{O}_K\). The integer \(e_i\) is called the ramification index of \(\mathfrak{p}_i\) over \(p\), and \(f_i = \log_p N(\mathfrak{p}_i)\) is the residue degree (or inertia degree). We say \(p\) is ramified in \(K\) if \(e_i \ge 2\) for some \(i\), and unramified otherwise.
The discriminant controls which primes ramify.
so \(p \mid \operatorname{Tr}((\alpha\beta)^p)\). Moreover,
\[ (\operatorname{Tr}(\alpha\beta))^p = \sum_i \sigma_i(\alpha\beta)^p + p\gamma = \operatorname{Tr}((\alpha\beta)^p) + p\gamma \]for some algebraic integer \(\gamma\), so \(p \mid (\operatorname{Tr}(\alpha\beta))^p\) and hence \(p \mid \operatorname{Tr}(\alpha\beta)\).
Write \(\alpha = a_1\omega_1 + \cdots + a_n\omega_n\) with \(a_i \in \mathbb{Z}\) and \(\{\omega_1, \ldots, \omega_n\}\) an integral basis. Since \(\alpha \notin (p)\), some \(a_i\) (say \(a_1\)) is not divisible by \(p\). Since \(p \mid \operatorname{Tr}(\alpha\omega_i)\) for all \(i\), the first row of the matrix \(\bigl(\operatorname{Tr}(\omega_i \omega_j)\bigr)\) — after replacing row 1 by \(\sum a_k \operatorname{Tr}(\omega_k \omega_j)\) — shows that \(p \mid a_1 D\). Since \(p \nmid a_1\), we conclude \(p \mid D\).
(\(\Leftarrow\)): If \(p \mid D\), the trace form on \(\mathcal{O}_K/(p)\) is degenerate: there exists \(x \in \mathcal{O}_K/(p)\), \(x \ne 0\), with \(\operatorname{Tr}(xy) = 0\) for all \(y \in \mathcal{O}_K/(p)\). A nondegenerate trace form characterizes products of separable field extensions, hence \(\mathcal{O}_K/(p)\) is not a product of fields, meaning the factorization of \((p)\) has a repeated factor. ∎
Chapter 8: Ideal Norms and the Class Group
Having established unique factorization of ideals in Dedekind domains, we now introduce the norm of an ideal, which generalizes the norm of an element. The multiplicativity of the ideal norm — whose proof requires delicate arguments involving localization or combinatorial counting — is the key technical result of this chapter. We then define the class group, which measures the failure of unique factorization of elements, and prove that it is always finite using Minkowski’s geometry of numbers.
Norm of an Ideal
the number of elements in the quotient ring \(\mathcal{O}_K/I\). Equivalently, \(N(I) = [\mathcal{O}_K : I]\) as an index of abelian groups.
Since every nonzero ideal \(I\) contains a nonzero integer \(a\) (take the norm of any nonzero \(\alpha \in I\)), and \(|\mathcal{O}_K/(a)| = |a|^n\), the quotient \(\mathcal{O}_K/I\) is always finite.
Computing the Norm via Discriminants
It therefore suffices to show \(N(I) = a_{11} \cdots a_{nn}\).
Every element of \(\mathcal{O}_K\) is congruent modulo \(I\) to a unique element of the form \(r_1\omega_1 + \cdots + r_n\omega_n\) with \(0 \le r_i < a_{ii}\). The existence follows by successively reducing: given \(\gamma = b_1\omega_1 + \cdots + b_n\omega_n\), divide \(b_n\) by \(a_{nn}\), subtract the appropriate multiple of \(\alpha_n\), and repeat. The uniqueness follows because if two such representatives were congruent mod \(I\), their difference would be an element of \(I\) with each coordinate strictly less than \(a_{ii}\) in absolute value, which forces all coordinates to be zero (by the minimality conditions defining the \(a_{ii}\)). Hence \(N(I) = \prod_{i=1}^n a_{ii}\). ∎
Norm of a Principal Ideal
The ideal norm generalizes the element norm.
so
\[ \operatorname{disc}(\alpha\omega_1, \ldots, \alpha\omega_n) = N_{K/\mathbb{Q}}(\alpha)^2 \cdot \operatorname{disc}(\omega_1, \ldots, \omega_n). \]By Theorem 8.2, \(N(I) = |N_{K/\mathbb{Q}}(\alpha)|\). ∎
Fermat’s Theorem for Ideals
The finite quotient \(\mathcal{O}_K / \mathfrak{p}\) is a field when \(\mathfrak{p}\) is prime, giving a natural analogue of Fermat’s little theorem.
The following simple observation is used repeatedly.
Multiplicativity of the Ideal Norm
The most important property of the ideal norm is its multiplicativity. The proof requires some care, as it cannot be derived from the Chinese Remainder Theorem alone when the ideals share common prime factors.
For any fixed \(m\), all \(\alpha_i\) with \(i \ne m\) lie in \(BP_m\), so if \(\alpha \in BP_m\) then \(\alpha_m \in BP_m\), contradicting the choice of \(\alpha_m\). Hence \(\alpha \notin BP_m\) for any \(m\), which means \(P_m \nmid \frac{(\alpha)}{B}\) for any \(m\), giving \(\gcd\!\left(\frac{(\alpha)}{B}, C\right) = \mathcal{O}_K\). ∎
Claim: The elements \(\beta_i + \alpha\gamma_j\), for \(1 \le i \le N(B)\) and \(1 \le j \le N(C)\), represent the distinct cosets of \(\mathcal{O}_K / BC\).
Distinctness: Suppose \(\beta_i + \alpha\gamma_j \equiv \beta_k + \alpha\gamma_\ell \pmod{BC}\). Since \(BC \subseteq B\), reducing mod \(B\) gives \(\beta_i \equiv \beta_k \pmod{B}\), so \(i = k\). Then \(\alpha(\gamma_\ell - \gamma_j) \equiv 0 \pmod{BC}\), meaning \(BC \mid (\alpha)(\gamma_\ell - \gamma_j)\). Since \(\gcd\!\left(\frac{(\alpha)}{B}, C\right) = \mathcal{O}_K\), we conclude \(C \mid (\gamma_\ell - \gamma_j)\), so \(j = \ell\).
Completeness: Given \(\omega \in \mathcal{O}_K\), choose \(i\) with \(\omega \equiv \beta_i \pmod{B}\). Then \(\omega - \beta_i \in B\), and since \(B = \gcd((\alpha), BC)\) (because \(\gcd\!\left(\frac{(\alpha)}{B}, C\right) = \mathcal{O}_K\) implies \((\alpha) + BC = B\)), write \(\omega - \beta_i = \alpha a + b\) with \(a \in \mathcal{O}_K\) and \(b \in BC\). Choose \(j\) with \(a \equiv \gamma_j \pmod{C}\). Then \(\omega = \beta_i + \alpha\gamma_j + \alpha(a - \gamma_j) + b\), and \(\alpha(a - \gamma_j) + b \in BC\), so \(\omega \equiv \beta_i + \alpha\gamma_j \pmod{BC}\).
Therefore \(N(BC) = N(B) \cdot N(C)\). ∎
The \(efg\) Formula
The multiplicativity of the norm yields a fundamental numerical constraint on how primes split.
with \(N(\mathfrak{p}_i) = p^{f_i}\). Then
\[ \sum_{i=1}^{r} e_i f_i = n. \]∎
The Class Group
We now turn to the central invariant of algebraic number theory: the class group, which measures the obstruction to \(\mathcal{O}_K\) being a PID.
Its order \(h_K = |\operatorname{Cl}(K)|\) is called the class number of \(K\).
Equivalently, two ideals \(I\) and \(J\) represent the same class if and only if \(I = \alpha J\) for some \(\alpha \in K^{\times}\), or equivalently, if there exist nonzero \(\alpha, \beta \in \mathcal{O}_K\) with \((\alpha)I = (\beta)J\).
The class group is abelian, with identity \([\mathcal{O}_K]\), and the class of any nonzero ideal has an inverse (guaranteed by the existence of ideal inverses in a Dedekind domain). The ring \(\mathcal{O}_K\) is a PID if and only if \(h_K = 1\).
Finiteness of the Class Number
The deepest result of this chapter is that the class group is always finite.
where \(n = [K : \mathbb{Q}]\), \(r\) is the number of real embeddings, \(s\) is the number of pairs of complex embeddings (so \(n = r + 2s\)).
Step 1: Every ideal class has a small representative. Let \([I]\) be an ideal class, and choose an integral representative. Consider the inverse \(I^{-1}\). We seek \(\alpha \in I^{-1}\) (nonzero) such that \(J = \alpha I\) is an integral ideal of norm at most \(M_K\). Since \(N(\alpha I) = |N_{K/\mathbb{Q}}(\alpha)| \cdot N(I)\) (by multiplicativity), we need \(|N_{K/\mathbb{Q}}(\alpha)| \le M_K / N(I) = M_K \cdot N(I^{-1})\).
Embed \(I^{-1}\) as a lattice in Minkowski space \(V_K \cong \mathbb{R}^n\), and apply Minkowski’s Lattice Theorem: if \(S \subseteq \mathbb{R}^n\) is a convex, symmetric set with \(\operatorname{vol}(S) > 2^n \cdot |\det(I^{-1})|\), then \(S\) contains a nonzero lattice point.
Take \(S = \{(v_1, \ldots, v_n) \in V_K : \sum |v_i| \le t\}\). This set is convex and symmetric, with volume \(\operatorname{vol}(S) = 2^r \pi^s t^n / n!\). The lattice determinant is \(|\det(I^{-1})| = N(I^{-1})\sqrt{|\operatorname{disc}(K)|}\).
By the AM–GM inequality, any vector in \(S\) satisfies \(\prod |v_i| \le (t/n)^n\). So we need \((t/n)^n \le M_K N(I^{-1})\) (to ensure \(S \subseteq \Lambda\), the locus of vectors of sufficiently small norm) and \(\operatorname{vol}(S) > 2^n |\det(I^{-1})|\) (to apply Minkowski).
Combining these two conditions and optimizing \(t\) shows that for every \(B > M_K N(I^{-1})\), there exists a nonzero \(\alpha \in I^{-1}\) with \(|N(\alpha)| < B\). Since the norm of an ideal is a positive integer, this gives an ideal \(J = \alpha I\) with \(N(J) \le M_K\) in the class \([I]^{-1}\) — hence also a representative of norm at most \(M_K\) in the class \([I]\).
Step 2: Finitely many ideals of bounded norm. If \(N(I) \le M_K\), then \(N(I) \in I\) (Proposition 8.5), so \((N(I)) \subseteq I\), hence \(I \mid (N(I))\). The ideal \((N(I))\) has only finitely many divisors (by unique factorization), and \(N(I)\) ranges over finitely many integers at most \(M_K\). Thus there are only finitely many ideal classes. ∎
Computing the Class Group: Worked Examples
The Minkowski bound, combined with the Kummer–Dedekind theorem, gives a practical algorithm for computing class groups.
General Strategy:
- Compute the Minkowski bound \(M_K\).
- Factor all rational primes \(p \le M_K\) in \(\mathcal{O}_K\) using the Kummer–Dedekind theorem. The prime ideals that arise generate \(\operatorname{Cl}(K)\).
- Determine relations among these generators by finding principal ideals of small norm (often by evaluating norms of simple elements).
So we need prime ideals of norm at most \(4\), meaning we factor \(p = 2\) and \(p = 3\).
Since \(-23 \equiv 1 \pmod{4}\), the ring of integers is \(\mathcal{O}_K = \mathbb{Z}\!\left[\frac{1+\sqrt{-23}}{2}\right]\).
Factoring (2): The minimal polynomial of \(\frac{1+\sqrt{-23}}{2}\) is \(x^2 - x + 6\), which factors mod 2 as \(x(x-1) = x(x+1)\). So
\[ (2) = \mathfrak{p}\mathfrak{p}', \quad \mathfrak{p} = \left(2, \frac{1+\sqrt{-23}}{2}\right), \quad \mathfrak{p}' = \left(2, \frac{1-\sqrt{-23}}{2}\right). \]Factoring (3): Similarly, \(x^2 - x + 6 \equiv (x+1)(x-2) \pmod{3}\), giving
\[ (3) = \mathfrak{q}\mathfrak{q}', \quad \mathfrak{q} = \left(3, \frac{1+\sqrt{-23}}{2}\right), \quad \mathfrak{q}' = \left(3, \frac{1-\sqrt{-23}}{2}\right). \]Relations: Since \(\mathfrak{p}\mathfrak{p}' = (2)\) is principal, \([\mathfrak{p}'] = [\mathfrak{p}]^{-1}\) in \(\operatorname{Cl}(K)\). Similarly \([\mathfrak{q}'] = [\mathfrak{q}]^{-1}\).
One computes \(\mathfrak{p}\mathfrak{q} = \left(\frac{1-\sqrt{-23}}{2}\right)\), so \([\mathfrak{p}][\mathfrak{q}] = 1\), giving \([\mathfrak{q}] = [\mathfrak{p}]^{-1} = [\mathfrak{p}']\).
Is \(\mathfrak{p}\) principal? If \(\mathfrak{p} = (\alpha)\), then \(N(\alpha) = N(\mathfrak{p}) = 2\). But \(N\!\left(\frac{a+b\sqrt{-23}}{2}\right) = \frac{a^2+23b^2}{4}\), and \(\frac{a^2+23b^2}{4} = 2\) gives \(a^2 + 23b^2 = 8\), which has no integer solutions. So \(\mathfrak{p}\) is not principal.
Is \(\mathfrak{p}^2\) principal? We have \(\mathfrak{p}^2 \sim \mathfrak{p}' \sim \mathfrak{q}\). Since \(N\!\left(\frac{3+\sqrt{-23}}{2}\right) = \frac{9+23}{4} = 8 = N(\mathfrak{p}^3)\) and \(N\!\left(\frac{3-\sqrt{-23}}{2}\right) = 8\), these give two distinct principal ideals of norm \(8\). The ideals of norm \(8\) are \(\mathfrak{p}^3, \mathfrak{p}^2\mathfrak{p}', \mathfrak{p}\mathfrak{p}'^2, \mathfrak{p}'^3\). Since \(\mathfrak{p}^2\mathfrak{p}' \sim \mathfrak{p}\) is not principal, and \(\mathfrak{p}\mathfrak{p}'^2 \sim \mathfrak{p}'\) is not principal, the two principal ideals of norm 8 must be \(\mathfrak{p}^3\) and \(\mathfrak{p}'^3\). Hence \([\mathfrak{p}]^3 = 1\) but \([\mathfrak{p}] \ne 1\) and \([\mathfrak{p}]^2 \ne 1\) (since \([\mathfrak{p}^2] = [\mathfrak{p}'] \ne 1\)).
Therefore \(\operatorname{Cl}(\mathbb{Q}(\sqrt{-23})) \cong \mathbb{Z}/3\mathbb{Z}\), and \(h_K = 3\).
The field has one real embedding and one pair of complex embeddings (\(r = 1, s = 1\)), giving
\[ M_K = \frac{3!}{27} \cdot \frac{4}{\pi} \sqrt{135} < 4. \]We factor primes \(p \le 3\):
\(p = 2\): \(f(x) = x^3 - 3x + 3 \equiv x^3 + x + 1 \pmod{2}\), which has no roots mod 2 and is irreducible (a cubic with no roots over \(\mathbb{F}_2\)). So \((2)\) is prime.
\(p = 3\): \(f(x) \equiv x^3 \pmod{3}\), so \((3) = \mathfrak{p}_3^3\) where \(\mathfrak{p}_3 = (3, \alpha) = (\alpha)\) (since \(3 = -\alpha^3 + 3\alpha = \alpha(-\alpha^2 + 3)\), so \(3 \in (\alpha)\)).
All prime ideals of norm at most 3 are principal. Hence \(\operatorname{Cl}(K)\) is trivial and \(\mathcal{O}_K\) is a PID.
Gauss’s Conjecture and the Hilbert Class Field
We conclude with two deep results connecting the class group to the broader landscape of number theory.
This confirms a conjecture made by Gauss. Heilbronn proved in 1934 that there is at most one value beyond Gauss’s list; Baker and Stark independently showed the list is complete.
The class group has a beautiful structural interpretation via class field theory.
(1) \([E : K] = h_K\).
(2) \(E/K\) is Galois, and \(\operatorname{Gal}(E/K) \cong \operatorname{Cl}(K)\).
(3) \(E/K\) is unramified at all primes (including the archimedean ones).
(4) Every ideal of \(\mathcal{O}_K\) becomes principal in \(\mathcal{O}_E\).
(5) A prime ideal \(\mathfrak{p}\) of \(\mathcal{O}_K\) splits into \(h_K / f\) prime ideals in \(\mathcal{O}_E\), where \(f\) is the order of \([\mathfrak{p}]\) in \(\operatorname{Cl}(K)\).
The theory of algebraic numbers, as developed thus far, has been largely algebraic in character. We have studied rings of integers, ideals, and their factorization properties using the tools of commutative algebra. In this chapter, we introduce a geometric perspective that proves to be extraordinarily powerful. The central idea is that the ring of integers \(\mathcal{O}_K\) of a number field \(K\) can be viewed as a lattice in a real vector space, and that geometric properties of this lattice encode deep arithmetic information about \(K\).
The key result of this chapter is Minkowski’s Convex Body Theorem, which gives conditions under which a convex, symmetric subset of \(\mathbb{R}^n\) must contain a nonzero lattice point. Combined with the Minkowski embedding, this theorem yields a fundamental bound on norms of elements in ideals, which in turn proves the finiteness of the class number. As striking applications, we prove that every prime \(p \equiv 1 \pmod{4}\) is a sum of two squares, and that every positive integer is a sum of four squares.
Lattices in \(\mathbb{R}^n\)
The set \(\{\alpha_1, \ldots, \alpha_n\}\) is called a basis for \(\Lambda\).
A basis for a lattice \(\Lambda\) is not unique. If \(\{\alpha_1, \ldots, \alpha_n\}\) is a basis and \(P = (v_{ij})\) is an \(n \times n\) integer matrix with \(\det(P) = \pm 1\), then the vectors \(\alpha_i' = \sum_j v_{ij} \alpha_j\) also form a basis for \(\Lambda\). Conversely, any two bases for \(\Lambda\) are related by such a unimodular change-of-basis matrix, because the transition matrix and its inverse must both have integer entries, forcing the determinant to be \(\pm 1\).
The determinant (or covolume) of \(\Lambda\) is
\[ d(\Lambda) = |\det(\alpha_1, \ldots, \alpha_n)|, \]where the matrix is formed by taking the \(\alpha_i\) as column vectors.
Every point in \(\mathbb{R}^n\) can be written uniquely as \(\lambda + \gamma\) where \(\lambda \in \Lambda\) and \(\gamma \in \mathcal{P}\). The Lebesgue measure of \(\mathcal{P}\) equals \(d(\Lambda)\). The determinant is well-defined because if \(\{\alpha_1', \ldots, \alpha_n'\}\) is another basis related by a unimodular matrix \(P\), then \(|\det(\alpha_1', \ldots, \alpha_n')| = |\det P| \cdot |\det(\alpha_1, \ldots, \alpha_n)| = |\det(\alpha_1, \ldots, \alpha_n)|\).
Blichfeldt’s Theorem
The first geometric result we need is a counting argument due to Blichfeldt, which is a multi-dimensional pigeonhole principle. It says that a set whose measure exceeds \(m\) times the covolume of a lattice must contain \(m+1\) points whose pairwise differences are all lattice vectors.
- \(\mu(S) > m \cdot d(\Lambda)\), or
- \(\mu(S) = m \cdot d(\Lambda)\) and \(S\) is compact.
be the fundamental parallelepiped. Every point \(x \in \mathbb{R}^n\) has a unique representation \(x = \lambda + \gamma\) with \(\lambda \in \Lambda\) and \(\gamma \in \mathcal{P}\), and \(\mu(\mathcal{P}) = d(\Lambda)\).
For each \(\lambda \in \Lambda\), define
\[ R(\lambda) = \{\nu \in \mathcal{P} \mid \lambda + \nu \in S\}. \]The sets \(R(\lambda)\) are pairwise disjoint subsets of \(\mathcal{P}\) (after translating each piece of \(S\) back into \(\mathcal{P}\)), and
\[ \sum_{\lambda \in \Lambda} \mu(R(\lambda)) = \mu(S). \]Case 1: Suppose \(\mu(S) > m \cdot d(\Lambda) = m \cdot \mu(\mathcal{P})\). Since the sum of the measures \(\mu(R(\lambda))\) exceeds \(m \cdot \mu(\mathcal{P})\), there must exist a point \(\nu_0 \in \mathcal{P}\) that belongs to at least \(m+1\) of the sets \(R(\lambda)\). (If every point belonged to at most \(m\) sets, the sum of measures could not exceed \(m \cdot \mu(\mathcal{P})\).) Thus there exist distinct \(\lambda_1, \ldots, \lambda_{m+1} \in \Lambda\) such that \(\nu_0 + \lambda_i \in S\) for each \(i\). Setting \(x_i = \lambda_i + \nu_0\), we have \(x_i - x_j = \lambda_i - \lambda_j \in \Lambda\).
Case 2: Suppose \(\mu(S) = m \cdot d(\Lambda)\) and \(S\) is compact. For each \(r \ge 1\), let \(S_r = (1 + \epsilon_r) S\) where \(\epsilon_r \to 0^+\). Then \(\mu(S_r) = (1+\epsilon_r)^n \mu(S) > m \cdot d(\Lambda)\), so by Case 1, there exist distinct points \(x_{1,r}, \ldots, x_{m+1,r} \in S_r\) with differences in \(\Lambda\). Since \(S\) is compact (hence bounded), the sequences \(x_{j,r}\) have convergent subsequences with limits \(x_j^0 \in S\) (since \(S\) is closed). Because \(\Lambda\) is discrete, the differences \(x_{j,r} - x_{i,r} \in \Lambda\) must eventually stabilize, so \(x_j^0 - x_i^0 \in \Lambda\) as required. ∎
Minkowski’s Convex Body Theorem
The bridge from Blichfeldt’s counting argument to number theory is Minkowski’s theorem, which applies specifically to convex, symmetric sets. Recall that a set \(S \subseteq \mathbb{R}^n\) is symmetric about the origin if \(x \in S\) implies \(-x \in S\), and convex if \(x, y \in S\) implies \(\lambda x + (1-\lambda)y \in S\) for all \(0 \le \lambda \le 1\).
- \(\mu(S) > m \cdot 2^n \cdot d(\Lambda)\), or
- \(\mu(S) = m \cdot 2^n \cdot d(\Lambda)\) and \(S\) is compact,
Order the \(x_i\) so that \(x_1 > x_2 > \cdots > x_{m+1}\), where we say \(x_i > x_j\) if the first nonzero coordinate of \(x_i - x_j\) is positive. Define
\[ \lambda_j = \frac{1}{2}x_j - \frac{1}{2}x_{m+1} \in \Lambda \setminus \{0\} \]for \(j = 1, \ldots, m\). By the ordering, the \(m\) pairs \(\pm \lambda_1, \ldots, \pm \lambda_m\) are all distinct.
Since \(S\) is symmetric, \(-\frac{1}{2}x_{m+1} = \frac{1}{2}(-x_{m+1}) \in \frac{1}{2}S\). Since \(S\) is convex, \(\frac{1}{2}S\) is convex, so
\[ \lambda_j = \frac{1}{2}x_j + \frac{1}{2}(-x_{m+1}) = \frac{1}{2} \cdot \frac{1}{2}x_j + \frac{1}{2} \cdot \frac{1}{2}(-x_{m+1}) \]Wait – we need to be slightly more careful. We have \(\frac{1}{2}x_j \in \frac{1}{2}S\) and \(-\frac{1}{2}x_{m+1} \in \frac{1}{2}S\). Since \(S\) is convex and symmetric, \(\lambda_j = \frac{1}{2}x_j + (-\frac{1}{2}x_{m+1})\). In fact, since \(\frac{1}{2}x_j \in \frac{1}{2}S\) and \(-\frac{1}{2}x_{m+1} \in \frac{1}{2}S\), convexity of \(S\) gives
\[ \lambda_j = \frac{1}{2}x_j - \frac{1}{2}x_{m+1} = \frac{1}{2} x_j + \frac{1}{2}(-x_{m+1}) \in S \]because \(x_j, -x_{m+1} \in S\) and \(S\) is convex. Thus each \(\lambda_j\) is a nonzero lattice point in \(S\), and by symmetry \(-\lambda_j \in S\) as well. ∎
In the most commonly used special case \(m = 1\), Minkowski’s theorem says: if \(S\) is convex, symmetric, and \(\mu(S) > 2^n d(\Lambda)\) (or \(\mu(S) = 2^n d(\Lambda)\) with \(S\) compact), then \(S\) contains a nonzero lattice point.
Real and Complex Embeddings
To apply Minkowski’s theorem to number fields, we need to embed the ring of integers \(\mathcal{O}_K\) as a lattice in a real vector space. The tool for this is the Minkowski embedding, which uses the field embeddings of \(K\) into \(\mathbb{C}\).
Let \(K\) be a number field of degree \(n = [K:\mathbb{Q}]\). There are exactly \(n\) field homomorphisms \(\sigma_1, \ldots, \sigma_n : K \hookrightarrow \mathbb{C}\) that fix \(\mathbb{Q}\). Some of these have image contained in \(\mathbb{R}\) (the real embeddings), and the rest come in conjugate pairs (the complex embeddings).
We label the embeddings so that \(\sigma_1, \ldots, \sigma_{r_1}\) are real, and \(\sigma_{r_1+1}, \overline{\sigma_{r_1+1}}, \ldots, \sigma_{r_1+r_2}, \overline{\sigma_{r_1+r_2}}\) are the complex conjugate pairs.
The Minkowski Embedding
defined by
\[ \sigma(\alpha) = \bigl(\sigma_1(\alpha), \ldots, \sigma_{r_1}(\alpha), \sigma_{r_1+1}(\alpha), \ldots, \sigma_{r_1+r_2}(\alpha)\bigr). \]By identifying \(\mathbb{C}\) with \(\mathbb{R}^2\) via \(z \mapsto (\operatorname{Re}(z), \operatorname{Im}(z))\), we may view \(\sigma\) as a map \(\sigma : K \hookrightarrow \mathbb{R}^n\).
The target space \(\mathbb{R}^{r_1} \times \mathbb{C}^{r_2}\) is sometimes called Minkowski space and denoted \(V_K\). It is a real vector space of dimension \(r_1 + 2r_2 = n\). As a ring, it has coordinate-wise operations, and the norm map \(N : V_K \to \mathbb{R}\) given by
\[ N(x_1, \ldots, x_{r_1}, z_{r_1+1}, \ldots, z_{r_1+r_2}) = \prod_{i=1}^{r_1} x_i \cdot \prod_{j=r_1+1}^{r_1+r_2} |z_j|^2 \]satisfies \(N(\sigma(\alpha)) = N_{K/\mathbb{Q}}(\alpha)\) for all \(\alpha \in K\).
\(\sigma(\mathcal{O}_K)\) Is a Lattice
The crucial geometric fact is that the image of the ring of integers under the Minkowski embedding forms a lattice.
Form the \(n \times n\) matrix \(B\) whose \((i,j)\)-entry is \(\sigma_i(\omega_j)\) (using all \(n\) embeddings, including conjugate pairs). The columns of \(B\) are the vectors \(\sigma(\omega_j)\) (before identifying \(\mathbb{C}\) with \(\mathbb{R}^2\)). If there were a linear dependence among the columns, then there would exist complex numbers \(a_1, \ldots, a_n\) (not all zero) such that \(\sum_j a_j \sigma_i(\omega_j) = 0\) for all \(i\). By linearity of the \(\sigma_i\), this would give \(\sigma_i(\sum_j a_j \omega_j) = 0\) for each \(i\), making the embeddings \(\sigma_1, \ldots, \sigma_n\) linearly dependent as functions from \(K\) to \(\mathbb{C}\).
But by the theorem on linear independence of characters (distinct homomorphisms from a group to a field are linearly independent), the embeddings \(\sigma_1, \ldots, \sigma_n\) are linearly independent over \(\mathbb{C}\). This contradiction shows that the columns of \(B\) are linearly independent over \(\mathbb{C}\), hence over \(\mathbb{R}\). ∎
More generally, any nonzero ideal \(A \subseteq \mathcal{O}_K\) gives a sublattice \(\sigma(A)\) of \(\sigma(\mathcal{O}_K)\), and we can compute its covolume precisely.
since \(\operatorname{Re}(z) = \frac{z + \bar{z}}{2}\) and \(\operatorname{Im}(z) = \frac{z - \bar{z}}{2i}\). By the discriminant formula, \(|\det(\sigma_j(\alpha_i))| = \sqrt{|\operatorname{disc}(K)|} \cdot N(A)\). Therefore
\[ d(\Lambda) = |D_0| = \frac{1}{2^{r_2}} \sqrt{|\operatorname{disc}(K)|} \cdot N(A) \]and since \(D_0 \neq 0\), the image \(\sigma(A)\) is indeed a lattice. ∎
The Minkowski Bound
We are now ready to state and prove the fundamental norm bound that underlies the finiteness of the class number. The idea is to choose a convex, symmetric set in Minkowski space whose volume is large enough to guarantee a nonzero lattice point via Minkowski’s theorem.
In particular, every ideal class of \(\mathcal{O}_K\) contains an ideal of norm at most
\[ C_K = \left(\frac{2}{\pi}\right)^{r_2} \sqrt{|\operatorname{disc}(K)|}. \]This set is convex and symmetric about the origin. Its measure is
\[ \mu(S_t) = 2^{r_1} \pi^{r_2} t^n. \]We choose \(t\) so that \(\mu(S_t) = 2^n \cdot d(\Lambda)\), where \(\Lambda = \sigma(A)\) has \(d(\Lambda) = 2^{-r_2} \sqrt{|\operatorname{disc}(K)|} \cdot N(A)\). This gives
\[ 2^{r_1} \pi^{r_2} t^n = 2^n \cdot 2^{-r_2} \sqrt{|\operatorname{disc}(K)|} \cdot N(A), \]so
\[ t = \left(\frac{2}{\pi}\right)^{r_2/n} \bigl(\sqrt{|\operatorname{disc}(K)|} \cdot N(A)\bigr)^{1/n}. \]Since \(S_t\) is compact, Minkowski’s theorem (Theorem 9.4 with \(m = 1\)) yields a nonzero lattice point \(\sigma(\alpha) \in S_t\) for some \(\alpha \in A\), \(\alpha \neq 0\). Then
\[ |N_{K/\mathbb{Q}}(\alpha)| = \prod_{i=1}^{r_1} |\sigma_i(\alpha)| \cdot \prod_{j=1}^{r_2} |\sigma_{r_1+j}(\alpha)|^2 \le t^{r_1} \cdot (t^2)^{r_2} = t^n = \left(\frac{2}{\pi}\right)^{r_2} \sqrt{|\operatorname{disc}(K)|} \cdot N(A). \]For the second statement, let \([I]\) be any ideal class, and let \(A\) be an ideal with \(IA \sim (1)\). By the above, there exists \(\alpha \in A\), \(\alpha \neq 0\), with \(|N(\alpha)| \le C_K \cdot N(A)\). Since \(\alpha \in A\), there is an ideal \(B\) with \(AB = (\alpha)\), so \(B \sim I\). Since \(N(A) \cdot N(B) = N(\alpha) \le C_K \cdot N(A)\), we get \(N(B) \le C_K\). ∎
This improved constant \((4/\pi)^{r_2} \cdot n!/n^n\) is smaller than the original \((2/\pi)^{r_2}\) for \(n \ge 2\).
Application: Sums of Two Squares
As a beautiful application, we use Minkowski’s theorem to prove the classical result characterizing which primes are sums of two squares.
- \(\left(\frac{-1}{p}\right) = 1\).
- \(p \equiv 1 \pmod{4}\).
- There exist \(x, y \in \mathbb{Z}\) such that \(p = x^2 + y^2\).
(ii) \(\Rightarrow\) (iii) via Minkowski: Since \(\left(\frac{-1}{p}\right) = 1\), there exists \(\ell \in \mathbb{Z}\) with \(\ell^2 \equiv -1 \pmod{p}\). Consider the lattice \(\Lambda \subseteq \mathbb{R}^2\) with \(\mathbb{Z}\)-basis \(\{(1, \ell), (0, p)\}\), so that \(d(\Lambda) = p\). Let \(S\) be the open disc of radius \(r = \sqrt{2p/\pi}\) centered at the origin. Then \(\mu(S) = \pi r^2 = 2p > 2^2 \cdot p = 4p\)… let us recalculate: we need \(\mu(S) > 4 d(\Lambda) = 4p\), so set \(r^2 = 4p/\pi\), giving \(\mu(S) = 4p\). Since the closed disc is compact, Minkowski’s theorem gives a nonzero lattice point \((m, m\ell + np) \in S\). Then
\[ 0 < m^2 + (m\ell + np)^2 \le r^2 < 2p. \]Working modulo \(p\), we have \(m^2 + (m\ell + np)^2 \equiv m^2 + m^2\ell^2 = m^2(1 + \ell^2) \equiv 0 \pmod{p}\). Since the quantity is strictly between \(0\) and \(2p\) and divisible by \(p\), it must equal \(p\). Thus \(p = m^2 + (m\ell + np)^2\).
(iii) \(\Rightarrow\) (ii): Since the squares modulo 4 are \(\{0, 1\}\), and \(p\) is odd, we need \(x^2 + y^2 \equiv 1 \pmod{4}\) (as \(p\) is odd, not both \(x, y\) are even), which forces \(p \equiv 1 \pmod{4}\). ∎
One can also give a proof using the Gaussian integers: since \(p \equiv 1 \pmod{4}\), the prime \(p\) splits in \(\mathbb{Z}[i]\) as \((p) = PQ\), where \(P, Q\) are conjugate prime ideals. Since \(\mathbb{Z}[i]\) is a PID, \(P = (a + bi)\), and then \(p = N(P) = a^2 + b^2\).
Application: Lagrange’s Four Squares Theorem
The Minkowski-theoretic approach extends to prove that every positive integer is a sum of four squares. The key is to work in a four-dimensional lattice related to quaternions.
where
\[ c_1 = a_1b_1 - a_2b_2 - a_3b_3 - a_4b_4, \quad c_2 = a_1b_2 + a_2b_1 + a_3b_4 - a_4b_3, \]\[ c_3 = a_1b_3 - a_2b_4 + a_3b_1 + a_4b_2, \quad c_4 = a_1b_4 + a_2b_3 - a_3b_2 + a_4b_1. \]By this identity, to show every positive integer is a sum of four squares, it suffices to prove it for primes.
Claim: For any prime \(p\), there exist \(a, b \in \mathbb{Z}\) such that \(a^2 + b^2 \equiv -1 \pmod{p}\).
If \(p \equiv 1 \pmod{4}\), then \(-1\) is a square mod \(p\) and we may take \(b = 0\). If \(p \equiv 3 \pmod{4}\), the set \(\{y^2 + 1 \mid y \in \mathbb{F}_p\}\) has \(\frac{p+1}{2}\) elements (counting \(0^2 + 1 = 1\) and the \(\frac{p-1}{2}\) distinct nonzero squares, each giving a distinct value after adding 1). Since there are only \(\frac{p-1}{2}\) nonzero squares in \(\mathbb{F}_p\), some value \(y_0^2 + 1\) must be a non-square. Then \(-(y_0^2+1)\) is a square, so there exists \(x_0\) with \(x_0^2 \equiv -(y_0^2+1) \pmod{p}\). (For \(p = 2\) the result is trivial: \(2 = 1^2 + 1^2 + 0^2 + 0^2\).)
Now consider the lattice \(\Lambda \subseteq \mathbb{R}^4\) with basis
\[ v_1 = (1, 0, a, b), \quad v_2 = (0, 1, b, -a), \quad v_3 = (0, 0, p, 0), \quad v_4 = (0, 0, 0, p). \]Then \(d(\Lambda) = p^2\). Let \(S\) be a closed ball of radius \(r\) with \(\mu(S) = \frac{\pi^2 r^4}{2}\). Choose \(r^2 = \frac{4p}{\pi\sqrt{2}}\) so that \(\mu(S) = 2^4 p^2\); since the ball is compact, Minkowski’s theorem yields a nonzero lattice point \((x, y, z, w) \in S\).
Writing \((x,y,z,w) = \alpha v_1 + \beta v_2 + \gamma v_3 + \delta v_4\), we have \(x = \alpha\), \(y = \beta\), \(z = a\alpha + b\beta + p\gamma\), \(w = b\alpha - a\beta + p\delta\). Working modulo \(p\):
\[ x^2 + y^2 + z^2 + w^2 \equiv \alpha^2 + \beta^2 + (a\alpha + b\beta)^2 + (b\alpha - a\beta)^2 \equiv (1 + a^2 + b^2)(\alpha^2 + \beta^2) \equiv 0 \pmod{p} \]since \(a^2 + b^2 \equiv -1\). Also \(0 < x^2 + y^2 + z^2 + w^2 \le r^2 < 2p\). Since the sum is positive, divisible by \(p\), and less than \(2p\), it must equal \(p\). ∎
Chapter 10: Dirichlet’s Unit Theorem
The group of units \(\mathcal{O}_K^*\) of the ring of integers of a number field \(K\) is a fundamental invariant that controls many arithmetic properties, from the structure of principal ideals to the solvability of Diophantine equations. In a PID like \(\mathbb{Z}\), the unit group is simply \(\{\pm 1\}\), but for general number fields, the unit group can be much richer. Dirichlet’s Unit Theorem gives a complete description of this group: it is finitely generated, consisting of a finite torsion part (the roots of unity in \(K\)) and a free part whose rank is determined by the signature of \(K\).
Units in Imaginary Quadratic Fields
We begin with the simplest case. If \(K = \mathbb{Q}(\sqrt{d})\) with \(d < 0\), then \(K\) has no real embeddings (\(r_1 = 0\)) and one pair of complex embeddings (\(r_2 = 1\)). Dirichlet’s theorem predicts that the unit group has rank \(r_1 + r_2 - 1 = 0\), so the unit group consists entirely of roots of unity.
Units in Real Quadratic Fields
For \(K = \mathbb{Q}(\sqrt{d})\) with \(d > 0\), both embeddings are real (\(r_1 = 2\), \(r_2 = 0\)), and Dirichlet’s theorem predicts rank \(r_1 + r_2 - 1 = 1\). Thus \(\mathcal{O}_K^* \cong \{\pm 1\} \times \mathbb{Z}\), and the unit group is generated by \(-1\) and a single fundamental unit.
Finding the fundamental unit amounts to solving the Pell equation \(x^2 - dy^2 = \pm 1\), and we shall see in Appendix A that continued fractions provide an efficient algorithm for this.
The Norm Criterion for Units
Before proving Dirichlet’s theorem in general, we record the basic characterization of units by their norm.
Conversely, if \(N(\alpha) = \pm 1\), then \(\prod_{i=1}^n \sigma_i(\alpha) = \pm 1\). The element \(\alpha^{-1} = \pm \prod_{i \neq 1} \sigma_i(\alpha)\) is a product of algebraic integers (the conjugates of \(\alpha\)), hence an algebraic integer. Since \(\alpha^{-1} \in K\), we have \(\alpha^{-1} \in \mathcal{O}_K\), so \(\alpha\) is a unit. ∎
The Logarithmic Embedding
The proof of Dirichlet’s Unit Theorem rests on converting the multiplicative structure of the unit group into an additive one using logarithms. The key construction is the logarithmic map.
defined by
\[ L(x_1, \ldots, x_{r_1}, z_{r_1+1}, \ldots, z_{r_1+r_2}) = (\log|x_1|, \ldots, \log|x_{r_1}|, 2\log|z_{r_1+1}|, \ldots, 2\log|z_{r_1+r_2}|). \]The factor of 2 in the complex places is natural: it accounts for the fact that each complex embedding contributes \(|z|^2\) to the norm. With this normalization, if \(\alpha \in K^*\) and \(L(\sigma(\alpha)) = (\ell_1, \ldots, \ell_{r_1+r_2})\), then
\[ \sum_{i=1}^{r_1+r_2} \ell_i = \log|N_{K/\mathbb{Q}}(\alpha)|. \]In particular, if \(\alpha \in \mathcal{O}_K^*\), then \(N(\alpha) = \pm 1\), so
\[ \ell_1 + \cdots + \ell_{r_1+r_2} = 0. \]Thus \(L(\sigma(\mathcal{O}_K^*))\) lies in the hyperplane \(H = \{x \in \mathbb{R}^{r_1+r_2} \mid \sum x_i = 0\}\), which has dimension \(r_1 + r_2 - 1\).
The map \(L\) is a group homomorphism from the multiplicative group \(V_K^*\) to the additive group \(\mathbb{R}^{r_1+r_2}\), because \(L(uv) = L(u) + L(v)\).
Roots of Unity as the Kernel
Conversely, suppose \(\alpha \in \mathcal{O}_K\) with \(|\sigma_i(\alpha)| \le 1\) for all \(i\). The set of such elements is a bounded subset of the lattice \(\sigma(\mathcal{O}_K)\), hence finite. Now if \(\alpha \in \mathcal{O}_K^*\) lies in the kernel, then every power \(\alpha^k\) also satisfies \(|\sigma_i(\alpha^k)| = 1\) for all \(i\), so the powers \(\alpha^k\) lie in a finite set. Therefore \(\alpha^j = \alpha^k\) for some \(j \neq k\), giving \(\alpha^{j-k} = 1\), so \(\alpha\) is a root of unity. ∎
Dirichlet’s Unit Theorem
The proof proceeds in several steps. We must show that \(L(\sigma(\mathcal{O}_K^*))\) is a full-rank lattice in the hyperplane \(H\).
Step 1: \(L(U)\) is a discrete subgroup of \(H\).
Let \(B = \{(y_1, \ldots, y_{r_1+r_2}) \in \mathbb{R}^{r_1+r_2} \mid |y_i| \le b\}\) be an arbitrary hypercube. If \(L(\sigma(\alpha)) \in B\), then \(|\sigma_i(\alpha)| \le e^b\) for real embeddings and \(|\sigma_j(\alpha)| \le e^{b/2}\) for complex embeddings. Then the minimal polynomial \(\prod_\sigma (t - \sigma(\alpha)) \in \mathbb{Z}[t]\) has coefficients bounded in terms of \(b\) alone. There are only finitely many such integer polynomials, hence only finitely many such \(\alpha\). Therefore \(L(U) \cap B\) is finite for every bounded \(B\), so \(L(U)\) is discrete.
Since \(L(U)\) is a discrete subgroup of the \((r_1+r_2-1)\)-dimensional space \(H\), it is a free abelian group of rank \(r \le r_1 + r_2 - 1\).
Step 2: \(G/U\) is compact, where \(G = \{v \in V_K^* \mid |N(v)| = 1\}\).
Consider \(G = \{v \in V_K^* \mid |N(v)| = 1\}\), which is a closed subgroup of \(V_K^*\). For any \(v \in G\), multiplication by \(v\) preserves the Lebesgue measure of regions in \(V_K\) (since \(|N(v)| = 1\) and \(v\) acts as a linear map with determinant \(\pm N(v)\)).
Let \(C \subseteq G\) be any compact, symmetric, convex region with \(\mu(C) \ge 2^n \cdot d(\sigma(\mathcal{O}_K))\). For any \(g \in G\), the translated region \(g^{-1}C\) has the same measure and is still symmetric, compact, and convex. By Minkowski’s theorem, there exists \(0 \neq \alpha \in \mathcal{O}_K\) with \(\sigma(\alpha) \in g^{-1}C\).
Since \(C\) is compact, \(|N(C)|\) is bounded, so there are only finitely many possible values of \(|N(\alpha)|\) arising this way. Let \(\alpha_1, \ldots, \alpha_m \in \mathcal{O}_K\) represent all possible absolute norms. For any \(g \in G\), we have \(\sigma(\alpha) \in g^{-1}C\) for some \(\alpha\) with \(|N(\alpha)| = |N(\alpha_i)|\), so \(\alpha = \alpha_i u\) for some unit \(u\), and hence \(gU \cap \sigma(\alpha_i^{-1})C \neq \emptyset\).
Therefore the coset space \(G/U\) is covered by the finite union \(\bigcup_{i=1}^m G \cap \sigma(\alpha_i^{-1})C\), which is compact. Since \(G/U\) is a closed subset of a compact set, it is compact.
Step 3: \(L(U)\) has rank \(r_1 + r_2 - 1\).
The map \(L : G \to H \cong \mathbb{R}^{r_1+r_2-1}\) is continuous and surjective. Since \(G/U\) is compact, its image \(L(G)/L(U) = H/L(U)\) is compact. Now \(L(U) \cong \mathbb{Z}^r\) for some \(r \le r_1+r_2-1\), and \(H/L(U) \cong (S^1)^r \times \mathbb{R}^{r_1+r_2-1-r}\). For this to be compact, we need \(r_1+r_2-1-r = 0\), i.e., \(r = r_1+r_2-1\).
Conclusion: The kernel of \(L \circ \sigma\) restricted to \(\mathcal{O}_K^*\) is \(\mu_K\) (Theorem 10.5), and the image \(L(U)\) is a free abelian group of rank \(r_1+r_2-1\). By the first isomorphism theorem,
\[ \mathcal{O}_K^*/\mu_K \cong L(U) \cong \mathbb{Z}^{r_1+r_2-1}. \]Since \(\mu_K\) is finite, the sequence \(1 \to \mu_K \to \mathcal{O}_K^* \to \mathbb{Z}^{r_1+r_2-1} \to 0\) splits (as \(\mathbb{Z}^{r_1+r_2-1}\) is free), giving \(\mathcal{O}_K^* \cong \mu_K \times \mathbb{Z}^{r_1+r_2-1}\). ∎
Showing an Ideal Is Not Principal Using the Unit Group
Dirichlet’s theorem has practical consequences for computing class groups. Suppose we want to determine whether an ideal \(I\) of \(\mathcal{O}_K\) is principal. If \(I = (\alpha)\), then \(\alpha\) has norm \(\pm N(I)\). The image \(L(\sigma(\alpha))\) lies in the affine hyperplane \(\sum \ell_i = \log N(I)\). Since any two generators of \(I\) differ by a unit, and the image of the unit group is a lattice in \(H\), we can search a bounded fundamental domain for possible generators.
The element \(\alpha\) has \(N(\alpha) = -1\), so \(\alpha\) is a unit (in fact a fundamental unit). The real root of \(x^3 + 4x + 1\) is approximately \(-0.246\), so the complex roots have absolute value approximately \(2.015\). If \(P\) were principal with generator \(y\), then by translating by appropriate powers of the fundamental unit, we could arrange \(1 \le |y_1| \le 4\) (where \(y_1\) is the real embedding of \(y\)). The constraint \(|y_1| \cdot |y_2|^2 = 2\) then gives \(\frac{1}{\sqrt{2}} \le |y_2| \le \sqrt{2}\). A finite search through elements of \(\mathcal{O}_K\) in this bounded region of Minkowski space reveals that none have norm 2. Therefore \(P\) is not principal.
Chapter 11: Quadratic Reciprocity
Quadratic reciprocity is one of the crown jewels of number theory. The law, first conjectured by Euler and Legendre and proved by Gauss, governs when one prime is a quadratic residue modulo another. In this chapter, we give a proof of quadratic reciprocity that flows naturally from the algebraic number theory we have developed, using cyclotomic fields and the Frobenius automorphism.
The Legendre Symbol
Suppose we wish to determine when a quadratic equation \(x^2 \equiv a \pmod{p}\) has a solution for an odd prime \(p\). The Legendre symbol organizes this information.
The set \(H_p\) of quadratic residues modulo \(p\) (i.e., the set of nonzero squares in \(\mathbb{F}_p^*\)) is a subgroup of index 2 in \(\mathbb{F}_p^*\). The Legendre symbol defines a surjective group homomorphism \(\phi : \mathbb{F}_p^* \to \{\pm 1\}\) with kernel \(H_p\). This immediately gives the multiplicativity of the Legendre symbol and a formula using Euler’s criterion.
- (Multiplicativity) \(\displaystyle\left(\frac{ab}{p}\right) = \left(\frac{a}{p}\right)\left(\frac{b}{p}\right)\) for all \(a, b\) with \(p \nmid ab\).
- (Euler's criterion) \(\displaystyle\left(\frac{a}{p}\right) \equiv a^{(p-1)/2} \pmod{p}\).
- (First supplement) \(\displaystyle\left(\frac{-1}{p}\right) = (-1)^{(p-1)/2} = \begin{cases} 1 & \text{if } p \equiv 1 \pmod{4}, \\ -1 & \text{if } p \equiv 3 \pmod{4}. \end{cases}\)
Quadratic Reciprocity via Cyclotomic Galois Theory
The algebraic number theory approach to quadratic reciprocity uses the Galois theory of cyclotomic fields. The key idea is to connect the Legendre symbol to the splitting behavior of primes, which in turn is governed by the Frobenius automorphism.
Let \(p\) be an odd prime, and consider the cyclotomic field \(\mathbb{Q}(\zeta_p)\). Its Galois group is
\[ \operatorname{Gal}(\mathbb{Q}(\zeta_p)/\mathbb{Q}) \cong (\mathbb{Z}/p\mathbb{Z})^* \]via \(\sigma_a : \zeta_p \mapsto \zeta_p^a\) for \(a \in (\mathbb{Z}/p\mathbb{Z})^*\).
for all \(\alpha \in \mathbb{Z}[\zeta_p]\).
That \(\operatorname{Frob}_q = \sigma_q\) is easily verified: for \(\alpha = \sum a_i \zeta_p^i\) with \(a_i \in \mathbb{Z}\),
\[ \alpha^q \equiv \sum a_i \zeta_p^{qi} = \sigma_q(\alpha) \pmod{\mathfrak{Q}} \]since \(\mathbb{Z}[\zeta_p]/\mathfrak{Q}\) has characteristic \(q\).
Now let \(H_p\) be the subgroup of squares in \((\mathbb{Z}/p\mathbb{Z})^*\). By the fundamental theorem of Galois theory, \(H_p\) corresponds to a unique quadratic subfield of \(\mathbb{Q}(\zeta_p)\). Define \(p^* = (-1)^{(p-1)/2} p\), and let \(K_p = \mathbb{Q}(\sqrt{p^*})\).
This is a standard result from the theory of Gauss sums, which shows that \(\sqrt{p^*}\) lies in \(\mathbb{Q}(\zeta_p)\).
The connection between the Legendre symbol and splitting is now clear:
\[ \left(\frac{q}{p}\right) = 1 \iff q \in H_p \iff \sigma_q \text{ fixes } K_p \iff q \text{ splits in } K_p. \]Equivalently,
\[ \left(\frac{q}{p}\right) = \left(\frac{p^*}{q}\right) \]where \(p^* = (-1)^{(p-1)/2} p\).
By the theory of splitting in quadratic fields, the prime \(q\) splits in \(\mathbb{Q}(\sqrt{p^*})\) if and only if \(p^*\) is a square modulo \(q\), i.e., \(\left(\frac{p^*}{q}\right) = 1\). Therefore
\[ \left(\frac{q}{p}\right) = \left(\frac{p^*}{q}\right) = \left(\frac{(-1)^{(p-1)/2} p}{q}\right) = \left(\frac{-1}{q}\right)^{(p-1)/2} \left(\frac{p}{q}\right) = (-1)^{\frac{(p-1)(q-1)}{4}} \left(\frac{p}{q}\right). \]Rearranging gives the classical form of quadratic reciprocity. ∎
Now \(\left(\frac{3}{11}\right) = -\left(\frac{11}{3}\right) = -\left(\frac{-1}{3}\right) = -(-1) = 1\) (using \(11 \equiv 2 \equiv -1 \pmod{3}\) and the first supplement). By the second supplement (or direct computation), \(\left(\frac{2}{11}\right)\) can be evaluated. Since \(11 \equiv 3 \pmod{8}\), we have \(\left(\frac{2}{11}\right) = -1\). Therefore \(\left(\frac{113}{17}\right) = (-1)(1) = -1\), so 113 is not a quadratic residue modulo 17.
Connection to the Frobenius Map and Splitting of Primes
The proof of quadratic reciprocity reveals a deep connection between the Legendre symbol and the arithmetic of primes in number fields. The Frobenius automorphism \(\operatorname{Frob}_q\) encodes the splitting behavior of the prime \(q\): it acts on the residue field \(\mathbb{Z}[\zeta_p]/\mathfrak{Q}\) as the Frobenius endomorphism \(x \mapsto x^q\).
Chapter 12: Fermat’s Last Theorem
Fermat’s Last Theorem, stating that the equation \(x^n + y^n = z^n\) has no solutions in positive integers for \(n \ge 3\), resisted proof for over 350 years until Andrew Wiles’s celebrated work in 1995. However, long before Wiles, Ernst Kummer made dramatic progress by proving the theorem for a large class of prime exponents, the so-called regular primes. Kummer’s approach, which uses the arithmetic of cyclotomic integers \(\mathbb{Z}[\zeta_p]\), was one of the driving forces behind the development of algebraic number theory.
In this chapter, we present Kummer’s proof for Case I of Fermat’s Last Theorem for regular primes.
Regular Primes
The ring \(\mathbb{Z}[\zeta_p]\) is not always a UFD, and this failure is measured by its class number \(h_p = h_{\mathbb{Q}(\zeta_p)}\). The crucial definition is:
The first few irregular primes are 37, 59, 67, and 101. It is conjectured that there are infinitely many regular primes (and in fact that about 60.65% of all primes are regular), but this remains unproven.
The importance of regularity for Fermat’s Last Theorem comes from the following observation: if \(p\) is regular and \(I\) is an ideal of \(\mathbb{Z}[\zeta_p]\) with \(I^p\) principal, then \(I\) itself is principal. This is because \([I]^p = 1\) in the class group, and since \(p \nmid h_p\), raising to the \(p\)-th power is an automorphism of the class group (by Lagrange’s theorem and the coprimality condition), so \([I] = 1\).
Kummer’s Approach: Factoring in \(\mathbb{Z}[\zeta_p]\)
The idea behind Kummer’s approach is elegant. Suppose \(x^p + y^p = z^p\) for integers \(x, y, z\) with \(p \nmid xyz\). In the ring \(\mathbb{Z}[\zeta_p]\), where \(\zeta = \zeta_p\) is a primitive \(p\)-th root of unity, we can factor
\[ z^p = x^p + y^p = \prod_{j=0}^{p-1} (x + \zeta^j y). \]If the ideals \((x + \zeta^j y)\) are pairwise coprime, then since their product is a \(p\)-th power of an ideal, each factor must itself be a \(p\)-th power. This is the starting point for Kummer’s argument.
Properties of \((1 - \zeta)\) in \(\mathbb{Z}[\zeta_p]\)
Before proceeding with the proof, we need several key properties of the element \(1 - \zeta\) in \(\mathbb{Z}[\zeta_p]\).
- The elements \(1 - \zeta, 1 - \zeta^2, \ldots, 1 - \zeta^{p-1}\) are associates.
- The element \(1 + \zeta\) is a unit.
- There exists a unit \(u \in \mathbb{Z}[\zeta]^*\) such that \(p = u(1-\zeta)^{p-1}\). In particular, \((1-\zeta)\) is the unique prime ideal of \(\mathbb{Z}[\zeta]\) lying above \(p\).
(ii) We have \(1 + \zeta = \frac{1 - \zeta^2}{1 - \zeta}\), and by part (i), both \(1 - \zeta^2\) and \(1 - \zeta\) are associates, so their ratio is a unit.
(iii) The cyclotomic polynomial gives
\[ 1 + x + \cdots + x^{p-1} = \prod_{j=1}^{p-1} (x - \zeta^j). \]Setting \(x = 1\):
\[ p = \prod_{j=1}^{p-1} (1 - \zeta^j) = (1 - \zeta)^{p-1} \prod_{j=1}^{p-1} \frac{1 - \zeta^j}{1 - \zeta} = u(1-\zeta)^{p-1} \]where \(u = \prod_{j=1}^{p-1} \frac{1-\zeta^j}{1-\zeta} \in \mathbb{Z}[\zeta]^*\) by part (i). ∎
We also need a fact about units and complex conjugation.
for every embedding \(\sigma\). An algebraic integer all of whose conjugates have absolute value 1 must be a root of unity. ∎
Kummer’s Theorem for Case I
Fermat’s Last Theorem is traditionally divided into two cases. Case I assumes that \(p \nmid xyz\), while Case II allows \(p\) to divide exactly one of \(x, y, z\). Kummer proved both cases for regular primes; we present Case I.
Step 1: The ideals \((x + \zeta^j y)\) are pairwise coprime.
Suppose \(\mathfrak{p}\) is a common prime factor of \((x + \zeta^j y)\) and \((x + \zeta^{j'} y)\) for \(j \neq j'\). Then \(\mathfrak{p}\) divides
\[ (x + \zeta^j y) - (x + \zeta^{j'} y) = \zeta^{j'} y (\zeta^{j-j'} - 1). \]By Lemma 12.2(i), \(\zeta^{j-j'} - 1\) is an associate of \(1 - \zeta\), so \(\mathfrak{p}\) divides \(y(1-\zeta)\). Since \((1-\zeta)\) is the unique prime above \(p\), either \(\mathfrak{p} = (1-\zeta)\) or \(\mathfrak{p}\) divides \((y)\).
If \(\mathfrak{p} = (1-\zeta)\), then \(\mathfrak{p}\) divides \(x + \zeta^j y \equiv x + y \pmod{(1-\zeta)}\), so \(p \mid (x+y)\). Similarly \(\mathfrak{p} \mid z^p\), so \(p \mid z\). But then \(x^p + y^p = z^p \equiv 0 \pmod{p}\), giving \(x + y \equiv 0 \pmod{p}\), so \(x^p + y^p = (x+y)(\cdots) \equiv 0 \pmod{p^2}\) (by the binomial theorem), forcing \(p^2 \mid z^p\), hence \(p \mid z\). But we assumed \(p \nmid z\), a contradiction. Similarly, \(\mathfrak{p}\) cannot divide \((y)\) without contradicting \(p \nmid y\).
Step 2: Each \((x + \zeta^j y)\) is a \(p\)-th power of an ideal.
Since the ideals \((x + \zeta^j y)\) are pairwise coprime and their product \((z^p) = (z)^p\) is a \(p\)-th power, unique factorization of ideals forces each \((x + \zeta^j y) = I_j^p\) for some ideal \(I_j\).
Step 3: Each \(I_j\) is principal (by regularity).
Since \(p\) is regular, \(p \nmid h_p\). The class \([I_j]\) satisfies \([I_j]^p = 1\) in the class group. Since \(\gcd(p, h_p) = 1\), the only element of order dividing \(p\) in the class group (which has order \(h_p\)) is the identity. So \([I_j] = 1\), meaning \(I_j\) is principal.
Step 4: Derive a contradiction.
Taking \(j = 1\), we have \((x + \zeta y) = (t)^p\) for some \(t \in \mathbb{Z}[\zeta]\), so \(x + \zeta y = ut^p\) for some unit \(u\).
Write \(t = b_0 + b_1\zeta + \cdots + b_{p-2}\zeta^{p-2}\). Working modulo the ideal \((p) = (1-\zeta)^{p-1}\), the Frobenius-type relation gives
\[ t^p \equiv (b_0 + b_1 + \cdots + b_{p-2})^p \pmod{p} \]since \(\zeta^k \equiv 1 \pmod{(1-\zeta)}\). Similarly \(\bar{t}^p \equiv (b_0 + \cdots + b_{p-2})^p \pmod{p}\), so \(t^p \equiv \bar{t}^p \pmod{p}\).
By Lemma 12.3, \(u/\bar{u} = \pm \zeta^j\) for some \(j\). Consider the case \(u/\bar{u} = \zeta^j\). Then
\[ x + y\zeta = ut^p \equiv \zeta^j \bar{u} \bar{t}^p = \zeta^j \overline{ut^p} = \zeta^j(x + y\bar{\zeta}) = \zeta^j(x + y\zeta^{-1}) \pmod{p}. \]This gives
\[ x + y\zeta - y\zeta^{j-1} - x\zeta^j \equiv 0 \pmod{p}. \]Since \(1, \zeta, \zeta^2, \ldots, \zeta^{p-2}\) form a basis for \(\mathbb{Z}[\zeta]\) over \(\mathbb{Z}\), and modulo \(p\) the ring \(\mathbb{Z}[\zeta]/(p) \cong \mathbb{F}_p[x]/(x-1)^{p-1}\), this linear combination being zero modulo \(p\) with the terms \(1, \zeta, \zeta^{j-1}, \zeta^j\) forces \(j \in \{0, 1, 2, p-1\}\) (otherwise we would have a nontrivial relation among fewer than \(p-1\) distinct basis elements). Each possibility leads to \(p \mid x\), \(p \mid y\), or \(p \mid z\), contradicting our assumption \(p \nmid xyz\). ∎
The Hilbert Class Field
Kummer’s work highlighted the role of the class group in controlling arithmetic. A deep generalization is provided by class field theory.
- \([E:K] = h_K\).
- \(E/K\) is Galois, and \(\operatorname{Gal}(E/K) \cong \operatorname{Cl}(\mathcal{O}_K)\).
- Every ideal of \(\mathcal{O}_K\) becomes principal in \(\mathcal{O}_E\).
- Every prime ideal \(\mathfrak{P}\) of \(\mathcal{O}_K\) decomposes into the product of \(h_K/f\) prime ideals in \(\mathcal{O}_E\), where \(f\) is the order of \([\mathfrak{P}]\) in \(\operatorname{Cl}(\mathcal{O}_K)\).
- The extension \(E/K\) is unramified at all primes (both finite and infinite).
The existence of the Hilbert class field was conjectured by Hilbert and proved by Furtwangler. It is the maximal abelian unramified extension of \(K\), and its construction is one of the cornerstones of class field theory. While the full proof lies well beyond the scope of these notes, the statement itself illuminates the deep connection between ideal class groups and Galois theory.
Appendix A: Continued Fractions
Continued fractions provide a systematic way to produce the best rational approximations to real numbers. For quadratic irrationals, the continued fraction expansion is eventually periodic, a fact that connects directly to the computation of fundamental units in real quadratic fields and to the solution of Pell’s equation. This appendix develops the theory from scratch and concludes with explicit computations.
Finite and Infinite Continued Fractions
When \(a_0 \in \mathbb{Z}\) and \(a_k \in \mathbb{Z}^+\) for \(k \ge 1\), the infinite continued fraction is
\[ [a_0, a_1, a_2, \ldots] = \lim_{n \to \infty} [a_0, a_1, \ldots, a_n]. \]with \(0 = r_n < r_{n-1} < \cdots < r_1 < b\). Then
\[ x = \frac{a}{b} = q_1 + \frac{r_1}{b} = q_1 + \cfrac{1}{b/r_1} = q_1 + \cfrac{1}{q_2 + \cfrac{r_2}{r_1}} = \cdots = [q_1, q_2, \ldots, q_n]. \]∎
Convergents and Their Recursive Computation
The power of continued fractions lies in the elegant recursive formulas for the successive approximations.
where \(p_n\) and \(q_n\) are defined by the recursion:
\[ p_0 = a_0, \quad p_1 = a_1 a_0 + 1, \quad p_k = a_k p_{k-1} + p_{k-2} \quad (k \ge 2), \]\[ q_0 = 1, \quad q_1 = a_1, \quad q_k = a_k q_{k-1} + q_{k-2} \quad (k \ge 2). \]For the inductive step, suppose the formula holds for continued fractions of length \(k\). Then
\[ c_{k+1} = [a_0, a_1, \ldots, a_k, a_{k+1}] = \left[a_0, a_1, \ldots, a_{k-1}, a_k + \frac{1}{a_{k+1}}\right]. \]By the inductive hypothesis (applied with \(a_k' = a_k + 1/a_{k+1}\) in the last position, noting that \(p_i' = p_i\) and \(q_i' = q_i\) for \(i < k\)):
\[ c_{k+1} = \frac{a_k' p_{k-1} + p_{k-2}}{a_k' q_{k-1} + q_{k-2}} = \frac{(a_k + 1/a_{k+1}) p_{k-1} + p_{k-2}}{(a_k + 1/a_{k+1}) q_{k-1} + q_{k-2}}. \]Multiplying numerator and denominator by \(a_{k+1}\):
\[ c_{k+1} = \frac{a_{k+1}(a_k p_{k-1} + p_{k-2}) + p_{k-1}}{a_{k+1}(a_k q_{k-1} + q_{k-2}) + q_{k-1}} = \frac{a_{k+1} p_k + p_{k-1}}{a_{k+1} q_k + q_{k-1}} = \frac{p_{k+1}}{q_{k+1}}. \]∎
Properties of Convergents
The convergents satisfy beautiful arithmetic properties that make them the best possible rational approximations.
- For all \(k \ge 0\): \(p_{k+1} q_k - q_{k+1} p_k = (-1)^k\).
- For all \(k \ge 0\): \(\gcd(p_k, q_k) = 1\).
- For all \(k \ge 0\): \(\displaystyle c_{k+1} - c_k = \frac{(-1)^k}{q_{k+1} q_k}\).
- The sequence \(\{c_n\}\) converges.
- If \(x = [a_0, a_1, a_2, \ldots]\), then \(c_{2k} < x < c_{2k+1}\) for all \(k \ge 0\).
By induction, \(p_k q_{k-1} - q_k p_{k-1} = (-1)^{k-1}\), so \(p_{k+1} q_k - q_{k+1} p_k = (-1)^k\).
(ii) From (i), \(p_{k+1} q_k - q_{k+1} p_k = (-1)^k\), so \(\gcd(p_k, q_k) \mid (-1)^k\), giving \(\gcd(p_k, q_k) = 1\).
(iii) Using (i):
\[ c_{k+1} - c_k = \frac{p_{k+1}}{q_{k+1}} - \frac{p_k}{q_k} = \frac{p_{k+1} q_k - q_{k+1} p_k}{q_{k+1} q_k} = \frac{(-1)^k}{q_{k+1} q_k}. \](iv) and (v): From (iii) and the fact that \(q_k \ge 1\) is strictly increasing for \(k \ge 1\), the differences \(c_{k+1} - c_k\) alternate in sign and decrease in absolute value. By the alternating series test, the series \(c_0 + \sum_{k=0}^\infty (c_{k+1} - c_k)\) converges, and the even convergents \(c_0 < c_2 < c_4 < \cdots\) increase to the limit while the odd convergents \(c_1 > c_3 > c_5 > \cdots\) decrease to it. ∎
Irrational Numbers and Infinite Continued Fractions
This gives \(s > q_k\) for all \(k\), contradicting \(q_k \to \infty\).
Uniqueness of partial quotients: If \(x = [a_0, a_1, a_2, \ldots]\), then \(a_0 < x < a_0 + 1\) (by Theorem A.5(v) with the fact that \([a_1, a_2, \ldots] > a_1 \ge 1\)), so \(a_0 = \lfloor x \rfloor = \lfloor x_0 \rfloor\). Then \(x_1 = 1/(x - a_0) = [a_1, a_2, \ldots]\), and by induction \(a_n = \lfloor x_n \rfloor\) and \(x_n = [a_n, a_{n+1}, \ldots]\).
Every irrational has a continued fraction expansion: Given \(x \in \mathbb{R} \setminus \mathbb{Q}\), define \(x_0 = x\), \(a_k = \lfloor x_k \rfloor\), and \(x_{k+1} = 1/(x_k - a_k)\). Since \(x_0\) is irrational, \(x_k - a_k\) is never zero, and each \(x_k\) is irrational with \(a_k \ge 1\) for \(k \ge 1\). One verifies that \(x = [a_0, a_1, \ldots, a_n, x_{n+1}]\) for all \(n\), so
\[ |x - c_n| = \frac{1}{q_n(x_{n+1} q_n + q_{n-1})} < \frac{1}{q_n^2} \to 0, \]giving \(x = \lim c_n = [a_0, a_1, a_2, \ldots]\). ∎
Best Approximation Property
Convergents are optimal rational approximations in a strong sense.
- If \(|sx - r| < |q_k x - p_k|\), then \(s \ge q_{k+1}\).
- If \(|x - r/s| < |x - p_k/q_k|\), then \(s > q_k\).
- If \(|x - r/s| < 1/(2s^2)\), then \(r/s = c_k\) for some \(k\).
One finds \(u, v \in \mathbb{Z}\) with \(u \neq 0\) (since \(v = 0\) would give \(s = uq_k\) and \(|sx - r| \ge |q_k x - p_k|\)) and \(v \neq 0\) (since \(u = 0\) would give \(s = vq_{k+1} \ge q_{k+1}\)). Moreover, \(u\) and \(v\) have opposite signs (since \(s = uq_k + vq_{k+1} > 0\) and \(s < q_{k+1}\)).
Since \(x\) lies between \(c_k\) and \(c_{k+1}\), the quantities \(q_k x - p_k\) and \(q_{k+1}x - p_{k+1}\) have opposite signs. Combined with \(u\) and \(v\) having opposite signs, the products \(u(q_k x - p_k)\) and \(v(q_{k+1}x - p_{k+1})\) have the same sign, so
\[ |sx - r| = |u(q_k x - p_k) + v(q_{k+1}x - p_{k+1})| = |u||q_k x - p_k| + |v||q_{k+1}x - p_{k+1}| > |q_k x - p_k|, \]contradicting our assumption.
(ii) If \(|x - r/s| < |x - p_k/q_k|\) and \(s \le q_k\), then \(|sx - r| = s|x - r/s| < s|x - p_k/q_k| \le q_k|x - p_k/q_k| = |q_k x - p_k|\), contradicting (i) (which gives \(s \ge q_{k+1} > q_k\)).
(iii) Given \(|x - r/s| < 1/(2s^2)\), choose \(k\) with \(q_k \le s < q_{k+1}\). If \(r/s \neq p_k/q_k\), then \(|r/s - p_k/q_k| \ge 1/(sq_k)\) (since \(rq_k - sp_k\) is a nonzero integer). The triangle inequality gives
\[ \frac{1}{sq_k} \le \left|\frac{r}{s} - \frac{p_k}{q_k}\right| \le \left|\frac{r}{s} - x\right| + \left|x - \frac{p_k}{q_k}\right| < \frac{1}{2s^2} + \frac{1}{2sq_k}, \]where the last inequality uses (i) applied to \(p_k/q_k\). Subtracting \(1/(2sq_k)\) gives \(1/(2sq_k) < 1/(2s^2)\), so \(s < q_k\), contradicting \(q_k \le s\). ∎
Quadratic Irrationals and Periodic Continued Fractions
The connection between continued fractions and algebraic number theory runs through quadratic irrationals.
Conversely, if \(x\) is irrational and satisfies \(ax^2 + bx + c = 0\) with \(a, b, c \in \mathbb{Z}\), \(a \neq 0\), then \(x = \frac{-b \pm \sqrt{b^2 - 4ac}}{2a}\). Setting \(d = b^2 - 4ac\) (which must be positive and non-square since \(x\) is real and irrational), we get the desired form. ∎
The following celebrated theorem of Lagrange characterizes quadratic irrationals by the periodicity of their continued fraction expansions.
- The sequence \(\{a_n\}\) is eventually periodic if and only if \(x\) is a quadratic irrational.
- The sequence \(\{a_n\}\) is purely periodic if and only if \(x\) is a quadratic irrational with \(x > 1\) and \(-1 < \bar{x} < 0\), where \(\bar{x}\) denotes the conjugate of \(x\).
We write \(\overline{a_1, \ldots, a_\ell}\) for the repeating block, so that \([a_0, \overline{a_1, \ldots, a_\ell}]\) means the sequence \(a_1, \ldots, a_\ell\) repeats indefinitely.
Computing \(\sqrt{d}\) as a Continued Fraction
For a positive non-square integer \(d\), the continued fraction of \(\sqrt{d}\) has a particularly nice structure.
- \(a_\ell = 2a_0\), so that \(\sqrt{d} = [a_0, \overline{a_1, a_2, \ldots, a_{\ell-1}, 2a_0}]\).
- The sequence \(\{s_n\}\) is purely periodic with \(s_k = 1 \iff \ell \mid k\).
- For all \(k \ge 0\): \(p_k^2 - dq_k^2 = (-1)^k s_{k+1}\).
- The smallest unit \(u\) in \(\mathbb{Z}[\sqrt{d}]\) with \(u > 1\) equals \(u = p_{\ell-1} + q_{\ell-1}\sqrt{d}\), and \(u^m = p_{m\ell-1} + q_{m\ell-1}\sqrt{d}\) for all \(m \in \mathbb{Z}^+\).
Part (iv) is the key connection to number theory: the continued fraction algorithm directly computes the fundamental unit.
Connection to Pell’s Equation
Case 1: \(0 < r^2 - ds^2 \le \sqrt{d}\). Then \(r > s\sqrt{d}\), and
\[ 0 < \frac{r}{s} - \sqrt{d} = \frac{r^2 - ds^2}{s(r + s\sqrt{d})} \le \frac{\sqrt{d}}{s(s\sqrt{d} + s\sqrt{d})} = \frac{1}{2s^2}. \]By Theorem A.7(iii), \(r/s\) is a convergent of \(\sqrt{d}\).
Case 2: \(-\sqrt{d} < r^2 - ds^2 < 0\). Then \(r < s\sqrt{d}\), so \(r/\sqrt{d} < s\), and
\[ 0 < \frac{s}{r} - \frac{1}{\sqrt{d}} = \frac{s\sqrt{d} - r}{r\sqrt{d}} = \frac{ds^2 - r^2}{r\sqrt{d}(s\sqrt{d} + r)} < \frac{\sqrt{d}}{r\sqrt{d}(r + r)} = \frac{1}{2r^2}. \]By Theorem A.7(iii), \(s/r\) is a convergent of \(1/\sqrt{d}\). Since the convergents of \(1/\sqrt{d}\) are the reciprocals of the convergents of \(\sqrt{d}\) (by the observation that if \(x > 1\) then \(1/x = [0, a_0, a_1, \ldots]\) and \(d_n = 1/c_{n-1}\)), it follows that \(r/s\) is a convergent of \(\sqrt{d}\). ∎
This corollary, combined with Theorem A.11(iii) and (iv), shows that solutions to Pell’s equation \(x^2 - dy^2 = \pm 1\) correspond precisely to convergents \(p_n/q_n\) where \(s_{n+1} = 1\), and the fundamental solution is \((p_{\ell-1}, q_{\ell-1})\).
Worked Example: \(\sqrt{14}\)
| \(k\) | \(x_k\) | \(a_k\) |
|---|---|---|
| 0 | \(\sqrt{14}\) | 3 |
| 1 | \(\frac{\sqrt{14}+3}{5}\) | 1 |
| 2 | \(\frac{\sqrt{14}+2}{2}\) | 2 |
| 3 | \(\frac{\sqrt{14}+2}{5}\) | 1 |
| 4 | \(\frac{\sqrt{14}+3}{1}\) | 6 |
| 5 | \(\frac{\sqrt{14}+3}{5}\) | 1 |
Since \(x_5 = x_1\), the sequence is periodic with period \(\ell = 4\):
\[ \sqrt{14} = [3, \overline{1, 2, 1, 6}]. \]The convergents are \(3, 4, 11/3, 15/4, 101/27, \ldots\). Since \(\ell = 4\), the fundamental solution to \(x^2 - 14y^2 = 1\) comes from \((p_3, q_3) = (15, 4)\), and indeed \(15^2 - 14 \cdot 4^2 = 225 - 224 = 1\). The fundamental unit in \(\mathbb{Z}[\sqrt{14}]\) is \(15 + 4\sqrt{14}\).
Worked Example: \(\sqrt{19}\)
| \(k\) | \(x_k\) | \(a_k\) | \(p_k\) | \(q_k\) | \(N_k = p_k^2 - 19q_k^2\) |
|---|---|---|---|---|---|
| 0 | \(\sqrt{19}\) | 4 | 4 | 1 | \(-3\) |
| 1 | \(\frac{\sqrt{19}+4}{3}\) | 2 | 9 | 2 | 5 |
| 2 | \(\frac{\sqrt{19}+2}{5}\) | 1 | 13 | 3 | \(-2\) |
| 3 | \(\frac{\sqrt{19}+3}{2}\) | 3 | 48 | 11 | 5 |
| 4 | \(\frac{\sqrt{19}+3}{5}\) | 1 | 61 | 14 | \(-3\) |
| 5 | \(\frac{\sqrt{19}+2}{3}\) | 2 | 170 | 39 | 1 |
| 6 | \(\frac{\sqrt{19}+4}{1}\) | 8 |
Since \(x_6\) yields \(a_6 = 8 = 2a_0\) and then the pattern repeats, the period is \(\ell = 6\):
\[ \sqrt{19} = [4, \overline{2, 1, 3, 1, 2, 8}]. \]The norms \(N_k = p_k^2 - 19q_k^2\) confirm Theorem A.11(iii): \(N_k = (-1)^k s_{k+1}\), and \(N_5 = 1\) corresponds to \(s_6 = 1\), i.e., \(\ell \mid 6\).
The fundamental unit in \(\mathbb{Z}[\sqrt{19}]\) with \(u > 1\) is
\[ u = p_5 + q_5\sqrt{19} = 170 + 39\sqrt{19}. \]One verifies: \(170^2 - 19 \cdot 39^2 = 28900 - 28899 = 1\). Since \(\ell = 6\) is even, the norm of the fundamental unit is \(+1\), meaning Pell’s equation \(x^2 - 19y^2 = -1\) has no solution.