CS 245E: Logic and Computation (Enriched)

Eric Blais

Estimated study time: 4 hr 4 min

Table of contents

These notes synthesize and enrich material from the primary sources listed below. The goal is to present mathematical logic as a living, unified subject — from the syntax of formal languages through soundness and completeness theorems to Gödel’s incompleteness results and the limits of computation. All mathematical content is paraphrased into original prose; no text is reproduced verbatim from any source.

Sources and References

These notes draw on the following works, paraphrased and synthesized into original exposition:

  • Primary: E. Blais, CS 245E: Logic and Computation (Enriched), University of Waterloo, Fall 2025 lecture notes at cs.uwaterloo.ca/~eblais/cs245e/. The definitive reference for course content, definitions, and proofs.
  • Supplementary (CS 245 standard): C. Roberts, CS 245 Lecture Notes, University of Waterloo, Spring 2024. Available at student.cs.uwaterloo.ca/~cs245/spring2024/.
  • Supplementary (older CS 245 notes): B. Lushman and P. Ragde, CS 245 Lecture Notes (archived), University of Waterloo. Available at cs.uwaterloo.ca/~plragde/cs245old/. Covers natural deduction, propositional topology, program verification, and the connection to Boolean algebras.
  • Video lectures: S. Ben-David, Logic for CS, University of Waterloo CS 245 lecture series, Fall 2015. Available at youtube.com/@logicforcsshaiben-daviduwa2472.
  • Graduate-level reference: H.D. Ebbinghaus, J. Flum, W. Thomas, Mathematical Logic, 3rd ed., Springer, 2021; D. Marker, An Invitation to Mathematical Logic, Springer, 2024. For model-theoretic depth.
  • Gödel and computability: P. Smith, An Introduction to Gödel’s Theorems, Cambridge University Press, 2013. Open Logic Project, Incompleteness and Computability: An Open Introduction, openlogicproject.org.
  • Online resources: Stanford Encyclopedia of Philosophy entries on Gödel’s incompleteness theorems, classical logic, and model theory; MIT OpenCourseWare 6.840 (Theory of Computation).

All mathematical content is paraphrased into original prose; no text is reproduced verbatim from any source.


Chapter 1: The Language of Propositions

1.1 What Is Logic?

Logic, as a mathematical discipline, is not primarily concerned with deciding what is true or false in the world. That task belongs to the empirical sciences. Logic concerns itself instead with a more austere and more universal question: given that certain statements are true, what other statements must also be true? In Shai Ben-David’s formulation from his CS 245 lectures, logic is about “how to reliably get from assumptions to conclusions.” The physicist may debate whether \(F = ma\) describes reality; the logician asks only whether the implication “if \(x = y \cdot z\) then \(y = x/z\)” is valid, independent of what \(x\), \(y\), and \(z\) stand for.

This distinction between the truth of premises and the validity of reasoning is the engine of the entire subject. It allows logic to serve as the common foundation of mathematics, computer science, and philosophy simultaneously.

Socrates is mortal. Probably the most famous logical argument runs as follows: all men are mortal; Socrates is a man; therefore Socrates is mortal. The division of labor is instructive. Whether all men are actually mortal is a question for biologists. Whether Socrates was actually a man is a question for historians. The logician's contribution is purely structural: if the two premises are granted, the conclusion is inescapable — and the argument is valid regardless of the actual truth of the premises.

Now contrast this with the superficially similar argument: all crows are black; Tweety is black; therefore Tweety is a crow. Common sense immediately rejects this — Tweety might be a piece of coal. Both arguments share the same grammatical shape, yet one is valid and one is not. Even in simple cases like this, common sense is an unreliable guide. As arguments lengthen and interlock — as in a mathematical proof or a program-correctness argument — the gap between what feels valid and what is actually valid can grow very large.

McDonald's hamburger and eternal happiness. Here is a striking illustration of why informal language cannot serve as the foundation for rigorous reasoning. The following "proof" appears to show that a McDonald's hamburger is better than eternal happiness:
  1. Nothing is better than eternal happiness.
  2. A McDonald's hamburger is better than nothing.
  3. Therefore (by transitivity), a McDonald's hamburger is better than eternal happiness.
The argument looks like a valid chain of transitivity. The fallacy is entirely linguistic: "nothing" in premise (1) is a universal quantifier ("there is no thing better than eternal happiness"), while "nothing" in premise (2) is a noun-phrase ("the state of having nothing at all"). These are two completely different uses of the same word, invisible to any analysis that does not fix the meaning of every symbol precisely. Formal logic eliminates this class of ambiguity by insisting on a syntax in which every symbol has exactly one role. Programming languages already demand this discipline of source code; logic demands the same of mathematical argument.

The course divides naturally into three parts, each capturing a different dimension of this foundational program. Propositional logic examines the simplest possible setting: statements that are either true or false, combined by connectives like “and,” “or,” and “not.” First-order logic extends this to the full expressive power needed to formalize mathematics, adding quantifiers over domains and predicates about elements. Finally, computability theory and Gödel’s incompleteness theorems reveal the fundamental limits of what any formal system can prove — a result of stunning depth that emerged from the very tools of logic itself.

Lewis Carroll's Kitten Puzzle. As a warm-up illustration of what logic can accomplish, consider this chain of propositions due to Lewis Carroll (Charles Dodgson):
  1. No kitten that loves fish is unteachable.
  2. No kitten without a tail will play with a gorilla.
  3. Kittens with whiskers always love fish.
  4. No teachable kitten has green eyes.
  5. No kitten has a tail unless it has whiskers.

What follows about kittens with green eyes? By chaining these statements carefully: a green-eyed kitten is unteachable (by 4), so it doesn’t love fish (by 1 contrapositive), so it has no whiskers (by 3 contrapositive), so it has no tail (by 5 contrapositive), so it will not play with a gorilla (by 2). The conclusion — no kitten with green eyes will play with a gorilla — follows by pure logical reasoning, regardless of what we happen to know about kittens.

This is propositional logic in action: a chain of implications leading to an inescapable conclusion. The goal of the first part of the course is to make this kind of reasoning mathematically precise.

1.2 Propositions and Connectives

A proposition (or declarative sentence) is a statement that has a definite truth value — it is either true or false. “My car is green” and “Socrates is mortal” are propositions. “What time is it?” and “Help!” are not. Atomic propositions cannot be decomposed further; they are the primitive building blocks. Compound propositions are built from atomic ones using logical connectives.

There are five standard connectives in propositional logic. Each takes one or two propositions as inputs and produces a new proposition.

The five logical connectives of propositional logic are:
  • Negation \(\neg\): "not \(\varphi\)." The negation \(\neg\varphi\) is true when \(\varphi\) is false, and false when \(\varphi\) is true.
  • Conjunction \(\land\): "\(\varphi\) and \(\psi\)." The conjunction \((\varphi \land \psi)\) is true precisely when both \(\varphi\) and \(\psi\) are true.
  • Disjunction \(\lor\): "\(\varphi\) or \(\psi\)." The disjunction \((\varphi \lor \psi)\) is true when at least one of \(\varphi\), \(\psi\) is true (inclusive or).
  • Implication \(\to\): "if \(\varphi\) then \(\psi\)." The conditional \((\varphi \to \psi)\) is false only when \(\varphi\) is true and \(\psi\) is false.
  • Biconditional \(\leftrightarrow\): "\(\varphi\) if and only if \(\psi\)." The biconditional \((\varphi \leftrightarrow \psi)\) is true when \(\varphi\) and \(\psi\) have the same truth value.

The set of connectives \(\{\neg, \land, \lor, \to, \leftrightarrow\}\) was placed on a rigorous algebraic footing by George Boole in his 1854 treatise The Laws of Thought, which introduced what is now called Boolean algebra. Boole’s insight was that logical reasoning about propositions could be treated as arithmetic over a two-element system \(\{0,1\}\). The connectives we use today descend directly from his formalization, though subsequent logicians (Frege, Hilbert, Gentzen) refined the framework considerably.

Historical sketch. The ancient Greeks — Aristotle in particular — developed a rich theory of syllogistic reasoning ("all men are mortal; Socrates is a man; therefore Socrates is mortal"). But this was not yet formal logic in the modern sense: Aristotle's logic was tied to specific patterns of inference, not to a general symbolic calculus.

The modern era of formal logic begins with Leibniz (1646–1716), who dreamed of a calculus ratiocinator — a symbolic calculus in which all correct reasoning could be mechanically verified. He did not fully realize this dream, but his vision shaped the next two centuries of work.

Boole (1815–1864) was the first to successfully algebraize logic. His 1847 booklet The Mathematical Analysis of Logic and the later Laws of Thought showed that propositions could be treated as elements of an algebraic system where multiplication corresponds to conjunction, addition to symmetric difference, and 1 to “true.” Boole’s work was limited — he restricted himself to propositions and did not handle quantifiers — but it established the algebraic tradition in logic.

Frege (1848–1925) made the decisive step in his 1879 Begriffsschrift (Concept Script): the first complete formalization of predicate logic, with full treatment of quantifiers and relations. Frege’s notation was eccentric (two-dimensional diagrams rather than linear formulas) and was not widely adopted, but his conceptual innovations — the distinction between syntax and semantics, the formalization of inference rules — are the direct ancestors of the system studied in this course.

Hilbert (1862–1943) brought rigor and program: his Grundlagen der Geometrie (1899) axiomatized Euclidean geometry with the precision Euclid lacked, and his formalization program (circa 1922) called for a complete and consistent axiomatization of all mathematics. The development of formal proof systems (Hilbert-style axioms by Hilbert and Bernays; natural deduction by Gentzen in 1935) was driven in large part by this program.

Gentzen’s natural deduction — the proof system used in this course — was introduced in his 1935 paper “Untersuchungen über das logische Schließen” (Investigations into Logical Deduction). Gentzen’s key insight was that every proof could be put into a normal form (the “cut-elimination theorem”) in which every formula is at most as complex as the conclusion. This normalization theorem is the cornerstone of proof theory and has direct computational interpretations (the Curry–Howard correspondence identifies proofs with programs).

The behavior of the implication connective \(\to\) is often counterintuitive at first. In everyday language, “if it rains then I will bring an umbrella” suggests a causal connection. In logic, implication is purely about truth values: \((\varphi \to \psi)\) is false only when the premise \(\varphi\) is true but the conclusion \(\psi\) fails. In particular, a false premise makes the implication vacuously true, regardless of \(\psi\). This material conditional does not encode causation — only the truth-functional relationship.

The implication as a promise. Ben-David offers a helpful framing: read \((\varphi \to \psi)\) as a promise — "if \(\varphi\) holds, I guarantee \(\psi\)." The promise is violated only when \(\varphi\) is true and \(\psi\) is false. If \(\varphi\) never happens, the promise was never tested, so it counts as kept. This is why a false antecedent makes the whole conditional true: you cannot be held responsible for a promise under conditions that never arose.

As a concrete example, let \(p\) stand for “today is Monday” and \(q\) for “the moon is made of cheese.” Under this interpretation, the formula \((p \to q)\) — “if today is Monday then the moon is made of cheese” — is true on six out of seven days of the week. It is only false on Mondays, when \(p\) is true and \(q\) is false. The formula has nothing to do with any causal connection between days of the week and lunar composition; it merely records what happens when truth values are combined by the conditional connective. The difference between the logical conditional and the English “implies” is one of the most important things to internalize in this course.

A surprising tautology. For any two propositions \(p\) and \(q\), the formula \((p \to q) \lor (q \to p)\) — "either \(p\) implies \(q\) or \(q\) implies \(p\)" — is a tautology. You can verify this by checking the four rows of the truth table: when \(p\) and \(q\) are both true, both conditionals are true; when \(p\) is true and \(q\) is false, \((q \to p)\) is true (false antecedent); when \(p\) is false and \(q\) is true, \((p \to q)\) is true (false antecedent); when both are false, both conditionals are true. In every row the disjunction holds.

In ordinary English, of course, we can easily find two statements neither of which implies the other — “today is Thursday” and “the Earth orbits the Sun” have no implicational relationship. The reason the propositional formula is a tautology is that propositional logic only cares about truth values, not about the content of statements. If you only know whether \(p\) and \(q\) are true or false, you cannot conclude that neither conditional holds, because at least one of the four truth-value combinations forces a conditional to be vacuously true.

1.3 The Formal Language \(\mathcal{L}^p(\sigma)\)

Informal descriptions of “propositions” and “connectives” are sufficient for puzzles and natural-language arguments. But to prove theorems about logic — to establish soundness, completeness, and incompleteness — we need a mathematically precise definition of what counts as a valid formula. This requires treating formulas as strings over a formal alphabet.

Fix a countably infinite set \(\sigma = \{p_1, p_2, p_3, \ldots\}\) of propositional variables (also called atoms). The alphabet of propositional logic over \(\sigma\) consists of the variables in \(\sigma\), the five connective symbols \(\neg, \land, \lor, \to, \leftrightarrow\), and the parentheses \((\ )\).

The language of propositional logic \(\mathcal{L}^p(\sigma)\) is the smallest set of strings over this alphabet satisfying:
  1. Every single propositional variable \(p_i \in \sigma\) belongs to \(\mathcal{L}^p(\sigma)\).
  2. If \(\varphi \in \mathcal{L}^p(\sigma)\), then \(\neg\varphi \in \mathcal{L}^p(\sigma)\).
  3. If \(\varphi, \psi \in \mathcal{L}^p(\sigma)\), then \((\varphi \land \psi)\), \((\varphi \lor \psi)\), \((\varphi \to \psi)\), and \((\varphi \leftrightarrow \psi)\) all belong to \(\mathcal{L}^p(\sigma)\).
Elements of \(\mathcal{L}^p(\sigma)\) are called formulas. A formula consisting of a single propositional variable is atomic; all others are compound.

The “smallest set” condition means \(\mathcal{L}^p(\sigma)\) is defined by these three rules and nothing else: every formula can be built up from variables by finitely many applications of rules (2) and (3). This is an inductive definition, and it will be the basis for inductive proofs throughout the course.

Inductive definitions: the IAP construction. The definition of \(\mathcal{L}^p(\sigma)\) as the "smallest set" satisfying certain closure properties is an instance of a general technique for defining infinite sets. The pattern has three components:
  1. A domain \(X\) — the universe in which everything lives. For propositional logic, \(X\) is the set of all finite strings over the alphabet.
  2. A core set \(A \subseteq X\) — the elements we know for certain belong. For PL, \(A\) is the set of all single propositional variables.
  3. A finite set of operations \(P\) — functions that take elements of \(X\) and produce new elements. For PL, \(P\) consists of the four formation rules (negation and the three binary connectives).
The set \(I(A,P)\) is then defined as the smallest subset of \(X\) that contains \(A\) and is closed under every operation in \(P\). This set always exists and is unique: it is the intersection of all subsets of \(X\) that contain \(A\) and are closed under \(P\). Any set with those two properties is already a superset of the intersection, so the intersection is indeed minimal.

Two everyday examples of the same pattern: (1) The set of even natural numbers is \(I(\{0\}, \{x \mapsto x+2\})\) — start with 0, close under “add 2.” (2) The set of all your blood relatives is \(I(\{\text{yourself}\}, \{\text{child-of}, \text{parent-of}, \text{sibling-of}, \ldots\})\) — start with yourself, close under family operations. In both cases, the “smallest set” framing avoids any circularity and gives a precise, checkable definition.

To show that some specific expression \(x\) belongs to \(I(A,P)\), it suffices to exhibit a construction sequence: a finite list \(\alpha_1, \alpha_2, \ldots, \alpha_n = x\) where each \(\alpha_i\) is either in \(A\) or is the result of applying some operation in \(P\) to earlier elements in the list. Such a sequence is a certificate of membership — and for propositional logic, it is exactly a proof that the expression is a well-formed formula.

It is instructive to trace a specific example. The string \(((p_1 \to p_2) \land \neg p_3)\) is in \(\mathcal{L}^p(\sigma)\) because: \(p_1\) and \(p_2\) are formulas (rule 1), so \((p_1 \to p_2)\) is a formula (rule 3); \(p_3\) is a formula, so \(\neg p_3\) is a formula (rule 2); hence \(((p_1 \to p_2) \land \neg p_3)\) is a formula (rule 3). On the other hand, the string \(p_1 \land p_2\) (without outer parentheses) is not in \(\mathcal{L}^p(\sigma)\) as defined — the rules require parentheses around every binary operation.

Precedence conventions. The strict parenthesization of \(\mathcal{L}^p(\sigma)\) is mathematically clean but typographically cumbersome. In practice, we adopt precedence rules to drop redundant parentheses: \(\neg\) binds most tightly, then \(\land\) and \(\lor\), then \(\to\), then \(\leftrightarrow\). The implication \(\to\) is right-associative: \(\varphi \to \psi \to \theta\) means \(\varphi \to (\psi \to \theta)\). When we write \(p \to q \land r\), we mean the fully parenthesized formula \((p \to (q \land r))\). These conventions are shorthand only; all formal definitions and proofs refer to the fully parenthesized language.

1.4 Parse Trees and the Unique Parsing Theorem

Every formula in \(\mathcal{L}^p(\sigma)\) has a natural tree representation. The parse tree of a formula records the hierarchical structure of how it was built: each internal node is labeled by a connective, each leaf is a propositional variable, and the children of a binary-connective node are the parse trees of its two subformulas.

Parse trees make the structure of a formula visually immediate. For instance, the formula \(((p \to q) \land \neg r)\) has a root labeled \(\land\), a left child labeled \(\to\) (whose own children are \(p\) and \(q\)), and a right child labeled \(\neg\) (whose child is \(r\)).

The critical fact underlying all of propositional logic is that parse trees are unique: every formula has exactly one parse tree, and hence a unique main connective and unique immediate subformulas. This is not obvious from the definition — in principle, the same string might be parseable in multiple ways, as happens with ambiguous context-free grammars. The Unique Parsing Theorem rules this out.

The proof rests on two structural lemmas about formulas as strings.

Lemma 1 (Balanced parentheses). Every formula \(\varphi \in \mathcal{L}^p(\sigma)\) contains the same number of left parentheses and right parentheses.

The proof is a straightforward induction on the construction of \(\varphi\): atomic formulas have zero of each (balanced); negations preserve any balance they already have; binary combinations add exactly one \((\) and one \()\) around an already-balanced pair.

Lemma 2 (Prefix imbalance). For every formula \(\varphi \in \mathcal{L}^p(\sigma)\) and every proper prefix \(\alpha\) of \(\varphi\) (a nonempty initial segment of \(\varphi\) that is not all of \(\varphi\)), the string \(\alpha\) contains strictly more left parentheses than right parentheses.

Lemma 2 is the key. It says that inside a formula, the left-parenthesis count never dips to meet the right-parenthesis count until the very end of the string. Combined with Lemma 1 (balance holds at the end), this gives a unique location for the main binary connective: it is the connective that appears at the point where the running excess of \((\) over \()\) drops to exactly 1. No proper prefix can be a complete formula.

Locating the main connective mechanically. Consider the formula \(\varphi = ((p \to q) \lor (q \to p))\). To find its main connective, scan from left to right and track the running count of left parentheses minus right parentheses, ignoring the very first character (the outermost left parenthesis). Starting from the second character:

After reading \((p \to q)\), the running count (ignoring the outer bracket) returns to 0 for the first time. The next symbol, \(\lor\), is therefore the main connective. Everything to its left — \((p \to q)\) — is the left subformula \(\psi\), and everything to its right — \((q \to p)\) — is the right subformula \(\theta\).

No ambiguity is possible: the prefix imbalance lemma guarantees the running count stays positive inside any proper subformula, so this first-zero point is unique. The abundance of parentheses in \(\mathcal{L}^p(\sigma)\) is a deliberate design choice — it buys unique readability at the cost of verbosity.

Unique Parsing Theorem. Every formula \(\varphi \in \mathcal{L}^p(\sigma)\) satisfies exactly one of the following:
  1. \(\varphi = p_i\) for some \(p_i \in \sigma\);
  2. \(\varphi = \neg\psi\) for a unique formula \(\psi \in \mathcal{L}^p(\sigma)\);
  3. \(\varphi = (\psi \odot \theta)\) for a unique binary connective \(\odot \in \{\land,\lor,\to,\leftrightarrow\}\) and unique formulas \(\psi, \theta \in \mathcal{L}^p(\sigma)\).

The uniqueness of case (3) follows from Lemma 2: the main binary connective is uniquely located (the first connective that appears after the running \((\)-excess has dropped to 1), and once the connective is located, \(\psi\) and \(\theta\) are uniquely determined.

The Unique Parsing Theorem is the foundation on which all semantic and proof-theoretic definitions rest. Because every formula has a unique structure, definitions and proofs by induction on formulas are well-founded: there is no ambiguity about which case applies at each step.

Generalized Induction Principle. Let \(I(A,P)\) be inductively defined as above. Suppose a property \(T\) (thought of as a subset of \(X\)) satisfies:
  1. Base case: every element of \(A\) satisfies \(T\).
  2. Inductive step: if \(\alpha_1, \ldots, \alpha_k\) all satisfy \(T\) and \(f \in P\) is a \(k\)-ary operation, then \(f(\alpha_1, \ldots, \alpha_k)\) also satisfies \(T\).
Then every element of \(I(A,P)\) satisfies \(T\).

The proof is a one-liner: the set of elements satisfying \(T\) contains \(A\) and is closed under \(P\), so it is one of the sets whose intersection defines \(I(A,P)\). Since \(I(A,P)\) is contained in every such set, it is contained in \(T\).

Ordinary induction as a special case. The familiar mathematical induction on natural numbers is exactly the Generalized Induction Principle applied to the structure \(I(\{0\}, \{n \mapsto n+1\})\). The base case "prove the property for 0" corresponds to the base case for the core set \(A = \{0\}\). The inductive step "if the property holds for \(n\), it holds for \(n+1\)" corresponds to closure under the single operation "add 1." Since \(I(\{0\}, \{n \mapsto n+1\}) = \mathbb{N}\), ordinary induction is a special case.

For propositional logic, we apply the same principle to \(I(A, P)\) where \(A\) is the set of propositional variables and \(P\) is the set of four formation operations. The “induction basis” is showing the property for all propositional variables, and the “inductive step” has four cases — one for each of the four operations. Every proof in this course that establishes something for all formulas has this structure.

Proving that \(p_1 p_2\) is not a well-formed formula. The string \(p_1 p_2\) (two variables written side by side) is not in \(\mathcal{L}^p(\sigma)\). To prove it, we use the Generalized Induction Principle to show that every well-formed formula either is a single propositional variable or begins with a left parenthesis \((\). We call this property \(T\).

Base case: Every propositional variable is a single character, so it satisfies “is a single variable.” \(\checkmark\)

Inductive step: Each of the four formation operations wraps its result in an outer pair of parentheses (e.g., \((\varphi \lor \psi)\)) or prepends a \(\neg\) symbol — in either case the output starts with \((\) or \(\neg\), not with two juxtaposed variables. \(\checkmark\)

By the Generalized Induction Principle, every element of \(\mathcal{L}^p(\sigma)\) satisfies \(T\). But \(p_1 p_2\) does not — it is two variables with no bracket or connective between them — so \(p_1 p_2 \notin \mathcal{L}^p(\sigma)\).


Chapter 2: Truth and Semantics

2.1 Truth Assignments

A formula in \(\mathcal{L}^p(\sigma)\) is a syntactic object — a string of symbols with a parse tree. To give it meaning, we need to assign truth values to its propositional variables and then compute the truth value of the formula as a whole.

A truth assignment is a function \(v: \sigma \to \{\mathbf{T}, \mathbf{F}\}\) that assigns to each propositional variable a truth value.

Given a truth assignment \(v\), the Unique Parsing Theorem allows us to extend \(v\) recursively to a function \(v^*: \mathcal{L}^p(\sigma) \to \{\mathbf{T}, \mathbf{F}\}\) that evaluates every formula. The extension follows the structure of the formula exactly.

The evaluation function \(v^*: \mathcal{L}^p(\sigma) \to \{\mathbf{T}, \mathbf{F}\}\) induced by a truth assignment \(v\) is defined recursively:
  • If \(\varphi = p_i\) is atomic, then \(v^*(\varphi) = v(p_i)\).
  • If \(\varphi = \neg\psi\), then \(v^*(\varphi) = \mathbf{T}\) iff \(v^*(\psi) = \mathbf{F}\).
  • If \(\varphi = (\psi \land \theta)\), then \(v^*(\varphi) = \mathbf{T}\) iff both \(v^*(\psi) = \mathbf{T}\) and \(v^*(\theta) = \mathbf{T}\).
  • If \(\varphi = (\psi \lor \theta)\), then \(v^*(\varphi) = \mathbf{T}\) iff at least one of \(v^*(\psi)\), \(v^*(\theta)\) is \(\mathbf{T}\).
  • If \(\varphi = (\psi \to \theta)\), then \(v^*(\varphi) = \mathbf{F}\) iff \(v^*(\psi) = \mathbf{T}\) and \(v^*(\theta) = \mathbf{F}\).
  • If \(\varphi = (\psi \leftrightarrow \theta)\), then \(v^*(\varphi) = \mathbf{T}\) iff \(v^*(\psi) = v^*(\theta)\).

The well-definedness of \(v^*\) depends directly on the Unique Parsing Theorem: since each formula has a unique parse, the recursive case that applies is uniquely determined, and the recursion terminates because each subformula is strictly shorter than the original.

The evaluation function assigns truth values compositionally — the truth value of a compound formula depends only on the truth values of its immediate subformulas, not on their internal structure. This compositionality is what makes truth tables feasible: a formula with \(n\) distinct variables has \(2^n\) possible truth assignments, and the truth value under each can be computed row by row.

2.2 Tautologies, Satisfiability, and Contradictions

Once we have a systematic way to evaluate formulas, we can classify them by how their truth values behave across all possible truth assignments.

Let \(\varphi \in \mathcal{L}^p(\sigma)\).
  • A truth assignment \(v\) satisfies \(\varphi\) if \(v^*(\varphi) = \mathbf{T}\). Such a \(v\) is a satisfying assignment for \(\varphi\).
  • \(\varphi\) is satisfiable if it has at least one satisfying assignment.
  • \(\varphi\) is a tautology (written \(\models \varphi\)) if every truth assignment satisfies it — that is, \(v^*(\varphi) = \mathbf{T}\) for all \(v\).
  • \(\varphi\) is a contradiction if no truth assignment satisfies it — it is false under every assignment.
  • \(\varphi\) is a contingency if it is satisfiable but not a tautology — some assignments make it true and others make it false.

Tautologies represent logical truths — formulas that are true regardless of what the variables stand for. The formula \((p \lor \neg p)\) is a tautology (the law of excluded middle). So is \((p \to p)\). Contradictions are their negations: \((p \land \neg p)\) is always false. Most interesting formulas in practice are contingencies.

An important operational criterion: \(\neg\varphi\) is a contradiction if and only if \(\varphi\) is a tautology. This symmetry means that verifying a tautology is equivalent to showing that its negation has no satisfying assignment — a fact that will be exploited repeatedly.

Truth table for \((p \to q)\). The conditional has truth table:
\(p\)\(q\)\(p \to q\)
TTT
TFF
FTT
FFT

The only row where \(p \to q\) is false is when \(p\) is true and \(q\) is false. In particular, when the premise \(p\) is false, the conditional is vacuously true — a feature that corresponds to the mathematical convention that “if \(n\) is a largest prime, then pigs can fly” is true, because there is no largest prime to serve as a false-making premise.

Evaluating a formula on all assignments. Let us compute the truth value of \(\varphi = (p \to q) \lor (q \to p)\) under all four truth assignments to \(p\) and \(q\). The method is to build the evaluation bottom-up following the parse tree: \(\varphi\) is a disjunction of \((p \to q)\) and \((q \to p)\), each of which is an implication of atoms.
\(p\)\(q\)\(p \to q\)\(q \to p\)\(\varphi\)
TTTTT
TFFTT
FTTFT
FFTTT

The formula is a tautology: for any two statements, at least one implies the other in the sense of the material conditional. In the \((T,F)\) row, \(q \to p\) is true because its antecedent \(q\) is false; in the \((F,T)\) row, \(p \to q\) is true because its antecedent \(p\) is false. This result would be false in ordinary English — “today is Thursday” and “the Earth orbits the Sun” have no implicational connection — but in propositional logic, which only tracks truth values, it holds universally.

2.3 Logical Equivalence

Two formulas that always have the same truth value, no matter what the variables are assigned, are interchangeable in any logical context. This idea is formalized as logical equivalence.

Formulas \(\varphi\) and \(\psi\) are logically equivalent, written \(\varphi \equiv \psi\), if \(v^*(\varphi) = v^*(\psi)\) for every truth assignment \(v\). Equivalently, \(\varphi \equiv \psi\) iff \((\varphi \leftrightarrow \psi)\) is a tautology.
\[ \neg(\varphi \land \psi) \equiv (\neg\varphi \lor \neg\psi), \qquad \neg(\varphi \lor \psi) \equiv (\neg\varphi \land \neg\psi). \]

These say that negation distributes over conjunction and disjunction by flipping the connective. Other key equivalences include double negation \(\neg\neg\varphi \equiv \varphi\), the contrapositive \((\varphi \to \psi) \equiv (\neg\psi \to \neg\varphi)\), and the material equivalence \((\varphi \to \psi) \equiv (\neg\varphi \lor \psi)\).

Logical equivalence is an equivalence relation on formulas, and it respects the structure of compound formulas: if \(\varphi \equiv \varphi'\), then replacing \(\varphi\) by \(\varphi'\) anywhere inside a larger formula produces a logically equivalent result. This substitution property is what allows us to simplify complex formulas systematically.

The Wason Selection Task. Here is a striking example of how human intuition about conditionals diverges from their logical semantics. You are shown four cards, each of which has a letter on one side and a number on the other. The visible faces show: A, D, 1, 5. The rule to check is: "If a card has A on one side, it has 5 on the other."

Which cards must you turn over to verify the rule?

Most people say A and 5. The logically correct answer is A and 1. Turning over A checks whether A is paired with something other than 5 (the only way the rule fails). Turning over 1 checks whether 1 is paired with A (if so, A→5 is violated). Turning over D is irrelevant (D≠A, so the rule is vacuously satisfied for that card). Turning over 5 is also irrelevant: 5 paired with A satisfies A→5, and 5 paired with D also does not violate the rule.

The task reveals that people naturally interpret “if A then 5” as a biconditional (“A if and only if 5”) rather than as a one-directional conditional. The logical behavior of \(\to\) — asymmetric, with vacuous truth when the antecedent is false — runs counter to everyday causal reasoning.

2.4 Logical Consequence

The central notion of propositional semantics is not about individual formulas but about the relationship between a set of premises and a conclusion.

Let \(\Gamma\) be a set of formulas and \(\varphi\) a formula. We say \(\varphi\) is a logical consequence of \(\Gamma\), written \(\Gamma \models \varphi\), if every truth assignment that satisfies every formula in \(\Gamma\) also satisfies \(\varphi\).

A logical argument with premises \(\varphi_1, \ldots, \varphi_n\) and conclusion \(\psi\) is valid if \(\{\varphi_1,\ldots,\varphi_n\} \models \psi\).

When \(\Gamma = \emptyset\), the condition “every assignment satisfying all formulas in \(\Gamma\)” is vacuously satisfied by every assignment. So \(\emptyset \models \varphi\) iff \(\varphi\) is a tautology — consistent with writing \(\models \varphi\) for tautologies.

Several basic properties of \(\models\) follow immediately from the definition. Monotonicity holds: if \(\Gamma \models \varphi\) and \(\Gamma \subseteq \Gamma'\), then \(\Gamma' \models \varphi\) (adding premises cannot destroy a valid consequence). Transitivity also holds: if \(\Gamma \models \varphi_i\) for each \(\varphi_i\) in some set \(\Delta\), and \(\Delta \models \psi\), then \(\Gamma \models \psi\).

For finite premise sets, logical consequence reduces to a tautology check:

Proposition. For formulas \(\varphi_1, \ldots, \varphi_n, \psi\), we have \(\{\varphi_1,\ldots,\varphi_n\} \models \psi\) if and only if \(\models ((\cdots((\varphi_1 \land \varphi_2) \land \varphi_3) \cdots \land \varphi_n) \to \psi)\).
Let \(\theta = ((\cdots(\varphi_1 \land \varphi_2) \land \cdots) \land \varphi_n)\). A truth assignment satisfies \(\theta\) exactly when it satisfies all of \(\varphi_1,\ldots,\varphi_n\). So \(\{\varphi_1,\ldots,\varphi_n\} \models \psi\) means: every assignment satisfying \(\theta\) also satisfies \(\psi\), which is precisely the condition that \((\theta \to \psi)\) is a tautology.

This reduction to tautology-checking is theoretically clean but computationally expensive: a formula with \(n\) variables requires checking \(2^n\) rows. For \(n = 100\), this is astronomically large. The fact that there is no known efficient algorithm for checking propositional tautologies is essentially the content of the \(\mathbf{P} \neq \mathbf{NP}\) conjecture. This is one reason why the formal proof system in the next chapter — which operates purely syntactically — is valuable: it can certify that a formula is a tautology by producing a short derivation, without exhaustively checking all truth assignments.

2.5 Normal Forms

Every propositional formula can be transformed into a standardized form that makes its structure easier to analyze. Two particularly important normal forms are conjunctive normal form (CNF) and disjunctive normal form (DNF).

A literal is either a propositional variable \(p_i\) or its negation \(\neg p_i\). A clause is a disjunction of literals. A formula is in conjunctive normal form (CNF) if it is a conjunction of clauses: \((\ell_{11} \lor \cdots \lor \ell_{1k_1}) \land \cdots \land (\ell_{n1} \lor \cdots \lor \ell_{nk_n})\). A formula is in disjunctive normal form (DNF) if it is a disjunction of conjunctions of literals.

Every formula is logically equivalent to some formula in CNF and to some formula in DNF.

The proof is constructive: apply De Morgan’s laws to push negations inward to literals, then distribute \(\lor\) over \(\land\) (for CNF) or distribute \(\land\) over \(\lor\) (for DNF). The algorithm terminates because each step strictly reduces the number of connectives not in the target normal form.

Normal forms are important practically because:

  • SAT checking: To test satisfiability of a CNF formula, it suffices to find an assignment satisfying every clause simultaneously. SAT solvers (used in automated verification) work natively with CNF.
  • Resolution: The resolution rule operates on clauses: from \((\varphi \lor r)\) and \((\neg r \lor \psi)\), derive \((\varphi \lor \psi)\). This eliminates the variable \(r\) from the clause set. Resolution is complete for unsatisfiability: a CNF formula is unsatisfiable if and only if the empty clause \(\bot\) can be derived by a sequence of resolution steps. Resolution underpins automated theorem proving and logic programming (Prolog).
CNF conversion. Convert \(\neg(p \to q)\) to CNF. Since \((p \to q) \equiv (\neg p \lor q)\), we have \(\neg(p \to q) \equiv \neg(\neg p \lor q) \equiv (p \land \neg q)\). This is already in CNF: the conjunction of the two unit clauses \(p\) and \(\neg q\). So the formula is satisfiable only when \(p\) is true and \(q\) is false.
Some basic consequences. The following are valid:
  • \(\{(\varphi \land \psi)\} \models \varphi\) and \(\{(\varphi \land \psi)\} \models \psi\): a conjunction implies each conjunct.
  • \(\{\varphi, \psi\} \models (\varphi \land \psi)\): if both are true, so is their conjunction.
  • \(\{(\varphi \to \psi), \varphi\} \models \psi\): modus ponens, the most fundamental inference rule.
  • \(\{\neg(\varphi \lor \psi)\} \models (\neg\varphi \land \neg\psi)\): De Morgan's law as a consequence relation.
A more subtle example: if \(\Gamma\) contains a contradiction \((\varphi \land \neg\varphi)\), then \(\Gamma \models \psi\) for any formula \(\psi\) — since no truth assignment satisfies a contradiction, the condition "every satisfying assignment for \(\Gamma\) also satisfies \(\psi\)" is vacuously true. This is the principle of ex falso quodlibet: from a contradiction, anything follows.
Russell proves the Pope is infallible. Bertrand Russell reportedly illustrated ex falso quodlibet to a skeptical audience as follows. An audience member challenged: "If 2 + 2 = 5, prove that you are the Pope." Russell accepted. His argument: (1) Assume \(2 + 2 = 5\). (2) We also know \(2 + 2 = 4\). (3) Therefore \(4 = 5\). (4) Subtracting 3 from both sides, \(1 = 2\). (5) The Pope and I are two people. (6) Since \(1 = 2\), the Pope and I are one person. (7) Therefore I am the Pope.

The derivation is formally valid given the false premise. This illustrates in a memorable way why inconsistent sets of assumptions are dangerous: if your premises contain a contradiction, you can formally “prove” anything — the logical machinery offers no protection.

Checking a tautology by chasing arrows. Consider \(\varphi = (A \to (B \to C)) \to ((A \to B) \to (A \to C))\). Rather than drawing the full truth table (which has \(2^3 = 8\) rows), we can check this is a tautology more efficiently. Since the main connective is \(\to\), the only way \(\varphi\) could be false is if the antecedent \((A \to (B \to C))\) is true and the consequent \(((A \to B) \to (A \to C))\) is false. For the consequent to be false, \((A \to B)\) must be true and \((A \to C)\) must be false. For \((A \to C)\) to be false, \(A\) must be true and \(C\) must be false. Now we have \(A = \mathbf{T}\), \(C = \mathbf{F}\), \((A \to B) = \mathbf{T}\) (with \(A = \mathbf{T}\)), so \(B\) must be true. But then \((B \to C)\) with \(B = \mathbf{T}\) and \(C = \mathbf{F}\) gives \((B \to C) = \mathbf{F}\), so the antecedent \((A \to (B \to C))\) with \(A = \mathbf{T}\) and \((B \to C) = \mathbf{F}\) is false — contradicting our assumption that the antecedent is true. So no assignment makes \(\varphi\) false, and it is a tautology. (This formula is in fact the \(\to\)-only version of the distributivity law, and it is one of the axioms of the Hilbert-style proof system.)

Chapter 3: Formal Deduction

3.1 From Semantics to Syntax

The semantic notion of logical consequence (\(\Gamma \models \varphi\)) is mathematically precise, but verifying it by checking all truth assignments is computationally intractable. Moreover, semantics cannot by itself give us a certificate that \(\Gamma \models \varphi\) holds — only an exhaustive check, not a concise proof.

The syntactic approach solves both problems. We define a formal proof system — a set of mechanical rules for deriving formulas from other formulas — and use it to produce explicit derivations. A derivation is a finite sequence of steps, each one justified by a rule, leading from premises to conclusion. Such a derivation is a certificate of validity that can be checked mechanically without understanding what the formulas mean.

The formal provability relation \(\Gamma \vdash \varphi\) asserts that there exists a formal proof deriving \(\varphi\) from premises in \(\Gamma\). The central theorems of propositional logic — soundness and completeness — will then establish that \(\Gamma \vdash \varphi\) and \(\Gamma \models \varphi\) are precisely equivalent: the syntactic and semantic notions of consequence coincide.

The proof system used throughout this course is natural deduction, introduced by Gerhard Gentzen in 1935. Natural deduction is designed to mirror the informal reasoning patterns that mathematicians actually use: introduction rules explain how to establish a connective, and elimination rules explain how to use a formula once it has been established.

Why introduction and elimination? The design principle is elegant: every connective gets exactly two kinds of rules. Introduction rules (\(\land\)I, \(\lor\)I, \(\to\)I, \(\neg\)I, \(\leftrightarrow\)I) answer the question: "how do I prove a formula with this connective as its main operator?" Elimination rules (\(\land\)E, \(\lor\)E, \(\to\)E, \(\neg\)E, \(\leftrightarrow\)E) answer the dual question: "given a formula with this main connective, what can I extract from it?"

This intro/elim symmetry is not merely aesthetics. In practice, it gives a clear proof strategy: when the goal is \((\varphi \land \psi)\), look for the introduction rule for \(\land\) — you need to prove \(\varphi\) and \(\psi\) separately, then apply \(\land\)I. When you have \((\varphi \land \psi)\) as a premise, look for the elimination rule for \(\land\) — you can extract \(\varphi\) or \(\psi\) at will. Every proof step answers one of these two questions.

The symmetry also has deep theoretical significance: it is precisely what Gentzen exploited to prove his normalization theorem, which says that any proof can be simplified so that no formula is introduced only to be immediately eliminated — the logical equivalent of never making a detour.

3.2 The Natural Deduction Rules

A formal proof (or derivation) in natural deduction is a finite, numbered sequence of lines, where each line contains a formula and a justification. A subproof is a consecutive block of lines beginning with an assumption (marked AS) that is temporarily introduced and later discharged. Lines within a subproof can only be referenced within or before that subproof; once the subproof ends, only the subproof as a whole can be cited.

The key definition of formal provability:

Formal provability. We say \(\Gamma \vdash \varphi\) (the set \(\Gamma\) formally implies \(\varphi\)) if there exists a formal proof — a finite sequence of lines, each justified by one of the rules below — such that all premise introductions cite formulas in \(\Gamma\), and the final line is \(\varphi\) outside any open subproof.

The rules are organized by connective. We list each rule with its name and notation.

Structural rules:

  • Premise (PR): Any formula \(\varphi \in \Gamma\) may be introduced at any point.
  • Reiteration (R \(m\)): Any formula appearing on line \(m\) may be repeated at a later line, provided line \(m\) is accessible (not inside a closed subproof).

Conjunction rules:

  • Conjunction Introduction (\(\land\)I \(k, l\)): From \(\varphi\) on line \(k\) and \(\psi\) on line \(l\), derive \((\varphi \land \psi)\).
  • Conjunction Elimination (\(\land\)E \(m\)): From \((\varphi \land \psi)\) on line \(m\), derive either \(\varphi\) or \(\psi\).

Disjunction rules:

  • Disjunction Introduction (\(\lor\)I \(m\)): From \(\varphi\) on line \(m\), derive \((\varphi \lor \psi)\) or \((\psi \lor \varphi)\) for any formula \(\psi\).
  • Disjunction Elimination (\(\lor\)E \(i, j\text{-}k, l\text{-}m\)): From \((\varphi \lor \psi)\) on line \(i\), a subproof from assumption \(\varphi\) concluding \(\theta\) (lines \(j\)–\(k\)), and a subproof from assumption \(\psi\) concluding \(\theta\) (lines \(l\)–\(m\)), derive \(\theta\).

Negation and contradiction rules:

  • Negation Elimination (\(\neg\)E \(k, l\)): From \(\varphi\) on line \(k\) and \(\neg\varphi\) on line \(l\), derive \(\bot\).
  • Negation Introduction (\(\neg\)I \(k\text{-}l\)): From a subproof that assumes \(\varphi\) and derives \(\bot\), derive \(\neg\varphi\).
  • Contradiction Elimination (\(\bot\)E \(m\)): From \(\bot\) on line \(m\), derive any formula \(\varphi\).
  • Reductio ad Absurdum (RAA \(k\text{-}l\)): From a subproof that assumes \(\neg\varphi\) and derives \(\bot\), derive \(\varphi\).

Conditional rules:

  • Conditional Elimination (\(\to\)E \(k, l\)) [modus ponens]: From \((\varphi \to \psi)\) on line \(k\) and \(\varphi\) on line \(l\), derive \(\psi\).
  • Conditional Introduction (\(\to\)I \(k\text{-}l\)): From a subproof that assumes \(\varphi\) and derives \(\psi\), derive \((\varphi \to \psi)\).

Biconditional rules:

  • Biconditional Introduction (\(\leftrightarrow\)I \(k, l\)): From \((\varphi \to \psi)\) on line \(k\) and \((\psi \to \varphi)\) on line \(l\), derive \((\varphi \leftrightarrow \psi)\).
  • Biconditional Elimination (\(\leftrightarrow\)E \(m\)): From \((\varphi \leftrightarrow \psi)\) on line \(m\), derive either \((\varphi \to \psi)\) or \((\psi \to \varphi)\).

Altogether there are 14 rules (plus the structural rules PR and R). The introduction/elimination symmetry is characteristic of Gentzen’s natural deduction design: every connective has exactly the rules needed to introduce it when building toward a goal and to use it once established. This symmetry is not merely aesthetic — it is the key to the normalization theorem, which says that any proof can be simplified into a “cut-free” normal form.

3.3 Worked Examples

The best way to internalize the rules is through worked examples. We trace several proofs of increasing complexity, showing each line with its justification.

Distributivity: \(\{(p \land (q \lor r))\} \vdash ((p \land q) \lor (p \land r))\).
LineFormulaJustification
1\((p \land (q \lor r))\)PR
2\(p\)\(\land\)E 1
3\((q \lor r)\)\(\land\)E 1
4\(\quad q\)AS
5\(\quad (p \land q)\)\(\land\)I 2, 4
6\(\quad ((p \land q) \lor (p \land r))\)\(\lor\)I 5
7\(\quad r\)AS
8\(\quad (p \land r)\)\(\land\)I 2, 7
9\(\quad ((p \land q) \lor (p \land r))\)\(\lor\)I 8
10\(((p \land q) \lor (p \land r))\)\(\lor\)E 3, 4–6, 7–9

The proof handles each disjunct of \((q \lor r)\) separately in a subproof, in each case combining the disjunct with \(p\) via \(\land\)I, then introducing the desired disjunction via \(\lor\)I. The \(\lor\)E rule merges the two branches.

Commutativity of conjunction: \(\{(p \land q)\} \vdash (q \land p)\). From \((p \land q)\) extract \(p\) and \(q\) separately via \(\land\)E, then recombine in reversed order via \(\land\)I.
Transitivity of implication: \(\{(p \to q), (q \to r)\} \vdash (p \to r)\). Open a subproof assuming \(p\); apply \(\to\)E twice to get \(r\); close the subproof with \(\to\)I to conclude \((p \to r)\).
Law of Excluded Middle: \(\vdash (p \lor \neg p)\). This requires a clever double application of RAA. Assume \(\neg(p \lor \neg p)\) as the outer assumption. Inside, assume \(p\); then \(\lor\)I gives \((p \lor \neg p)\), contradicting the outer assumption via \(\neg\)E to get \(\bot\). Close the subproof with \(\neg\)I to get \(\neg p\); then \(\lor\)I gives \((p \lor \neg p)\) again, another contradiction with the outer assumption. Close with RAA to get \((p \lor \neg p)\).
De Morgan's Law: \(\{\neg(p \lor q)\} \vdash (\neg p \land \neg q)\). Assume \(p\); \(\lor\)I gives \((p \lor q)\), contradicting \(\neg(p \lor q)\); so \(\neg p\) by \(\neg\)I. Similarly derive \(\neg q\). Then \(\land\)I gives \((\neg p \land \neg q)\).

These examples illustrate a general strategy: to prove an implication \((\varphi \to \psi)\), open a subproof assuming \(\varphi\) and derive \(\psi\). To prove a negation \(\neg\varphi\), assume \(\varphi\) and derive \(\bot\). To prove a disjunction, derive one of the disjuncts and use \(\lor\)I. To prove a conjunction from a disjunction, handle each disjunct in a separate subproof branch via \(\lor\)E.

Derived rules. Several commonly useful inference patterns appear repeatedly enough to name as derived rules. Modus tollens: from \((\varphi \to \psi)\) and \(\neg\psi\), derive \(\neg\varphi\) (contrapositive applied forward). Hypothetical syllogism: from \((\varphi \to \psi)\) and \((\psi \to \theta)\), derive \((\varphi \to \theta)\). Disjunctive syllogism: from \((\varphi \lor \psi)\) and \(\neg\varphi\), derive \(\psi\). These are all derivable in natural deduction and can be used as shorthand in proofs.

3.4 More Worked Proofs

The following proofs develop additional facility with natural deduction. Each illustrates a different aspect of proof strategy.

Double negation (forward): \(\{p\} \vdash \neg\neg p\). To prove \(\neg\neg p\) from \(p\), we need to derive \(\bot\) from the assumption \(\neg p\). Open a subproof: assume \(\neg p\). Apply \(\neg\)E to \(p\) (from PR) and \(\neg p\) (the assumption) to derive \(\bot\). Close the subproof with \(\neg\)I to obtain \(\neg\neg p\).
Double negation (reverse): \(\{\neg\neg p\} \vdash p\). Here RAA is essential. Open a subproof assuming \(\neg p\). Apply \(\neg\)E to \(\neg\neg p\) (from PR) and \(\neg p\) (the assumption) to get \(\bot\). Close with RAA to conclude \(p\). This proof cannot be done without RAA or \(\bot\)E — it uses classical (as opposed to intuitionistic) reasoning.
Conditional exportation: \(\{((p \land q) \to r)\} \vdash (p \to (q \to r))\). The claim says we can "curry" a two-premise implication.
LineFormulaJustification
1\(((p \land q) \to r)\)PR
2\(\quad p\)AS
3\(\quad\quad q\)AS
4\(\quad\quad (p \land q)\)\(\land\)I 2, 3
5\(\quad\quad r\)\(\to\)E 1, 4
6\(\quad (q \to r)\)\(\to\)I 3–5
7\((p \to (q \to r))\)\(\to\)I 2–6

The nested subproofs correspond to partial application: first we assume \(p\), then assume \(q\), derive \(r\), and close outward.

Contrapositive: \(\{(p \to q)\} \vdash (\neg q \to \neg p)\). This is the contrapositive of \((p \to q)\), and it shows that proving the contrapositive is always available as a strategy.
LineFormulaJustification
1\((p \to q)\)PR
2\(\quad \neg q\)AS
3\(\quad\quad p\)AS
4\(\quad\quad q\)\(\to\)E 1, 3
5\(\quad\quad \bot\)\(\neg\)E 4, 2
6\(\quad \neg p\)\(\neg\)I 3–5
7\((\neg q \to \neg p)\)\(\to\)I 2–6

The strategy here is to assume the antecedent \(\neg q\), then show \(\neg p\) by assuming \(p\) and deriving a contradiction. The contradiction comes from applying \((p \to q)\) to get \(q\), which contradicts \(\neg q\). This is a template for proving any contrapositive.

Disjunctive syllogism: \(\{(p \lor q), \neg p\} \vdash q\). From the disjunction, we handle each case: if \(p\), then \(\neg p\) and \(p\) together give \(\bot\), and \(\bot\)E yields \(q\); if \(q\), we already have \(q\). Apply \(\lor\)E to conclude \(q\) in either case.
Commutativity of biconditional: \(\{(p \leftrightarrow q)\} \vdash (q \leftrightarrow p)\). From \((p \leftrightarrow q)\), extract both directions via \(\leftrightarrow\)E to get \((p \to q)\) and \((q \to p)\). Then apply \(\leftrightarrow\)I in the reversed order to derive \((q \leftrightarrow p)\).

3.5 The Deduction Theorem

One of the most useful meta-theorems about the natural deduction system is the Deduction Theorem, which relates implication-provability to provability under an assumption.

Deduction Theorem. For any set of formulas \(\Gamma\), any formulas \(\varphi\) and \(\psi\): \[ \Gamma \cup \{\varphi\} \vdash \psi \quad \Longleftrightarrow \quad \Gamma \vdash (\varphi \to \psi). \]
The \((\Rightarrow)\) direction is immediate from the \(\to\)I rule: if \(\Gamma \cup \{\varphi\} \vdash \psi\), then the derivation of \(\psi\) from \(\Gamma \cup \{\varphi\}\) constitutes a subproof from assumption \(\varphi\) concluding \(\psi\), so \(\to\)I gives \(\Gamma \vdash (\varphi \to \psi)\).

The \((\Leftarrow)\) direction follows from \(\to\)E: if \(\Gamma \vdash (\varphi \to \psi)\) and we add \(\varphi\) as a premise, we can apply \(\to\)E to get \(\psi\), so \(\Gamma \cup \{\varphi\} \vdash \psi\).

The Deduction Theorem is natural deduction’s analogue of the fact in mathematics that “assume \(P\), prove \(Q\)” gives “prove \(P \Rightarrow Q\).” It confirms that the \(\to\)I rule exactly captures this pattern. The theorem makes it easy to convert proofs-under-assumptions into theorems: to show \(\vdash (p \to (q \to p))\), we need only show \(\{p, q\} \vdash p\) — which is immediate by PR.

A second meta-theorem worth noting is proof by contradiction in its strongest form: \(\Gamma \vdash \varphi\) if and only if \(\Gamma \cup \{\neg\varphi\}\) is inconsistent (i.e., \(\Gamma \cup \{\neg\varphi\} \vdash \bot\)). The \((\Rightarrow)\) direction follows from the fact that from \(\varphi\) and \(\neg\varphi\) one derives \(\bot\) by \(\neg\)E; the \((\Leftarrow)\) direction follows from RAA. This equivalence makes it legitimate to prove \(\varphi\) by assuming \(\neg\varphi\) and deriving a contradiction — the standard mathematical technique of proof by contradiction.


Chapter 4: Soundness and Completeness of Propositional Logic

4.1 The Bridge Between Syntax and Semantics

We now have two notions of consequence: the semantic \(\Gamma \models \varphi\) and the syntactic \(\Gamma \vdash \varphi\). These were defined independently — one in terms of truth assignments, the other in terms of formal proof rules — and it is not at all obvious that they agree. The Soundness and Completeness Theorems assert that they do, completely and exactly:

\[ \Gamma \vdash \varphi \quad \Longleftrightarrow \quad \Gamma \models \varphi. \]

Soundness (\(\vdash\) implies \(\models\)) asserts that formal proofs are reliable: every formula provable from \(\Gamma\) is genuinely a logical consequence of \(\Gamma\). Without soundness, a proof system could derive nonsense. Completeness (\(\models\) implies \(\vdash\)) asserts that formal proofs are powerful enough to capture everything: every genuine logical consequence has a formal proof. Without completeness, some true things would be unprovable — a significant limitation.

Together, soundness and completeness mean the formal proof system and the semantic definition are interchangeable. This is one of the fundamental results of mathematical logic. The analogous result for first-order logic, proved by Kurt Gödel in his 1930 doctoral dissertation, is widely regarded as one of the great achievements of twentieth-century mathematics.

4.2 Soundness

To prove the Soundness Theorem, we first establish that each individual proof rule preserves logical consequence.

Preliminary properties of \(\models\).
  1. (Monotonicity) If \(\Gamma \models \varphi\) and \(\Gamma \subseteq \Gamma'\), then \(\Gamma' \models \varphi\).
  2. (Transitivity) If \(\Gamma \models \varphi_i\) for each \(\varphi_i \in \Delta\) and \(\Delta \models \psi\), then \(\Gamma \models \psi\).
  3. (Rule soundness) Each of the 14 natural deduction rules preserves \(\models\): if all formulas used by the rule follow logically from appropriate premises, then so does the derived formula.

Property (3) is established by checking each rule individually. The non-trivial cases are the rules that involve subproofs, where the discharged assumptions must be carefully tracked.

For the structural and conjunction rules: \(\land\)I is sound because if every assignment satisfying \(\Gamma\) makes both \(\varphi\) and \(\psi\) true, it also makes \((\varphi \land \psi)\) true; \(\land\)E follows since a conjunction being true implies each conjunct is true; \(\lor\)I follows since a true disjunct makes the disjunction true.

For conditional rules: \(\to\)E (modus ponens) is sound because truth of \((\varphi \to \psi)\) and of \(\varphi\) forces truth of \(\psi\), by the definition of the conditional. For \(\to\)I: the claim is that if \(\Gamma \cup \{\varphi\} \models \psi\), then \(\Gamma \models (\varphi \to \psi)\). This holds because any assignment satisfying \(\Gamma\): if it makes \(\varphi\) false, it makes \((\varphi \to \psi)\) vacuously true; if it makes \(\varphi\) true, then since \(\Gamma \cup \{\varphi\}\) is satisfied, \(\psi\) is also true, so \((\varphi \to \psi)\) is true.

For negation rules: \(\neg\)E follows from the definition of contradiction. For \(\neg\)I: if \(\Gamma \cup \{\varphi\} \models \bot\), then no assignment satisfies both \(\Gamma\) and \(\varphi\), so every assignment satisfying \(\Gamma\) must falsify \(\varphi\), which means \(\Gamma \models \neg\varphi\). For RAA: if \(\Gamma \cup \{\neg\varphi\} \models \bot\), then no assignment satisfies \(\Gamma\) with \(\neg\varphi\) true, so every assignment satisfying \(\Gamma\) must satisfy \(\varphi\), giving \(\Gamma \models \varphi\). For \(\bot\)E: from \(\Gamma \models \bot\) (no satisfying assignment), every conditional \(\Gamma \models \psi\) holds vacuously.

For \(\lor\)E: suppose \(\Gamma \models (\varphi \lor \psi)\), \(\Gamma \cup \{\varphi\} \models \theta\), and \(\Gamma \cup \{\psi\} \models \theta\). Any assignment satisfying \(\Gamma\) makes \((\varphi \lor \psi)\) true, so it makes either \(\varphi\) true or \(\psi\) true (or both). In the first case, \(\Gamma \cup \{\varphi\}\) is satisfied, so \(\theta\) is true; in the second case, \(\Gamma \cup \{\psi\}\) is satisfied, so \(\theta\) is true. Hence \(\Gamma \models \theta\).

Soundness Theorem. For every set of formulas \(\Gamma\) and every formula \(\varphi\), if \(\Gamma \vdash \varphi\) then \(\Gamma \models \varphi\).
By induction on the number of lines in the formal proof. The induction hypothesis is: at each line \(k\) containing formula \(\varphi_k\), if \(A_k\) denotes the set of undischarged assumptions at that point (those of open subproofs), then \(\Gamma \cup A_k \models \varphi_k\).

Base case (line 1). The first line is either a premise (in \(\Gamma\), so trivially \(\Gamma \models \varphi_1\)) or the start of a subproof with assumption \(\alpha\) (so \(\Gamma \cup \{\alpha\} \models \alpha\)).

Inductive step. Assume the hypothesis holds for all lines before \(k\). Line \(k\) was derived by one of the rules. For each rule, Property (3) above guarantees that the conclusion follows logically from the formulas already established by the inductive hypothesis. The case analysis covers all 14 rules.

For the final line of the proof, there are no open subproofs, so \(A_k = \emptyset\) and the hypothesis gives \(\Gamma \models \varphi\).

Soundness has an immediate corollary: if \(\Gamma \vdash \bot\), then \(\Gamma \models \bot\), meaning no truth assignment satisfies all of \(\Gamma\) — so \(\Gamma\) is semantically unsatisfiable.

4.3 Consistent and Maximally Consistent Sets

The completeness proof requires a different approach: we must show that whenever \(\Gamma \models \varphi\), a formal proof exists. Rather than constructing proofs directly, we use an indirect strategy via the Lindenbaum extension: extend \(\Gamma\) to a large, well-behaved set from which a satisfying assignment can be read off directly.

A set of formulas \(\Gamma\) is:
  • consistent if \(\Gamma \not\vdash \bot\) (cannot derive a contradiction);
  • inconsistent if \(\Gamma \vdash \bot\);
  • maximally consistent if \(\Gamma\) is consistent and no proper superset of \(\Gamma\) is consistent (adding any new formula creates inconsistency).

By Soundness, if \(\Gamma\) is satisfiable then it is consistent: a satisfying assignment prevents deriving \(\bot\). The converse — consistency implies satisfiability — is a key lemma.

Maximally consistent sets have a very useful structure:

Proposition 1. Let \(\Gamma\) be maximally consistent and \(\varphi\) any formula. The following are equivalent:
  1. \(\varphi \in \Gamma\);
  2. \(\neg\varphi \notin \Gamma\);
  3. \(\Gamma \vdash \varphi\);
  4. \(\Gamma \cup \{\varphi\} \not\vdash \bot\).

This says a maximally consistent set is syntactically complete: for every formula \(\varphi\), exactly one of \(\varphi, \neg\varphi\) belongs to \(\Gamma\). No formula sits in a gray zone of being neither provable nor disprovable.

Lemma 2. Every maximally consistent set is satisfiable.
Given a maximally consistent \(\Gamma\), define a truth assignment \(v\) by \(v(p_i) = \mathbf{T}\) iff \(p_i \in \Gamma\). We claim that for every formula \(\varphi\), \(v^*(\varphi) = \mathbf{T}\) iff \(\varphi \in \Gamma\). The proof is by structural induction on \(\varphi\).

Base case: if \(\varphi = p_i\), then \(v^*(\varphi) = v(p_i) = \mathbf{T}\) iff \(p_i \in \Gamma\) by definition.

Negation: \(v^*(\neg\psi) = \mathbf{T}\) iff \(v^*(\psi) = \mathbf{F}\) iff \(\psi \notin \Gamma\) (by induction) iff \(\neg\psi \in \Gamma\) (by Proposition 1).

Conjunction: \(v^*((\psi \land \theta)) = \mathbf{T}\) iff both \(v^*(\psi) = \mathbf{T}\) and \(v^*(\theta) = \mathbf{T}\). By induction, this is iff \(\psi \in \Gamma\) and \(\theta \in \Gamma\), which is iff \((\psi \land \theta) \in \Gamma\) (using Proposition 1 and the fact that \(\land\)I and \(\land\)E are derivable rules).

The remaining connectives (\(\lor\), \(\to\), \(\leftrightarrow\)) are handled similarly, each using the appropriate introduction and elimination rules via Proposition 1.

In particular, \(v^*(\varphi) = \mathbf{T}\) for all \(\varphi \in \Gamma\), so \(v\) is a satisfying assignment for \(\Gamma\).

4.4 The Lindenbaum Extension

The final ingredient for completeness is a method to extend any consistent set to a maximally consistent one. This is the Lindenbaum Lemma, named after Adolf Lindenbaum who sketched the idea in the 1920s (it was fully developed by Tarski).

Lemma 4 (Lindenbaum Extension). Every consistent set \(\Gamma\) can be extended to a maximally consistent set \(\Gamma^* \supseteq \Gamma\).
Fix an enumeration \(\varphi_1, \varphi_2, \varphi_3, \ldots\) of all formulas in \(\mathcal{L}^p(\sigma)\) (possible since \(\sigma\) is countable). Define a chain: \[ \Gamma_0 = \Gamma, \qquad \Gamma_k = \begin{cases} \Gamma_{k-1} \cup \{\varphi_k\} & \text{if this is consistent,} \\ \Gamma_{k-1} & \text{otherwise.}\end{cases} \] Set \(\Gamma^* = \bigcup_{k=0}^\infty \Gamma_k\). Each \(\Gamma_k\) is consistent (by construction: we only add \(\varphi_k\) when it preserves consistency). Since any formal proof uses only finitely many lines, \(\Gamma^* \vdash \bot\) would require \(\Gamma_k \vdash \bot\) for some finite \(k\) — but each \(\Gamma_k\) is consistent, so \(\Gamma^*\) is consistent. Maximality holds because for each \(\varphi_k\), either \(\varphi_k \in \Gamma^*\) (added at step \(k\)) or \(\Gamma_{k-1} \cup \{\varphi_k\}\) was inconsistent, meaning \(\Gamma^* \cup \{\varphi_k\}\) is also inconsistent.

4.5 The Completeness Theorem

We now have all the pieces.

Completeness Theorem for Propositional Logic. For every set of formulas \(\Gamma\) and every formula \(\varphi\), if \(\Gamma \models \varphi\) then \(\Gamma \vdash \varphi\).
Suppose \(\Gamma \models \varphi\). Then every truth assignment satisfying \(\Gamma\) also satisfies \(\varphi\), which means no truth assignment satisfies \(\Gamma \cup \{\neg\varphi\}\). So \(\Gamma \cup \{\neg\varphi\}\) is semantically unsatisfiable.

By the contrapositive of Lemmas 2 and 4 combined: if \(\Gamma \cup \{\neg\varphi\}\) is unsatisfiable, it cannot be extended to a maximally consistent (hence satisfiable) set, so \(\Gamma \cup \{\neg\varphi\}\) must already be inconsistent. That is, \(\Gamma \cup \{\neg\varphi\} \vdash \bot\). Applying the RAA rule, we conclude \(\Gamma \vdash \varphi\).

Historical note. The completeness of propositional logic was proved by Emil Post in 1921 and Paul Bernays in 1926, independently. Post's proof used normal forms and truth tables; Bernays worked with Hilbert's axiomatic system. Gentzen's natural deduction framework arrived in 1935 and made the completeness argument more transparent through the Lindenbaum technique.

The analogous theorem for first-order logic — the Completeness Theorem that will occupy Chapter 8 — was proved by Kurt Gödel in his doctoral dissertation submitted to the University of Vienna in 1929. This is the same Gödel who would, just two years later, prove the incompleteness of arithmetic. The two results are not contradictory: propositional logic and first-order logic are complete (every valid consequence has a proof), while the incompleteness theorems concern specific sufficiently powerful theories within first-order logic (like Peano arithmetic), not the proof system itself.

\[ \Gamma \vdash \varphi \quad \Longleftrightarrow \quad \Gamma \models \varphi. \]

This means we can use semantic reasoning (truth tables) to discover logical consequences and proof-theoretic reasoning (natural deduction) to certify them, interchangeably. The rest of the course extends this picture: first to first-order logic (where the same equivalence holds but is harder to prove), and then to computability theory (where the limits of the equivalence become central).

4.6 Consequences of the Main Theorems

The soundness and completeness theorems unlock several important corollaries.

Compactness for Propositional Logic. A set of formulas \(\Gamma\) is satisfiable if and only if every finite subset of \(\Gamma\) is satisfiable.
By Completeness, \(\Gamma\) is satisfiable iff \(\Gamma\) is consistent. Any proof of \(\bot\) from \(\Gamma\) uses only finitely many premises, so \(\Gamma\) is inconsistent iff some finite \(\Gamma_0 \subseteq \Gamma\) is inconsistent, iff some finite \(\Gamma_0\) is unsatisfiable (by Soundness). Contrapositive: \(\Gamma\) is satisfiable iff every finite subset is satisfiable.
Finite characterization of logical consequence. If \(\Gamma \models \varphi\), then there is a finite \(\Gamma_0 \subseteq \Gamma\) with \(\Gamma_0 \models \varphi\).

This follows immediately from completeness (which gives a finite proof using finitely many premises) plus soundness. It is the propositional analogue of the FOL compactness theorem.

Refutation completeness. If \(\Gamma\) is unsatisfiable, then \(\Gamma \vdash \bot\).

This corollary, combined with the Deduction Theorem, gives the standard proof-by-contradiction strategy: to prove \(\varphi\) from \(\Gamma\), it suffices to show that \(\Gamma \cup \{\neg\varphi\}\) is semantically unsatisfiable (e.g., by finding a tautology of the form \((\gamma_1 \land \cdots \land \gamma_n \land \neg\varphi) \to \bot\)).

The practical significance of these corollaries for automated reasoning is substantial. Refutation-based theorem provers (such as resolution provers) exploit refutation completeness: to prove a theorem, negate it and derive a contradiction. Soundness ensures that any contradiction found is genuine; completeness ensures that if a contradiction exists, it will eventually be found.

A deeper corollary connects formal provability to decidability. The set of propositional tautologies is decidable: given any formula \(\varphi\) with \(n\) variables, check its truth table in \(O(2^n)\) time. While exponential, this is still computable, so propositional validity is in the class co-NP (complement of NP). The question of whether it is in P — whether there is a polynomial-time algorithm — is equivalent to the \(\mathbf{P} \neq \mathbf{NP}\) conjecture, perhaps the deepest open problem in theoretical computer science.


Chapter 5: First-Order Languages

5.1 The Limits of Propositional Logic

The classic example that motivates first-order logic is the syllogism:

Every man is mortal. Socrates is a man. Therefore, Socrates is mortal.

Propositional logic cannot verify this. If we label “every man is mortal” as \(P\), “Socrates is a man” as \(Q\), and “Socrates is mortal” as \(S\), all propositional logic can see is \(P \land Q \to S\) — but it has no reason to accept this as valid. The internal structure of the statements (the shared predicate “mortal,” the shared object “Socrates,” the universal quantification “every man”) is invisible at the propositional level. The three statements are just three unrelated atoms.

First-order logic — also called predicate calculus — breaks statements into two components: objects and properties. “Every man is mortal” becomes \(\forall x\,(\text{Man}(x) \to \text{Mortal}(x))\); “Socrates is a man” becomes \(\text{Man}(\text{Socrates})\). The fact that “Socrates” appears as the object in both statements, and “Mortal” as the predicate, makes the conclusion \(\text{Mortal}(\text{Socrates})\) derivable by universal instantiation — a step invisible to propositional logic.

Properties are not just unary predicates on individual objects. “John likes Mary” involves a binary predicate relating two objects. “Between \(x\) and \(z\) there is a \(y\)” involves a ternary relation. This is why the logic is called predicate calculus: predicates of any arity are the building blocks.

Propositional logic is powerful enough to verify arguments of the form “given these truth-valued premises, this conclusion follows.” But it cannot capture the internal structure of statements. Consider the argument:

\(3 + 2 = 5\). \quad 5 > 1. \quad Therefore: \(3 + 2 > 1\).

This argument is clearly valid — the transitivity of \(>\) with the equality \(3+2=5\) forces the conclusion. But propositional logic cannot verify it. If we label the three statements as atoms \(p\), \(q\), \(r\), propositional logic sees no connection between them. To capture validity, we need to look inside the statements: to name constants (\(3, 2, 5, 1\)), recognize the function symbol \(+\), and relate the predicates \(=\) and \(>\).

More fundamentally, propositional logic cannot handle quantification over infinite domains. If we want to state “every student who took CS 245 and did not pass has failed,” propositional logic requires a separate atomic proposition for every student — infinitely many premises. But with a predicate \(P(x)\) meaning “\(x\) took CS 245” and quantifiers, we can say \(\forall x\, (P(x) \land \neg Q(x) \to R(x))\) — a single sentence covering all students simultaneously.

First-order logic (FOL) is the extension of propositional logic that adds:

  • Terms built from variables, constants, and function symbols — representing objects;
  • Atomic formulas built from predicate symbols applied to terms — representing relations;
  • Quantifiers \(\forall\) and \(\exists\) — ranging over elements of a domain.

First-order logic is expressive enough to axiomatize essentially all of classical mathematics (arithmetic, group theory, set theory), yet it admits a completeness theorem (Chapter 8). This combination of expressiveness and tractability makes it the lingua franca of mathematical logic.

5.2 Signatures

Different areas of mathematics use different primitive symbols. Number theory talks about \(0\), \(S\), \(+\), \(\times\); group theory talks about \(\cdot\), \(e\), \({}^{-1}\); graph theory talks about the edge relation \(E\). A signature specifies which symbols are available and how many arguments each takes.

A signature \(\tau\) consists of:
  • Constant symbols: names for fixed elements (arity 0);
  • Function symbols: each with a positive integer arity \(k\), representing \(k\)-ary operations on the domain;
  • Predicate symbols (also called relation symbols): each with a positive integer arity \(k\), representing \(k\)-ary relations.
In addition, every first-order language includes logical symbols shared across all signatures: connectives \(\neg, \land, \lor, \to, \leftrightarrow\), quantifiers \(\forall, \exists\), equality \(=\), variables \(x_1, x_2, x_3, \ldots\), and punctuation.
Example signatures.
  • Number theory: \(\tau_{\mathrm{arith}} = \{0, S, {+}, {\times}\}\) where \(0\) is a constant, \(S\) (successor) is a unary function, and \(+\), \(\times\) are binary functions.
  • Group theory: \(\tau_{\mathrm{grp}} = \{e, {\cdot}\}\) where \(e\) is a constant and \(\cdot\) is a binary function. One can also include an inverse function \({}^{-1}\) as a unary function.
  • Graph theory: \(\tau_{\mathrm{gr}} = \{E\}\) where \(E\) is a binary predicate (the edge relation).
  • Set theory: \(\tau_{\in} = \{\in\}\) where \(\in\) is a binary predicate.

The distinction between function symbols (which produce elements of the domain) and predicate symbols (which produce truth values) is fundamental. Constants are just zero-arity functions; they name specific elements rather than computing new ones.

Ben-David's number theory signature, informally. In lectures, Ben-David uses a simplified notation to build intuition before introducing the formal definition. He writes the number theory signature with constants \(a, b\) (standing for 0 and 1), binary function symbols \(f, g\) (standing for \(+\) and \(\times\)), and a binary relation symbol \(r\) (standing for \(\leq\)). This separation of syntax from semantics is deliberate: the symbols \(a, b, f, g, r\) carry no meaning until a structure is specified. The formula \[ \forall y\,\forall z\,\bigl(g(y,z) = x \to (y = x \lor z = x)\bigr) \] with free variable \(x\) then expresses "\(x\) is a prime number" — because multiplication of two numbers yields \(x\) only if one of the factors is \(x\) itself (the other being 1, which plays a special role separately). Similarly, the formula \(\exists z\, f(z,z) = x\) with free variable \(x\) expresses "\(x\) is even": there is a number \(z\) such that \(z + z = x\). These examples illustrate that even without writing numbers explicitly, the language of function and relation symbols can capture rich arithmetic properties. \[ \forall x\,\exists y\,\bigl(r(x,y) \land \varphi_{\text{prime}}(y)\bigr) \]

where \(\varphi_{\text{prime}}(y)\) is the formula above (with \(y\) replacing \(x\)). The sentence says: for every number \(x\), there is a larger (or equal) prime \(y\). This is equivalent to the infinitude of primes.

5.3 Terms

Given a signature \(\tau\), the terms of \(\tau\) are the expressions representing objects. They are built recursively from variables and constants by applying function symbols.

The set of \(\tau\)-terms is the smallest set satisfying:
  1. Every variable \(x_i\) is a term.
  2. Every constant symbol of \(\tau\) is a term.
  3. If \(f\) is an \(n\)-ary function symbol of \(\tau\) and \(t_1,\ldots,t_n\) are terms, then \(f(t_1,\ldots,t_n)\) is a term.

In the number theory signature, \(S(S(S(0)))\) is a term representing the number \(3\); \((x_1 + (x_2 \times 0))\) is a term representing \(x_1 + x_2 \cdot 0\). Terms are the “noun phrases” of the language — they name elements, not make claims about them.

5.4 Formulas

With terms in hand, we can form atomic formulas by applying predicate symbols to terms, and build compound formulas using connectives and quantifiers.

The set of \(\tau\)-formulas is the smallest set satisfying:
  1. If \(P\) is an \(n\)-ary predicate symbol and \(t_1,\ldots,t_n\) are terms, then \(P(t_1,\ldots,t_n)\) is an atomic formula.
  2. If \(t_1, t_2\) are terms, then \((t_1 = t_2)\) is an atomic formula (using equality).
  3. If \(\varphi\) and \(\psi\) are formulas, so are \(\neg\varphi\), \((\varphi \land \psi)\), \((\varphi \lor \psi)\), \((\varphi \to \psi)\), \((\varphi \leftrightarrow \psi)\).
  4. If \(\varphi\) is a formula and \(x\) is a variable, then \(\forall x\,\varphi\) and \(\exists x\,\varphi\) are formulas.

The quantifier cases (rule 4) are what distinguish FOL from propositional logic. The quantifier \(\forall x\) says “for all elements of the domain, the following holds,” and \(\exists x\) says “there exists some element for which the following holds.”

5.4.1 Terms as Polynomials

An illuminating way to think about terms: in the number theory language with constants \(a\) (for 0) and \(b\) (for 1) and binary function symbols \(f\) (for \(+\)) and \(g\) (for \(\times\)), the set of all terms is precisely (when interpreted in \(\mathbb{N}\)) the set of all polynomials over a set of variables with natural-number coefficients. The term \(g(f(b,b), x)\) represents \((1+1) \cdot x = 2x\). Terms are the “noun phrases” of the language: they denote values, not truth values. Only when a relation symbol is applied to terms do we get an atomic formula that can be true or false.

Why terms cannot be evaluated without a structure. The term \(f(b,b)\) syntactically represents \(b + b\) — but what element does this denote? In the standard model \(\mathbb{N}\) it denotes 2. In a "fruit structure" with domain \(\{\text{apple}, \text{banana}, \text{cherry}\}\) it denotes whatever the function \(f^{\mathcal{M}}\) maps \((b^{\mathcal{M}}, b^{\mathcal{M}})\) to — perhaps "banana". The term itself is just a syntactic tree; the structure gives it a value. This clean separation between syntax (terms, formulas) and semantics (structures, truth values) is the hallmark of formal logic.

5.5 Free and Bound Variables

Quantifiers create a notion of variable scope analogous to variable binding in programming languages.

An occurrence of variable \(x\) in a formula \(\varphi\) is bound if it appears inside a subformula of the form \(\forall x\,\psi\) or \(\exists x\,\psi\); otherwise it is free. A formula with no free variables is a sentence (or closed formula).

The formula \(\forall x_1\, E(x_1, x_2)\) has \(x_1\) bound and \(x_2\) free — it is not a sentence and cannot be evaluated in a structure without first specifying what \(x_2\) refers to. The sentence \(\forall x_1\, \forall x_2\, E(x_1, x_2)\) makes the definite claim “every pair of vertices is connected by an edge.”

A variable can appear both free and bound in the same formula. In \((x_1 = x_1) \land \exists x_1\, \neg(x_1 = 0)\), the first two occurrences of \(x_1\) are free and the third is bound. This situation is avoided in practice by renaming bound variables to keep them distinct from free ones.

Formalizing mathematical statements. The following illustrate how standard mathematical claims become FOL sentences over appropriate signatures.

Over \(\tau_{\mathrm{arith}} = \{0, S, +, \times\}\):

  • There are infinitely many primes: \(\forall x\, \exists y\, (x < y \land \mathrm{Prime}(y))\) where \(\mathrm{Prime}(y) \equiv \forall z_1\, \forall z_2\, ((z_1 \times z_2 = y) \to (z_1 = S(0) \lor z_2 = S(0)))\) and \(x < y \equiv \exists z\, (x + S(z) = y)\).
  • Every positive integer is a sum of four squares (Lagrange's theorem): \(\forall x\, \exists a\, \exists b\, \exists c\, \exists d\, (x = a \times a + b \times b + c \times c + d \times d)\).

Over \(\tau_G = \{e, \cdot\}\) (groups):

  • The group is abelian: \(\forall x\, \forall y\, (x \cdot y) = (y \cdot x)\).
  • Every element has order dividing 2: \(\forall x\, (x \cdot x) = e\).

Over \(\tau_{\in} = \{\in\}\) (set theory):

  • Extensionality: \(\forall x\, \forall y\, ((\forall z\, (z \in x \leftrightarrow z \in y)) \to x = y)\).
  • Empty set: \(\exists x\, \forall y\, \neg(y \in x)\).

The Unique Parsing Theorem extends to FOL: every formula has a unique parse tree, established by an argument analogous to the propositional case. The proof is more involved because of terms and quantifiers, but the essential idea — that the balanced-parenthesis argument localizes the main connective — carries over.

5.6 Substitution

A central operation in FOL is substituting a term for a variable. This underlies both the semantics of quantifiers and the \(\forall\)E and \(\exists\)I proof rules.

For a term \(t\) and variable \(x\), the substitution \(\varphi[x/t]\) denotes the formula obtained by replacing every free occurrence of \(x\) in \(\varphi\) with \(t\). The substitution is free for \(x\) in \(\varphi\) if no free variable of \(t\) becomes bound by a quantifier in \(\varphi\) after substitution.

The freeness condition is critical. Consider \(\forall y\, (x = y)\). Substituting \(y\) for \(x\) gives \(\forall y\, (y = y)\), but the \(y\) from the substituted term is now inside the scope of \(\forall y\) — the intended free variable \(y\) has been captured. The formula \(\forall y\, (y = y)\) asserts something quite different from what we intended. To avoid capture, one renames bound variables before substituting: replace \(\forall y\, (x = y)\) with \(\forall z\, (x = z)\), then substitute to get \(\forall z\, (y = z)\).

Safe substitution. In \(\forall y\, P(x, y)\), substituting term \(f(z)\) for \(x\) gives \(\forall y\, P(f(z), y)\). The substituted term \(f(z)\) contains the free variable \(z\), and \(z\) does not appear as a bound variable in the formula, so the substitution is safe.

Unsafe substitution (capture). In \(\exists y\, (x < y)\), substituting \(y+1\) for \(x\) would give \(\exists y\, (y+1 < y)\), which is false in \(\mathbb{N}\), whereas the original formula with \(x = y+1\) should be satisfiable. To avoid capture: rename the bound variable first to get \(\exists z\, (x < z)\), then substitute to get \(\exists z\, (y+1 < z)\), which correctly expresses that there exists something larger than \(y+1\).

In the natural deduction proof system, the rules \(\forall\)E and \(\exists\)I require that substitutions are free (no capture). The eigenvariable conditions on \(\forall\)I and \(\exists\)E ensure that the fresh variables introduced do not appear anywhere in the current context, automatically preventing capture.

Intuition for "free for substitution." The phrase "term \(t\) is free for variable \(x\) in \(\varphi\)" means: when you physically replace every free occurrence of \(x\) in \(\varphi\) with \(t\), no variable that was free in \(t\) gets accidentally captured by a quantifier in \(\varphi\). Think of it as placing a sticky note saying "\(x = t\)" everywhere \(x\) appears. If the sticky note contains a variable \(y\), and the location where you place it is inside a \(\forall y\) or \(\exists y\), then the \(y\) in the note changes its meaning — it was supposed to refer to an outer \(y\) but now refers to the bound \(y\). This accidental capture changes the formula's meaning. The solution is always to rename bound variables in \(\varphi\) before substituting: rename \(\forall y\) to \(\forall z\) (where \(z\) is fresh), and then substitute safely.

A concrete test: \(t\) is free for \(x\) in \(\varphi\) if and only if every free variable of \(t\) is free in every subformula of \(\varphi\) in which \(x\) occurs free. Equivalently, no free variable of \(t\) is in the scope of a quantifier binding that variable at any occurrence of \(x\) in \(\varphi\).


Chapter 6: Structures and Models

6.1 Interpreting a Signature

A formula in first-order logic is a syntactic object. To evaluate it — to ask whether it is true — we need a structure that gives meaning to each symbol in the signature. A structure specifies a domain (the universe of objects) and an interpretation for each constant, function, and predicate.

A structure (also called an interpretation) \(\mathcal{M}\) for signature \(\tau\) consists of:
  • A nonempty set \(D\), the domain of \(\mathcal{M}\);
  • For each constant \(c \in \tau\): an element \(c^{\mathcal{M}} \in D\);
  • For each \(n\)-ary function symbol \(f \in \tau\): a function \(f^{\mathcal{M}}: D^n \to D\);
  • For each \(n\)-ary predicate symbol \(P \in \tau\): a relation \(P^{\mathcal{M}} \subseteq D^n\).

The domain can be any nonempty set — finite or infinite, concrete or abstract. Distinct constant symbols need not be interpreted as distinct elements. Multiple function symbols can receive identical interpretations. The freedom to choose the domain and interpretation arbitrarily is precisely what makes model-theoretic arguments so powerful: to show a sentence is not a logical consequence of some axioms, it suffices to find one structure where the axioms hold but the sentence fails.

Structures for number theory. The signature \(\tau_{\mathrm{arith}} = \{0, S, +, \times\}\) has several natural interpretations:
  • The standard model \(\mathbb{N}\): domain \(\{0,1,2,\ldots\}\), \(0^{\mathbb{N}} = 0\), \(S^{\mathbb{N}}(n) = n+1\), standard addition and multiplication.
  • The integers \(\mathbb{Z}\): domain all integers, with \(0^{\mathbb{Z}} = 0\), \(S^{\mathbb{Z}}(n) = n+1\), standard addition and multiplication. This is a different structure over the same signature.
  • A fruit structure \(\mathcal{M}_f\): domain \(\{\text{apple}, \text{banana}, \text{cherry}\}\), with \(0^{\mathcal{M}_f} = \text{apple}\), and \(+\) and \(\times\) defined by arbitrary lookup tables. Absurd as an arithmetic model, but perfectly valid as a first-order structure. It demonstrates that the symbols in a signature carry no intrinsic meaning — only the structure's interpretation gives them meaning.

6.2 Evaluating Sentences

Given a structure \(\mathcal{M}\), the valuation \(\mathrm{val}_{\mathcal{M}}\) evaluates closed terms to elements of \(D\) and sentences to truth values.

The term evaluation is:
  • \(\mathrm{val}_{\mathcal{M}}(c) = c^{\mathcal{M}}\) for constants;
  • \(\mathrm{val}_{\mathcal{M}}(x_i) = x_i^{\mathcal{M}}\) for variables (when assigned a value in \(D\));
  • \(\mathrm{val}_{\mathcal{M}}(f(t_1,\ldots,t_n)) = f^{\mathcal{M}}(\mathrm{val}_{\mathcal{M}}(t_1),\ldots,\mathrm{val}_{\mathcal{M}}(t_n))\).

The sentence evaluation \(\mathrm{val}_{\mathcal{M}}(\varphi) \in \{\mathbf{T}, \mathbf{F}\}\) for a sentence \(\varphi\) is defined recursively:

  • \(\mathrm{val}_{\mathcal{M}}(P(t_1,\ldots,t_n)) = \mathbf{T}\) iff \((\mathrm{val}_{\mathcal{M}}(t_1),\ldots,\mathrm{val}_{\mathcal{M}}(t_n)) \in P^{\mathcal{M}}\);
  • \(\mathrm{val}_{\mathcal{M}}(t_1 = t_2) = \mathbf{T}\) iff \(\mathrm{val}_{\mathcal{M}}(t_1) = \mathrm{val}_{\mathcal{M}}(t_2)\) as elements of \(D\);
  • Connectives (\(\neg, \land, \lor, \to, \leftrightarrow\)) work exactly as in propositional logic;
  • \(\mathrm{val}_{\mathcal{M}}(\exists x\,\varphi) = \mathbf{T}\) iff there exists \(d \in D\) such that \(\mathrm{val}_{\mathcal{M}}(\varphi[x/d]) = \mathbf{T}\);
  • \(\mathrm{val}_{\mathcal{M}}(\forall x\,\varphi) = \mathbf{T}\) iff for every \(d \in D\), \(\mathrm{val}_{\mathcal{M}}(\varphi[x/d]) = \mathbf{T}\).

Here \(\varphi[x/d]\) denotes the formula obtained by substituting the domain element \(d\) for every free occurrence of \(x\). The quantifier cases say: \(\exists x\,\varphi\) is true if at least one “witness” in the domain makes \(\varphi\) true when substituted, and \(\forall x\,\varphi\) is true if every element satisfies \(\varphi\).

Only sentences can be evaluated in a structure. A formula with free variables, like \(E(x_1, x_2)\), cannot be assigned a truth value in a structure without specifying what \(x_1\) and \(x_2\) denote. This is why the definition above is stated for sentences (closed formulas). The treatment of formulas with free variables in formal proofs requires additional care — addressed in Chapter 8.

6.3 Satisfaction, Validity, and Models

Let \(\mathcal{M}\) be a structure for \(\tau\) and \(\varphi\) a sentence of \(\tau\).
  • \(\mathcal{M}\) satisfies \(\varphi\), written \(\mathcal{M} \models \varphi\), if \(\mathrm{val}_{\mathcal{M}}(\varphi) = \mathbf{T}\).
  • \(\varphi\) is satisfiable if some structure satisfies it.
  • \(\varphi\) is (logically) valid, written \(\models \varphi\), if every structure satisfies it.
  • For a set of sentences \(\Gamma\): a structure satisfying every sentence in \(\Gamma\) is a model of \(\Gamma\). The theory of \(\Gamma\) consists of all sentences that hold in every model of \(\Gamma\).
  • \(\Gamma \models \varphi\) (logical consequence) means every model of \(\Gamma\) satisfies \(\varphi\).

The sentence \(\exists x_1\, \exists x_2\, \neg(x_1 = x_2)\) (“there exist two distinct elements”) is satisfiable (holds in \(\mathbb{N}\)) but not valid (fails in any one-element structure).

The commutativity of addition \(\forall x_1\, \forall x_2\, (x_1 + x_2) = (x_2 + x_1)\) holds in \(\mathbb{N}\), \(\mathbb{Z}\), \(\mathbb{R}\), but fails in the fruit structure \(\mathcal{M}_f\) if apples and bananas are not interchangeable. It is therefore satisfiable but not logically valid. By contrast, \(\forall x_1\, \neg P(x_1) \to \neg\exists x_1\, P(x_1)\) is valid — it holds in every structure regardless of what \(P\) means.

Three structures for one binary relation symbol. Take the signature \(\{R\}\) with a single binary relation symbol. Three natural structures illuminate how truth depends entirely on the structure:
  • \(\mathcal{M}_1\): universe \(\mathbb{N}\), \(R^{\mathcal{M}_1} = \{(n,m) : n < m\}\) (strict ordering).
  • \(\mathcal{M}_2\): universe \(\mathbb{N}\), \(R^{\mathcal{M}_2} = \{(n,m) : n \mid m\}\) (divisibility).
  • \(\mathcal{M}_3\): universe \(\mathcal{P}(\mathbb{N})\), \(R^{\mathcal{M}_3} = \{(A,B) : A \subseteq B\}\) (subset inclusion).
The sentence \(\exists x\, R(x,x)\) is false in \(\mathcal{M}_1\), true in \(\mathcal{M}_2\) (every number divides itself), and true in \(\mathcal{M}_3\) (every set is a subset of itself). The sentence \(\exists x\, \forall y\, R(x,y)\) ("there is an element relating to everything") is false in \(\mathcal{M}_1\), true in \(\mathcal{M}_2\) (1 divides every natural number), and true in \(\mathcal{M}_3\) (\(\emptyset\) is a subset of every set). To show a sentence is not logically valid, exhibit a single counterexample structure. To show it is logically valid, argue it holds in every possible structure.
Graphs as structures; the rock-star and egg-checking logical truths. Any directed graph \(G = (V,E)\) is a structure for \(\{R\}\) with universe \(V\) and \(R^{\mathcal{M}} = E\). Consider the pair of sentences: \[ \varphi_1 = \exists y\,\forall x\, R(x,y), \qquad \varphi_2 = \forall x\,\exists y\, R(x,y). \] In the rock-star reading, \(\varphi_1\) says "there is a rock star whom everyone loves" and \(\varphi_2\) says "everyone loves someone." The sentence \(\varphi_1 \to \varphi_2\) is logically valid: if there is a universal beloved, then everyone loves at least that person. The converse \(\varphi_2 \to \varphi_1\) is not valid. Counterexample: the directed 3-cycle \(\{A,B,C\}\) with edges \(A \to B\), \(B \to C\), \(C \to A\). Every vertex loves someone (\(\varphi_2\) holds), but no single vertex is loved by all three (\(\varphi_1\) fails). Ben-David uses the graph picture as a standard tool: when \(R\) is binary, draw a directed graph and read off the truth values visually.

A second surprising example: the sentence \(\exists x\,(P(x) \to \forall y\,P(y))\) is a logical truth. Proof by cases: if \(P\) holds of every element, pick any \(x\) — the consequent is true, so the implication holds. If some element \(x_0\) fails \(P\), pick that \(x_0\) — the antecedent \(P(x_0)\) is false, so the implication is vacuously true. In both cases the existential is satisfied. The egg-checking reading: there is always one egg such that if that egg is intact, all eggs are intact. If all eggs are whole, any egg certifies the claim. If some egg is broken, pick the broken egg — its conditional is vacuously true.

6.4 Complete Theories and Categoricity

A recurring theme in model theory is identifying which theories are “well-behaved” in the sense of having a unique model (up to isomorphism) or deciding every sentence.

A theory \(T\) (a set of sentences closed under logical consequence) is:
  • Complete if for every sentence \(\varphi\), either \(\varphi \in T\) or \(\neg\varphi \in T\);
  • \(\kappa\)-categorical if all models of \(T\) with domain of cardinality \(\kappa\) are isomorphic to each other.

Categoricity is a strong form of model-theoretic control. If a theory is \(\kappa\)-categorical for some infinite \(\kappa\) and has no finite models, then it is complete (Łoś–Vaught theorem): any two models of the same infinite cardinality are isomorphic, hence satisfy the same sentences, so the theory is complete.

Dense linear orders without endpoints. The theory \(\mathrm{DLO}\) (axioms: irreflexivity, transitivity, totality, density \(\forall x\, \forall y\, (x < y \to \exists z\, (x < z \land z < y))\), no minimum \(\forall x\, \exists y\, y < x\), no maximum \(\forall x\, \exists y\, x < y\)) is \(\aleph_0\)-categorical: all countable dense linear orders without endpoints are isomorphic (by a back-and-forth argument). By Łoś–Vaught, \(\mathrm{DLO}\) is complete. The rationals \((\mathbb{Q}, <)\) are the unique countable model.
Definability of classes of structures by size. With the empty signature (no constants, functions, or relations — only the logical symbols \(\forall, \exists, =\)), one can already define interesting classes of structures. The key observation: every formula in any signature is finite, so every definition uses only finitely many sentences.
  • The class of all structures with at least \(n\) elements is defined by the single sentence: \[ \varphi_{\geq n} = \exists x_1\,\cdots\,\exists x_n\,\bigwedge_{1 \leq i < j \leq n} \neg(x_i = x_j). \] This asserts the existence of \(n\) pairwise distinct elements.
  • The class of all structures with at most \(n\) elements is defined by: \[ \varphi_{\leq n} = \forall x_1\,\cdots\,\forall x_{n+1}\,\bigvee_{1 \leq i < j \leq n+1}(x_i = x_j). \] This says: whenever you pick \(n+1\) elements, at least two are equal.
  • The class of all structures with exactly \(n\) elements is defined by \(\{\varphi_{\geq n}, \varphi_{\leq n}\}\).
  • The class of all infinite structures is defined by the infinite set of sentences \(\{\varphi_{\geq 1}, \varphi_{\geq 2}, \varphi_{\geq 3}, \ldots\}\). Each individual sentence \(\varphi_{\geq n}\) is finite; only the set is infinite. A structure satisfies all of them if and only if its universe is infinite.

Note that the class of all finite structures is not definable by any set of FOL sentences — this follows from the Compactness Theorem applied to the negations of the \(\varphi_{\geq n}\).

Definability of orderings and equivalence relations. With the single binary relation symbol \(R\), many natural classes of structures are FOL-definable.
  • Total (linear) orders: require (1) totality \(\forall x\,\forall y\,(R(x,y) \lor R(y,x))\), (2) transitivity \(\forall x\,\forall y\,\forall z\,(R(x,y) \land R(y,z) \to R(x,z))\), and (3) antisymmetry \(\forall x\,\forall y\,(R(x,y) \land R(y,x) \to x = y)\). These three sentences, taken together, define the class of all total orders.
  • Orders with a minimum element (e.g., \(\mathbb{N}\) but not \(\mathbb{Z}\)): add \(\exists x\,\forall y\, R(x,y)\).
  • Orders with no maximum element: add \(\forall x\,\exists y\,(R(x,y) \land \neg(x = y))\).
  • Dense orders (e.g., \(\mathbb{Q}\) but not \(\mathbb{Z}\)): add the density axiom \(\forall x\,\forall y\,(R(x,y) \land \neg(x=y) \to \exists z\,(R(x,z) \land R(z,y) \land \neg(x=z) \land \neg(z=y)))\).
  • Equivalence relations: require reflexivity \(\forall x\, R(x,x)\), symmetry \(\forall x\,\forall y\,(R(x,y) \to R(y,x))\), and transitivity. These three sentences define the class of all equivalence relations.
One cannot distinguish \((\mathbb{Q}, <)\) from \((\mathbb{R}, <)\) using FOL sentences over \(\{R\}\). Both are dense linear orders without endpoints. The property that distinguishes them — completeness in the sense of Dedekind cuts (every nonempty bounded-above set has a least upper bound) — requires quantification over sets, which is second-order and beyond the reach of first-order logic.

6.5 Isomorphism and Elementary Equivalence

Two structures may look quite different yet satisfy precisely the same sentences.

Two structures \(\mathcal{M}\) and \(\mathcal{N}\) for the same signature \(\tau\) are isomorphic, written \(\mathcal{M} \cong \mathcal{N}\), if there is a bijection \(f: D^{\mathcal{M}} \to D^{\mathcal{N}}\) that preserves all function and predicate interpretations: \(f(c^{\mathcal{M}}) = c^{\mathcal{N}}\) for all constants, \(f(g^{\mathcal{M}}(a_1,\ldots,a_k)) = g^{\mathcal{N}}(f(a_1),\ldots,f(a_k))\) for all function symbols, and \((a_1,\ldots,a_k) \in P^{\mathcal{M}}\) iff \((f(a_1),\ldots,f(a_k)) \in P^{\mathcal{N}}\) for all predicate symbols.

Two structures are elementarily equivalent, written \(\mathcal{M} \equiv \mathcal{N}\), if they satisfy exactly the same sentences: \(\mathcal{M} \models \varphi\) iff \(\mathcal{N} \models \varphi\) for every sentence \(\varphi\).

Isomorphic structures are elementarily equivalent — their truth values agree on every sentence because the bijection witnesses that they are “the same structure up to renaming of elements.” The converse fails: elementarily equivalent structures need not be isomorphic. For example, the standard model \(\mathbb{N}\) and any non-standard model of Peano arithmetic (as constructed in Chapter 9 via Compactness) are elementarily equivalent (they satisfy the same FOL sentences) but not isomorphic (non-standard models contain infinite elements that \(\mathbb{N}\) does not).

This gap — between structural identity (isomorphism) and logical indistinguishability (elementary equivalence) — is one of the central themes of model theory. First-order logic cannot “see” the difference between \(\mathbb{N}\) and its non-standard extensions, because such differences are not expressible in first-order sentences.

The rational and real orders. The ordered structure \((\mathbb{Q}, <)\) and the structure \((\mathbb{R}, <)\) are both dense linear orders without endpoints. It can be shown that any two countable dense linear orders without endpoints are isomorphic (Cantor's theorem on dense linear orders). Since \(\mathbb{Q}\) is countable, \((\mathbb{Q}, <)\) is the unique countable model (up to isomorphism) of the theory of dense linear orders without endpoints, and this theory is complete.

Chapter 7: Theories and Formal Proofs

7.1 Mathematical Theories as Axiom Systems

One of the primary uses of first-order logic is to axiomatize mathematical theories: to write down a finite (or recursive) set of sentences that completely characterize a class of mathematical structures. The models of these axioms are exactly the objects of interest.

This approach has a profound methodological advantage. Once we have formalized a theory as a set of axioms, we can use the FOL proof system to derive theorems mechanically — without relying on informal mathematical intuition — and use model-theoretic methods to show that certain statements are independent of the axioms (true in some models but false in others).

Two of the most important theories in mathematics — group theory and Peano arithmetic — serve as running examples throughout this chapter.

7.2 Group Theory

Group theory concerns algebraic structures with a binary operation, an identity element, and inverses. The first-order axioms for groups use signature \(\tau_G = \{e, \cdot\}\) where \(e\) is a constant and \(\cdot\) is a binary function symbol.

The group theory axioms \(\Gamma_G\) are the following three sentences:
  • Associativity: \(\forall x\, \forall y\, \forall z\, (x \cdot (y \cdot z)) = ((x \cdot y) \cdot z)\)
  • Identity: \(\forall x\, ((x \cdot e) = x \land (e \cdot x) = x)\)
  • Inverse: \(\forall x\, \exists y\, ((x \cdot y) = e \land (y \cdot x) = e)\)
Every model of \(\Gamma_G\) is a group. The theory of \(\Gamma_G\) consists of all FOL sentences true in every group.

Notice that commutativity (\(\forall x\, \forall y\, (x \cdot y) = (y \cdot x)\)) is not an axiom — it holds in abelian groups but not in general groups (the symmetric group \(S_3\) is a counterexample). To prove something is in the theory of \(\Gamma_G\), we must derive it from the three axioms without assuming commutativity.

Group theory illustrates the value of first-order axiomatization: by writing down exactly three axioms, we capture all groups simultaneously — from the trivial one-element group to infinite matrix groups. Any sentence derivable from \(\Gamma_G\) is a theorem of group theory, valid in every group. Any sentence neither derivable nor refutable from \(\Gamma_G\) is independent of group theory — true in some groups and false in others.

Idempotent elements are the identity. \(\Gamma_G \models \forall x\, ((x \cdot x) = x \to x = e)\).
Assume \(a \cdot a = a\) for some element \(a\). By the inverse axiom, there exists \(b\) such that \(a \cdot b = e\) and \(b \cdot a = e\). Multiply both sides of \(a \cdot a = a\) on the right by \(b\): \[ (a \cdot a) \cdot b = a \cdot b. \] By associativity, \(a \cdot (a \cdot b) = a \cdot b\), so \(a \cdot e = e\), giving \(a = e\) by the identity axiom.
Left cancellation. \(\Gamma_G \models \forall x\, \forall y\, \forall z\, ((x \cdot y) = (x \cdot z) \to y = z)\).
Suppose \(x \cdot y = x \cdot z\). Let \(w\) be a left inverse of \(x\) (existing by the inverse axiom: \(w \cdot x = e\)). Multiply on the left by \(w\): \[ w \cdot (x \cdot y) = w \cdot (x \cdot z). \] Associativity gives \((w \cdot x) \cdot y = (w \cdot x) \cdot z\), so \(e \cdot y = e \cdot z\), hence \(y = z\).

7.3 Peano Arithmetic

The natural numbers — the most fundamental mathematical object — can be axiomatized in first-order logic. The Peano axioms use signature \(\tau_{\mathrm{PA}} = \{0, S, +, \times\}\).

The Peano axioms \(\Gamma_{\mathrm{PA}}\) consist of six base axioms and an infinite induction schema:
  1. \(\forall x\, \neg(S(x) = 0)\) — zero is not a successor;
  2. \(\forall x\, \forall y\, (S(x) = S(y) \to x = y)\) — the successor function is injective;
  3. \(\forall x\, (x + 0) = x\) — zero is the additive identity (right);
  4. \(\forall x\, \forall y\, (x + S(y)) = S(x + y)\) — recursive definition of addition;
  5. \(\forall x\, (x \times 0) = 0\) — multiplication by zero;
  6. \(\forall x\, \forall y\, (x \times S(y)) = ((x \times y) + x)\) — recursive definition of multiplication.
The induction schema adds, for every formula \(\varphi(y)\) with free variable \(y\) (possibly with other free variables \(x_1,\ldots,x_n\)): \[ \forall x_1\,\cdots\,\forall x_n\,\bigl((\varphi(0) \land \forall y\,(\varphi(y) \to \varphi(S(y)))) \to \forall y\,\varphi(y)\bigr). \]

The induction schema is not a single axiom but an infinite family — one for each formula \(\varphi\). This is necessary: no finite list of axioms can capture the full strength of mathematical induction for all properties expressible in FOL.

Axioms (3)–(6) are recursive definitions: addition is defined by its action on 0 and successors, and similarly for multiplication. From these, a rich arithmetic can be derived. For example, from the axioms alone one can prove \(\forall x\, (0 + x) = x\) — though this requires induction since axiom (3) only gives the right-identity, and the left-identity is not immediate.

Deriving \(\Gamma_{\mathrm{PA}} \models \forall x\, (0 + x) = x\). This is the first of eight natural exercises in Peano arithmetic. Apply the induction schema with \(\varphi(y) = (0 + y = y)\):
  • Base case \(\varphi(0)\): \(0 + 0 = 0\) follows from axiom (3) with \(x = 0\).
  • Inductive step: Assume \(0 + n = n\). Then \(0 + S(n) = S(0 + n)\) by axiom (4), which equals \(S(n)\) by the inductive hypothesis.
By the induction schema, \(\forall x\, (0 + x) = x\).

The remaining seven exercises in the sequence (commutativity of addition, left-zero for multiplication, commutativity of multiplication, distributivity, and cancellation) each use induction and earlier results as lemmas. They form a complete derivation of the basic ring axioms for \(\mathbb{N}\) from the Peano axioms alone.

7.3.1 Other Important Theories

The group theory and Peano arithmetic examples are particular instances of a general method. Here we briefly indicate some other important theories expressible in FOL.

Linear orders. A linear order (or total order) is a structure \((D, <)\) where \(<\) is a binary predicate satisfying irreflexivity (\(\forall x\, \neg(x < x)\)), transitivity (\(\forall x\, \forall y\, \forall z\, (x < y \land y < z \to x < z)\)), and totality (\(\forall x\, \forall y\, (x < y \lor x = y \lor y < x)\)). Examples include \((\mathbb{N}, <)\), \((\mathbb{Z}, <)\), \((\mathbb{Q}, <)\), \((\mathbb{R}, <)\). Adding the axiom “between any two elements there is another” gives dense linear orders; adding “no minimum or maximum” gives dense linear orders without endpoints (satisfied by \(\mathbb{Q}\) and \(\mathbb{R}\)).

Equivalence relations. A binary predicate \(\sim\) is an equivalence relation if it satisfies reflexivity, symmetry, and transitivity. Any two countable equivalence relations with the same number of classes of each finite size and the same number of infinite classes are isomorphic — a result provable model-theoretically.

Fields and rings. Field theory uses signature \(\{0, 1, +, -, \cdot, {}^{-1}\}\) with axioms for addition, multiplication, and inverses. The theory of algebraically closed fields of characteristic 0 (axiomatized by the field axioms plus axioms asserting that every polynomial has a root) is complete — this is a theorem of Chevalley and Tarski.

These examples illustrate the reach of first-order axiomatization. The model theory of these theories — which structures they have, when two structures are elementarily equivalent, which theories are complete or decidable — is a rich and active area at the interface of algebra, geometry, and logic.

7.4 The Natural Deduction Rules for FOL

The propositional natural deduction rules extend to FOL with six new rules — two each for universal and existential quantifiers, and two for identity — plus the carry-over of all fourteen propositional rules.

The FOL natural deduction rules for quantifiers and identity are:
  • Universal Elimination (\(\forall\)E \(m\)): From \(\forall x\,\varphi\) on line \(m\), derive \(\varphi[x/t]\) for any term \(t\) that does not introduce a bound variable capture when substituted.
  • Universal Introduction (\(\forall\)I \(m\)): From \(\varphi\) containing a free variable \(v\) on line \(m\), derive \(\forall x\,\varphi[v/x]\), provided \(v\) does not appear in any premise or undischarged assumption. The variable \(v\) is called the eigenvariable of this inference.
  • Existential Introduction (\(\exists\)I \(m\)): From \(\varphi\) on line \(m\) containing term \(t\), derive \(\exists x\,\varphi[t/x]\) for a fresh variable \(x\) not in \(\varphi\).
  • Existential Elimination (\(\exists\)E \(i, j\text{-}k\)): From \(\exists x\,\varphi\) on line \(i\) and a subproof with assumption \(\varphi[x/v]\) (fresh eigenvariable \(v\)) deriving \(\psi\) (lines \(j\)–\(k\)), derive \(\psi\), provided \(v\) does not appear in \(\varphi\) or \(\psi\).
  • Identity Introduction (\(=\)I): At any line, derive \(t = t\) for any term \(t\).
  • Identity Elimination (\(=\)E \(k, l\)): From \(t = u\) on line \(k\) and formula \(\varphi\) on line \(l\), derive \(\varphi[t/u]\) (replacing some occurrences of \(t\) with \(u\) or vice versa).

The eigenvariable conditions on \(\forall\)I and \(\exists\)E are crucial. For \(\forall\)I: if the variable \(v\) appeared in a premise, then deriving \(\forall x\,\varphi[v/x]\) would universally generalize over a specific assumed constant — invalid. The freshness condition ensures we are truly reasoning about “an arbitrary \(v\),” just as in informal mathematical proofs (“let \(n\) be an arbitrary natural number”). For \(\exists\)E: we discharge the assumption that some specific witness satisfies \(\varphi\), deriving a conclusion \(\psi\) that does not mention the witness — just as in “since there exists an \(n\) satisfying \(P(n)\), call it \(a\); … thus \(\psi\).”

Comparing \(\forall\)I and \(\exists\)E. Both rules use an eigenvariable, and both enforce a freshness condition. The dual roles are worth spelling out. In \(\forall\)I, we have derived \(\varphi\) for an arbitrary variable \(v\) (which must not appear in any open assumption — it is genuinely arbitrary, not a specific element), and we generalize to \(\forall x\,\varphi[v/x]\). In \(\exists\)E, we know \(\exists x\,\varphi\) — some witness exists — and we introduce a fresh name \(v\) for "that witness," derive a conclusion \(\psi\) that does not mention \(v\) (so \(\psi\) does not depend on the specific witness), and discharge the assumption. Both are analogues of standard mathematical language: \(\forall\)I is "let \(v\) be arbitrary" and \(\exists\)E is "let \(v\) be a witness for the existential."
De Morgan for quantifiers: \(\{\forall x\, \neg\varphi\} \vdash \neg\exists x\, \varphi\).
LineFormulaJustification
1\(\forall x\, \neg\varphi\)PR
2\(\quad \exists x\, \varphi\)AS
3\(\quad\quad \varphi[x/a]\)AS (eigenvariable \(a\))
4\(\quad\quad \neg\varphi[x/a]\)\(\forall\)E 1
5\(\quad\quad \bot\)\(\neg\)E 3, 4
6\(\quad \bot\)\(\exists\)E 2, 3–5
7\(\neg\exists x\, \varphi\)\(\neg\)I 2–6

The four De Morgan laws for quantifiers — \(\forall x\, \neg\varphi \equiv \neg\exists x\, \varphi\), \(\neg\forall x\, \varphi \equiv \exists x\, \neg\varphi\), and their converses — are all derivable and form the quantifier-level analogues of the propositional De Morgan laws.

7.5 More Worked FOL Proofs

The following proofs illustrate the interaction of identity, universal and existential quantifiers, and standard propositional rules.

Symmetry of identity: \(\vdash \forall x\, \forall y\, (x = y \to y = x)\).
LineFormulaJustification
1\(\quad a = b\)AS (eigenvariable \(a\), \(b\))
2\(\quad a = a\)\(=\)I
3\(\quad b = a\)\(=\)E 1, 2 (replacing \(a\) with \(b\) in line 2)
4\((a = b \to b = a)\)\(\to\)I 1–3
5\(\forall y\, (a = y \to y = a)\)\(\forall\)I 4 (variable \(b\))
6\(\forall x\, \forall y\, (x = y \to y = x)\)\(\forall\)I 5 (variable \(a\))

In step 3, we apply \(=\)E to the equation \(a = b\) (line 1) and the reflexivity instance \(a = a\) (line 2), replacing the second occurrence of \(a\) with \(b\) to obtain \(b = a\).

Transitivity of identity: \(\vdash \forall x\, \forall y\, \forall z\, ((x = y \land y = z) \to x = z)\). Assume \(a = b\) and \(b = c\). From \(a = b\) and \(b = c\), apply \(=\)E to \(a = b\): replace \(b\) in \(b = c\) to get \(a = c\). Close with \(\to\)I and three applications of \(\forall\)I.
Existence from a specific instance: \(\{P(t)\} \vdash \exists x\, P(x)\) for any closed term \(t\). Apply \(\exists\)I to line 1 (premise \(P(t)\)), substituting \(t\) for a fresh variable \(x\), to obtain \(\exists x\, P(x)\). This one-line derivation (beyond the premise) is the simplest use of \(\exists\)I and formalizes the obvious fact that a specific example witnesses an existential claim.
Universal specification and instantiation: From \(\forall x\, \forall y\, E(x,y)\) derive \(E(a,b)\) for any terms \(a\), \(b\). Apply \(\forall\)E twice: first to get \(\forall y\, E(a,y)\), then again to get \(E(a,b)\).

These examples, though simple, illustrate the compositional nature of FOL proofs: quantifier rules handle the quantifier layer, propositional rules handle the boolean layer, and identity rules handle equality. The eigenvariable discipline is what keeps universal and existential quantifiers from interfering with each other.

7.6 From Group Theory Proofs to Arithmetic

The group theory propositions proved in Section 7.2 illustrate an important distinction between syntactic provability and semantic truth. When we proved that \(\Gamma_G \models \forall x\, ((x \cdot x) = x \to x = e)\), we found a semantic argument (an algebraic manipulation valid in every group). But by the Completeness Theorem (Chapter 8), semantic truth and formal provability coincide: the same result holds as \(\Gamma_G \vdash \forall x\, ((x \cdot x) = x \to x = e)\). We can, in principle, write out the explicit natural deduction derivation; we spare the reader that 40-line exercise.

Peano arithmetic is considerably richer. A central technique is arithmetic induction, which maps directly to the FOL induction schema. The schema says: to prove \(\forall y\, \varphi(y)\), it suffices to derive \(\varphi(0)\) (the base case) and \(\forall y\, (\varphi(y) \to \varphi(S(y)))\) (the inductive step). In formal proofs, this is implemented by applying the specific instance of the induction schema for the formula \(\varphi\), followed by \(\forall\)E to instantiate the hypotheses, and \(\to\)E to chain the base case and inductive step.

Addition is commutative: \(\Gamma_{\mathrm{PA}} \vdash \forall x\, \forall y\, (x + y) = (y + x)\). The proof requires two nested inductions: first show \(\forall x\, (x + 0) = (0 + x)\) (using the base case of the outer induction and the induction schema), then show the full commutativity by induction on \(y\), using the inner result as a lemma in each step. This is a standard exercise in formalized arithmetic and illustrates that even simple ring identities require non-trivial inductive arguments within the formal system.

The shift from group theory to Peano arithmetic is a shift in expressive power. Group theory has a complete (in the Gödel sense) theory: there exist computable, consistent, complete axiom systems for the theory of abelian groups. Peano arithmetic does not — as the incompleteness theorems will establish. The extra power of the induction schema, which generates infinitely many axioms, is both what makes Peano arithmetic so expressive and what makes it impossible to axiomatize completely.


Chapter 8: The Completeness Theorem for First-Order Logic

8.1 From PL to FOL Completeness

The FOL Soundness Theorem is proved by the same structural induction argument as for propositional logic, extended with cases for the six new quantifier and identity rules. The key technical complication is that formulas with free variables cannot be directly evaluated in a structure — only sentences can. The fix is to work with variable assignments: a variable assignment \(\alpha\) maps each variable to a domain element, and we extend the evaluation \(\mathrm{val}_{\mathcal{M}, \alpha}(\varphi)\) to cover formulas with free variables by using \(\alpha\) to interpret them.

Soundness Theorem for FOL. For every set of sentences \(\Gamma\) and every sentence \(\varphi\) in a first-order language: if \(\Gamma \vdash \varphi\) then \(\Gamma \models \varphi\).
By induction on formal proofs, extended to cover the six FOL-specific rules. The propositional cases carry over directly. For the new rules:

\(\forall\)E: If \(\mathcal{M} \models \forall x\,\varphi\), then for every domain element \(d\), \(\mathcal{M} \models \varphi[x/d]\). In particular, for any closed term \(t\), setting \(d = \mathrm{val}_{\mathcal{M}}(t)\) gives \(\mathcal{M} \models \varphi[x/t]\).

\(\forall\)I with eigenvariable \(v\): Suppose the step concludes \(\forall x\,\varphi[v/x]\) from \(\varphi\) (where \(v\) does not appear in any undischarged assumption). The induction hypothesis gives \(\Gamma \cup A_k \models \varphi\) where \(v\) is free in \(\varphi\). Since \(v\) does not appear in \(\Gamma\) or any undischarged assumption, for any structure \(\mathcal{M}\) satisfying \(\Gamma \cup A_k\) and any domain element \(d\): the structure \(\mathcal{M}'\) that agrees with \(\mathcal{M}\) except that it interprets \(v\) as \(d\) still satisfies \(\Gamma \cup A_k\) (since \(v\) does not appear there). By the induction hypothesis, \(\mathcal{M}' \models \varphi\), i.e., \(\mathcal{M} \models \varphi[v/d]\). Since \(d\) was arbitrary, \(\mathcal{M} \models \forall x\,\varphi[v/x]\).

\(\exists\)I: From \(\varphi[x/t] \in \Gamma^*\), the term \(t\) witnesses \(\exists x\,\varphi\).

\(\exists\)E with eigenvariable \(v\): Analogous to the \(\forall\)I case: the eigenvariable freshness ensures that the witness \(v\) does not escape into the conclusion \(\psi\).

\(=\)I: \(t = t\) is trivially valid since every element of the domain equals itself.

\(=\)E: Leibniz’s law — substituting equals for equals preserves truth — is exactly what the semantics of equality dictates.

The Completeness Theorem for FOL is considerably deeper than its propositional counterpart and was the culminating result of Gödel’s 1929 doctoral dissertation.

Completeness Theorem for FOL (Gödel 1930). For every set of sentences \(\Gamma\) and every sentence \(\varphi\): if \(\Gamma \models \varphi\) then \(\Gamma \vdash \varphi\).

Together with Soundness, this gives \(\Gamma \vdash \varphi \Leftrightarrow \Gamma \models \varphi\) for FOL. The proof strategy is the same as for propositional logic — extend \(\Gamma\) to a maximally consistent set and read off a satisfying structure — but with two major complications that require new ideas.

8.2 Maximal Consistency and Saturation

In the propositional case, maximally consistent sets were satisfiable because we could define a truth assignment directly from the set: \(v(p_i) = \mathbf{T}\) iff \(p_i \in \Gamma^*\). In FOL, the analogous approach would assign domain elements to constant symbols. But what are the domain elements? If the signature is finite, the term model only has finitely many constants — not enough to satisfy existential statements like \(\exists x\, P(x)\) in a signature without constants.

Why saturation is needed: the propositional analogy breaks down. In propositional logic, a maximally consistent set \(\Gamma^*\) over variables \(p_1, p_2, \ldots\) is automatically "complete": for every propositional variable \(p_i\), either \(p_i \in \Gamma^*\) or \(\neg p_i \in \Gamma^*\). This lets us define the truth assignment \(v(p_i) = \mathbf{T}\) iff \(p_i \in \Gamma^*\), and verify that \(v\) satisfies all of \(\Gamma^*\) by structural induction.

In FOL, maximal consistency still guarantees: for every sentence \(\varphi\), either \(\varphi \in \Gamma^*\) or \(\neg\varphi \in \Gamma^*\). But the structural induction hits a wall at existential sentences. If \(\exists x\, P(x) \in \Gamma^*\), we need some specific element \(d\) in the domain with \(P(d)\) true. In the propositional proof, the domain elements were the truth values \(\mathbf{T}\) and \(\mathbf{F}\). In FOL, the domain needs to contain actual objects, and the only objects named in the language are the closed terms. So we need the existential statement to be witnessed by an actual term in the signature. Without saturation, this may fail.

Concretely: suppose \(\Gamma^*\) is maximally consistent but the signature has no constants at all. Then the term domain is empty (there are no closed terms), and the term model cannot be constructed. Or suppose the signature has only the constant \(a\), and \(\Gamma^*\) contains \(\exists x\,(P(x) \land \neg P(a))\) — then the witness cannot be \(a\), but there is no other constant to name it. Saturation resolves this by guaranteeing that for every existential formula, a named constant exists in the (extended) signature to serve as a witness.

The solution introduces a new property that maximally consistent FOL sets must additionally satisfy.

A set \(\Gamma\) of sentences over a signature \(\tau\) is saturated (also called a Henkin set) if for every formula \(\varphi(x)\) with one free variable \(x\), there exists a constant \(c \in \tau\) such that the sentence \((\exists x\,\varphi \to \varphi[x/c])\) belongs to \(\Gamma\).

Saturation provides Henkin witnesses: whenever \(\Gamma\) contains the existential sentence \(\exists x\,\varphi\), it also contains \(\varphi[x/c]\) for some specific constant \(c\). This is the FOL analogue of having, for every provable existential claim, an actual named witness for it.

The key result is:

Proposition. Every maximally consistent and saturated set \(\Gamma^*\) (over a signature that includes sufficiently many constants) is satisfiable. The satisfying structure is the term model \(\mathcal{M}^*\): domain = set of closed terms, with \(c^{\mathcal{M}^*} = c\) and \(f^{\mathcal{M}^*}(t_1,\ldots,t_n) = f(t_1,\ldots,t_n)\).

The proof is by induction on formula structure, using the maximality of \(\Gamma^*\) at each connective step and the saturation property at each existential step. Let us trace through the key cases to see how the properties interact.

Proof that \(\mathcal{M}^* \models \varphi\) iff \(\varphi \in \Gamma^*\), by structural induction on sentences \(\varphi\) (for the case without identity).

Atomic case. For \(P(t_1,\ldots,t_n)\): the term model makes this true iff the tuple \((t_1,\ldots,t_n) \in P^{\mathcal{M}^*}\), which by definition of the term model holds iff \(P(t_1,\ldots,t_n) \in \Gamma^*\).

Negation. \(\neg\varphi \in \Gamma^*\) iff \(\varphi \notin \Gamma^*\) (by Proposition 1, the maximally consistent analogue of Lemma 2 from Chapter 4), iff \(\mathcal{M}^* \not\models \varphi\) (by induction), iff \(\mathcal{M}^* \models \neg\varphi\).

Conjunction. \((\varphi \land \psi) \in \Gamma^*\) iff both \(\varphi \in \Gamma^*\) and \(\psi \in \Gamma^*\) (derivable from the \(\land\) rules and maximality), iff \(\mathcal{M}^* \models \varphi\) and \(\mathcal{M}^* \models \psi\) (by induction), iff \(\mathcal{M}^* \models (\varphi \land \psi)\).

Existential. \(\exists x\,\varphi \in \Gamma^*\) implies, by saturation, \(\varphi[x/c] \in \Gamma^*\) for some Henkin constant \(c \in \tau'\). By induction, \(\mathcal{M}^* \models \varphi[x/c]\), so the element \(c \in D^{\mathcal{M}^*}\) witnesses \(\mathcal{M}^* \models \exists x\,\varphi\). Conversely, if \(\mathcal{M}^* \models \exists x\,\varphi\), there is some closed term \(t\) with \(\mathcal{M}^* \models \varphi[x/t]\), so by induction \(\varphi[x/t] \in \Gamma^*\), and \(\exists\)I gives \(\exists x\,\varphi \in \Gamma^*\) (since \(\Gamma^*\) is closed under provable consequences by maximality).

Universal. \(\forall x\,\varphi \in \Gamma^*\) iff for every closed term \(t\), \(\varphi[x/t] \in \Gamma^*\) (by maximality and \(\forall\)E/\(\forall\)I), iff for every domain element \(t \in D^{\mathcal{M}^*}\), \(\mathcal{M}^* \models \varphi[x/t]\) (by induction), iff \(\mathcal{M}^* \models \forall x\,\varphi\).

One subtlety arises with equality: if \(\Gamma^*\) contains \(t_1 = t_2\) for two distinct terms, we must identify them in the domain. The factored term model \(\mathcal{M}^*_{\Gamma}\) quotients the term domain by the equivalence relation \(t \sim t'\) iff \((t = t') \in \Gamma^*\). One must check that \(\sim\) is indeed an equivalence relation: reflexivity holds because \(\Gamma^*\) contains \(t = t\) for every term (derivable by \(=\)I); symmetry and transitivity hold from the identity rules \(=\)E. After quotienting, equality in the model corresponds exactly to equality sentences in \(\Gamma^*\), and the induction goes through with an analogous argument using the equivalence classes \([t]\) as domain elements.

The FOL satisfiability proposition then states: every maximally consistent and saturated \(\Gamma^*\) has the factored term model \(\mathcal{M}^*_\Gamma\) as a model, and \(\mathcal{M}^*_\Gamma \models \varphi\) iff \(\varphi \in \Gamma^*\) for every sentence \(\varphi\).

A concrete term model. To make the construction tangible, consider the consistent set \(\Gamma = \{\forall x\,\exists y\, R(x,y),\ \neg R(c_0, c_0)\}\) over the signature \(\{R, c_0\}\) (a binary relation and one constant). After the Henkin extension adds constants \(c_1, c_2, \ldots\) and the Lindenbaum extension makes the set maximally consistent and saturated, we obtain \(\Gamma^*\). Among the sentences in \(\Gamma^*\):
  • \(\forall x\, \exists y\, R(x,y)\) is present (from \(\Gamma\)).
  • Since \(\exists y\, R(c_0, y) \in \Gamma^*\) (by instantiating the universal), saturation gives some Henkin constant \(c_1\) such that \(R(c_0, c_1) \in \Gamma^*\).
  • By the universal again, \(\exists y\, R(c_1, y) \in \Gamma^*\), and saturation gives \(c_2\) with \(R(c_1, c_2) \in \Gamma^*\).
  • And so on: the term model has domain \(\{c_0, c_1, c_2, \ldots\}\) and the relation \(R^{\mathcal{M}^*}\) contains at least all pairs \((c_k, c_{k+1})\).
The resulting structure is an infinite directed graph where each node has at least one outgoing edge — a model of \(\forall x\,\exists y\,R(x,y)\). The Henkin constants \(c_1, c_2, \ldots\) are the "witnesses" that the completeness proof manufactures out of thin air to satisfy the existential claims. Without them, we would have only \(c_0\) in the domain, and \(\forall x\,\exists y\,R(x,y)\) might require \(c_0\) to relate to some element not in the domain.

8.3 The Henkin Construction

The Lindenbaum argument must be enhanced to ensure saturation as well as maximal consistency.

Henkin Extension Lemma. Every consistent set \(\Gamma\) over a signature \(\tau\) can be extended to a maximally consistent and saturated set \(\Gamma^*\) over a larger signature \(\tau' \supseteq \tau\) (obtained by adding countably many new constant symbols, called Henkin constants).
The construction requires careful ordering. Fix a countably infinite supply of fresh constant symbols \(\{c_1, c_2, \ldots\}\) — the Henkin constants — not in \(\tau\). Let \(\tau' = \tau \cup \{c_1, c_2, \ldots\}\). Enumerate all sentences of \(\tau'\) as \(\varphi_1, \varphi_2, \ldots\) (possible since \(\tau'\) is countable).

The construction proceeds in two interleaved phases, using a growing sequence of sets \(\Gamma_0 \subseteq \Gamma_1 \subseteq \cdots\). Start with \(\Gamma_0 = \Gamma\). At stage \(k\):

Phase A (Henkin witness). If \(\varphi_k = \exists x\,\psi\), pick the next unused Henkin constant \(c_k\) and add \((\exists x\,\psi \to \psi[x/c_k])\) to \(\Gamma_k\). This is the Henkin axiom for \(\varphi_k\); it ensures saturation. The set remains consistent: adding a conditional \((\alpha \to \beta)\) can only cause inconsistency if \(\alpha\) is already derivable. But if \(\exists x\,\psi\) is not yet in \(\Gamma_k\), the conditional is vacuously consistent. If \(\exists x\,\psi\) is in \(\Gamma_k\), then picking \(c_k\) as a witness must be consistent (by a brief argument: if \(\Gamma_k \cup \{(\exists x\,\psi \to \psi[x/c_k])\} \vdash \bot\), then since \(c_k\) is fresh it can be renamed to any other variable, giving \(\Gamma_k \vdash \neg\exists x\,\psi\), contradicting the consistency of \(\Gamma_k\)).

Phase B (Maximal extension). If \(\Gamma_k \cup \{\varphi_k\}\) is consistent, add \(\varphi_k\). Otherwise, keep \(\Gamma_k\) unchanged.

Set \(\Gamma^* = \bigcup_{k=0}^\infty \Gamma_k\). Maximally consistency follows by the same argument as the Lindenbaum lemma (proofs are finite, so any derivation of \(\bot\) uses only finitely many premises from some finite stage). Saturation holds because for every formula \(\exists x\,\psi\), the corresponding Henkin axiom was added in Phase A.

The Henkin construction is the culmination of a long proof chain: Lindenbaum’s lemma (propositional completeness) → Lindenbaum + Henkin (FOL completeness). What changes between the propositional and first-order cases is precisely the need to manage witnesses: in propositional logic, a maximally consistent set is automatically “saturated” because there are no existential formulas; in FOL, we must manually ensure that every existential claim has a concrete witness.

Why fresh constants preserve consistency. The key step in Phase A is: if \(\Gamma_k\) is consistent and we add the Henkin axiom \(\exists x\,\psi \to \psi[x/c_k]\) for a fresh constant \(c_k\), the result is still consistent. Suppose for contradiction that \(\Gamma_k \cup \{\exists x\,\psi \to \psi[x/c_k]\} \vdash \bot\). Since the inconsistency requires the antecedent to be derivable, we get \(\Gamma_k \vdash \exists x\,\psi\) and then \(\Gamma_k \cup \{\psi[x/c_k]\} \vdash \bot\). But \(c_k\) does not appear anywhere in \(\Gamma_k\). We can replace every occurrence of \(c_k\) in the proof with a fresh variable \(v\), obtaining \(\Gamma_k \cup \{\psi[x/v]\} \vdash \bot\). Since \(\Gamma_k \vdash \exists x\,\psi\) and any witness \(v\) for \(\psi\) leads to contradiction, the \(\exists\)E rule gives \(\Gamma_k \vdash \bot\), contradicting the consistency of \(\Gamma_k\). This "fresh constant may be renamed to a fresh variable" step is the heart of every Henkin argument.

8.4 Completing the Proof

Proof of FOL Completeness Theorem. Suppose \(\Gamma \models \varphi\). Then \(\Gamma \cup \{\neg\varphi\}\) is unsatisfiable. By the contrapositive of the Henkin Extension Lemma and the Satisfiability Proposition: \(\Gamma \cup \{\neg\varphi\}\) is inconsistent, i.e., \(\Gamma \cup \{\neg\varphi\} \vdash \bot\). By RAA, \(\Gamma \vdash \varphi\).
Gödel's dissertation. Kurt Gödel defended his doctoral thesis at the University of Vienna in November 1929, at age 23. The core result — the completeness of first-order logic — established that the semantic notion of logical consequence (\(\models\)) and the syntactic notion of provability (\(\vdash\)) coincide for first-order logic. This was far from obvious: \(\models\) is defined by quantifying over all possible structures (a highly semantic, infinitary condition), while \(\vdash\) refers to finite derivations by mechanical rules. That they agree is a deep fact about the relationship between syntax and semantics.

The proof technique Gödel used — constructing a model from a syntactically consistent set of formulas — introduced what is now called the Henkin construction (though Henkin’s 1949 paper gave the cleanest version). This technique recurs throughout model theory: to show a consistent theory has a model, build one from the theory’s own terms.

It is worth emphasizing what completeness does not say: it does not mean every true sentence has a proof. “True” presupposes a fixed structure (e.g., the standard model of arithmetic), and Gödel’s First Incompleteness Theorem will show that there are true arithmetic sentences with no proof in Peano arithmetic. The completeness theorem says that every logically valid sentence — true in all structures — has a proof.


Chapter 9: Compactness and Model Theory

9.1 The Compactness Theorem

The Completeness Theorem has a beautiful consequence that would be difficult to prove by direct model-theoretic means.

Compactness Theorem. A set of sentences \(\Gamma\) is satisfiable if and only if every finite subset of \(\Gamma\) is satisfiable.
The non-trivial direction is: if every finite subset of \(\Gamma\) is satisfiable, then \(\Gamma\) is satisfiable. Suppose for contradiction that \(\Gamma\) is unsatisfiable. By the Completeness Theorem, \(\Gamma \vdash \bot\). But any formal proof uses only finitely many lines, hence only finitely many premises from \(\Gamma\). So there is a finite subset \(\Gamma_0 \subseteq \Gamma\) with \(\Gamma_0 \vdash \bot\). By Soundness, \(\Gamma_0\) is unsatisfiable — contradicting our assumption.

Compactness is named by analogy with topological compactness: just as a topological space is compact if every open cover has a finite subcover, a set of sentences is satisfiable iff it is “finitely satisfiable.” The theorem reveals something deep about first-order logic: satisfiability is a finitary property. No matter how many sentences you add, if any finite collection can be satisfied, the whole collection can be.

An equivalent formulation is: if \(\Gamma \models \varphi\), then there is a finite \(\Gamma_0 \subseteq \Gamma\) with \(\Gamma_0 \models \varphi\). This follows because \(\Gamma \models \varphi\) iff \(\Gamma \cup \{\neg\varphi\}\) is unsatisfiable iff some finite subset \(\Gamma_0 \cup \{\neg\varphi\}\) is unsatisfiable (by Compactness) iff \(\Gamma_0 \models \varphi\).

The Compactness Theorem provides one half of the explanation for why FOL cannot express certain properties: if a property requires checking infinitely many conditions simultaneously in an irreducible way, FOL (which only sees finite proofs and finite conjunctions of axioms) cannot capture it exactly.

9.2 Non-Standard Models

Compactness is a powerful tool for constructing models with surprising properties — models that satisfy all the axioms of a theory but contain elements no finite axiom system can “forbid.”

Non-standard models of Peano arithmetic. Consider the theory \(\Gamma_{\mathrm{PA}}\) of Peano arithmetic, together with a new constant symbol \(c\) and the infinite set of sentences: \[ \Delta = \{\neg(c = 0),\ \neg(c = S(0)),\ \neg(c = S(S(0))),\ \ldots\} \] Each finite subset of \(\Gamma_{\mathrm{PA}} \cup \Delta\) is satisfiable: take the standard model \(\mathbb{N}\) and interpret \(c\) as a sufficiently large standard natural number. By Compactness, \(\Gamma_{\mathrm{PA}} \cup \Delta\) itself is satisfiable. The resulting model satisfies all Peano axioms but contains an element \(c\) that is greater than every standard natural number — an infinite number beyond all finite ones.

This shows that Peano arithmetic cannot be categorical (have only one model up to isomorphism). \(\mathbb{N}\) is just one of infinitely many models of \(\Gamma_{\mathrm{PA}}\).

Why introducing a new constant is essential. It is instructive to ask what goes wrong if we try to prove non-standard model existence without adding the constant \(c\). Suppose instead we write, for each \(n\), the sentence \(\varphi'_n\): "there exist \(y, x_1, \ldots, x_n\) all distinct with \(R(x_i, y)\) for each \(i\)." The set \(\Gamma_{\mathrm{PA}} \cup \{\varphi'_n : n \geq 1\}\) is satisfied by the standard natural numbers: for any particular \(\varphi'_n\), pick \(y = n{+}1\) and use \(x_i = i{-}1\). The standard model satisfies every sentence in the set because for each \(\varphi'_n\) we are free to choose a different witness \(y\). No non-standard element is produced.

Introducing the constant \(c\) forces a single fixed name to serve all sentences at once. The element named by \(c\) must have at least \(n\) predecessors for every \(n\), which no standard natural number can satisfy. For each finite sub-collection \(\{\varphi_1, \ldots, \varphi_k\}\) we can interpret \(c\) as the standard number \(k{+}1\), satisfying all of them. Compactness then guarantees a model for the whole infinite collection — and in that model, the interpretation of \(c\) is necessarily non-standard.

A related subtlety: even if we enrich the language by naming every natural number — adding constant symbols \(a_0, a_1, a_2, \ldots\) for each \(n \in \mathbb{N}\) — non-standard models still arise. FOL cannot express “every element equals some \(a_n\),” because that would require the infinite disjunction \((x = a_0) \lor (x = a_1) \lor \cdots\), which is not a well-formed formula. Non-standard elements always creep in because the language is inherently finite.

The geometry of non-standard models of arithmetic is well understood: they look like a copy of \(\mathbb{N}\) followed by a dense linear order of “blocks,” each block internally ordered like \(\mathbb{Z}\) (stretching to \(+\infty\) and \(-\infty\) within the block). The blocks can be indexed by any dense linear order without endpoints — such as \(\mathbb{Q}\) or \(\mathbb{R}\). No first-order sentence can distinguish these from the standard model, because every first-order truth of \(\mathbb{N}\) holds in all of them.

Non-standard models of arithmetic have been used productively in several areas. Abraham Robinson’s non-standard analysis uses analogous models of real number theory to rigorously justify infinitesimals — the “infinitely small” quantities that Newton and Leibniz used intuitively but could not formalize until the 19th century. The compactness theorem shows their existence is logically consistent with the standard axioms.

9.3 More Compactness Applications

The compactness theorem has many applications beyond non-standard models. Here are two that illustrate its power in different directions.

Graph colorability. A graph is \(k\)-colorable if its vertices can be colored with \(k\) colors such that no two adjacent vertices share a color. Consider the theory \(\Gamma\) containing:
  • For each vertex \(v\): the formula \(\bigvee_{i=1}^k C_i(v)\) (every vertex gets some color);
  • For each edge \((u,v)\): the formula \(\bigwedge_{i=1}^k \neg(C_i(u) \land C_i(v))\) (adjacent vertices differ);
plus axioms forcing \(C_1,\ldots,C_k\) to be a partition of vertices. A (possibly infinite) graph is \(k\)-colorable iff every finite subgraph is \(k\)-colorable, by Compactness. This is the de Bruijn–Erdős theorem: a graph is \(k\)-colorable iff every finite subgraph is \(k\)-colorable.
Ruling out infinite models by finite axiom sets. Can a finite set of FOL sentences characterize exactly the finite linear orders? No. If \(\Gamma\) is any theory all of whose models are finite, then by Compactness (contrapositive), there must be some finite upper bound \(N\) on the size of models: if arbitrarily large finite models existed, we could add sentences saying "there are at least \(k\) elements" and find an infinite model. This shows that any theory with only finite models is bounded — no set of sentences can describe "all finite linear orders" without also having an infinite model.
Graph connectivity is not first-order definable. Consider the graph-theoretic language with a single binary relation \(E\) for edges. The family of all connected graphs is not definable by any set of first-order sentences (even an infinite set).

To see why, suppose for contradiction that some set of sentences \(\Gamma_{\mathrm{conn}}\) defined exactly the connected graphs. Extend the language by two new constant symbols \(a\) and \(b\), and let \(\Sigma\) be the set containing, for each \(n\), the sentence \(\psi_n\): “there is no path of length \(n\) from \(a\) to \(b\).” Every finite subset of \(\Sigma\) is satisfiable by a connected graph: just take a long enough path graph and let \(a, b\) be its endpoints. Together with \(\Gamma_{\mathrm{conn}}\), every finite sub-collection is satisfiable by a connected graph in which \(a\) and \(b\) are far apart.

By Compactness, \(\Gamma_{\mathrm{conn}} \cup \Sigma\) is satisfiable. But any satisfying structure must be connected (it satisfies \(\Gamma_{\mathrm{conn}}\)) and yet have no path of any finite length from \(a\) to \(b\) (it satisfies all \(\psi_n\)) — meaning \(a\) and \(b\) are in different connected components. Contradiction. So \(\Gamma_{\mathrm{conn}}\) cannot exist.

This argument is a template for many inexpressibility proofs: introduce two named points, write an infinite set of sentences asserting they are “far apart,” and use compactness to get a disconnected structure while simultaneously satisfying the property that was supposed to forbid it. The key is that each finite fragment only sees a finite distance requirement, which a connected graph can satisfy by placing the two points far but finitely apart.

Compactness as the price of having a good proof system. Compactness may look like an unfortunate limitation — it stops FOL from expressing finiteness, connectivity, and well-foundedness. But it is in fact unavoidable: it is a theorem that any proof system with three natural properties (soundness, completeness, and finite proofs) must satisfy the compactness theorem.

The argument is short. Suppose \(\Delta\) is such a proof system, and suppose every finite subset of \(\Gamma\) is satisfiable. We want to show \(\Gamma\) is satisfiable. Define \(\Gamma\) to be consistent if it does not prove both \(\alpha\) and \(\neg\alpha\) for any \(\alpha\). By soundness and completeness, consistent \(\Leftrightarrow\) satisfiable. Now if \(\Gamma\) were inconsistent, there would be a proof of \(\alpha\) and \(\neg\alpha\) from \(\Gamma\). But every proof is finite — it uses only finitely many hypotheses. So there is already a finite \(\Gamma_0 \subseteq \Gamma\) that is inconsistent, hence unsatisfiable. This contradicts our assumption that every finite subset is satisfiable. Therefore \(\Gamma\) is consistent, and by soundness/completeness it is satisfiable.

In short: compactness is not a defect of FOL. It is exactly what you get when proofs are finite and the proof system is sound and complete. Every “limitation” imposed by compactness is really the shadow of the fact that our proofs have finitely many steps.

9.4 The Löwenheim–Skolem Theorems

Compactness also implies constraints on what first-order logic can express about the size of structures.

Downward Löwenheim–Skolem Theorem. If \(\Gamma\) is a satisfiable set of sentences over a countable signature, then \(\Gamma\) has a countable model (a model with domain \(\mathbb{N}\) or finite).
The Downward theorem follows directly from the Henkin construction in Chapter 8. Given a satisfiable \(\Gamma\), extend it to a maximally consistent, saturated \(\Gamma^*\) over a signature \(\tau'\) obtained by adding countably many Henkin constants. The factored term model \(\mathcal{M}^*_\Gamma\) has as its domain the set of equivalence classes of closed terms of \(\tau'\). Since \(\tau'\) is countable (countably many original symbols plus countably many Henkin constants), there are only countably many closed terms, and hence countably many equivalence classes. The resulting model is therefore countable.
Upward Löwenheim–Skolem Theorem. If \(\Gamma\) has an infinite model, then \(\Gamma\) has models of arbitrarily large infinite cardinality.

Together, these theorems reveal a fundamental limitation of FOL: cardinality is not first-order expressible. No set of sentences can have exactly the natural numbers as its models and no others — for any such theory has models of all infinite cardinalities. Similarly, there are no FOL sentences that say “the domain has exactly uncountably many elements” in a way that rules out countable models.

The proof of the Upward Löwenheim–Skolem theorem is a clean application of compactness. Given an infinite model \(\mathcal{M}\) of \(\Gamma\) and a cardinal \(\kappa\), introduce a new set \(C = \{c_\alpha : \alpha < \kappa\}\) of \(\kappa\) many new constant symbols and add the sentences \(\{c_\alpha \neq c_\beta : \alpha \neq \beta\}\). Every finite subset of \(\Gamma \cup \{c_\alpha \neq c_\beta : \alpha \neq \beta\}\) is satisfiable (interpret finitely many distinct constants as distinct elements of \(\mathcal{M}\), which is infinite). By Compactness, the whole collection is satisfiable, giving a model of \(\Gamma\) with at least \(\kappa\) elements.

Skolem's paradox. The Downward Löwenheim–Skolem theorem has a famously paradoxical appearance. Zermelo–Fraenkel set theory (ZFC) proves the existence of uncountable sets. Yet by the theorem, if ZFC is consistent, it has a countable model — a model that, from the outside, appears to be countable, but from inside satisfies the axiom that the real numbers are uncountable. How can this be? The resolution is that "uncountable" is relative to the model: the model contains no bijection between \(\mathbb{N}\) and the "real numbers" of the model, because the required bijection exists outside the model but is not one of the model's sets. First-order logic cannot see outside the model. This illustrates how thoroughly FOL is a local, internal language.

9.5 What First-Order Logic Cannot Express

The Compactness and Löwenheim–Skolem theorems, taken together, sharply delineate what FOL can and cannot say.

What FOL can express:

  • Finite lower bounds: “There are at least \(k\) elements” is expressible as the sentence \(\exists x_1 \cdots \exists x_k \bigwedge_{i \neq j} \neg(x_i = x_j)\).
  • Finite upper bounds: “There are at most \(k\) elements” is expressible as \(\forall x_1 \cdots \forall x_{k+1} \bigvee_{i \neq j} (x_i = x_j)\).
  • That the domain is infinite: the schema \(\{\varphi_k : k \geq 1\}\) where \(\varphi_k\) says “there are at least \(k\) elements” is an infinite set of FOL sentences whose only models are infinite structures.

What FOL cannot express:

  • That the domain is exactly countably infinite: by Upward Löwenheim–Skolem, any theory with a countably infinite model also has models of every larger cardinality.
  • That the domain is finite: any theory with arbitrarily large finite models also has an infinite model (by Compactness: if the theory is consistent with infinitely many “there are at least \(k\) elements” sentences, it has an infinite model).
  • Graph connectivity: the property “the graph \(G\) is connected” (for arbitrary graphs) cannot be expressed by any single FOL sentence or even by a recursively enumerable set of sentences whose models are exactly the connected graphs.
  • Well-foundedness: the property that a linear order has no infinite descending chains cannot be expressed in FOL, since any theory of well-orders also has non-well-founded models by Compactness (add infinitely many \(c_n > c_{n+1}\) sentences).

These inexpressibility results are not failures of particular axiom systems — they are fundamental limitations of first-order logic as a language. Second-order logic can express many of these properties, but at the cost of losing the completeness theorem.


Chapter 10: Axiom Systems and Their Limits

10.1 The Axiomatic Method Revisited

The approach developed in the preceding chapters gives us a powerful toolkit: a formal language (FOL), a proof system (natural deduction), a model theory (structures and satisfaction), and the guarantee of soundness and completeness connecting them. The natural question now is: how far can this toolkit take us?

Any area of mathematics can be approached by selecting a signature and writing down axioms. From those axioms, the proof system generates a theory — the set of all sentences formally provable from the axioms. By completeness, the theory equals the set of all sentences true in every model of the axioms. The dream of Hilbert’s formalization program (circa 1900–1930) was that this process could produce complete, consistent, decidable axiom systems for all of mathematics.

The question of adequacy breaks naturally into four desiderata:

An axiom system \(\Gamma\) for a signature \(\tau\) is said to be:
  • Computable if there is an algorithm that decides, given any sentence \(\varphi\), whether \(\varphi \in \Gamma\);
  • Consistent if \(\Gamma \not\vdash \bot\) (no contradiction is derivable);
  • Sound if every provable sentence is true in the intended model (the "standard" interpretation);
  • Complete if for every sentence \(\varphi\), either \(\Gamma \vdash \varphi\) or \(\Gamma \vdash \neg\varphi\) (every sentence is decided).

Computability is important because the axiom system must be checkable — otherwise a “proof” using a claimed axiom cannot be verified mechanically. Consistency is the minimum for any useful system. Soundness prevents the system from proving falsehoods. Completeness — the ideal where the axioms decide every sentence — is the most stringent and, as it turns out, the most elusive.

10.2 Success Stories

Before confronting the limits, it is worth noting that the four-property wishlist can be achieved for restricted theories.

Presburger arithmetic is the theory of the natural numbers under addition only — the signature \(\{0, S, +\}\) without multiplication. Its axioms are computable, consistent, sound, and complete. Alfred Tarski proved this in 1930 using a technique called quantifier elimination: every formula in Presburger arithmetic can be algorithmically transformed into an equivalent quantifier-free formula, reducing all questions to finite computation.

Tarski’s geometry (Euclidean plane geometry, 1951) is another example. Tarski gave a complete, decidable, consistent axiomatization of Euclidean geometry in first-order logic. This is possible because Euclidean geometry is equivalent (over the reals) to the theory of real-closed fields, which also admits quantifier elimination.

These results confirm that the four-property dream is achievable in certain domains. The question is whether it can be extended to arithmetic — to the theory of the natural numbers under both addition and multiplication, the setting needed to formalize all of elementary mathematics.

The key property behind both successes is quantifier elimination: for every formula \(\varphi(x_1,\ldots,x_n)\) in the theory, there exists a quantifier-free formula \(\psi(x_1,\ldots,x_n)\) that is logically equivalent over all models of the theory. For Presburger arithmetic, this means every statement about natural number addition can be decided by comparing linear integer combinations. For Tarski geometry, it reduces geometric statements to polynomial inequalities over the reals (decided by the theory of real-closed fields).

Decidability is intimately connected to quantifier elimination: a theory with effective quantifier elimination is decidable, because the quantifier-free formulas can be evaluated algorithmically (they involve no quantifiers to range over the domain). The completeness of these theories is then a corollary: once every sentence reduces to a quantifier-free sentence, and quantifier-free sentences are evaluated the same way across all models of the theory, the theory is complete.

What addition alone misses. The boundary between decidability and undecidability in arithmetic is surprisingly sharp. Presburger arithmetic (addition only) is decidable; Robinson arithmetic \(Q\) (a fragment of PA without the induction schema) and full Peano arithmetic (with multiplication) are not. The Matiyasevich–Robinson–Davis–Putnam (MRDP) theorem (1970) shows that every recursively enumerable set of integers is Diophantine — expressible as the solution set of a polynomial equation over the integers. This implies, in particular, that the solvability of arbitrary Diophantine equations (Hilbert's 10th problem) is undecidable.

10.3 Hilbert’s Program and Its Demise

In the late 19th and early 20th centuries, mathematicians discovered that naive set theory contained contradictions (Russell’s paradox, Burali-Forti’s paradox). This prompted a crisis: if the foundations of mathematics were inconsistent, every theorem ever proved might be worthless. Hilbert’s response was the formalization program: find a finite, computable, consistent set of axioms from which all of mathematics could be derived, and then prove the consistency of these axioms by finitary means. If successful, mathematics would rest on an unassailable foundation.

Hilbert was so confident in this program that in 1930, at a conference in Königsberg, he declared: “Wir müssen wissen, wir werden wissen” — “We must know, we will know.” The next day, at the same conference, a young Austrian mathematician named Kurt Gödel announced that the program was impossible. No consistent, computable axiom system that is strong enough to express elementary arithmetic can be complete.

This is the content of the First Incompleteness Theorem, the subject of Chapter 13. But to prove it, we first need to understand Turing machines — the formal model of computation introduced in Chapters 11 and 12. The connection between incompleteness and computability is not incidental: the proof of incompleteness in this course runs through the undecidability of the halting problem.

The historical timing is remarkable. On September 7, 1930, Gödel presented the First Incompleteness Theorem at the Epistemology of Exact Sciences Conference in Königsberg. Hilbert was in the audience for his own retirement address the next day, apparently unaware that his program had just been dealt a fatal blow. John von Neumann, present at the conference, immediately recognized the significance and wrote to Gödel asking for the preprint. Within a few weeks, von Neumann had independently discovered what would become the Second Incompleteness Theorem — though Gödel had already found it and included it in his paper.

The 1930 paper, “On Formally Undecidable Propositions of Principia Mathematica and Related Systems I” (the “I” was a placeholder; Gödel intended a sequel that was never written), appeared in Monatshefte für Mathematik in 1931. At 25 pages, it is one of the most consequential papers in the history of mathematics, placing fundamental and permanent limits on the reach of the axiomatic method.

10.4 The Boundary of Computability and Logic

The relationship between the four properties — computability, consistency, soundness, completeness — is more subtle than it might appear.

Computability vs. completeness for first-order logic as a whole. The FOL proof system is complete for logical validity: every logically valid sentence has a formal proof. But the question “is \(\varphi\) logically valid?” is not decidable — there is no algorithm that, given a sentence, decides validity. What is possible: an algorithm can enumerate all valid sentences (by systematically running all possible proofs) but cannot reliably halt when a sentence is not valid. This places logical validity in the class \(\Sigma_1\) of recursively enumerable but not recursive problems.

The incompleteness hierarchy. Gödel’s result is not an isolated limit but the first of a hierarchy. Starting from \(\Gamma_{\mathrm{PA}}\), we can add \(\mathrm{Con}(\Gamma_{\mathrm{PA}})\) as a new axiom to get a stronger system \(\Gamma_1\). Gödel’s theorem applies again to \(\Gamma_1\), giving a sentence independent of it. Adding that as an axiom gives \(\Gamma_2\), and so on. Any consistent extension of \(\Gamma_{\mathrm{PA}}\) that is computable and TM-expressive is still incomplete. The undecidable sentences form a “forest” of independent statements, none of which can be decided by any finite computable extension.


Chapter 11: Turing Machines

11.1 What Is an Algorithm?

The notion of “computation” or “algorithm” is central to mathematics and computer science, but giving it a mathematically precise definition is non-trivial. Intuitively, an algorithm is a finite, step-by-step procedure that takes an input and either produces an output or runs forever. But what exactly counts as a “step”? Can we add arbitrary real numbers in one step? Can we access infinite memory?

In the 1930s, independently and from different starting points, several mathematicians proposed formal definitions of computability:

  • Alonzo Church (1936): the lambda calculus, a formal system of function abstraction and application;
  • Alan Turing (1936): Turing machines, abstract computing machines with an infinite tape;
  • Emil Post (1936): tag systems and rewriting rules;
  • Stephen Kleene (1936): general recursive functions.

These apparently different definitions turned out to be equivalent — all computing the same class of functions. This convergence strongly suggests that they capture an objective notion, formalized as the Church–Turing thesis.

11.2 The Formal Definition

A Turing machine is a triple \(M = (Q, \Sigma, \delta)\) where:
  • \(Q\) is a finite set of states, including a distinguished Halt state;
  • \(\Sigma\) is the tape alphabet, including a special blank symbol \(\square\);
  • \(\delta: (Q \setminus \{\text{Halt}\}) \times \Sigma \to Q \times \Sigma \times \{L, R, S\}\) is the transition function, mapping a (state, tape-symbol) pair to a new state, a symbol to write, and a head direction (Left, Right, or Stay).

The machine operates on an infinite tape divided into cells. Initially, the input string occupies the leftmost cells and the rest are blank. The machine head starts on the first input cell. At each step, the machine reads the symbol under the head, consults the transition function, writes a new symbol, moves the head, and transitions to a new state. Computation halts when the Halt state is reached; the output is then read from the tape.

Why this definition captures "algorithm." Turing's key insight was that any mechanical computation can be decomposed into steps of this form. The human computer (the original meaning of "computer") reads a symbol from a page, possibly writes or erases a symbol, moves attention left or right, and changes mental state. Turing's machine abstracts this process. The infinite tape is not a realistic feature but a theoretical convenience: it ensures the machine never "runs out" of working space.
A function \(f: \Sigma^* \to \Sigma^*\) is computable if there exists a Turing machine \(M\) such that for every input string \(s \in \Sigma^*\), \(M\) halts and the tape content after halting is \(f(s)\).
Erase: The function that maps any input string to the empty string. Machine: start in state \(q_0\); while the head sees a non-blank symbol, overwrite with \(\square\) and move right; when a blank is seen, halt. This requires only two states (\(q_0\) and Halt) and is straightforwardly computable.

IsEven: Determines whether the length of a binary input is even, returning 1 or 0. Machine: alternate between two states as the head scans right (one state for “seen even many symbols so far,” one for odd), then output appropriately. Also computable.

11.3 The Church–Turing Thesis

The class of computable functions defined by Turing machines agrees with the class defined by lambda calculus, by recursive functions, and by every other reasonable formalization of computation. This is a mathematical theorem about the equivalence of these models. But the Church–Turing thesis makes a stronger, empirical claim:

Church–Turing Thesis. Every function that can be computed by any physically realizable, finite, mechanical procedure is computable by some Turing machine.

The thesis is not a theorem — it is a claim about the physical world and about human intuition regarding computation. It cannot be proved, but it can be (and could in principle be) refuted if someone exhibited a physically realizable computation that no Turing machine can perform. No such refutation has ever been found. The thesis is universally accepted in theoretical computer science and guides the study of computability: to show a function is computable, it suffices to describe an informal algorithm; to show it is not, we must prove no Turing machine can compute it.

11.4 Worked Examples: Step-by-Step Traces

Tracing Turing machine execution step by step builds intuition for what the formalism captures. A configuration of a Turing machine records the current state, the current tape content, and the head position. We write a configuration as \((q, w_L \cdot w_R)\) where \(q\) is the current state, \(w_L\) is the tape content to the left of and under the head, and \(w_R\) is the tape content to the right of the head.

Erase machine. Define machine \(M_{\mathrm{erase}} = (\{q_0, \mathrm{Halt}\}, \{0,1,\square\}, \delta)\) where the transition function \(\delta\) is:
  • \(\delta(q_0, 0) = (\mathrm{Halt}, \square, S)\) — write blank and halt
  • \(\delta(q_0, 1) = (\mathrm{Halt}, \square, S)\) — write blank and halt
  • \(\delta(q_0, \square) = (\mathrm{Halt}, \square, S)\) — already blank, halt

Wait — this only erases one cell. For a full erase, we need to scan right:

  • \(\delta(q_0, 0) = (q_0, \square, R)\) — overwrite 0 with blank, move right
  • \(\delta(q_0, 1) = (q_0, \square, R)\) — overwrite 1 with blank, move right
  • \(\delta(q_0, \square) = (\mathrm{Halt}, \square, S)\) — end of input, halt

Trace on input \(101\):

StepStateTape (head position underlined)
0\(q_0\)101
1\(q_0\)\(\square\)01
2\(q_0\)\(\square\square\)1
3\(q_0\)\(\square\square\square\)\(\square\)
4Halt\(\square\square\square\square\)

After halting, the tape is all blanks. Output: the empty string.

IsEven machine. Define machine \(M_{\mathrm{even}} = (\{q_e, q_o, \mathrm{Halt}\}, \{0,1,\square\}, \delta)\) where:
  • State \(q_e\) = "have seen an even number of symbols so far"
  • State \(q_o\) = "have seen an odd number of symbols so far"
Transitions (for any symbol \(a \in \{0,1\}\)):
  • \(\delta(q_e, a) = (q_o, a, R)\)
  • \(\delta(q_o, a) = (q_e, a, R)\)
  • \(\delta(q_e, \square) = (\mathrm{Halt}, 1, S)\) — even length: output 1
  • \(\delta(q_o, \square) = (\mathrm{Halt}, 0, S)\) — odd length: output 0

Trace on input \(10\) (length 2, should output 1):

StepStateTape (head underlined)
0\(q_e\)10
1\(q_o\)10
2\(q_e\)10\(\square\)
3Halt101 (write 1 in place of \(\square\))

Trace on input \(101\) (length 3, should output 0):

StepStateTape (head underlined)
0\(q_e\)101
1\(q_o\)101
2\(q_e\)101
3\(q_o\)101\(\square\)
4Halt1010 (write 0 at head position)

These examples confirm that even very simple computations — erasing input, checking parity — require deliberate state management. More sophisticated computations (sorting, parsing, arithmetic) are handled by machines with more states, but the same basic step-by-step mechanism.

11.5 Encoding and Multi-Tape Equivalence

Real computations often benefit from multiple tapes (one for input, one for working memory, one for output). A multi-tape Turing machine has \(k \geq 1\) tapes, each with its own head, and transitions based on the tuple of currently read symbols. Multi-tape machines are more convenient to program but no more powerful:

Every function computable by a multi-tape Turing machine is computable by a single-tape Turing machine.

The proof simulates \(k\) tapes on one tape by interleaving the tape contents and using marked symbols to track head positions. The simulation is at most quadratically slower in terms of steps. This and similar equivalences (with random-access machines, with queue automata, with cellular automata) reinforce the Church–Turing thesis: the definition of computability is robust across all reasonable formalizations.


Chapter 12: Undecidability

12.1 Encoding and the Universal Machine

Before asking which functions are not computable, we need a richer picture of what Turing machines can encode. The key observation is that mathematical objects — integers, rationals, graphs, sets, even Turing machines themselves — can be encoded as finite strings over a binary alphabet.

Every Turing machine \(M\) can be described by a finite string \(\langle M \rangle\) that encodes the transition function \(\delta\), the states \(Q\), and the alphabet \(\Sigma\). This encoding is computable and injective (different machines get different strings). The existence of such encodings makes it possible to reason about Turing machines as inputs to other Turing machines.

Universal Turing Machine. There exists a Turing machine \(U\) such that for any machine \(M\) and input \(s\), \(U(\langle M, s \rangle) = M(s)\) (i.e., \(U\) simulates \(M\) on input \(s\)).

The universal machine \(U\) reads the description \(\langle M \rangle\) and uses it to simulate \(M\)’s transitions step by step. This is the theoretical forerunner of the stored-program computer: the observation that a machine can execute arbitrary programs if programs are represented as data. Von Neumann’s architecture — software stored in the same memory as data — is a direct implementation of this idea.

The existence of the universal machine also means that the set of (machine, input) pairs on which some fixed Turing machine halts is recursively enumerable: it can be enumerated by running \(U\) on each pair in a dovetailing order (interleaving the first step of each computation, then the second step of each, etc.). Recursively enumerable sets sit at the boundary of computability: they can be “listed” but not necessarily “decided.”

A set \(S \subseteq \Sigma^*\) is:
  • decidable (also recursive) if its characteristic function is computable — there is a TM that halts and outputs 1 on inputs in \(S\) and 0 on inputs not in \(S\);
  • semi-decidable (also recursively enumerable) if there is a TM that halts and outputs 1 on inputs in \(S\), and either halts with output 0 or loops on inputs not in \(S\).

Every decidable set is semi-decidable, but not vice versa. The set of halting computations \(\{(\langle M \rangle, s) : M \text{ halts on } s\}\) is semi-decidable (run \(U\) and halt if it halts) but not decidable (the Halting Problem). The complement — \(\{(\langle M \rangle, s) : M \text{ does not halt on } s\}\) — is neither decidable nor semi-decidable.

12.2 Diagonalization and SelfReject

We now prove that some functions cannot be computed by any Turing machine.

SelfReject is uncomputable. The function \(\mathrm{SelfReject}: \Sigma^* \to \{0,1\}\) defined by \[ \mathrm{SelfReject}(s) = \begin{cases} 1 & \text{if } s = \langle M \rangle \text{ and } M(\langle M \rangle) = 0, \\ 0 & \text{otherwise,}\end{cases} \] is not computable.
Suppose for contradiction that some Turing machine \(D\) computes \(\mathrm{SelfReject}\). Consider running \(D\) on its own description \(\langle D \rangle\):
  • If \(D(\langle D \rangle) = 1\), then by definition of \(\mathrm{SelfReject}\), we need \(D(\langle D \rangle) = 0\) — a contradiction.
  • If \(D(\langle D \rangle) = 0\), then by definition of \(\mathrm{SelfReject}\), we need \(D(\langle D \rangle) = 1\) — another contradiction.
In either case, we get a contradiction. So no such \(D\) exists.

This is the diagonal argument, the same technique Cantor used in 1891 to prove that the real numbers are uncountable. Cantor’s diagonal argument constructs a real number that differs from every number in a supposed list. Here, \(D(\langle D \rangle)\) is the “diagonal element” — a Turing machine applied to its own description — and the function is defined to differ from whatever the diagonal produces. Gödel used the same idea in 1931 to construct a sentence that asserts its own unprovability.

Why the diagonal works: the self-reference trap. The diagonal argument exploits a gap between the description of an object and the object itself. Suppose we enumerate all Turing machines as \(M_1, M_2, M_3, \ldots\) and ask: is there a machine \(D\) such that \(D(\langle M_i \rangle)\) differs from \(M_i(\langle M_i \rangle)\) for every \(i\)? If yes, then \(D\) itself appears in the list as some \(M_j\). Running \(D\) on its own description \(\langle M_j \rangle = \langle D \rangle\) gives a value that must differ from \(M_j(\langle M_j \rangle) = D(\langle D \rangle)\) — a contradiction. The machine cannot be the same as itself on the diagonal.

The intuition is: if you build a machine that “disagrees with row \(i\) on column \(i\)” for every \(i\), then when you look at row \(j\) (the row for your machine), you need to disagree with yourself — but you cannot disagree with your own output.

The same idea runs through all three great impossibility results: Cantor’s theorem (no list of reals is complete — the diagonal real is missing), Gödel’s incompleteness (no consistent formal system proves all truths — the Gödel sentence is missing), and the halting problem (no program decides halting — the diagonal program’s behavior is missing from any purported decider).

12.2.1 Kolmogorov Complexity and Uncomputability

Before proving the Halting Problem, it is illuminating to see an independent route to an undecidability result that traces directly to the Berry paradox.

Fix a programming language (say, C++). The Kolmogorov complexity \(K(x)\) of a binary string \(x\) is the bit-length of the shortest program that, run on empty input, halts and outputs \(x\). Informally, \(K(x)\) is the length of the shortest description of \(x\).
Most strings are incompressible. For every \(n\), among all \(2^n\) binary strings of length \(n\), at most \(2^{n-2}\) of them have \(K(x) \leq n-2\) — because there are fewer than \(2^{n-2}\) programs of length at most \(n-2\). So at least \(3/4\) of strings of length \(n\) have Kolmogorov complexity within 2 bits of the trivial upper bound \(n\). In fact for any constant \(c\), at most a \(2^{-c}\) fraction of strings can be compressed by more than \(c\) bits. Most strings are essentially incompressible: they carry no pattern that allows a shorter description.

Yet here is the paradox: even though most strings are complex, you cannot produce one. If you had a computable procedure that, given \(n\), outputs a string \(x\) with \(K(x) > n - 3\), then the description of that procedure is a constant-size program plus the input \(n\) (which takes \(\log n\) bits). For large enough \(n\), the program description is shorter than \(n - 3\) — contradicting \(K(x) > n - 3\). You can observe that complex strings form the majority, but you cannot point to one. This mirrors the real-number analogue: almost all real numbers are transcendental (not solutions of any polynomial with integer coefficients), but we can name only a handful — \(\pi\), \(e\), perhaps \(\pi^e\).

Kolmogorov complexity is uncomputable. There is no program that, given a string \(x\) and a number \(k\), halts and correctly outputs whether \(K(x) = k\).
This is a precise version of the Berry paradox. Suppose for contradiction that program \(P\) decides Kolmogorov complexity. Fix a large number \(K_0\) satisfying \(K_0 > 10 \log K_0 + 10|P|\) (such \(K_0\) exists since the right side grows as \(O(\log K_0)\)). Now construct program \(Q\):

Program \(Q\): Enumerate all binary strings in lexicographic order. For each string \(y\), run \(P\) to check whether \(K(y) = K_0\). Output the first such \(y\).

Call the first string found \(y^*\). By assumption, \(K(y^*) = K_0\). But the program \(Q\) outputs \(y^*\), so \(K(y^*) \leq |Q|\). The length of \(Q\) is at most \(|P| + \log K_0 + 20\) (we need to hard-code \(P\) and the value \(K_0\), plus a constant number of lines for the loop). This is much less than \(K_0\) by our choice of \(K_0\). So \(K(y^*) \leq |Q| < K_0 = K(y^*)\) — a contradiction.

The argument is the same as the Berry paradox: “the first string not describable in fewer than \(K_0\) bits” has just been described in fewer than \(K_0\) bits.

Note that \(Q\) will definitely find such a \(y^*\): among strings of length \(K_0\), most have Kolmogorov complexity exactly \(K_0\), so the loop terminates.

12.3 The Halting Problem

The most famous uncomputable problem is the Halting Problem: given a machine \(M\) and input \(s\), does \(M\) halt on \(s\)?

The Halting Problem is the function \(\mathrm{Halt}: \Sigma^* \times \Sigma^* \to \{0,1\}\) where \(\mathrm{Halt}(\langle M \rangle, s) = 1\) if and only if \(M\) halts on input \(s\).
The Halting Problem is uncomputable.
By reduction from SelfReject. Suppose for contradiction that some Turing machine \(H\) computes \(\mathrm{Halt}\). We construct a machine \(C\) that computes \(\mathrm{SelfReject}\):

Given input \(s\):

  1. If \(s\) is not the encoding of a Turing machine, output 0.
  2. Otherwise, let \(s = \langle M \rangle\). Use \(H\) to check whether \(M\) halts on \(\langle M \rangle\).
  3. If \(M\) does not halt on \(\langle M \rangle\): \(M(\langle M \rangle)\) is undefined, so output 0.
  4. If \(M\) does halt on \(\langle M \rangle\): simulate \(M\) on \(\langle M \rangle\) until halting, then output 1 iff \(M(\langle M \rangle) = 0\).
Machine \(C\) computes \(\mathrm{SelfReject}\), contradicting the previous theorem. So \(H\) cannot exist.
Practical significance. The undecidability of the Halting Problem is not a curiosity — it has direct consequences for software engineering. No general-purpose program analysis tool can decide for all programs and all inputs whether the program will terminate. Tools that check termination (such as those used in formal verification) must either work on restricted classes of programs or accept that they may fail to give an answer. The fundamental limitation is not a deficiency of current technology but a mathematical fact about the nature of computation.
An alternative proof: halting via Kolmogorov complexity. Section 12.2.1 showed that Kolmogorov complexity is uncomputable. We can use this as the base to derive the undecidability of the Halting Problem directly — without going through SelfReject.

Suppose for contradiction that some machine \(H\) solves the Halting Problem. We show how to use \(H\) to decide Kolmogorov complexity, contradicting Section 12.2.1.

On input \((x, k)\) — asking whether \(K(x) = k\) — proceed as follows: enumerate all binary strings in lexicographic order. For each string \(J\), use \(H\) to check whether \(J\) (viewed as a program) halts on the empty input. If \(H\) says yes, simulate \(J\) until it halts, and check whether its output is \(x\). Collect all programs of length at most \(k\) that output \(x\) and halt. If any exist, output “\(K(x) \leq k\)” and halt; otherwise output “\(K(x) > k\).”

Without \(H\), we are stuck: we cannot know whether a given program \(J\) will eventually produce output \(x\) or loop forever. With \(H\) as a subroutine, we can short-circuit this uncertainty — we ask \(H\) first, and only simulate the program if \(H\) confirms it halts. Since Kolmogorov complexity cannot be decided (Section 12.2.1), \(H\) cannot exist.

The proof illustrates the key technique of undecidability by reduction: we know that problem \(A\) (Kolmogorov complexity) is undecidable, and we show that if problem \(B\) (halting) were decidable, we could decide \(A\) — contradiction. The logical flow is always: assume \(B\) is decidable, use it as a black-box subroutine, solve \(A\), derive a contradiction.

12.4 Reductions and Rice’s Theorem

The halting problem serves as a source from which many other undecidability results can be derived by reduction. A computable reduction from problem \(A\) to problem \(B\) is a computable function \(f\) such that \(A(x) = 1\) iff \(B(f(x)) = 1\) for all inputs \(x\). If such a reduction exists and \(B\) is computable, then \(A\) is computable (by computing \(f(x)\) then applying \(B\)); contrapositive: if \(A\) is uncomputable, so is \(B\).

The most sweeping consequence of the halting problem’s undecidability is Rice’s Theorem, which rules out decidability for essentially any semantic property of Turing machine behavior.

A semantic property of Turing machines is a set \(\mathcal{P}\) of Turing machines that is closed under functional equivalence: if \(M \in \mathcal{P}\) and \(M'\) computes the same function as \(M\), then \(M' \in \mathcal{P}\). A semantic property is trivial if it contains all Turing machines or no Turing machines.
Rice's Theorem. Every non-trivial semantic property of Turing machines is uncomputable.
Let \(\mathcal{P}\) be a non-trivial semantic property. Assume without loss of generality that the machine \(M_\emptyset\) that halts on no input is not in \(\mathcal{P}\) (if it is, consider the complement \(\overline{\mathcal{P}}\), which is also non-trivial). Since \(\mathcal{P}\) is non-trivial, there exists some machine \(M_{\mathcal{P}} \in \mathcal{P}\).

We reduce the Halting Problem to \(\mathcal{P}\). Given an instance \((\langle M \rangle, x)\) of the Halting Problem, construct the machine \(N_{M,x}\) that on any input \(y\): first simulates \(M\) on \(x\); if \(M\) halts, then simulates \(M_{\mathcal{P}}\) on \(y\) and returns the result. If \(M\) does not halt on \(x\), then \(N_{M,x}\) also never halts on any input — it computes the same function as \(M_\emptyset \notin \mathcal{P}\). If \(M\) does halt on \(x\), then \(N_{M,x}\) computes the same function as \(M_{\mathcal{P}} \in \mathcal{P}\).

Therefore: \(M\) halts on \(x\) iff \(N_{M,x} \in \mathcal{P}\). If \(\mathcal{P}\) were decidable, the Halting Problem would be decidable — a contradiction.

Rice’s Theorem rules out decidability for all of the following questions about a Turing machine \(M\):

  • Does \(M\) halt on at least one input?
  • Does \(M\) compute a total function (halt on all inputs)?
  • Does \(M\) compute the identity function?
  • Does \(M\) always output a prime?
  • Is the set of inputs on which \(M\) halts equal to \(\{0, 1\}^*\)?

All of these are semantic properties (they depend on what function \(M\) computes, not on the internal structure of \(M\)’s transition function). None can be decided by any Turing machine.

What Rice's Theorem does not say. Rice's Theorem applies to semantic properties — properties of the function computed by \(M\) — not syntactic properties of the machine description. Syntactic properties can be decidable. For example: "does \(M\) have exactly 5 states?" is computable (just read the machine description). "Does \(M\)'s transition function ever write a \(0\)?" is also syntactic and decidable. The theorem is precisely targeted: the moment a question depends on the behavior of the machine across its inputs (not just its structure), undecidability is almost inevitable.

Chapter 13: Gödel’s Incompleteness Theorems

13.1 The Bridge Between Logic and Computation

We now connect the computability theory of Chapter 12 to the question of axiomatic completeness raised in Chapter 10. The key concept is that a sufficiently powerful axiom system can “express” statements about Turing machine computations.

A set of axioms \(\Gamma\) over a signature with function and predicate symbols is TM-expressive if there exists a computable encoding function that, for every Turing machine \(M\) and input \(s\), produces a sentence \(\varphi_{M,s}\) such that: \[ M \text{ halts on } s \quad \Longleftrightarrow \quad \Gamma \models \varphi_{M,s}. \]

Intuitively, TM-expressiveness says the theory is rich enough to talk about what Turing machines do. Both Peano arithmetic \(\Gamma_{\mathrm{PA}}\) and Zermelo–Fraenkel set theory ZFC are TM-expressive: the behavior of any Turing machine can be encoded as an arithmetic statement (via Gödel numbering or step-by-step simulation in set theory).

The key lemma establishing TM-expressiveness of Peano arithmetic is the following:

Key Lemma (TM-expressiveness of \(\Gamma_{\mathrm{PA}}\)). For every Turing machine \(M\) and input \(s\), there is a sentence \(\varphi_{M,s}\) of \(\tau_{\mathrm{PA}}\) that is computable from \(\langle M \rangle\) and \(s\), such that: \[ M \text{ halts on } s \quad \Longleftrightarrow \quad \Gamma_{\mathrm{PA}} \models \varphi_{M,s}. \] Moreover, if \(M\) halts on \(s\) in \(k\) steps, then \(\Gamma_{\mathrm{PA}} \vdash \varphi_{M,s}\) with a proof whose length is polynomial in \(k\).
The idea is that a Turing machine computation is a finite sequence of configurations, each obtainable from the previous by applying \(\delta\). A halting computation of length \(k\) is encoded as a natural number (via a suitable Gödel numbering) that encodes the entire configuration sequence. The sentence \(\varphi_{M,s}\) asserts: "there exists a natural number \(n\) that encodes a valid halting computation of \(M\) on \(s\)."

Formally, we encode each configuration \((q, \text{tape})\) as a natural number and define arithmetic predicates (expressible in \(\tau_{\mathrm{PA}}\)):

  • \(\mathrm{ValidStart}(n, M, s)\): the first configuration encoded in \(n\) is the initial configuration of \(M\) on \(s\);
  • \(\mathrm{ValidStep}(n, M)\): each consecutive pair of configurations in \(n\) satisfies the transition function \(\delta\);
  • \(\mathrm{Halts}(n)\): the last configuration in \(n\) has the Halt state.
Then \(\varphi_{M,s} = \exists n\, (\mathrm{ValidStart}(n,M,s) \land \mathrm{ValidStep}(n,M) \land \mathrm{Halts}(n))\).

Each of these predicates is expressible in \(\Gamma_{\mathrm{PA}}\) because arithmetic encodes sequences via prime factorization, and validity conditions are primitive recursive. If \(M\) halts in \(k\) steps, the witnessing \(n\) is computable and the proof that it satisfies the predicates has length \(O(k)\). If \(M\) does not halt, no such \(n\) exists, so \(\varphi_{M,s}\) is false in \(\mathbb{N}\); by soundness of \(\Gamma_{\mathrm{PA}}\), it is not provable.

13.2 The First Incompleteness Theorem

First Incompleteness Theorem (initial version). Every set of axioms \(\Gamma\) that is TM-expressive, computable, and sound must be incomplete: there exists a sentence \(\varphi\) such that neither \(\Gamma \vdash \varphi\) nor \(\Gamma \vdash \neg\varphi\).
Suppose \(\Gamma\) is complete. We show \(\Gamma\) can solve the Halting Problem, contradicting Chapter 12.

Given any machine \(M\) and input \(s\), compute the sentence \(\varphi_{M,s}\) (by TM-expressiveness, this is computable). Since \(\Gamma\) is computable and complete, there is an algorithm that, given \(\varphi_{M,s}\), decides whether \(\Gamma \vdash \varphi_{M,s}\) or \(\Gamma \vdash \neg\varphi_{M,s}\) — by enumerating all proofs and waiting.

If \(\Gamma \vdash \varphi_{M,s}\): by soundness, \(\Gamma \models \varphi_{M,s}\), so by TM-expressiveness, \(M\) halts on \(s\).

If \(\Gamma \vdash \neg\varphi_{M,s}\): by soundness, \(\Gamma \models \neg\varphi_{M,s}\), so \(M\) does not halt on \(s\).

This decides the Halting Problem — impossible by Theorem 12.3. Contradiction.

Walking through the proof. The argument is cleaner than it first appears. The key insight is that a complete, sound, computable proof system acts as an oracle for the Halting Problem.

Here is the algorithm explicitly: given machine \(M\) and input \(s\), first compute the sentence \(\varphi_{M,s}\) (this is a mechanical translation). Then enumerate all finite strings of characters in increasing length order, checking each one to see whether it constitutes a valid proof (using the checkability of the proof system) of either \(\varphi_{M,s}\) or \(\neg\varphi_{M,s}\). Since the system is complete, one of these two sentences has a proof — so eventually the search finds it. If it finds a proof of \(\varphi_{M,s}\), output “halts” (justified by soundness: the sentence is true). If it finds a proof of \(\neg\varphi_{M,s}\), output “does not halt.”

Three properties are each used once: checkability lets us verify candidate proofs; completeness guarantees the search terminates; soundness guarantees the answer is correct. Remove any one of these, and the argument collapses.

This also clarifies why the attempt to take “\(\Gamma =\) all sentences true in \(\mathbb{N}\)” fails. That set is sound and complete for arithmetic, but it is not computable: there is no algorithm to decide membership (this is itself a consequence of the halting problem). Checkability is not a luxury — it is essential.

The soundness assumption can be weakened to consistency:

First Incompleteness Theorem (stronger version, Gödel 1931). Every set of axioms \(\Gamma\) that is TM-expressive, computable, and consistent must be incomplete.
The proof constructs a self-referential Turing machine. By a careful diagonalization (mirroring the SelfReject argument), one builds a sentence \(\varphi^*\) such that:
  • If \(\Gamma \vdash \varphi^*\): then consistency of \(\Gamma\) implies \(\Gamma \not\models \varphi^*\), contradicting soundness (or, in the consistency version, leading to a derivation of \(\bot\)).
  • If \(\Gamma \vdash \neg\varphi^*\): by TM-expressiveness the encoding of \(\neg\varphi^*\) being provable can be converted into a halting statement, again producing a contradiction with consistency.
So neither \(\varphi^*\) nor \(\neg\varphi^*\) is provable.
Gödel's original approach: arithmetization. Gödel's original 1931 proof did not use Turing machines (they were not defined until 1936). Instead, Gödel introduced Gödel numbering: a systematic encoding of every symbol, formula, and proof as a natural number, such that syntactic properties ("this sequence of formulas is a valid proof") become arithmetic properties ("this natural number has this arithmetic structure"). Using this encoding, Gödel constructed within Peano arithmetic a sentence \(G\) that asserts, in arithmetic language: "the natural number encoding of a proof of \(G\) does not exist" — i.e., "\(G\) is not provable." This is a precise arithmetic formalization of the liar paradox ("this sentence is false"), made consistent by saying "unprovable" rather than "false."

If \(G\) is provable, then the proof exists, contradicting what \(G\) asserts. If \(\neg G\) is provable, then \(G\) is provable, a contradiction. So neither is provable in a consistent system. But \(G\) is true in the standard model \(\mathbb{N}\): there really is no proof of \(G\) in \(\Gamma_{\mathrm{PA}}\) (given consistency), so the arithmetic assertion that no proof-number for \(G\) exists is correct.

The Turing-machine proof in this course captures the same idea through the lens of computability: the halting problem is the modern formulation of the same undecidability phenomenon that Gödel first observed in formal arithmetic.

13.3 The Second Incompleteness Theorem

The First Incompleteness Theorem shows that no consistent, computable, TM-expressive system can decide every sentence. The Second Incompleteness Theorem shows an even stronger limitation: such a system cannot even prove that it is consistent.

Second Incompleteness Theorem (Gödel 1931). Let \(\Gamma\) be a TM-expressive, computable, consistent set of axioms. Then \(\Gamma \not\vdash \mathrm{Con}(\Gamma)\), where \(\mathrm{Con}(\Gamma)\) is the sentence expressing "there exists no proof of \(\bot\) from \(\Gamma\)."

The intuition: if \(\Gamma\) could prove \(\mathrm{Con}(\Gamma)\), then from the proof of the Gödel sentence \(G\) — which shows that \(\Gamma + \neg G\) is inconsistent (from consistency of \(\Gamma\) and the fact that \(\neg G\) implies a contradiction) — \(\Gamma\) would be able to prove \(G\). But by the First Incompleteness Theorem, \(G\) is unprovable. So proving \(\mathrm{Con}(\Gamma)\) would give us \(G\), a contradiction.

The Second Incompleteness Theorem devastated Hilbert’s program. Hilbert wanted to use a “safe,” finitary sub-system of mathematics (call it \(S\)) to prove the consistency of all of mathematics. But the Second Incompleteness Theorem says: any consistent system \(S\) powerful enough to talk about itself cannot prove its own consistency. To prove the consistency of \(\Gamma_{\mathrm{PA}}\), you need a stronger system — and to trust that stronger system, you need an even stronger one, and so on indefinitely. There is no secure foundation from which all of mathematics can be certified as consistent.

Gerhard Gentzen did show in 1936 that Peano arithmetic is consistent — but his proof used transfinite induction up to the ordinal \(\varepsilon_0\), a form of reasoning that goes beyond \(\Gamma_{\mathrm{PA}}\) itself. This is not a violation of the Second Incompleteness Theorem: Gentzen used a different, stronger system. The theorem says \(\Gamma_{\mathrm{PA}}\) cannot prove its own consistency; it says nothing about what a stronger system can prove.

13.3.1 Rosser’s Strengthening

Gödel’s original 1931 proof of the stronger First Incompleteness Theorem used a technical condition called \(\omega\)-consistency, which is stronger than mere consistency. J. Barkley Rosser showed in 1936 that simple consistency suffices, through a more refined self-referential construction.

Rosser's Theorem (1936). Let \(\Gamma\) be TM-expressive, computable, and consistent. Then there exists a sentence \(\rho\) (the Rosser sentence) such that neither \(\Gamma \vdash \rho\) nor \(\Gamma \vdash \neg\rho\).

Rosser’s sentence asserts: “For every proof of \(\rho\) from \(\Gamma\), there is a shorter proof of \(\neg\rho\) from \(\Gamma\).” This asymmetric self-reference is more subtle than Gödel’s “I am not provable” and circumvents the need for \(\omega\)-consistency. In the TM-machine formulation, the Rosser improvement corresponds to a more careful construction of the diagonal machine \(D\) that compares the lengths of possible proofs in its self-reference.

13.4 Implications and Perspectives

What incompleteness does and does not say. Incompleteness is sometimes misread as saying "there are mathematical truths we can never know." This overstates the case. The Gödel sentence \(G\) for \(\Gamma_{\mathrm{PA}}\) is *unprovable* in \(\Gamma_{\mathrm{PA}}\), but it is provable in a stronger system (such as \(\Gamma_{\mathrm{PA}} + \mathrm{Con}(\Gamma_{\mathrm{PA}})\)). And we can *know* that \(G\) is true by reasoning outside any fixed formal system.

What incompleteness does say: no single, mechanically checkable proof system can capture all of mathematical truth. Mathematics, as a human activity, always transcends any particular formalization. There will always be true statements that require stronger axioms to prove.

The incompleteness theorems do not say that mathematics is inconsistent, that human minds are non-mechanical (Penrose’s controversial claim), or that all interesting questions in arithmetic are undecidable. The vast majority of everyday mathematics is provable in \(\Gamma_{\mathrm{PA}}\) or in weaker systems. The undecidable sentences are deliberately constructed and do not typically arise in ordinary mathematics — though there are natural combinatorial statements (like the Goodstein sequences) that are provably unprovable in \(\Gamma_{\mathrm{PA}}\) yet are true in the standard model.

The incompleteness theorems mark the end of a particular dream — the dream of a complete, mechanical foundation for all of mathematics — but they also open a new perspective. By revealing the limits of formal systems, they illuminate what it means for something to be true as opposed to merely provable, and they establish that mathematical understanding is not reducible to mechanical verification. They are, in Gödel’s own words, “not a human failing but a feature of the richness of mathematical truth.”

Back to top