CS 442: Principles of Programming Languages

Gregor Richards

Estimated study time: 2 hr 39 min

Table of contents

Sources and References

These notes draw primarily on the Winter 2025 course notes for CS 442 at the University of Waterloo, written by Gregor Richards, Yangtian Zi, and Anthony Cox. Additional material is incorporated from David R. Cheriton’s CS 442/642 lectures and Jaiden Ratti’s lecture notes.

Chapter 1: Languages and Paradigms

Introduction

The study of programming languages is not about surveying languages or building production compilers. It is about understanding programming languages as formal objects that can be analyzed and reasoned about with mathematical precision. Two themes run through this subject: formal specification (describing a language rigorously enough to prove things about programs) and paradigm diversity (understanding fundamentally different ways of thinking about computation).

A calculus is any formal symbolic system for carrying out reasoning within a particular domain. The formal backbone of this subject is the $\lambda$-calculus, a minimalist system invented by Alonzo Church in the 1930s containing nothing but variables, function definitions, and function applications, yet powerful enough to express any computation. We develop it both as a mathematical formalism and as an implementable (if impractical) programming language. Extending it with types, mutable state, and concurrency yields precise insight into how each mechanism affects program behavior.

Alongside the $\lambda$-calculus, we survey several programming paradigms through exemplar languages: Haskell for functional programming, Prolog for logic programming, Pascal for imperative programming, Smalltalk for object-oriented programming, Erlang for concurrent programming, and C for systems programming.

History of Programming Languages

The first programs were sequences of binary digits entered by hand. Every computer still executes machine code; programming languages are successive layers of abstraction trading direct control for expressiveness. Among the earliest high-level languages was Fortran (IBM, 1950s), which lacked structured programming — control flow consisted of numbered instructions with explicit GOTO jumps.

Algol (late 1950s to early 1960s) introduced structured control flow: if blocks, while loops, and nested procedures. Algol never gained mass adoption, but its design DNA runs through C, Java, JavaScript, Python, and their relatives. A crucial distinction between Algol and Fortran was intentionality: Fortran grew organically, whereas Algol was deliberately designed to embody a philosophy of programming, marking the birth of distinct paradigms.

The Church-Turing thesis (1930s) demonstrated that Church’s function-based $\lambda$-calculus and Turing’s state-machine model are equivalent in power. Languages descended from Church’s vision became the functional family; those from Turing’s became the imperative family. Fortran, Algol, and COBOL belong to the imperative tradition where programs are sequences of state-mutating commands. Lisp (late 1950s) took a fundamentally different approach as the first functional language, where functions are first-class values and data is encapsulated rather than freely mutated. Lisp also introduced homoiconicity — code and data share the same representation — enabling powerful macro systems. The next major shift came in the 1980s with object-oriented programming (rooted in Simula 1967, crystallized by Smalltalk) and concurrent languages.

Programming Paradigms and Design Principles

A programming paradigm is a way of thinking about program structure and execution. An exemplar language embodies a paradigm’s essential ideas purely, without mixing traditions. C++ exemplifies many things simultaneously, making it poor for isolating principles; Smalltalk, where even control flow is message-passing, is an excellent OO exemplar.

Three cross-cutting language design principles recur across paradigms. The Abstraction Principle states that naming and reusing patterns of computation is fundamental. The Parameterization Principle states that abstractions should accept parameters to generalize their behavior. The Qualification Principle states that any expression should be refinable by attaching local declarations that qualify its meaning without altering surrounding context. Together, abstraction creates reusable units, parameterization makes them flexible, and qualification makes them locally precise.

The two implementation languages, OCaml and Smalltalk, represent the functional and object-oriented paradigms. Writing interpreters for one paradigm in the “opposite” paradigm’s language forces the implementor to confront how concepts like closures must be represented when the host language thinks in objects, and vice versa.

OCaml

OCaml is a statically typed, eagerly evaluated functional language descended from the ML family. It is valued here for its algebraic data types, pattern matching, and powerful type system — not its object system (despite the “O” in OCaml). Standard ML would serve equally well in principle, but OCaml’s ecosystem is better maintained. The toolchain is set up via opam (OCaml’s package manager) by running opam init, loading the environment with eval $(opam env), and installing the core, base, and utop packages. The standard library together with Base and Core provides everything needed for interpreter-building exercises.

OCaml’s compilation model mirrors C’s separate compilation and linking. Tests can be embedded in the source file or placed in a separate test module compiled together with the main file using ocamlfind. The -package flag specifies compile-time dependencies and -linkpkg handles linking. Records (simple product types) are used freely but should not be confused with classes — in many OO languages like C++ a struct is just a class with public defaults, but historically records and classes arose from completely separate traditions, and purely OO languages like Smalltalk have no record type at all.

Smalltalk

Smalltalk is an object-oriented language taken to its logical extreme. It has no built-in conditionals, loops, or other control structures. Instead, booleans are objects with an ifTrue: method that evaluates a block only when the receiver is true, and blocks have a whileTrue: method for looping. Everything is achieved by sending messages to objects.

In traditional Smalltalk environments (Pharo, Squeak), classes are created interactively through the IDE; only method bodies have textual syntax. GNU Smalltalk provides a file-based workflow with a REPL while retaining all semantic idiosyncrasies.

Messages, Precedence, and Syntax

The fundamental mechanism is sending a message to an object: 'Hello, world!' displayNl sends displayNl to a string. Everything in Smalltalk is an object — integers, booleans, classes, even Class itself. Because the only operation is sending messages, syntax is minimal: receiver message.

Strings use single quotes; double quotes are comments. Smalltalk treats every operation as a message send. Unary messages (like displayNl) bind most tightly, then binary messages (like +), then keyword messages (like ifTrue:). All binary messages have equal precedence and associate left to right, so 30 + 6 * 2 evaluates as (30 + 6) * 2 = 72 — Smalltalk has no knowledge of arithmetic precedence. Parentheses are required for explicit grouping. Multiple statements are separated by periods.

Booleans, Conditions, and Loops

The boolean values true and false are objects of classes True and False. Conditional execution uses method calls: true ifTrue: ['yes' displayNl] evaluates the block argument, while false’s ifTrue: does nothing. No conditional branching primitive exists; polymorphic dispatch does all the work. A block ([...]) is a deferred piece of code, like a closure. The ifTrue:ifFalse: keyword message implements two-way branching with its interleaved syntax.

Loops are also method-based. Numeric ranges use 1 to: 10 do: [:x| x displayNl]. Block arguments are declared with a colon prefix at the start of the block, terminated by |. For general-purpose looping, whileTrue: takes a block as both receiver (condition) and argument (body): [x < 64] whileTrue: [x := x * 2]. The condition must be wrapped in a block because it is re-evaluated on each iteration.

Variables, Containers, and Dictionaries

Local variables are declared between vertical bars (| x y |) and assigned with :=. Smalltalk provides several built-in collection types.

OrderedCollection is the most commonly used dynamically sized list. Objects are instantiated by sending new to a class: OrderedCollection new. Elements are appended with add:, prepended with addFirst:, and bulk-inserted with addAll:. The , binary message concatenates two collections into a new list. The with: factory method (and variants like with:with:) constructs a collection with initial elements in one step. Iteration uses do: with a block argument. Elements are accessed by 1-based index with at: and mutated with at:put:. The methods removeFirst and removeLast pop elements and return them, making OrderedCollection usable as a stack or queue.

Array has a fixed size specified at creation via new:. Uninitialized slots contain nil, Smalltalk’s equivalent of null — but unlike null in most languages, nil is a full object with methods; the idiomatic nil check is x isNil. Array literals use braces with period-separated elements: {2. 4. 6}. Arrays lack add: entirely since their size is fixed, but at: indexing is O(1). The methods asOrderedCollection and asArray convert between the two types.

A String is essentially a read-only array of Character objects. All array operations work on strings except mutation. Converting to an Array makes it mutable. Character literals use a dollar-sign prefix ($H, $e), and non-printable characters use named constructors like Character lf.

A Dictionary is a key-value map where keys can be of any type. Creation and insertion follow the standard pattern: dict at: 'Hello' put: 'world'. Key presence is tested with includesKey:, entries removed with removeKey:. Iterating with do: yields only values; to access both keys and values, use keysAndValuesDo: with a two-argument block. All Smalltalk collections are 1-indexed.

Classes, Subclasses, and the RPN Calculator

Classes are created by sending a message to an existing class. Object subclass: #RPNCalculator creates a new subclass; the #RPNCalculator syntax creates a symbol (an interned string used as a name). GNU Smalltalk provides a block-based shorthand for declaring a class with instance variables and methods in one declaration: Object subclass: RPNCalculator [ | stack | ... ]. Methods are defined inside the class block. Instance variable declarations between vertical bars apply to the class as a whole.

A crucial design principle: instance variables are always private — no syntax exists to access another object’s fields. Since new is a class-side method (sent to the class, not an instance), it cannot directly initialize the fields of the object it creates. The standard pattern splits construction into two steps: the class-side new (declared with RPNCalculator class >> new [...]) allocates the object via super new, then calls an instance-side initializer (e.g., init) that can access its own fields. The >> operator specifies which class receives the method. Using super new calls Object’s original allocator, avoiding an infinite loop from the overridden new.

The RPN calculator example demonstrates these ideas. In reverse Polish notation, operands are pushed onto a stack and operators pop their arguments. A push: method appends to the stack and returns the argument (the caret ^ is Smalltalk’s return operator). A binary: method abstracts the common pop-two-compute-push pattern using a two-argument block parameter, and specific methods like add delegate to it: ^ self binary: [:x :y| x + y]. The self keyword (Smalltalk’s equivalent of this) enables this delegation. Blocks are evaluated by sending value (zero arguments), value: (one), or value:value: (two). Subclasses FractionalRPNCalculator and FloatRPNCalculator override only push: to convert incoming values via asFraction or asFloat; all other methods are inherited unchanged.

Blocks, Non-local Returns, and Display

Blocks differ from ordinary closures in one critical way: a ^ return inside a block causes the enclosing method to return, not just the block. This non-local return is useful for early-exit and switch-like dispatch patterns — a sequence of (cmd = 'push') ifTrue: [self push. ^self] checks can serve as a command dispatcher where each matching branch returns from the method immediately.

The display and displayNl methods delegate to displayString for the string representation; overriding displayString customizes how an object prints with these methods. The REPL uses printString instead, so interactive display customization requires overriding that separately. GNU Smalltalk includes a MiniDebugger that activates on errors when loaded alongside your code, providing gdb-style navigation (up/down the call stack, stepping, inspection). The self halt message triggers the debugger at a specific point. Smalltalk is well suited for implementing interpreters: abstract syntax trees are constructed as Smalltalk objects and test files are loaded after the implementation file.

Chapter 2: Untyped $\lambda$-Calculus

To study programming languages with mathematical rigor, we need a formal model simple enough to reason about yet expressive enough to represent a broad class of languages. The $\lambda$-calculus, devised by Alonzo Church in the 1930s, is precisely such a model: a minimal formal system consisting of just three constructs — variables, function definitions, and function applications — yet capable of encoding booleans, numbers, lists, recursion, and in principle any computation whatsoever. The deeper lesson is that a startlingly small core suffices for universal computation. Anything a programming language can compute, the $\lambda$-calculus can compute too — giving language designers a principled baseline for understanding what features contribute genuine expressiveness versus mere convenience.

Syntax and Definitions

The entire syntax fits four BNF productions:

⟨Expr⟩ ::= ⟨Var⟩ | ⟨Abs⟩ | ⟨App⟩ | (⟨Expr⟩)
⟨Var⟩  ::= a | b | c | ...
⟨Abs⟩  ::= λ ⟨Var⟩ . ⟨Expr⟩
⟨App⟩  ::= ⟨Expr⟩ ⟨Expr⟩

A $\lambda$-expression is a variable, an abstraction $\lambda x.\, M$ (an anonymous function with parameter $x$ and body $M$), an application $MN$ (applying the rator $M$ to the rand $N$), or a parenthesized expression. Two disambiguation conventions: abstraction bodies extend maximally to the right ($\lambda x.\, xy$ means $\lambda x.\, (xy)$), and application associates left ($abc$ means $(ab)c$). Every abstraction takes exactly one parameter; multi-argument functions are recovered through currying — nesting single-parameter abstractions.

Free and Bound Variables

In $\lambda x.\, x$, the body’s $x$ is bound to the parameter. A variable not captured by any enclosing abstraction is free. These notions have recursive definitions:

Bound Variables. $ BV\left[x\right] = \emptyset $, $ BV\left[\lambda x.\, L\right] = BV\left[L\right] \cup \{x\} $, $ BV\left[MN\right] = BV\left[M\right] \cup BV\left[N\right] $.

Free Variables. $ FV\left[x\right] = \{x\} $, $ FV\left[\lambda x.\, L\right] = FV\left[L\right] \setminus \{x\} $, $ FV\left[MN\right] = FV\left[M\right] \cup FV\left[N\right] $.

An expression is closed (a combinator) if $ FV\left[E\right] = \emptyset $. A variable name may be both free and bound in the same expression (e.g., in $x\lambda x.\, x$, the first $x$ is free and the second is bound), but each individual occurrence is exclusively one or the other. Bound variables derive their meaning from the enclosing abstraction; free variables are meaningful only if defined by some external context. A combinator is self-contained and can be evaluated without external information. This distinction parallels scoping in conventional languages: in int f(int x) { return x + y; }, x is bound to the parameter and y is free.

Substitution and $\beta$-Reduction

Computation proceeds by reduction — repeatedly transforming an expression until a normal form is reached. The central rule is $\beta$-reduction: an expression $(\lambda x.\, M)N$ (a $\beta$-redex) reduces to $M[N/x]$, meaning $M$ with every free occurrence of $x$ replaced by $N$.

$\beta$-reduction. $ (\lambda x.\, M)N \to_\beta M\left[N/x\right] $. If $ C\left[(\lambda x.\, M)N\right] $ contains the redex as a subterm, then $ C\left[(\lambda x.\, M)N\right] \to_\beta C\left[M\left[N/x\right]\right] $.

Related notation: $\to^n_\beta$ for exactly $n$ steps, $\to^*_\beta$ for zero or more, $\to^+_\beta$ for one or more, $\leftarrow_\beta$ for $\beta$-expansion, and $=_\beta$ for $\beta$-equivalence.

Substitution itself needs a careful structural definition. The provisional definition covers each syntactic form: $x\left[T/x\right] = T$; $y\left[T/x\right] = y$ when $y \neq x$; $(MN)\left[T/x\right] = M\left[T/x\right]\,N\left[T/x\right]$; $(\lambda x.\, M)\left[T/x\right] = \lambda x.\, M$ (shadowed); and $(\lambda y.\, M)\left[T/x\right] = \lambda y.\, M\left[T/x\right]$. Currying is illustrated by reduction: $(\lambda x.\, \lambda y.\, x)ab \to_\beta (\lambda y.\, a)b \to_\beta a$ — the nested abstractions simulate a two-argument function returning its first argument.

However, this provisional definition has a flaw. Applying $(\lambda x.\, \lambda y.\, x)$ to $y$ and $z$ under the provisional rules yields $z$ instead of the expected $y$, because the free variable $y$ in the argument is captured by the inner $\lambda y$ during substitution. This produces unwanted dynamic binding; what we want is static binding where binding occurrences never change during computation. The corrected definition adds a sixth case: when $y \neq x$ and $y \in FV\left[T\right]$, the abstraction’s variable is renamed to a fresh $z$ before substituting: $(\lambda y.\, M)\left[T/x\right] = \lambda z.\, M\left[z/y\right]\left[T/x\right]$. This fresh-name generation is formalized through $\alpha$-conversion.

$\alpha$-conversion. $ \lambda x.\, E =_\alpha \lambda y.\, E\left[y/x\right] $ provided $ y \notin FV\left[E\right] $. Renaming a bound variable throughout its scope does not change the function's meaning.

Normal Forms and Reduction Strategies

A $\lambda$-expression with no $\beta$-redex is in $\beta$-normal form. Not all expressions reach normal form — $(\lambda x.\, xx)(\lambda x.\, xx) \to_\beta (\lambda x.\, xx)(\lambda x.\, xx)$, looping forever. The expression $\lambda x.\, fx$ is extensionally equivalent to $f$ alone — both accept the same inputs and produce the same outputs. This principle of function extensionality gives rise to $\eta$-reduction: $\lambda x.\, Mx \to_\eta M$ when $x \notin FV\left[M\right]$. A reduction allowing both $\beta$- and $\eta$-steps is called $\beta\eta$-reduction.

An important alternative stopping condition is weak normal form (WNF): an expression is in WNF if every $\beta$-redex lies within the body of some abstraction. Real programming languages do not evaluate a function’s body until it is called, so WNF mirrors this practical behavior. Every expression in $\beta$-normal form is also in WNF, since WNF is strictly weaker. For example, $\lambda x.\, (\lambda y.\, y)(\lambda z.\, (\lambda w.\, w)z)$ is in WNF but not in $\beta$-normal form — both redices lie inside the outer abstraction.

The Church-Rosser Theorem

When multiple redices exist, the $\beta$-reduction definition does not prescribe which to reduce first. A fundamental theorem guarantees this freedom is safe:

Church-Rosser Theorem (1936). If $ E_1 \to^*_\beta E_2 $ and $ E_1 \to^*_\beta E_3 $, then there exists $ E_4 $ such that $ E_2 \to^*_\beta E_4 $ and $ E_3 \to^*_\beta E_4 $ (up to $\alpha$-equivalence).

An immediate corollary is that normal forms, when they exist, are unique. Crucially, the theorem does not guarantee that a normal form exists — only that if one does, it is the same regardless of reduction path.

Applicative Order vs. Normal Order

Two canonical strategies illustrate the fundamental trade-off. Applicative Order Reduction (AOR) always chooses the leftmost, innermost redex (a redex containing no other redices). Arguments are fully evaluated before being passed to a function — also called call-by-value or eager evaluation. Under AOR, $(\lambda x.\, fx)((\lambda y.\, gy)z)$ first reduces the inner argument to $gz$, then applies the outer function, yielding $f(gz)$.

Normal Order Reduction (NOR) always chooses the leftmost, outermost redex (a redex not contained in any other redex). Arguments are substituted unevaluated — also called call-by-name or lazy evaluation. The same expression under NOR first substitutes the unevaluated argument into the body, yielding $f((\lambda y.\, gy)z)$, then reduces the remaining redex to $f(gz)$. Both strategies arrive at the same normal form, but may take different paths.

NOR has an important advantage: the Standardization Theorem (1958) guarantees that if an expression has a normal form, NOR will reach it. AOR can fail: $(\lambda x.\, y)((\lambda x.\, xx)(\lambda x.\, xx))$ has a normal form ($y$), but AOR insists on evaluating the divergent argument first and loops forever. NOR succeeds by substituting the divergent argument into the body and discovering it is never used, immediately yielding $y$.

The Church-Rosser guarantee applies only in a purely functional setting. In languages with side effects (mutable state, I/O), evaluation order can produce observable differences — C and C++ leave function-argument evaluation order unspecified, so different compilers may produce different behavior for code with side-effecting arguments.

Programming with $\lambda$-Calculus

Church and Turing independently proved their respective models are computationally equivalent, so the $\lambda$-calculus can encode any computation. We write $\llbracket \cdot \rrbracket$ for “the $\lambda$-calculus representation of.”

Booleans and Conditionals

A boolean acts as a selector — a two-argument function returning one of its arguments:

\[ \llbracket \mathit{true} \rrbracket := \lambda x.\, \lambda y.\, x \qquad \llbracket \mathit{false} \rrbracket := \lambda x.\, \lambda y.\, y \]

The conditional simplifies to $\llbracket B \rrbracket\, \llbracket T \rrbracket\, \llbracket F \rrbracket$, since the boolean itself selects the appropriate branch. Boolean operators follow naturally: $\llbracket \mathit{not} \rrbracket = \lambda p.\, p\,\llbracket \mathit{false} \rrbracket\,\llbracket \mathit{true} \rrbracket$, and and/or exploit the selector behavior similarly. This behavioral encoding mirrors Smalltalk’s design, where True and False implement ifTrue: through polymorphic dispatch.

Pairs and Lists

Programs generally require storage facilities. The simplest expandable storage is the list, built from the fundamental pair — a combination of two data values. A pair stores its head and tail in a function that accepts a selector producing one of them: $\llbracket \langle h, t \rangle \rrbracket := \lambda s.\, sht$. Selectors that return the first or second of two arguments are exactly what $\llbracket \mathit{true} \rrbracket$ and $\llbracket \mathit{false} \rrbracket$ do. So the head extractor is $\llbracket \mathit{head} \rrbracket := \lambda l.\, l\,\llbracket \mathit{true} \rrbracket$, the tail extractor is $\llbracket \mathit{tail} \rrbracket := \lambda l.\, l\,\llbracket \mathit{false} \rrbracket$, and the cons constructor is $\llbracket \mathit{cons} \rrbracket := \lambda h.\, \lambda t.\, \lambda s.\, sht$.

For the empty list, the null test $\llbracket \mathit{null?} \rrbracket := \lambda l.\, l(\lambda h.\, \lambda t.\, \llbracket \mathit{false} \rrbracket)$ passes a selector that always returns false to a non-empty list (a cons pair). The empty list should make this return true, so it ignores its selector: $\llbracket \mathit{nil} \rrbracket := \lambda s.\, \llbracket \mathit{true} \rrbracket$. A list like (list a b c) is encoded as nested cons cells: $\llbracket \mathit{cons}\; a\; (\mathit{cons}\; b\; (\mathit{cons}\; c\; \mathit{nil})) \rrbracket$.

Church Numerals

While numbers could be represented as lists (empty for 0, one-element for 1, etc.), a cleverer solution introduced by Church defines Church numerals: $\llbracket n \rrbracket$ is a function that takes a function $f$ and a value $x$, producing the result of $f$ applied $n$ times to $x$:

\[ \llbracket 0 \rrbracket = \lambda f.\, \lambda x.\, x \qquad \llbracket 1 \rrbracket = \lambda f.\, \lambda x.\, fx \qquad \llbracket 2 \rrbracket = \lambda f.\, \lambda x.\, f(fx) \qquad \llbracket 3 \rrbracket = \lambda f.\, \lambda x.\, f(f(fx)) \]

Arithmetic operations exploit this structure. Addition applies $f$ $n$ times to $x$, then $m$ more times to the result: $\llbracket + \rrbracket = \lambda m.\, \lambda n.\, \lambda f.\, \lambda x.\, mf(nfx)$. A special case is the successor: $\llbracket \mathit{succ} \rrbracket = \lambda n.\, \lambda f.\, \lambda x.\, nf(fx)$. Multiplication is $n$-fold repetition of $f$ composed $m$ times: $\llbracket * \rrbracket = \lambda m.\, \lambda n.\, \lambda f.\, m(nf)$. Exponentiation is $n$-fold application of $m$ itself: $\llbracket \hat{\phantom{x}} \rrbracket = \lambda m.\, \lambda n.\, nm$.

Subtraction is harder because there is no way to “undo” a function application. The predecessor function uses pairs $\langle a, b \rangle$ to track a number alongside its predecessor: starting from $\langle 0, 0 \rangle$ and applying the transformation $\langle a, b \rangle \mapsto \langle a+1, a \rangle$ exactly $n$ times yields $\langle n, n-1 \rangle$. Taking the tail gives the predecessor. Subtraction then applies the predecessor function $n$ times starting from $m$: $\llbracket - \rrbracket = \lambda m.\, \lambda n.\, n\,\llbracket \mathit{pred} \rrbracket\, m$.

Recursion and the Y Combinator

$\lambda$-calculus functions are anonymous, so self-reference is not directly available. To define a recursive function like list length, we factor out the self-reference: if the function satisfies $\llbracket \mathit{len} \rrbracket = F\,\llbracket \mathit{len} \rrbracket$, we need a fixed point of $F$. The Y combinator (Curry’s Paradoxical Combinator) provides this:

\[ Y := \lambda f.\, (\lambda x.\, f(xx))(\lambda x.\, f(xx)) \]

Its critical property is $Yg =_\beta g(Yg)$ for any $g$. The Y combinator requires NOR; under AOR the self-application diverges. For Applicative Order Evaluation (AOE) — which does not reduce under abstraction bodies — three modifications are needed: (1) use AOE instead of AOR, (2) use the $Y'$ combinator with $\eta$-expanded repetition $Y' = \lambda f.\, (\lambda x.\, f(\lambda y.\, xxy))(\lambda x.\, f(\lambda y.\, xxy))$, and (3) wrap both conditional branches in abstractions to achieve short-circuit evaluation.

De Bruijn Notation

Variable names can obscure meaning — in $\lambda y.\, a(\lambda y.\, b(\lambda y.\, cy)y)y$, each $y$ refers to a different binder. De Bruijn notation (1972) replaces names with integers indicating how many enclosing $\lambda$s must be escaped to locate the binding. For example, $\lambda x.\, \lambda y.\, x$ becomes $\lambda.\, \lambda.\, 2$. A single binding can appear as different integers depending on nesting depth: $\lambda x.\, (\lambda y.\, x)x$ becomes $\lambda.\, ((\lambda.\, 2)1)$, where the same $x$ appears as both 2 and 1. Free variables can be left unconverted.

All $\alpha$-equivalent expressions have identical de Bruijn representations, eliminating the need for $\alpha$-conversion entirely. The definition of $\beta$-reduction becomes $(\lambda.\, N)M \to_\beta N\left[M/1\right]$, and substitution is defined arithmetically on indices: when the index matches, the substituted term is inserted with a renaming adjustment; when it exceeds the target, it is decremented (to account for the removed binding); when less, it is left unchanged. Entering an abstraction increments the target index. This gives a natural “stack frame” interpretation where the binding environment is a simple stack and variables are stack indices. The trade-off is readability: de Bruijn terms are significantly harder for humans to parse, and converting between named and indexed representations introduces overhead.

Implementation Strategies

Practical implementations represent $\lambda$-expressions as abstract syntax trees (ASTs) tagged as variable, abstraction, or application nodes. A binding environment tracks variable-to-value mappings. With named variables, subtrees may shadow names, requiring fresh-name generation. With de Bruijn notation, the binding environment becomes a simple stack indexed by integers, making evaluation simpler at the cost of an extra conversion step and harder debugging.

Chapter 3: Formal Semantics

Formal semantics assigns precise mathematical meaning to every element of a programming language, enabling proofs of correctness, termination, and equivalence. While informal descriptions convey intuition, they lack the precision needed for rigorous reasoning. A semantic model usually specifies a set of observables — valid outputs such as values, types, or error/success indicators.

The primary framework here is operational semantics, specifically small-step operational semantics, which models an abstract machine taking one reduction step at a time.

Semantics and Category Theory

Reduction steps are formally described by an arrow $\to$ mapping one program state to the next. This arrow lives within the framework of category theory: the language forms a category and $\to$ is a morphism over it. Category theory abstracts beyond set theory and group theory, describing categories whose objects and morphisms (arrows) capture equivalences, reductions, or other transformations. A structure-preserving map from one category to another is a functor. Category theory supports self-reference: an entire category can be an object within the category of categories.

A programming language’s semantics is a morphism mapping program states to program states, sometimes annotated (e.g., $\to_\beta$ for beta-reduction). Morphisms in one category may be defined using another — for instance, one might specify $(x + y \to z)$ whenever $(x + y = z)$, grounding the programming language’s semantics in the language of arithmetic. Keeping the object language (the language being described) distinct from the metalanguage (the language used for the description) is essential for clarity. A compiler, for example, is essentially a functor between two language categories, and proving compiler correctness amounts to proving that functor correct. Programming language semantics are not ad hoc inventions; they are described within the rigorous framework of categories.

Semantics and Reality

A formal semantic model does not necessarily reflect what a real implementation does. Some tools execute formal specifications directly, but the resulting interpreters are far too slow for practical use; their value lies in providing reference implementations. Formal semantics deliberately abstracts away memory limits, garbage collection, and other runtime details. Mathematical proofs apply to abstract calculi, not physical machines; the designer’s responsibility is to argue convincingly that the abstraction faithfully captures intended behavior.

The Post System

The Post system, named after Emil Post, is a deductive formal system widely used to specify programming language semantics. It consists of signs (alphabet), variables, and productions. A production takes the form:

\[ \dfrac{t_1 \quad t_2 \quad \cdots \quad t_n}{t} \]

where the $t_i$ are premises and $t$ is the conclusion, read as “if the premises hold, then the conclusion holds.” A production with no premises is an axiom. A proof is built by recursively proving each premise until only axioms remain, forming an inverted tree with the final conclusion at the root and axioms at the leaves.

For example, logical and and or can be encoded: $\dfrac{A}{A \lor B}$, $\dfrac{B}{A \lor B}$, $\dfrac{A \quad B}{A \land B}$. From a proof of $A$ alone, one can derive $(A \lor B) \land (A \lor C)$.

Operational Semantics for $\lambda$-Calculus

Beta-reduction is restated as formal rules. A top-level redex reduces directly (an axiom): $\dfrac{}{(\lambda x.\, M)N \to_\beta M\left[N/x\right]}$. Reduction under abstractions: $\dfrac{M \to_\beta P}{\lambda x.\, M \to_\beta \lambda x.\, P}$. Reduction in applications (either side): $\dfrac{M \to_\beta P}{MN \to_\beta PN}$ and $\dfrac{N \to_\beta P}{MN \to_\beta MP}$. Because an expression may match multiple rules simultaneously, the semantics is non-deterministic, characterizing every valid reduction path.

Defining Evaluation Order

Most programming languages enforce a deterministic strategy. The formal semantics achieves this through side-conditions on productions.

Normal Order Reduction (NOR). The top-level redex axiom and under-abstraction rule remain as before. For applications where the operator is not an abstraction, reduce the operator first. Only when the operator cannot reduce further does the operand reduce. These conditions are expressed as universal quantifier side-conditions: $\forall x.\, \forall M_3.\, M_1 \neq \lambda x.\, M_3$ prevents use when the operator is an abstraction, and $\forall M_1'.\, M_1 \not\to M_1'$ ensures the operator is fully reduced before the operand is touched.

Applicative Order Evaluation (AOE). The substitution axiom adds the condition that the argument must be fully reduced first: $\forall M_2'.\, M_2 \not\to M_2'$. The operator reduces before the operand in applications. Crucially, the under-abstraction rule is absent — AOE does not reduce under abstractions, mirroring how most programming languages defer function-body evaluation until the function is called.

Terminal Values

The universal quantifiers used as side-conditions on productions complicate proofs that a particular rule applies. A cleaner approach gives a syntactic characterization of terminal (or final) expressions — those admitting no further reduction.

Several candidates exist for the set of terminal values: $\beta$-normal form, weak normal form, or head normal form. Choosing anything other than $\beta$-normal form forfeits the Church-Rosser uniqueness guarantee. Yet even with $\beta$-normal form, subtle questions remain. What is the semantics of $(\lambda x.\, xx)(\lambda x.\, xx)$? The only response is “no semantics,” since it has no normal form. What about $(\lambda x.\, \lambda y.\, y)((\lambda x.\, xx)(\lambda x.\, xx))$? It has a terminal value $\lambda y.\, y$, but not all legal derivation paths reach it. Should the expression be given a final value since there exists a reduction to it, or should we say there is no meaning? The answer depends on the goals of the semantic design.

A Proof with Multi-step Reduction

To chain individual steps, the multi-step reduction operator $\to^*$ is defined:

\[ \dfrac{}{M \to^* M} \qquad \dfrac{M_1 \to^* M_2 \quad M_2 \to M_3}{M_1 \to^* M_3} \]

This is the reflexive and transitive closure of $\to$. Using it, one can prove that $(\lambda x.\, \lambda y.\, y)((\lambda x.\, xx)(\lambda x.\, xx))x \to^* x$ under NOR by building a proof tree from the axioms upward.

An alternative to small-step semantics is big-step operational semantics, which describes the terminal value an expression evaluates to directly (written $E \Downarrow V$) rather than as the closure of smaller steps. Denotational semantics assigns mathematical objects to program phrases via double-bracket notation $\llbracket \cdot \rrbracket$; formally, it is a functor between categories, whereas operational semantics is a morphism within a single category. Denotational semantics offers compositionality but operational semantics has become dominant in practice because it more closely mirrors implementation and extends more easily to concurrency and nondeterminism.

Adding Primitives

Although the $\lambda$-calculus can encode common data types, such encodings are impractical. Real languages treat booleans, numbers, and collections as primitives with dedicated semantic rules.

Booleans and Conditionals

Boolean primitives (true, false, not, and, or) and if-then-else are added to the BNF syntax. Unlike Church-encoded booleans, these are genuine syntactic primitives with no underlying $\lambda$-calculus representation. The semantic rules for not include two axioms and a congruence rule:

\[ \dfrac{}{\text{not true} \to \text{false}} \qquad \dfrac{}{\text{not false} \to \text{true}} \qquad \dfrac{B \to B'}{\text{not } B \to \text{not } B'} \]

The and and or connectives evaluate their first argument before the second and exhibit short-circuiting behavior: and false B reduces directly to false without inspecting B, and and true B evaluates B. Rules for if reduce the condition until it becomes true or false, then select the appropriate branch:

\[ \dfrac{}{\text{if true then } E_1 \text{ else } E_2 \to E_1} \qquad \dfrac{}{\text{if false then } E_1 \text{ else } E_2 \to E_2} \qquad \dfrac{B \to B'}{\text{if } B \text{ then } E_1 \text{ else } E_2 \to \text{if } B' \text{ then } E_1 \text{ else } E_2} \]

Numbers

Natural numbers (unbounded) with binary operations $+, -, *, /$ are added. Let $M, N$ range over numeric expressions and $a, b$ over natural numbers. Semantic rules enforce left-to-right evaluation of operands. The addition rules are:

\[ \dfrac{a + b = c}{(+\; a\; b) \to c} \qquad \dfrac{M \to M'}{(+\; M\; N) \to (+\; M'\; N)} \qquad \dfrac{M \to M'}{(+\; a\; M) \to (+\; a\; M')} \]

The first rule appeals to the metalanguage of arithmetic via the predicate $a + b = c$. Subtraction adds the condition that the result must be a natural number: $\dfrac{a - b = c \quad c \in \mathbb{N}}{(-\; a\; b) \to c}$. When that condition fails (e.g., $2 - 3$), no rule applies and the expression is stuck. An alternative design avoids stuckness by mapping out-of-range results to a default: $\dfrac{a - b = c \quad c \notin \mathbb{N}}{(-\; a\; b) \to 0}$. This may seem counterintuitive, but the language’s subtraction operator is whatever the semantic rules say it is, even if it diverges from ordinary arithmetic.

Lists

Lists use a cons-cell representation: a list containing 1, 2, 3 is $\text{(cons 1 (cons 2 (cons 3 empty)))}$. Semantic rules define cons (which reduces its arguments left-to-right before constructing), first (extracts the head), and rest (extracts the tail). The core rules for extraction are:

\[ \dfrac{L_1 = \left[E\right] + L_2}{(\text{first}\; L_1) \to E} \qquad \dfrac{L_1 = \left[E\right] + L_2}{(\text{rest}\; L_1) \to L_2} \]

The premise $L_1 = \left[E\right] + L_2$ implies that $L_1$ is not empty; applying first or rest to empty is stuck. Congruence rules allow reduction inside cons, first, and rest arguments.

Sets

Sets are abstract collections of distinct objects with insert, remove, and contains?. The formal semantics treats sets as abstract mathematical objects, ignoring representation details. Semantic rules appeal to metalanguage set theory: insertion uses $\cup$, removal uses $\setminus$, and membership testing uses $\in$ and $\notin$. For example:

\[ \dfrac{\forall E_1.\, E \not\to E_1 \quad S' = S \cup \{E\}}{(\text{insert}\; E\; S) \to S'} \qquad \dfrac{\forall E_1.\, E \not\to E_1 \quad E \in S}{(\text{contains?}\; E\; S) \to \text{true}} \]

All operations require their expression arguments to be fully reduced (the $\forall E_1.\, E \not\to E_1$ premise) before the set operation fires. Inserting into empty yields a singleton; removing from empty yields empty.

Denotational Semantics

Denotational semantics assigns mathematical objects (denotations) directly to program phrases, describing what a program computes rather than how. The meaning of an arithmetic expression is a function from environments to numbers; the meaning of a while loop is the least fixed point of a continuous function on state transformations, requiring domain theory. Pioneered by Dana Scott and Christopher Strachey in the late 1960s, its principal advantage is compositionality: the meaning of a compound expression is determined entirely by the meanings of its parts. In practice, small-step operational semantics dominates for formalizing real languages, but denotational semantics remains important in the theory of recursive types, domain equations, and higher-order function semantics.

Looking Ahead

The next chapter introduces types, which classify values according to the operations they support. Type systems make it possible to prove that well-typed programs never get stuck.

Chapter 4: Types

Stuck States and the Need for Types

A well-formed expression can reach a point where no reduction rule applies yet the expression is not a value. For instance, $ (\text{true})\ 5 $ applies a boolean to a number – no semantic rule covers this case. Such a configuration is called a stuck state. Stuck states represent run-time errors arising from meaningless operations: applying non-functions, adding a number to an abstraction, and so forth.

The purpose of a type system is to prevent stuck states statically – before evaluation begins. The system assigns a type to every expression and rejects programs in which types are inconsistent. The central guarantee is type safety, which consists of two theorems:

Progress. If expression E is well-typed and not a value, then there exists some E' such that $ E \to E' $.

Preservation (Subject Reduction). If $ E : \tau $ and $ E \to E' $, then $ E' : \tau $.

Together these guarantee that well-typed programs never get stuck. Progress ensures that evaluation can always take a step; preservation ensures that each step maintains the type invariant.

Static vs. Dynamic Typing

In statically typed languages, type checking occurs before execution – typically at compile time. Every expression is assigned a type, and the program is rejected if any expression is ill-typed. In dynamically typed languages, type tags are attached to values at run time, and type errors surface only when a problematic operation is actually attempted. Static typing catches errors earlier and enables certain compiler optimizations, but is more restrictive; dynamic typing is more flexible but defers errors to run time. Dynamically typed code is neither polymorphic nor monomorphic – those concepts belong to the domain of static type systems.

The Simply-Typed Lambda-Calculus

Types are introduced by extending the lambda-calculus with a type annotation on each abstraction parameter. The syntax becomes:

⟨Abs⟩  ::= λ ⟨Var⟩ : ⟨Type⟩ . ⟨Expr⟩
⟨Type⟩ ::= ⟨PrimType⟩ | ⟨Type⟩ → ⟨Type⟩

A type environment $ \Gamma $ is a set of pairs $ \langle x, \tau \rangle $ mapping variables to types. A type judgment has the form $ \Gamma \vdash E : \tau $, read “under environment $ \Gamma $, expression $ E $ has type $ \tau $.”

The three core type rules are:

\[ \textbf{T\_Variable} \quad \dfrac{\langle x, \tau \rangle \in \Gamma}{\Gamma \vdash x : \tau} \]\[ \textbf{T\_Abstraction} \quad \dfrac{\{\langle x, \tau_1 \rangle\} + \Gamma \vdash E : \tau_2}{\Gamma \vdash (\lambda x : \tau_1.\ E) : \tau_1 \to \tau_2} \]\[ \textbf{T\_Application} \quad \dfrac{\Gamma \vdash E_1 : \tau_1 \to \tau_2 \qquad \Gamma \vdash E_2 : \tau_1}{\Gamma \vdash (E_1\ E_2) : \tau_2} \]

These rules define the simply-typed lambda-calculus. Type derivation is entirely mechanical: each rule is determined by syntactic form, and subexpressions are checked recursively. An important consequence is the Strong Normalization Theorem: every well-typed term terminates. The Y combinator cannot be typed because self-application $ x\ x $ requires $ x $ to have type $ \tau = \tau \to \tau' $, which is circular. The simply-typed lambda-calculus is therefore not Turing-complete. Real functional languages restore Turing-completeness through special recursive binding constructs like OCaml’s let rec.

Semantics with Terminal Values

With a concrete notion of types, the semantics can be reformulated more cleanly using terminal values (metavariable $ V $): expressions that need no further evaluation. Initially, only abstractions are values. The AOE rules become:

\[ \textbf{ReduceLeft} \quad \dfrac{M_1 \to M_1'}{M_1\ M_2 \to M_1'\ M_2} \qquad \textbf{Application} \quad (\lambda x : \tau.\ M)V \to M\left[V/x\right] \qquad \textbf{ReduceRight} \quad \dfrac{M \to M'}{V\ M \to V\ M'} \]

All universal quantifiers from earlier formulations are eliminated. ReduceRight fires only when the operator is already a value $ V $, and Application requires its operand to be a value. This enforces deterministic left-to-right, call-by-value evaluation.

Polymorphism

Consider typing Church numerals. A Church numeral has type $ (\tau_1 \to \tau_1) \to \tau_1 \to \tau_1 $ for some concrete type $ t_1 $ substituted for $ \tau_1 $. Multiplication $ \ulcorner{*}\urcorner = \lambda m.\ \lambda n.\ \lambda f.\ \lambda x.\ m(nf)x $ type-checks because $ nf $ returns type $ t_1 \to t_1 $, which is exactly what $ m $ expects as its first argument. But exponentiation $ \ulcorner{\hat{\phantom{x}}}\urcorner = \lambda m.\ \lambda n.\ \lambda f.\ \lambda x.\ nmfx $ fails: $ n $ expects an argument of type $ t_1 \to t_1 $, yet $ m $ has type $ (t_1 \to t_1) \to t_1 \to t_1 $, and these do not match.

The problem is that fixing a single concrete type is too rigid. If Church numerals could use abstract types, with $ \tau_1 $ consistent within each numeral but different across numerals, exponentiation becomes typable. Let $ m $ have type $ (\tau_2 \to \tau_2) \to \tau_2 \to \tau_2 $ and $ n $ have type $ (\tau_3 \to \tau_3) \to \tau_3 \to \tau_3 $. In $ nm $, setting $ \tau_3 = \tau_2 \to \tau_2 $ makes the application valid (since $ m $’s type matches $ \tau_3 \to \tau_3 $). Then $ nmf $ is well-typed when $ \tau_1 = \tau_2 $, and the entire expression $ nmfx $ is typable. Even for Church numerals to type correctly, abstract types must be part of the language, with the ability to relate them to each other. This capability is precisely polymorphism – abstraction of code to operate over multiple types.

Polymorphism is classified by Cardelli and Wegner into two major families. Ad hoc polymorphism provides multiple implementations per type, in two sub-forms: overloading (same name, different implementations disambiguated at compile time by argument types) and coercion (converting values between types, either explicitly via casts or implicitly by the compiler). Universal polymorphism provides a single implementation generalized over types, also in two sub-forms: parametric polymorphism (functions parameterized by type variables, as in OCaml’s 'a -> 'a) and inclusion polymorphism (subtyping, where $ \tau_1 $ is a subtype of $ \tau_2 $ if values of $ \tau_1 $ are valid wherever $ \tau_2 $ is expected – the basis for OOP polymorphism). Universal polymorphism is considered by many to be the only “true” form, because new types can be created after the polymorphic code was written, without modification.

System F: The Polymorphic Lambda-Calculus

The informal notion of abstract types is made precise by System F (the second-order or polymorphic lambda-calculus), introduced independently by Jean-Yves Girard (1972) and John Reynolds (1974). System F extends the simply-typed lambda-calculus with two new constructs: type abstraction $ \Lambda\alpha.\ E $ and type application $ E\{\tau\} $.

⟨Expr⟩ ::= ... | Λ⟨TVar⟩.⟨Expr⟩ | ⟨Expr⟩{⟨Type⟩}
⟨Type⟩ ::= ... | ⟨TVar⟩ | ∀⟨TVar⟩.⟨Type⟩

The typing rules are:

\[ \textbf{T\_TypeAbs} \quad \dfrac{\Gamma \vdash E : \tau \qquad \alpha \text{ is not free in } \Gamma}{\Gamma \vdash \Lambda\alpha.\ E : \forall\alpha.\ \tau} \qquad \textbf{T\_TypeApp} \quad \dfrac{\Gamma \vdash E : \forall\alpha.\ \tau_1}{\Gamma \vdash E\{\tau_2\} : \tau_1\left[\tau_2/\alpha\right]} \]

T_TypeAbs requires that $ \alpha $ not already appear in $ \Gamma $, preventing ill-formed captures. T_TypeApp substitutes the concrete type $ \tau_2 $ for $ \alpha $ throughout $ \tau_1 $.

The polymorphic identity is $ \mathit{id} = \Lambda\alpha.\ \lambda x:\alpha.\ x : \forall\alpha.\ \alpha \to \alpha $. Applying $ \mathit{id}\{\mathit{Int}\}\ 42 $ instantiates it to $ \mathit{Int} \to \mathit{Int} $ and applies it to 42.

Encodings in System F

Church encodings receive precise types. Booleans: $ \llbracket\mathit{Bool}\rrbracket = \forall\alpha.\ \alpha \to \alpha \to \alpha $. Natural numbers: $ \llbracket\mathit{Nat}\rrbracket = \forall\alpha.\ (\alpha \to \alpha) \to \alpha \to \alpha $. Pairs: $ \llbracket\mathit{Pair}\ \tau_1\ \tau_2\rrbracket = \forall\alpha.\ (\tau_1 \to \tau_2 \to \alpha) \to \alpha $. Lists: $ \llbracket\mathit{List}\ \tau\rrbracket = \forall\alpha.\ (\tau \to \alpha \to \alpha) \to \alpha \to \alpha $ – essentially a right fold.

Properties of System F

Like the simply-typed lambda-calculus, System F is strongly normalizing: all well-typed terms terminate, so it remains non-Turing-complete. It is strictly more expressive, however – it can type Church numeral exponentiation. A crucial property is erasability: all $ \Lambda $-abstractions and type applications can be removed without altering run-time behavior. Formally, $ \text{erase}(\Lambda\alpha.\ E) = \text{erase}(E) $ and $ \text{erase}(E\{\tau\}) = \text{erase}(E) $. Types exist solely for static checking; type-level and term-level evaluation never interact.

Types in Practice

A type judgment is a mathematical formulation of a type checker implemented in a real compiler or interpreter. In practice, type checkers perform a depth-first traversal of the code, carrying a stack-like type environment, determining types from the inside out. Types also affect compilation: on many systems, integers and floating-point numbers use different register banks; garbage-collected languages must communicate where reference-typed values are stored; and accessing struct fields or object methods requires the compiler to know the layout. Parametric polymorphism complicates type checking, and we investigate the algorithm for it (Algorithm W) in Chapter 5.

Adding More Types

Let Bindings and Recursion

The simply-typed lambda-calculus is extended with let and let rec bindings to restore Turing-completeness. A store $ \sigma $ (a partial map from variables to values) is added, yielding pairs $ \langle \sigma, E \rangle $. The LetBody rule evaluates the body $ M $ in a store $ \sigma' = \sigma\left[x \mapsto V\left[z/x\right]\right] $ where $ z $ is fresh; the renaming prevents x from referencing itself within its own definition (unlike let rec). The Variable rule retrieves names from the store via explicit lookup. LetResolution converts a fully-evaluated let into its body value via substitution.

For let rec, the LetRecBody rule stores $ V $ with its own name intact ($ \sigma' = \sigma\left[x \mapsto V\right] $), enabling recursion: future lookups of x within $ V $ expand to $ V $ again. LetRecResolve removes the let rec when the body no longer references x. If the body is an abstraction that still references x, LetRecInvert moves the let rec inside the abstraction, preserving access to the recursive binding while making the abstraction available for further applications.

The type rules are:

\[ \textbf{T\_Let} \quad \dfrac{\Gamma \vdash V : \tau_1 \qquad \{\langle x, \tau_1\rangle\} + \Gamma \vdash E : \tau_2}{\Gamma \vdash \text{let}\ x = V\ \text{in}\ E : \tau_2} \]\[ \textbf{T\_LetRec} \quad \dfrac{\Gamma_0 = \{\langle x, \tau_1\rangle\} + \Gamma \qquad \Gamma_0 \vdash V : \tau_1 \qquad \Gamma_0 \vdash E : \tau_2}{\Gamma \vdash (\text{let rec} : \tau_1\ x = V\ \text{in}\ E) : \tau_2} \]

In T_LetRec, the environment already contains $ x : \tau_1 $ when checking $ V $ – this is what permits self-reference. The explicit type annotation $ \tau_1 $ is required because inferring it without a declaration is complex (addressed in Chapter 5). The let-rec-calculus is Turing-complete.

Booleans, Numbers, Lists, and Sets

Extending the system with primitives is mechanical. Booleans add terminal values true/false of type boolean; not, and, and or require boolean operands and produce boolean results. The T_If rule requires the predicate to be boolean and both branches to share a type $ \tau $:

\[ \textbf{T\_If} \quad \dfrac{\Gamma \vdash E_1 : \text{boolean} \qquad \Gamma \vdash E_2 : \tau \qquad \Gamma \vdash E_3 : \tau}{\Gamma \vdash \text{if}\ E_1\ \text{then}\ E_2\ \text{else}\ E_3 : \tau} \]

The specific type $ \tau $ is unconstrained – it suffices to check that $ E_2 $ and $ E_3 $ have matching types. Numbers add natural-number literals of type nat; T_NumOp requires nat operands and produces nat results. If an integer type were added, additional rules would be needed for type transitions (e.g., the difference of two nats is an int). Lists introduce a parameterized type list τ, with T_Cons requiring the element and the list to share the same parameter type. The type of empty is $ \text{list}\ \tau $ for any $ \tau $ – a non-deterministic judgment that constitutes an ad hoc form of polymorphism; a proper solution requires parametric polymorphism. Sets follow the same pattern with set τ, using insert and remove operations analogous to cons.

Chapter 5: Functional Programming

Introduction

Functional languages – Racket, OCaml, ML, Haskell – are based on constructing and evaluating expressions, where functions serve as the primary abstraction. Computations are described by mathematical formulas rather than explicit sequences of steps. Ordering is implicit, and the focus is on what to compute rather than how.

The distinguishing characteristic is the absence or reduced importance of assignment. In a purely functional language, executing a function produces no side effects – the machine state remains unchanged. This property is called referential transparency.

Referential Transparency. A language is referentially transparent if every function, given the same input, always produces the same output. Consequently, if z is bound to f(3), then z and f(3) are interchangeable in all contexts.

Languages like Scheme/Racket, ML, and OCaml are impure – they allow side effects through assignment and I/O but have a pure subset and generally discourage impure features. In an impure language, z bound to f(3) and a fresh call f(3) may produce different results if f mutates state. For example, in OCaml, if f increments a mutable counter, successive calls to f 3 yield 4, 5, 6, etc., while z remains bound to 4. Haskell is pure: it prohibits assignment and I/O entirely, channeling all effects through monads. This strict enforcement is why Haskell serves as our exemplar.

Variables in functional languages exhibit immutability: once bound, a variable’s value cannot change within its scope. Mathematically, “variable” means only that the same name may be bound to different values in different evaluations, not that the binding can change. Impure languages require explicit constructs (such as OCaml’s ref and mutable records) to gain mutability; pure languages have no such mechanism at all. Functions are always first-class values – they can be passed as arguments, returned from functions, and stored in data structures.

Exemplar: Haskell

Haskell, designed in the 1980s by a committee seeking a state-of-the-art pure functional language and named after logician Haskell Curry, is a lazy, purely functional language (current standard: Haskell 2010). Influenced by ML, it is a cousin of OCaml but differs in key ways – most notably laziness and enforced purity. We use Haskell to illustrate functional programming concepts, not as a comprehensive language tutorial.

Expressions and Functions

All behavior is described by expressions. Operators like + and * are functions usable in infix or prefix form (e.g., (+) 100 20). As in the lambda-calculus, every function takes exactly one argument; multi-argument functions are curried. Partial application is natural: add10 = (+) 10 creates a function adding 10 to its argument. Haskell’s lambda syntax replaces $ \lambda $ with \ and . with ->:

churchTwo = \f -> \x -> f (f x)
churchMul = \m -> \n -> \f -> m (n f)

Because Haskell is typed, it cannot represent the Y combinator – the self-application $ x\ x $ causes the type checker to attempt constructing the infinite type t0 ~ t0 -> t. Recursion is instead achieved through named bindings, analogous to let rec.

Conditionals, Guards, and Syntactic Sugar

Haskell’s if is an expression (not a statement) requiring both branches to have the same type, and the predicate must be Bool. Guards offer an alternative syntax for case-based definitions:

absButTerrible x | x >= 0 = x
                 | x <  0 = -x

Guards are syntactic sugar: they desugar mechanically to nested if-then-else expressions. More generally, formal semantics are defined over a reduced core language, and surface syntax is translated to that core via informal desugaring rules. For guards, each case $ | E_1 = E_2 $ becomes $ \text{if } E_1 \text{ then } E_2 \text{ else } \ldots $, with the empty case mapping to error "Non-exhaustive pattern".

Environments and Name Binding

An environment is a lookup table mapping identifier names to values (or, for type checking, names to types – a type environment). Environments nest to form a tree with the global scope at the root. A variable is accessible in the environment where it is declared and in all descendants. This principle is lexical scoping. Haskell’s let...in introduces a child environment:

fun1 n = let x = "A" in n ++ "x is " ++ x

The store $ \sigma $ in operational semantics and the type environment $ \Gamma $ are duals encoding this nesting. Every expression is paired with its environment and evaluated or typed in that context; subexpressions introducing new variables operate in extended contexts.

For top-level declarations, a resolve function converts the program’s declaration list into a store $ \sigma $. The starting expression is $ \sigma(\text{main})\ z $, where main is the entry point and $ z $ is the program argument. Execution has two phases: build $ \sigma $ from declarations, then evaluate main in $ \sigma $.

\[ \textbf{EmptyProgram} \quad \text{resolve}() = \text{empty} \]\[ \textbf{Declaration} \quad \dfrac{\sigma = \text{resolve}(L) \qquad \sigma' = \sigma\left[x \mapsto V\right]}{\text{resolve}(x = V\ ;\ L) = \sigma'} \]

Declarations are processed right-to-left; since type checking prevents duplicate names, the order is immaterial. In Haskell, names are bound to unevaluated expressions due to lazy evaluation (discussed later).

Algebraic Data Types

Haskell defines types and their values through algebraic data types (ADTs):

data Bool = True | False

True and False are data constructors – symbols that exist solely to be distinguishable. Their run-time values are one-tuples containing just the name: the value of True is $ \{True\} $. Data constructors can take parameters, creating infinite types:

data BoolList = Empty | Pair Bool BoolList

Pair is a two-argument curried function. Semantically, all ADT values are $ n $-tuples: the first element is a symbol identifying the constructor (the tag), and the remaining elements are the arguments. The resolve rules store nullary constructors as symbols $ \{N\} $ and parameterized constructors as curried lambdas producing tuples $ \{N, x_1, \ldots, x_m\} $. Type declarations can themselves be parameterized:

data List a = Empty | Pair a (List a)

This parameterization affects only types, not semantics – Pair remains the same function regardless of a.

Parametric Polymorphism and System F

Virtually all typed functional languages use parametric polymorphism, formalized in System F. System F extends the simply-typed lambda-calculus with type abstractions $ \Lambda\alpha.\ E $, type applications $ E\{\tau\} $, and universal types $ \forall\alpha.\ \tau $. The type-level language follows lambda-calculus reduction but forbids recursion, so type evaluation always terminates.

Church numerals gain explicit type parameters: $ \lceil 2 \rceil = \Lambda\alpha.\ \lambda f : \alpha \to \alpha.\ \lambda x : \alpha.\ f(f\ x) $. We recover monomorphic Church numerals by type application: $ \lceil 2 \rceil\{t_1\} \to \lambda f : t_1 \to t_1.\ \lambda x : t_1.\ f(f\ x) $. Exponentiation becomes typable because different numerals can be instantiated at different types. For $ \lceil \hat{\cdot} \rceil $, $ n $ must accept $ m $ as its argument. Since $ m $ has type $ (\alpha \to \alpha) \to \alpha \to \alpha $, and $ n $ expects $ \tau_3 \to \tau_3 $, we set $ \tau_3 = \alpha \to \alpha $, making the application well-typed. The full System F expression wraps everything in a $ \Lambda $-abstraction over the base type.

The power of System F is that expressions themselves can be made polymorphic. The type language is erasable: all $ \Lambda $-abstractions and type applications can be removed without changing run-time behavior. Formally, erasure strips type annotations from abstractions and drops type-level constructs entirely:

\[ \text{erase}(\lambda x : \tau.\ E) = \lambda x.\ \text{erase}(E) \qquad \text{erase}(\Lambda\alpha.\ E) = \text{erase}(E) \qquad \text{erase}(E\{\tau\}) = \text{erase}(E) \]

The Erasability Theorem states: if $ E \to E' $, then either $ \text{erase}(E) = \text{erase}(E') $ or $ \text{erase}(E) \to_\beta \text{erase}(E') $. Despite erasability, System F guarantees type soundness (progress and preservation). It is strictly more expressive than the simply-typed lambda-calculus yet still strongly normalizing and non-Turing-complete.

Typed functional languages, including Haskell, base their type systems on System F. Parameterized types like List a are possible because List is a type constructor (a $ \Lambda $-abstraction at the type level), while Pair is a data constructor (term-level). Writing all type annotations explicitly is extremely tedious, motivating type inference.

Type Inference

Since types are erasable, one can ask: can types be inferred from untyped terms? Full type inference for System F is undecidable, but it becomes tractable under the restriction that universal quantifiers appear only at the outermost level of a type expression (so $ \forall\alpha.\ \alpha \to \alpha $ is allowed, but $ t_1 \to \forall\alpha.\ \alpha $ is not). This restriction covers virtually all useful functions.

Erasing types from System F’s rules yields the polymorphic lambda-calculus – syntactically identical to the untyped lambda-calculus, but with an implicit static type system. The key erased rules are:

\[ \textbf{T\_Gen} \quad \dfrac{\Gamma \vdash E : \tau \qquad \forall x.\ \langle x, \alpha \rangle \notin \Gamma}{\Gamma \vdash E : \forall\alpha.\ \tau} \qquad \textbf{T\_Spec} \quad \dfrac{\Gamma \vdash E : \forall\alpha.\ \tau_1}{\Gamma \vdash E : \tau_1\left[\tau_2/\alpha\right]} \]\[ \textbf{T\_Ascr} \quad \dfrac{\{\langle x, \tau_1 \rangle\} + \Gamma \vdash E : \tau_2}{\Gamma \vdash (\lambda x.\ E) : \tau_1 \to \tau_2} \]

T_Gen (generalization) lifts a type to a universal: if $ E $ has type $ \tau $, it can be viewed as having $ \forall\alpha.\ \tau $, justified because $ E $ could be rewritten as $ \Lambda\alpha.\ E $. T_Spec (specialization) instantiates a universal type by substituting any $ \tau_2 $ for $ \alpha $, justified because $ E $ could be rewritten as $ E\{\tau_2\} $. T_Ascr (type ascription) introduces a type for a lambda parameter, justified because $ \lambda x.\ E $ could be rewritten as $ \lambda x : \tau_1.\ E $. The erased versions of T_Variable and T_Application are unchanged since those rules contain no explicit types.

These rules lack syntax-directedness: for any expression, up to three rules may apply (the structural rule plus generalization and specialization), and ascription, specialization, and generalization can each be instantiated in infinitely many ways. A syntax-directed system has exactly one rule per syntactic form and defines the semantics completely from immediate subexpressions. Despite this nondeterminism, the Hindley-Milner algorithm (described by Milner and Damas) correctly infers types.

Unification

The core of type inference is unification – discovering substitutions that make two type expressions equal. A unifier of $ \tau_1 $ and $ \tau_2 $ is a substitution $ S $ such that $ \tau_1 S = \tau_2 S $. The Most General Unifier (MGU) is the least constraining such substitution: every other unifier factors through it.

Most General Unifier. A substitution S is an MGU of $ \tau_1 $ and $ \tau_2 $ if S unifies them and for every other unifier S' there exists S'' such that $ S' = S'' \circ S $.

Robinson’s Unification Algorithm computes the MGU:

\[ \begin{aligned} &U(t, t) = \left[\right] \qquad U(t_1, t_2) = \text{error} \quad (t_1 \neq t_2) \qquad U(t, \tau_1 \to \tau_2) = \text{error} \\ &U(\tau_1 \to \tau_2,\ \tau_3 \to \tau_4) = S_1 S_2 \quad \text{where } S_1 = U(\tau_1, \tau_3),\ S_2 = U(\tau_2 S_1, \tau_4 S_1) \\ &U(\alpha, \tau) = \begin{cases} \text{error} & \text{if } \alpha \text{ occurs in } \tau \text{ (occurs-check)} \\ \left[\tau/\alpha\right] & \text{otherwise} \end{cases} \end{aligned} \]

A primitive type unifies only with itself. Two arrow types unify by first unifying their parameter types ($ S_1 $), then applying $ S_1 $ to both return types before unifying them ($ S_2 $). A type variable $ \alpha $ unifies with any type $ \tau $ via substitution $ \left[\tau/\alpha\right] $, unless the occurs-check detects that $ \alpha $ appears in $ \tau $ (which would create a circular type). The algorithm is commutative.

Algorithm W

Algorithm W (Milner and Damas) is the type inference algorithm underlying Haskell and OCaml. It takes a type environment $ \Gamma $ and an expression, returning a substitution and the expression’s type:

\[ \textbf{W\_Var} \quad W(\Gamma, x) = \langle \left[\right], \Gamma(x) \rangle \]\[ \textbf{W\_Abs} \quad W(\Gamma, \lambda x.\ E) = \langle S, (\alpha S) \to \tau \rangle \quad \text{where } \alpha \text{ fresh},\ \langle S, \tau \rangle = W(\{\langle x, \alpha \rangle\} + \Gamma, E) \]\[ \textbf{W\_App} \quad W(\Gamma, E_1\ E_2) = \langle S_1 S_2 S_3, \alpha S_3 \rangle \]

where $ \langle S_1, \tau_1 \rangle = W(\Gamma, E_1) $, $ \langle S_2, \tau_2 \rangle = W(\Gamma S_1, E_2) $, $ \alpha $ is fresh, and $ S_3 = U(\tau_1 S_2, \tau_2 \to \alpha) $.

W is syntax-directed: exactly one case per syntactic form. For variables, it simply looks up the type. For abstractions, it invents a fresh type variable $ \alpha $ for the parameter, types the body, and returns $ (\alpha S) \to \tau $; if $ \alpha $ was specialized during body typing, $ S $ records that. For applications, it types both subexpressions, propagating substitutions from the operator to the operand (via $ \Gamma S_1 $), then unifies the operator’s type with $ \tau_2 \to \alpha $. After W terminates, remaining type variables are universally quantified via T_Gen.

Real languages also allow explicit type annotations, both for documentation and because inferred polymorphic types sometimes do not express the programmer’s actual intent – they can be too polymorphic. Type inference is distinct from dynamic typing: types are reasoned about from the code alone, and every expression receives a precise type. It does not weaken type checking but allows sophisticated types to be used without writing (or even working out) the types by hand. The downside is that error messages can be incomprehensible, even to those familiar with type inference. Understanding the concepts of unification and Algorithm W makes reading such errors considerably easier.

Pattern Matching

With ADTs and polymorphism in hand, we reach one of the hallmarks of typed functional languages: pattern matching. Given our List a type, pattern matching decomposes values:

sumList l = case l of
    Empty    -> 0
    Pair x y -> x + sumList y

Each case branch associates a constructor pattern with an expression. Variables in the pattern are implicitly let-bound to the matched components. Patterns can be nested (e.g., Pair x (Pair y _) matches the first two elements) and _ is a wildcard matching without binding. Patterns are strictly ordered: a more general pattern must follow more specific ones. Values in patterns are also allowed, giving an easy way to write base cases like sumNums 0 = 0.

Haskell supports a shorthand where pattern matching appears in function declarations: sumNums 0 = 0; sumNums x = x + sumNums (x-1), which desugars directly to a case expression.

Semantics of Patterns

Patterns are formalized through a match construct: $ \text{match } E_1 : P \text{ then } E_2 \text{ else } E_3 $. Patterns are restricted to one level deep (nested patterns are rewritten into cascaded matches). The MatchLeft and MatchRight rules evaluate the scrutinee and pattern respectively. Once both sides are values, MatchBind converts each unbound variable $ x $ in $ P $ into a _ and wraps the then-branch: $ \text{let}\ x = N\ \text{in}\ E_1 $, where $ N $ is the corresponding component of the matched value. MatchThen fires when all variables are bound and the constructor symbols match. MatchElse fires when the constructor symbols differ, evaluating $ E_3 $ instead. If all patterns are exhausted, a run-time error results. Type checking ensures that patterns and values have compatible tuple sizes.

The ML Module System

While Haskell is our functional programming exemplar, the ML family (Standard ML, OCaml) introduced an influential module system providing abstraction at the scale of program components. Structures group related types, values, and functions into named units. Signatures specify interfaces – the types and operations a structure must provide – without revealing implementation. When a structure is constrained by a signature, its types become abstract: clients cannot inspect representations, enforcing encapsulation. Functors are parameterized structures – functions from structures to structures – providing parametric polymorphism at the module level. Sharing declarations let functors express that types from different module arguments are identical. Together, these mechanisms demonstrate the abstraction, parameterization, and qualification principles operating at the scale of entire program components.

Errors and Exceptions

Haskell uses error to signal erroneous conditions. Semantically, error is both a terminal value and an expression. If an error appears in any subexpression, it propagates upward:

\[ \textbf{ErrorLeft} \quad \text{error}\ E \to \text{error} \qquad \textbf{ErrorRight} \quad V\ \text{error} \to \text{error} \]

The entire program becomes an error if any subexpression errors. Haskell has no language-level exceptions; exception handling is a library feature built on monads.

Lazy Evaluation

Recall three reduction strategies: AOR/AOE (reduce leftmost, innermost redex – arguments fully evaluated before passing) and NOR (reduce leftmost, outermost redex – arguments passed unevaluated). NOR evaluates arguments “as needed” – this is lazy evaluation. AOR/AOE are eager evaluation.

The Standardization Theorem tells us NOR always reaches a normal form if one exists, while AOR can diverge. Yet most languages use eager evaluation because it is generally more efficient: when an argument is used multiple times, lazy evaluation duplicates the unevaluated expression, requiring multiple reductions. Consider $ (\lambda x.\ x\ x\ x\ x\ x)((\lambda y.\ y)z) $: NOR takes six steps (duplicating and re-reducing the argument), but AOR takes only two. However, NOR wins in two cases: (1) avoiding divergence that AOR falls into, and (2) skipping reductions whose results are discarded, as in $ (\lambda x.\ z)((\lambda a.\ \lambda b.\ \lambda c.\ \lambda d.\ \lambda e.\ e)\ y\ y\ y\ y\ y) $ where AOR takes six steps but NOR takes one.

Haskell uses NOE (Normal Order Evaluation) – NOR that never reduces inside abstractions. This enables natural definitions of infinite data structures:

allTheNumbersFrom x = Pair x (allTheNumbersFrom (x+1))

Extracting third (allTheNumbersFrom 1) terminates in finite steps. Under NOE, third unpacks the outermost Pair, discards the first element, and passes the tail to second. second does likewise, passing the next tail to first. first extracts the head, which is 1+1+1. Only then is the addition evaluated to 3. The infinite recursive calls to allTheNumbersFrom at each level are never forced because they are bound to wildcard _ patterns or otherwise unused. This works precisely because Haskell is pure: whether an unused subexpression is evaluated is unobservable, so lazy evaluation produces correct results.

The GHC compiler performs heroic optimizations to make lazy evaluation efficient. It determines which expressions are definitely evaluated and arranges them into an evaluation order, memoizing any expression that cannot be proved to be evaluated exactly once (so that a variable used multiple times does not trigger full re-evaluation). Lazy evaluation also eliminates the need for special tail-recursion optimization: since Haskell returns unevaluated expressions rather than computed results, tail calls naturally require no extra stack frames.

Laziness can be added ad hoc to eager languages by wrapping recursive structures in thunks:

type 'a lazyList = Empty | Pair of 'a * (unit -> 'a lazyList)

The user must explicitly call the thunk with x() to force evaluation. This pattern appears as generators in languages like Python.

Impure State and Assignment

Laziness requires purity for predictable results, but practical programs need state. In impure languages like OCaml, mutable state is modeled by adding a heap $ \Sigma $ to the evaluation context, yielding triples $ \langle \Sigma, \sigma, E \rangle $. Unlike the transient store $ \sigma $, the heap persists across reduction steps: a step can change $ \Sigma $ to $ \Sigma' $. Labels (unique values analogous to memory addresses) serve as indices into the heap.

The language is extended with ref (allocate), ! (dereference), and := (assign):

\[ \textbf{Ref} \quad \dfrac{\ell \text{ fresh} \quad \Sigma' = \Sigma\left[\ell \mapsto V\right]}{\langle \Sigma, \sigma, \text{ref}\ V \rangle \to \langle \Sigma', \sigma, \ell \rangle} \qquad \textbf{Deref} \quad \dfrac{\Sigma(\ell) = V}{\langle \Sigma, \sigma, !\ell \rangle \to \langle \Sigma, \sigma, V \rangle} \]\[ \textbf{Assg} \quad \dfrac{\Sigma' = \Sigma\left[\ell \mapsto V\right]}{\langle \Sigma, \sigma, \ell := V \rangle \to \langle \Sigma', \sigma, V \rangle} \]

Because $ \Sigma $ changes during evaluation, evaluation order fundamentally affects program meaning. Even discarded subexpressions can alter the heap. Although nothing makes state technically incompatible with lazy evaluation, the resulting behavior would be so unpredictable that this combination is never used in practice. The same idea – adding extra elements to the evaluation tuple – generalizes to any impure behavior, including I/O and file systems.

Monads

Haskell’s creators observed that while an expression $ E $ in an impure language may not be pure, the step relation $ \to $ over pairs $ \langle \Sigma, E \rangle $ is pure: the side effects are no longer “side” when described explicitly in the pair. Monads bring this insight into the language: they bundle side-effecting behavior into composable units forming explicitly ordered chains.

The IO Monad

The IO monad has type constructor IO a, conceptually representing World -> (a, World) – a function taking the entire world state and producing a value plus a new world. getLine :: IO String and putStrLn :: String -> IO () describe but do not perform I/O. They are combined with bind (>>=):

(>>=) :: IO a -> (a -> IO b) -> IO b
echo = getLine >>= putStrLn

The notation f >>= g specifies: perform action f, pass its result to function g to produce another action, then perform that action. Critically, echo itself is a non-function value of type IO () – it specifies but does not perform the I/O. Nothing done so far is impure.

How does the I/O actually happen? Recall that a Haskell program must define main, and running the program means evaluating main. But main is not a function – it is an IO monad. The behavior of the entire program is to run the I/O described by the value of main. Some entity outside the language actually causes the I/O to occur. In the REPL (ghci), if a user expression evaluates to an IO, the action is performed immediately.

The then combinator >> chains actions when the first result is unneeded: defined as f >> g = f >>= (\_ -> g), it allows echoecho = echo >> echo. The return primitive wraps a pure value into the monad without performing any I/O: return (l1, l3) :: IO (String, String). It allows pure expressions to participate in monadic chains.

Generalizing Monads

Monad. A monad is a triple $ \langle M, {>>=_M}, \text{return}_M \rangle $ where M is a type constructor, $ \text{return}_M : \alpha \to M\{\alpha\} $, and $ {>>=_M} : M\{\alpha\} \to (\alpha \to M\{\beta\}) \to M\{\beta\} $, satisfying three monad laws: left identity ($ \text{return}\ a >>= k = k\ a $), right identity ($ m >>= \text{return} = m $), and associativity ($ m >>= (\lambda a.\ k\ a >>= \lambda b.\ h\ b) = (m >>= \lambda a.\ k\ a) >>= \lambda b.\ h\ b $).

Haskell provides do notation as syntactic sugar for monadic composition:

myIOAction = do
    l1 <- getLine
    getLine
    l3 <- getLine
    return (l1, l3)

The Maybe Monad

The Maybe type (data Maybe a = Nothing | Just a) models computations that may fail. Its monadic bind propagates failure automatically:

(>>=) Nothing  _ = Nothing
(>>=) (Just x) g = g x
return = Just

A safe division returns Nothing on division by zero. Composing safe operations with do notation gives clean error propagation without explicit checks:

safef a b c = do
    r1 <- safeDiv a b
    r2 <- safeDiv b c
    return (r1 + r2)

If b is zero, safeDiv a b returns Nothing. The >>= for Maybe then skips the remaining computations (safeDiv b c and return (r1 + r2)), propagating Nothing as the final result. All error-handling mechanics are confined to the Maybe monad definition itself.

The State Monad

The IO monad supports variables and state, but we do not need the entire World just to manage state. Monads can model mutable state purely with a “state transformer” – a function that takes an old state and returns a result plus a new state:

data State s a = ST (s -> (a, s))

The return combinator maps a value to a function that returns that value while leaving state unchanged: return x = ST (\s -> (x, s)). The bind combinator threads state sequentially: first f is applied to initial state $ s_0 $, producing $ (a, s_1) $; then g a yields a new state computation h, which is applied to $ s_1 $ to produce $ (b, s_2) $. The state monad makes the changes to $ \Sigma $ explicit in Haskell code. Helper functions get (returns the current state as the value) and put v (replaces the state with v) provide a convenient interface. The IO monad is opaque – there is no way to extract the underlying action from an IO value – but State is fully decomposable and purely functional. Both share identical algebraic structure through the monad laws.

Monads are not strictly necessary outside pure languages, but their ability to make implicit behavior explicit is valuable. OCaml and Scala now support monads even though their environments are impure. Only Haskell requires monads for all impure behavior.

Continuations and CPS

Computers operate in explicit steps, so implementing functional languages requires bridging the gap. A continuation represents the remainder of a computation at any point. In $ (3 + 5) \times 4 $, the continuation of the subexpression $ 3 + 5 $ is $ \lambda x.\ x \times 4 $; the continuation of the whole expression is the identity function.

Continuation Passing Style (CPS) is a compiler intermediate representation where every function receives an extra parameter (its continuation) and passes results to it instead of returning. A simple let f x = 3 becomes let f x k = k 3. Functions in CPS never return; they pass computed values to their continuation. When the computation is finished, the continuation is the identity function.

Consider a more complex example – a recursive filter function in OCaml. In direct style, it pattern-matches on the list and either conses the head onto the recursive result or skips it. In CPS, each recursive call to filter receives a distinct continuation (k3 or k4), and the predicate p also receives a continuation (k2). The CPS version names all intermediate values (making dataflow explicit) and all control-flow points (making control flow explicit). This facilitates optimizations.

In particular, tail-call elimination becomes trivial. A tail-recursive call in CPS has the form let k4 l2 = k1 l2 in filter xs p k4. The continuation k4 is $ \eta $-reducible to k1, so the entire construction simplifies to filter xs p k1 – the caller’s continuation is reused directly with no additional context. This optimization is fundamental to functional language implementations.

In Haskell, CPS is less critical because NOE naturally avoids building up stack frames for tail calls – what is returned is the unevaluated expression itself rather than the result of a recursive call.

First-Class Continuations

Some languages expose continuations as first-class values via callcc (call-with-current-continuation). The primitive captures the current continuation and passes it as an argument to a given function. In Scheme:

(+ 3 (callcc (lambda (k) (k 4))))   ;; => 7

Here, callcc captures the continuation $ \lambda v.\ 3 + v $ and passes it as k. Calling (k 4) aborts the current computation and returns 4 directly to the captured continuation, yielding $ 3 + 4 = 7 $. First-class continuations enable non-local exits, coroutines, cooperative multithreading (by saving and restoring continuations as a queue), exception handling, and backtracking search. SML/NJ provides typed continuations with type 'a cont, callcc : ('a cont -> 'a) -> 'a, and throw : 'a cont -> 'a -> 'b. First-class continuations are powerful but notoriously hard to reason about; most modern languages opt for more structured alternatives – async/await, algebraic effects, or delimited continuations – offering similar capabilities with better composability.

Appendix: Full Semantics Reference

The complete syntax, semantics, and name resolution for “untyped lambda-Haskell” are unified here. The language includes variables, abstractions, applications, booleans, conditionals, numbers, arithmetic, let bindings, pattern matching (via match...then...else), and errors. Terminal values include variables, abstractions, true, false, numbers, and error.

The semantic rules use Normal Order Evaluation with a store $ \sigma $. Application passes the unevaluated operand directly (reflecting lazy evaluation): $ \langle \sigma, (\lambda x.\ M_1)\ M_2 \rangle \to \langle \sigma, M_1\left[M_2/x\right] \rangle $. ReduceLeft evaluates the operator first. Arithmetic rules (Add, Sub, etc.) evaluate operands left-to-right and compute primitive results; subtraction requires the result to be a natural number. IfTrue and IfFalse select the appropriate branch; IfExpr evaluates the predicate. The Variable rule looks up names in the store. LetBody evaluates the body in an extended store (with a fresh variable replacing the bound name to prevent accidental self-reference). LetResolution substitutes the bound value into the result when the body reaches a terminal value. Match rules (MatchLeft, MatchRight, MatchBind, MatchThen, MatchElse) implement pattern matching as described in the pattern matching section. ErrorLeft and ErrorRight ensure that errors in any subexpression cause the entire application to become error.

Global name resolution is defined by resolve: the empty program yields the empty store; each declaration extends the store with a new binding. Data declarations for nullary constructors map the name to the symbol $ \{N\} $; parameterized constructors map the name to a curried lambda producing the tuple $ \{N, x_1, \ldots, x_m\} $.

Chapter 6: Logic Programming

Programming languages are often classified as declarative or imperative. A declarative language lets the programmer describe what is to be computed without specifying how; an imperative language requires the programmer to spell out the steps of the computation. These labels are best understood as relative: a given language is more declarative or more imperative than another, and one can imagine a spectrum from raw machine code (maximally imperative) to a hypothetical language in which the programmer merely states constraints a solution must satisfy and the system finds a solution automatically (maximally declarative).

Predicate logic is a strong candidate for such a maximally declarative language. With an appropriate domain and set of predicates, predicate logic can describe any computable problem. Because it is a mathematical formalism predating the computer, programming purely in predicate logic has long been an aspiration of the declarative community. This chapter examines the logic programming paradigm and its principal language, Prolog (Programmation en Logique), invented around 1970 at the University of Marseilles. Prolog is, for practical purposes, the paradigm itself: nearly every logic programming language in use derives from it. Logic programming also belongs to a broader family of query languages, a connection explored at the end of this chapter.

Predicate Logic and Clausal Form

Formulas in predicate logic are composed of constants ($ a, b, c, \ldots $), function symbols ($ f, g, \ldots $), predicate symbols (capital letters), variables ($ x, y, \ldots $), logical connectives ($ \neg, \to, \lor, \land $), and quantifiers ($ \forall, \exists $). Each function symbol and predicate symbol has an associated arity $ n \geq 0 $, specifying how many arguments it accepts. A particular predicate language is determined by the choice of these symbols. For example, introducing the unary function $ s $ and the binary predicate $ g $ yields the formula $ \forall x.\, g(s(x), x) $. Under the interpretation where $ s $ is the successor function and $ g $ means “greater than” over the naturals, this reads “for every natural number $ x $, the successor of $ x $ is greater than $ x $.”

The connectives have their standard meanings: $ \neg X $ is true when $ X $ is false; $ X \to Y $ means “$ X $ implies $ Y $” (vacuously true when $ X $ is always false); $ X \lor Y $ is true when either operand is true; and $ X \land Y $ is true when both are. A sequence joined by $ \lor $ is a disjunction, and one joined by $ \land $ is a conjunction.

The logic programming paradigm proceeds as follows: we assert the truth of a set of formulas (our database), then either query the database about the truth of a particular formula, or present a formula with variables and ask the system to find satisfying instantiations. To make this feasible, formulas are placed into a uniform representation called clausal form through a systematic transformation:

Eliminate implications: Replace every $ A \to B $ with $ \neg A \lor B $.
Move negations inward using De Morgan’s laws and the equivalences $ \neg \forall x.\, P \equiv \exists x.\, \neg P $ and $ \neg \exists x.\, P \equiv \forall x.\, \neg P $, until negation applies only to atomic predicates.
Standardize variables so each quantifier binds a distinct name, renaming as needed (alpha-conversion) to avoid variable capture when pulling quantifiers outward.
Move quantifiers outward to prenex form, preserving their relative left-to-right order.
Skolemize: Replace each existentially quantified variable with a Skolem function of all universally quantified variables whose quantifiers appear to its left. If no universal quantifier appears to the left, use a fresh Skolem constant. For example, $ \exists y.\, \forall x.\, P(x,y) $ becomes $ \forall x.\, P(x, s_1) $ (Skolem constant), while $ \forall x.\, \exists y.\, P(x,y) $ becomes $ \forall x.\, P(x, s_1(x)) $ (Skolem function), because the choice of $ y $ depends on $ x $.
Drop universal quantifiers, since all remaining variables are implicitly universally quantified.
Convert to Conjunctive Normal Form (CNF) by distributing $ \lor $ over $ \land $ using $ A \lor (B \land C) = (A \lor B) \land (A \lor C) $.

The result is a conjunction of clauses, each a disjunction of possibly negated predicates. Reordering so unnegated predicates come first, collecting the negations, and reintroducing implication, each clause becomes:

\[ (P_1 \lor \cdots \lor P_j) \mathbin{:-} (P_{j+1} \land \cdots \land P_n) \]

The disjunction on the left is the head, the conjunction on the right is the body, and the meaning is: if the body is true, then the head is true. This representation admits resolution, a powerful inference method. Resolution has the property that if a set of clauses is logically inconsistent, it can derive the empty clause (:-), representing a contradiction. Hence, to determine whether clauses $ \{C_1, \ldots, C_n\} $ imply clause $ D $, we show by resolution that $ \{C_1, \ldots, C_n, \neg D\} $ is inconsistent.

Horn Clause. A Horn clause is a clause whose head contains at most one predicate. A headed Horn clause has exactly one predicate in its head (meaning "if all these body facts hold, then this one head fact holds"). A headless Horn clause has no head and serves as a query: it either is true or it is not. A Horn clause with no body is simply a declaration of a fact. Horn clauses are sufficient to express any computable function. A logic program is a set of Horn clauses with exactly one headless clause representing the query. By combining resolution with unification for matching, we can decide whether the query is satisfiable and find the required variable substitutions. This is the essence of logic programming.

Introduction to Prolog

Prolog programs have three major components: facts, rules, and queries. Facts and rules are entered into a database (loaded from a Prolog file) before computation; queries are entered interactively and serve as the means by which programs are “run.”

A fact like cogito(descartes). establishes a predicate cogito/1 (the functor cogito with arity 1) asserting that descartes satisfies it. Facts end with a period. Querying ?- cogito(descartes). returns Yes; querying ?- cogito(aristotle). returns No, since this cannot be deduced from the database.

A rule deduces new facts from existing ones. The rule sum(X) :- cogito(X). states that for all X, sum(X) holds whenever cogito(X) does – cogito ergo sum. Variables in Prolog must begin with a capital letter; all other identifiers begin with lowercase. When a query contains variables, Prolog reports satisfying assignments; pressing ; after an answer requests the next solution, while pressing Enter abandons the search. Queries may contain conjunctions: ?- cogito(X), philosopher(X). is satisfied only by values of X satisfying both predicates simultaneously. Multiple clauses for the same predicate do not conflict, even if they imply overlapping results.

Data Structures and Lists

Prolog exhibits uniformity: code and data share the same syntax. Data structures called structures consist of a functor followed by zero or more components, e.g., book(greatExpectations, dickens). Since Prolog is untyped, structures need not be used uniformly – book could appear with different arities and nestings. Structures are distinguished from predicates by context: structures appear nested inside predicates, while predicates appear at the top level. Queries can match against any component of a structure using variables, but the functor position itself cannot be a variable.

Prolog also supports lists, enclosed in square brackets with elements separated by commas. The empty list is []. The cons operator | decomposes lists: [X|Y] denotes the list with head X and tail Y. Any number of elements may precede |, so [1,2,3] is equivalent to [1,2|[3]], [1|[2,3]], or [1,2,3|[]]. This flexible syntax enables patterns like [10, 20 | X] that assign variables to only part of a list.

Unification and Search

Execution of Prolog programs is based on two principles: unification (Robinson’s algorithm) and depth-first search. When answering a query, Prolog searches the database top-to-bottom, attempting to unify each fact or rule head with the current query. If the query unifies with a fact, Prolog returns the most general unifier (MGU), or simply “Yes” if there are no variables. If it unifies with the head of a rule, Prolog applies the MGU to the rule’s body and recursively attempts to satisfy that body, which becomes the new query. Predicates in a compound query are processed left-to-right; when a rule body replaces a matched predicate, it is prepended to the remaining predicates, producing depth-first search with backtracking. Variable renaming may be necessary during this process to avoid collisions between unrelated variables sharing a name.

The unification algorithm handles objects (atoms), variables, and compound terms. Two objects unify only if they are identical. A variable $ X $ unifies with any term $ T $ via the substitution $ \left[T/X\right] $. Two compound terms $ R_1(T_{1,1}, \ldots, T_{1,n}) $ and $ R_2(T_{2,1}, \ldots, T_{2,m}) $ unify only if their functors match ($ R_1 = R_2 $), their arities match ($ m = n $), and their arguments unify pairwise with accumulated substitutions applied at each step. Notably, Prolog omits the occurs-check for efficiency. This means querying equal(X, f(X)) against the rule equal(X,X) produces the nonsensical infinite structure X = f(f(f(...))). The built-in unify_with_occurs_check/2 restores the check when needed, but since Prolog is Turing-complete and some programs will not halt anyway, the omission is less problematic here than in type-checking.

Database ordering matters critically for Prolog’s search. Consider a database with a recursive rule p :- u, p. as the first entry versus the last. If placed first, Prolog immediately enters an infinite loop before ever discovering solutions that would follow from later rules. If placed last, solutions are found first, and the recursion only produces additional (possibly infinite) answers afterward. Thus, the order of entries in the database can determine whether a query terminates.

Values, Operators, and Programming in Prolog

Prolog supports numbers with arithmetic comparison operators (>, <, >=, =<, =:=, =\=) and the is operator for arithmetic evaluation. The is operator only works when one side is a variable and the other is a fully resolved arithmetic expression; Prolog cannot “solve” for unknowns in arithmetic. The = operator forces structural unification, which is distinct from numeric equality: X = Y succeeds when X and Y are structurally identical (or can be made so), while X is Y succeeds only when Y evaluates to a number equal to X.

To mimic functions with predicates, we add extra arguments for results. Rather than saying $ f(a,b) = c $, we define a predicate f(a, b, c) and write facts and rules relating the third argument to the first two. The classic example is list append:

append([], Y, Y).
append([X|T], Y, [X|Z]) :- append(T, Y, Z).

These two clauses state declarative facts about list concatenation; Prolog’s search algorithm derives the actual computation. We never write algorithms directly – the “algorithm” is the implied combination of the declarative rules and the search strategy.

List reversal can be written directly (reverse([X|Y], R) :- reverse(Y, Z), append(Z, [X], R)) or with an accumulator (reverse2([X|T], Y, R) :- reverse2(T, [X|Y], R)). The accumulator version is linear rather than quadratic, but it does not correspond to any obvious logical fact about lists. Crafting declarations that are formally correct as logical statements while also guiding the search engine toward efficient computation is a central skill in logic programming.

Control Flow: The Cut

The cut, written !, commits Prolog to choices made so far, pruning the search tree. When Prolog reaches a cut in a rule, it commits to the current resolution of all predicates before the cut and also commits to the choice of that rule for the predicate on the left-hand side. No further alternatives will be explored for either. There are three main uses:

Confirming the correct match. Consider sum_to(1, 1). and sum_to(N, R) :- M is N-1, sum_to(M, P), R is P+N. Without a cut, the constant 1 matches both the pattern 1 in the first rule and N in the second, causing infinite recursion when the user requests additional solutions. Writing sum_to(1,1) :- !. tells Prolog that the first rule is the correct and only choice when the argument is 1. The same technique makes lookup/3 return only the first match in an association list.

Exceptions to the rule. Combined with fail (a predicate that always fails), the cut encodes exceptions. The rules goodPet(porcupine) :- !, fail. followed by goodPet(X) :- smallMammal(X). ensure that porcupines are rejected: the cut prevents backtracking to the general rule after the specific one matches, and fail causes that specific match to fail. Other small mammals pass through to the second rule normally. The cut must come before fail, since nothing after a fail is ever reached.

Preventing needless computation. For member/2, member(X, [X|_]) :- !. eliminates redundant searching when an element appears multiple times in a list.

Negation

Prolog provides a negation meta-predicate not/1 (or \+, read “cannot be proven”): not(P) succeeds if and only if the attempt to satisfy P fails, discarding any variable bindings made during that attempt. Using call/1 (which treats its argument as a goal), !, and fail, not can be defined in Prolog itself:

not(X) :- call(X), !, fail.
not(X).

If the argument succeeds, the cut commits and fail forces failure; if the argument cannot be satisfied, the first rule fails outright and the second (which always succeeds) fires.

However, not is not true logical negation. Double negation not(not(P)) does not preserve variable bindings from $ P $. For example, with animal(dog) and vegetable(carrot) in the database, the query not(not(animal(X))), vegetable(X) returns X = carrot. This happens because evaluating animal(X) binds X = dog, the inner not fails, the outer not succeeds, but bindings are erased – so X is again uninstantiated and free to unify with carrot. This behavior is why some implementations prefer the name \+ to emphasize the difference from logical negation.

Sorting Examples

Sorting illustrates the generators and tests pattern. A naive sort declares sort(X, Y) :- permute(X, Y), sorted(Y), !. The generator permute/2 produces all permutations of a list; the test sorted/1 checks whether a permutation is in order. The predicate sorted/1 is straightforward: empty and singleton lists are sorted, and sorted([X,Y|Z]) :- X =< Y, sorted([Y|Z]). A generator permute can be implemented using a selection predicate that picks elements by index, or more concisely by using append as a generator to split a list at every position. Either way, the naive sort takes $ O(n!) $ time.

A merge sort is dramatically more efficient. It requires three components: dividing a list in half (via mutually recursive divide/3 and divide2/3 that interleave elements into odd- and even-indexed sublists), merging two sorted lists (by comparing heads and recursing on tails), and the recursive merge sort itself:

merge_sort([], []) :- !.
merge_sort([X], [X]) :- !.
merge_sort(List, Sorted) :-
    divide(List, L1, L2),
    merge_sort(L1, S1), merge_sort(L2, S2),
    merge(S1, S2, Sorted), !.

The cut in each clause frees Prolog from searching for alternative sortings. Although merge_sort always produces the same result as the naive sort, Prolog’s search algorithm finds it far more quickly. This is why understanding the search algorithm is essential for effective Prolog programming.

Database Manipulation and Difference Lists

Prolog allows programmatic modification of its database through asserta/1 and assertz/1 (adding clauses to the beginning or end, respectively) and retract/1 (removing the first matching clause). These effects are not undone by backtracking, making them powerful but dangerous; excessive use leads to highly unreadable programs.

Difference lists are a technique for efficient accumulator-based programming. A difference list represents a list with an uninstantiated tail – a “hole” – e.g., [1,2,3|X]. By instantiating the hole, elements can be appended at the tail. If the hole is instantiated to another difference list, the result is still a difference list. This yields constant-time append:

appendDL(List1, List2, List2, Hole2, List1, Hole2).

Since each difference list carries a reference to both the list and its hole, appending reduces to a single unification. This solves the reversal problem inherent in head-consing accumulators: since elements are added at the tail rather than the head, the output maintains the original order. A common pattern wraps a difference list in a structure dl(List, Hole) and finalizes by unifying the hole with []. For example, an increment-all-elements predicate using difference list accumulators produces the correct order [2,3,4] from input [1,2,3], whereas the naive accumulator version would produce [4,3,2].

Datalog and Relational Databases

The terminology of logic programming echoes that of relational databases: clause sets are “databases,” and predicates and structures are relations. When this overlap was discovered, researchers sought subsets of Prolog useful for data querying but not Turing-complete, ensuring all queries terminate. These subsets are collectively called Datalog. Datalog is syntactically a subset of Prolog, and all queries that halt in Prolog and are valid in Datalog yield the same results.

Datalog’s restrictions typically include: removing !, fail, and call; running all queries to completion with no guaranteed order; disallowing non-atomic structures (no nested structures); rejecting fully recursive queries; requiring every variable in the head of a rule to appear in a non-negated body predicate; and stratifying negation, numeric operations, and lists so that each resolution step moves closer to base facts. These guarantees make Datalog non-Turing-complete but still highly useful. Crucially, without a fixed resolution order, a query optimizer is free to reorder predicates for efficiency, and such optimization is often required to guarantee termination.

Relational databases were developed independently and rest on the relational algebra, expressed through SQL (Structured Query Language). In database terminology, relations defined only as facts are tables (extensional database predicates), those defined with rules are views (intensional database predicates), individual facts are rows, and relation arguments are fields. A join creates a new relation by matching fields across tables. For example, given employee/2 and supervisor/2 tables, a supervisor_by_name view joins them on employee numbers. The natural join matches fields with the same names. Fields may be designated as keys (unique identifiers) to prevent duplicate rows.

Queries with variables are selections. Although the original relational algebra disallowed recursion, most modern database systems support recursive views. For instance, a transitive closure of the supervisory hierarchy can be expressed as a recursive view nhigher(M, N) :- nhigher(P, N), supervisor(M, P). with base case nhigher(M, N) :- supervisor(M, N).

Query Optimization

Query optimization chooses an evaluation order based on structural analysis and cost models. Join optimization prefers resolving predicates that reduce the number of unrestricted variables before those that increase them. Consider the partially-evaluated query nhigher(P, X), supervisor(1, P). Resolving supervisor(1, P) first introduces no new predicates and resolves $ P $; resolving nhigher(P, X) first would introduce new variables and predicates. This choice can mean the difference between termination and non-termination: in Prolog, where order is fixed, putting the recursive rule before the base case causes infinite expansion; a query optimizer in Datalog reorders to avoid this. Most practical database systems implement Turing-complete extensions to SQL but preserve the ability to optimize queries, making correctness of optimization a challenging ongoing problem.

Chapter 7: Imperative Programming

The English word “imperative” is synonymous with “command” and shares its etymological root with “emperor.” In programming languages, an imperative language is one whose fundamental behavior is described by commands executed in sequence. Whereas functional languages take expressions as the basic unit of behavior with largely implicit evaluation order, imperative languages take statements as the basic unit, and execution order matches the textual order of statements. Formally, an imperative is a single step, typically a single statement, though some statements contain nested sub-statements.

For commands to communicate information from one to the next, all imperative languages provide mutable variables. In functional languages, variables (let bindings and function arguments) are immutable, with mutability encapsulated in references or monads. In an imperative language, statements may modify variables, and the meaning of any statement depends on the current values of the variables it references at the time of execution.

Nearly all modern imperative languages support procedures – parameterized lists of steps. Strictly, a procedure differs from a function because functions are defined by expressions and are referentially transparent, but in practice the terms are used interchangeably. Procedural languages are imperative languages with procedures (virtually all imperative languages qualify). Structured languages are those where control flow constructs (conditions and loops) are reflected in syntactic structure rather than via goto jumps. Since nearly all modern imperative languages are both structured and procedural, these terms are largely interchangeable. A language does not need procedures to be Turing-complete, but it must have some mechanism for repetition, so a non-procedural imperative language must at least have loops.

Exemplar: Pascal

Pascal, designed by Niklaus Wirth around 1970, is an excellent exemplar because a Pascal program is fundamentally a tree of procedures. Unlike many imperative languages, Pascal allows procedures to nest (much as functions nest in functional languages), with lexical scoping throughout. Pascal is statically typed but weakly typed.

Pascal distinguishes three kinds of subroutines: programs (the outermost routine), procedures (subroutines with no return value, used for side effects), and functions (subroutines that return a value). We use “subroutine” to refer to all three collectively.

A Pascal subroutine consists of a header, a declaration list (variables, nested subroutines), and a body enclosed in begin/end. Declarations appear between the header and begin, separated from executable code – a historical convention that C did not officially abandon until C99. Statements are separated by ;, not terminated by it (a trailing ; is allowed because Pascal permits empty statements). The assignment operator is :=. To return a value from a function, one assigns to the function’s name; this does not terminate execution but merely sets the return value, consistent with structured programming’s principle that blocks always run to completion.

Declaration lists form environments with lexical scoping: subroutines can access and modify variables from their own or any enclosing scope. Each invocation of a subroutine gets its own set of local variables, but variables from enclosing scopes are shared across invocations. A program’s name is not part of its own scope, so programs cannot be directly recursive; other subroutines, however, can call themselves. Pascal provides if-then-else, while-do loops, for-do loops, and repeat-until loops. The latter two are syntactic sugar for while.

The Simple Imperative Language

The $ \lambda $-calculus served well as a foundation for functional languages because it is built from functions, but it does not fit imperative languages well: it lacks explicit sequencing of commands. The Simple Imperative Language (SIL) is a minimal language designed to capture the core ideas of imperative programming:

⟨stmtlist⟩ ::= ⟨stmt⟩ | ⟨stmt⟩ ; ⟨stmtlist⟩
⟨stmt⟩ ::= skip | begin ⟨stmtlist⟩ end
          | while ⟨boolexp⟩ do ⟨stmt⟩
          | if ⟨boolexp⟩ then ⟨stmt⟩ else ⟨stmt⟩
          | ⟨var⟩ := ⟨intexp⟩

Integer expressions include literals, variables, addition, multiplication, and negation. Boolean expressions include true, false, not, and, or, and comparisons (>, <, =). The skip statement is a no-op whose purpose becomes clear in the semantics. Subtraction is syntactic sugar for adding a negation. There is no goto; despite being called simply “imperative,” SIL is a structured language.

Operational Semantics

Formulating operational semantics for imperative languages requires care. Unlike functional languages, where the result is a computed value, imperative programs are executed for their side effects on a mutable store $ \sigma $. The semantics operate over pairs $ \langle S, \sigma \rangle $, defining reductions $ \langle S, \sigma \rangle \to \langle S', \sigma' \rangle $, where $ S $ is a program and $ \sigma $ is a store mapping variable names to values. Execution begins with an empty store; a program terminates when it reduces to $ \langle \mathtt{skip}, \sigma \rangle $, and the final $ \sigma $ is the result. In functional language semantics, $ \sigma $ never changes across the $ \to $ morphism; here, $ \sigma $ itself is allowed to mutate, though only the := statement modifies it.

The expression sublanguages (integer and boolean) are purely functional: $ \sigma $ never changes during their evaluation. The store appears on both sides of $ \to $ for uniformity. Integer expression rules reduce binary operations by evaluating the left operand first (IntOpLeft), then the right (IntOpRight), then performing the arithmetic (Add, Mul). The Var rule looks up a variable in $ \sigma $; if the variable is undefined, the semantics get stuck – the only way integer expression evaluation can fail.

Boolean expression rules handle comparisons (reducing both integer operands, then comparing), not (reducing the subexpression, then flipping), and short-circuit evaluation for and and or. For instance, false and B reduces directly to false without evaluating B (the AndSC rule), and true or B reduces to true (the OrSC rule). The operands of comparison operators must be integer expressions; since variables cannot hold booleans, this creates a strict stratification between integer and boolean expression sublanguages.

For statements, the key reduction rules are:

BlockOnly/BlockRest: begin L end unwraps to L (two cases needed because statement lists have two forms).
Skip: skip; L drops the leading skip, proceeding to the rest of the statement list.
StmtList: The first statement in a list may take a reduction step while the rest waits.
AssignStep/Assign: First reduce the right-hand side to a value, then update the store: $ \langle x := N, \sigma \rangle \to \langle \mathtt{skip}, \sigma\left[x \mapsto N\right] \rangle $.
IfCond/IfTrue/IfFalse: Reduce the condition to a boolean, then replace the if with the appropriate branch.
While: $ \langle \mathtt{while}\ B\ \mathtt{do}\ Q, \sigma \rangle \to \langle \mathtt{if}\ B\ \mathtt{then}\ \mathtt{begin}\ Q;\ \mathtt{while}\ B\ \mathtt{do}\ Q\ \mathtt{end}\ \mathtt{else}\ \mathtt{skip}, \sigma \rangle $.

Loops

The While rule is unusual: it “reduces” a while statement into a longer if statement that still contains the original while. The body executes only when the condition is true, and repetition arises from the loop being re-embedded in the then branch. When the condition is false, the if reduces to skip and the loop ends. A real implementation does not literally mutate the statement list, but the semantics accurately describe the logical steps. Pascal’s while loops have exactly this form; for-do and repeat-until are syntactic sugar.

The Initialization Problem

SIL is nearly type-safe due to expression stratification: variables can only store integers, and it is syntactically impossible to produce a type error. The one failure mode is accessing an uninitialized variable (e.g., x := y when y was never assigned), which causes the Var rule to get stuck. This is the initialization problem, a surprisingly pervasive issue in imperative language design. Four approaches address it:

Garbage values: An uninitialized variable holds whatever happened to be in its memory location previously. Used by Pascal, C, and C++. Formally modeled by a non-deterministic rule that allows any integer, which semanticists generally avoid.
Default values: Every type has a designated default (e.g., 0 for integers). Used by Java and JavaScript. Modeled by a rule that yields 0 when a variable is not in $ \sigma $.
Forced initialization: The syntax requires a value at declaration time, as with let bindings in functional languages. Rare in imperative languages because it complicates recursive data structures like circular linked lists.
Static analysis: Analyzing code without executing it to detect possibly uninitialized variables. Many compilers for C and C++ warn about this, though few language specifications mandate specific analyses.

The difficulty lies in defining “before” with respect to initialization: with conditionals and loops, it is not always clear whether one statement will definitely execute before another.

Types

A type judgment for SIL addresses the initialization problem through a lightweight static analysis. Statements do not produce values, so they are judged well-typed without a type ($ \Gamma \vdash Q $), while expressions have types ($ \Gamma \vdash E : \tau $). The type language has just int and bool. All programs are assumed to end with skip so that every statement always has a “rest” to propagate type information into.

The key typing rules for statements are T_Assign and T_Reassign. T_Assign handles a variable’s first assignment: it checks that the variable is not yet in $ \Gamma $, infers the type of the right-hand expression, and extends $ \Gamma $ with the variable’s type for the remaining statements. T_Reassign handles subsequent assignments: it requires the new value’s type to match the existing type in $ \Gamma $. Variables first assigned inside an if or while have their type visible only within that construct, effectively introducing lightweight lexical scoping into a language that nominally has only a single global scope. The T_If and T_While rules check conditions against bool and verify sub-statements independently, appending ; skip to convert single statements into statement lists for uniform treatment.

Procedures

SIL has no procedures and only a global scope. Adding procedures yields SIL-P (Simple Imperative Language with Procedures). Procedures are stored in $ \sigma $ alongside variables. The syntax extends SIL with procedure declarations (keyword procedure, a name, parameter list, local variable declarations, and a begin-end body) and procedure calls (x(args)).

The ProcDecl rule stores the entire procedure definition in $ \sigma $, treating it like an assignment. The ProcCall rule performs invocation through a technique called freshening: for each parameter and locally declared variable, a fresh variable name is created and substituted throughout the body. The call is then replaced by a begin-end block containing assignment statements for each argument (binding fresh parameter names to argument values) followed by the freshened body. This isolates each call’s variables without requiring nested stores or scoping mechanisms. For example, evaluating fac(2) with the recursive factorial procedure first freshens n to n_1 and inlines the body. When the body recursively calls fac(1), n is freshened to n_2. Both n_1 and n_2 coexist in the flat store, while the global variable x (used for the result) remains shared across all calls.

A consequence of this approach is that $ \sigma $ grows monotonically; entries are never removed. Real implementations reclaim storage via stack frame popping and garbage collection, but the formal semantics simply accept unbounded growth.

Procedures and Types

Procedures complicate typing in two ways. First, since procedures are not first-class values in SIL-P (the semantics get stuck if an expression reduces to anything other than an integer), the target of any call is always known statically. Second, statement ordering interacts with initialization: a procedure might assign to a global variable that is defined (or type-assigned) after the procedure itself. Pascal’s solution is explicit type declarations with declared variable types. A procedure type is a constructed type defined by the types of its arguments: procedure(⟨typelist⟩). Unlike functions in functional languages, procedures may have zero arguments and no return type. Pascal avoids the complications of a void return type by cleanly separating procedures (no return value) from functions (with a return value).

Arrays and References

Arrays are as fundamental to imperative languages as lists are to functional ones. An array is a mapping from integer indices to values with constant-time access, generally of fixed size. To allow procedures to modify arrays visible to their callers, arrays are made a reference type: only labels (pointers into a heap $ \Sigma $) appear in $ \sigma $, and the actual array data lives in $ \Sigma $. This reintroduces the heap into reductions, which now operate over triples $ \langle \Sigma, \sigma, x \rangle $. All previously defined reductions leave $ \Sigma $ unchanged.

The extended language SIL-PA (Simple Imperative Language with Procedures and Arrays) adds three constructs:

Allocation: array[M] reduces its argument to a natural number $ N $, allocates a fresh label $ \ell $, stores an $ N $-element zero-initialized array at $ \Sigma(\ell) $, and reduces to $ \ell $.
Indexing: M_1[M_2] reduces both expressions to values, then extracts the $ M_2 $-th element from the array at label $ M_1 $ in $ \Sigma $.
Array assignment: x[M_1] := M_2 looks up $ x $ in $ \sigma $ to get a label $ \ell $, retrieves the array from $ \Sigma(\ell) $, constructs a new heap with the $ M_1 $-th element replaced by $ M_2 $, and reduces to skip.

The ArrAssign rule is the most complex reduction rule encountered so far, requiring that $ x $ maps to a label, the label maps to an array, and the index falls within bounds. If two variables share the same label (aliasing), modifications through one are visible through the other, since both share the same heap entry. Out-of-bounds access causes the semantics to get stuck, reflecting Pascal’s lack of bounds checking. Integrating array bounds into the type system remains an open challenge that has spawned the field of dependent type systems, where types can depend on runtime values such as array sizes.

Records

Records (called structs in C) provide an alternative to arrays for storing compound data. A record type defines a mapping from a fixed set of names to values; each name-value pair is a field. All values of a given record type share the same field names but may have different field values. In Pascal, a record is declared with record ... end and may be given a type alias (e.g., type point = record x: integer; y: integer end). Fields are accessed with dot notation (b.x), where the field name must appear literally in the code – arbitrary field access via expressions is not supported.

Unlike arrays, which hold elements of a single type, records can contain fields of different types, making them especially important in typed languages. Declaring a variable with a record type in Pascal allocates space on the enclosing subroutine’s stack; the record becomes invalid when that subroutine exits. Records add no expressive power beyond arrays in an untyped setting (fields can be mapped to indices), but their named-field structure became a fundamental building block for object-oriented programming, which is the subject of the next chapter.

Chapter 8: Object-Oriented Programming

Object-Oriented Programming (OOP) is a paradigm that gained considerable popularity over recent decades. Its advocates emphasize information hiding and code reuse, while its critics contend that the hierarchical type systems characteristic of OOP do not always model real-world relationships accurately. Objects first appeared in 1967 in Simula, a descendant of Algol, where they arose as a byproduct of simulation-oriented design. Later languages, most notably Smalltalk, refined these features into modern OOP. Widespread adoption did not occur until the 1990s. Today, OOP is arguably the most popular paradigm, yet very few languages are purely object-oriented. Smalltalk is one such language, but it is difficult to name a second.

In practice, OOP functions as a “meta-paradigm” combinable with others. Ada 95 and C++ are procedural languages with object support; Java enforces stricter OO discipline but its methods remain structured programming; CLOS and OCaml graft object-orientation onto functional foundations. Smalltalk has been described as “so object-oriented that it’s barely a procedural imperative language” because even numbers are objects and control flow constructs (conditionals, loops) are encapsulated into objects rather than built-in syntax.

What OOP Is(n’t)

There is no broad consensus on the defining features of an OO language. At minimum, OOP has objects: the encapsulation of data and behavior into a single abstraction, viewable as records with associated code or code with associated records. Other commonly associated features include:

Data Abstraction: separating interface from implementation behind abstraction barriers;
Inclusion Polymorphism: types arranged in hierarchies by a subtyping relationship allowing some types to be used in place of others;
Inheritance: objects sharing implementation details with other objects;
Dynamic Dispatch: the actual code for a method invocation may not be determinable statically, a form of dynamic binding.

Exemplar: Smalltalk

Smalltalk was developed at Xerox PARC by Alan Kay, Dan Ingalls, Adele Goldberg, and others. Xerox PARC pioneered the graphical user interface, WYSIWYG editing, fully-fledged OOP (Smalltalk), and prototype-based object orientation (later monetized as JavaScript). Notably, the GUI pioneered at PARC was Smalltalk: programming in Smalltalk means interacting with a graphical environment in which classes are created visually, with no standalone source files.

Smalltalk dispenses with main functions. Everything is an object, and many objects (like true and false) must already exist for basic operations, making a traditional entry point impractical. Software is distributed as images, serialized snapshots of the entire object environment. For our purposes, we adopt GNU Smalltalk’s file-based syntax and a Java-like convention of a main class with a main method. Since Smalltalk is untyped, type annotations use Strongtalk syntax, e.g. hypotenuseWithSide: x <Number> andSide: y <Number> ^<Number>.

Classes

Most OO languages use classes to describe objects. A class describes a set of objects with identical behavior and form but different internal state; classes form the types of an OO type system. In a purely OO language, the global scope contains only classes, and all imperative code must be boxed into classes. A class contains fields (like record fields) and methods (like procedures, but associated with a class). The object a method is called on is the receiver (Smalltalk) or target, accessible within the method as self or this.

A class is declared as a subclass of some existing class, its superclass. The subclass inherits all fields and methods from the superclass and may add new ones. A subclass may also override (re-define) inherited methods. In an overridden method, the superclass’s original implementation can be invoked via super. Consider Rectangle with setWidth: and setHeight: methods, and Square overriding them so that setting either dimension sets both. Crucially, if self is a Square, calling self setWidth: within Rectangle’s code dispatches to Square’s override.

Since a subclass has at least all the same fields and methods as its superclass, a Square is a Rectangle: anything you can do with a Rectangle, you can do with a Square. But not all Rectangles are Squares.

Semantics

The semantics of an OO language extend those of an imperative language. We define a simplified semantics loosely based on Featherweight Java, with several restrictions to keep the language tractable. Field access is syntactically distinct from variable access, using an arrow (e.g., self->width). Every method has exactly one return statement, at the end. All statements take one of a few simple forms: assignment of a method call result, assignment of a block, assignment of a variable, field read (x := self->y), field write (self->y := x), or return (^x). Since even mathematical operators are methods in Smalltalk, any complex expression can be broken into individual steps.

The store $ \sigma $ holds global class declarations and freshened local variables; the heap $ \Sigma $ stores objects referenced by labels. An object is defined by its class name and the values of all its fields. A $ \text{resolve} $ function populates $ \sigma $ with all defined classes from the global scope.

The key semantic rules are:

New: Creating an object via x1 := x2 new gathers all fields (including inherited ones) via $ \text{fields}(\sigma, x_2) $, creates an object with all fields initialized to nil, allocates a fresh label $ \ell $ in $ \Sigma $, and maps $ x_1 \mapsto \ell $ in $ \sigma $.

Block evaluation: Evaluating x := y value looks up the block in $ \sigma $, inlines its statements, and assigns the value of its last statement (restricted to a single variable name) to x.

Variable assignment and field access: Simple variable assignment copies the value in $ \sigma $. Reading x1 := x2->y looks up the object in $ \sigma $, follows the label to $ \Sigma $, and extracts the named field’s value. Writing x1->y := x2 similarly locates the object and updates the named field in $ \Sigma $, leaving all other fields unchanged.

Method lookup and calls: Calling a method requires finding it. The function $ \text{method}(\sigma, x, y) $ searches class $ x $ for method $ y $; if found, it extracts the method body; if not, it searches the superclass, continuing up the hierarchy until it either finds the method or gets stuck. Method invocation freshens all variable names to avoid conflicts (as with procedure calls in imperative languages), assigns the freshened self to the target object’s label, copies arguments into freshened parameter names, inlines the method body, and writes the return value back to the caller’s target variable. Surprisingly, the semantics need no built-in conditions or loops, because these are constructed from True and False classes exactly as Smalltalk does.

Inclusion Polymorphism and Subtyping

Inclusion polymorphism arranges types in a subtype hierarchy. Type $ a $ is a subtype of $ b $ if a value of type $ a $ can be used wherever type $ b $ is expected. The relation $ <: $ is reflexive, transitive, and typically a partial order. The key typing rule is subsumption:

\[ \dfrac{\Gamma \vdash e : \tau_1 \qquad \tau_1 <: \tau_2}{\Gamma \vdash e : \tau_2} \quad \text{(T\_Subsumption)} \]

For records, structural subtyping follows two principles. Width subtyping: adding fields creates a subtype (a record with fields $\{x: A, y: B, z: C\}$ is a subtype of $\{x: A, y: B\}$). Depth subtyping (covariance): subtyping propagates into fields. For function types, subtyping is contravariant in the parameter and covariant in the return type:

\[ \dfrac{\tau_3 <: \tau_1 \qquad \tau_2 <: \tau_4}{\tau_1 \to \tau_2 \;<:\; \tau_3 \to \tau_4} \]

Contravariance in the parameter means a function accepting a more general input can safely replace one expecting a more specific input.

Structural Subtyping

In structural subtyping, two classes are related by $ <: $ based solely on their method signatures. If every method in $ \tau_1 $ has a corresponding method with the same signature in $ \tau_2 $, then $ \tau_2 <: \tau_1 $. Structural subtyping is sometimes called duck typing, though that term more often describes informal types in dynamically-typed languages. It differs from parametric polymorphism by the presence of a hierarchy: a method with parameter type $ \tau $ may receive a subtype but can still rely on $ \tau $’s definition.

Nominal Subtyping

In nominal subtyping, the explicit subclass declaration defines the subtyping relationship. Since inheritance automatically gives subclasses all superclass methods, the inheritance relationship suffices for subtyping. Languages differ on first classes: C++ allows classes without a superclass, while Smalltalk and Java have a single root class Object defined by the implementation.

Nominal subtyping is justified by implementation concerns. Classes compile into virtual tables (vtables), arrays of pointers to method code. A subclass’s vtable extends its superclass’s vtable, maintaining compatibility. This design makes structural subtyping essentially impossible because vtable compatibility depends on declaration order, not just types. Modern JIT compilers can recover some structural subtyping through type profiling, but that is beyond our scope.

Methods, Subtyping, and Encapsulation

Smalltalk methods come in three forms: nullary (area), operators (*), and n-ary with colon-separated names (setWidth:setHeight:). Method types follow the form $ (\tau_1, \ldots, \tau_n) \to \tau_r $. Methods are not values in class-based languages; they exist only as part of object types.

Method subtyping requires covariant return types (a subclass method may return a more specific type) and contravariant parameter types (a subclass method may accept a more general type):

\[ \dfrac{\forall v \in (1,N).\ \tau_{2,v} <: \tau_{1,v} \qquad \tau_{1,r} <: \tau_{2,r}}{(\tau_{1,1}, \ldots, \tau_{1,n}) \to \tau_{1,r}\ <:\ (\tau_{2,1}, \ldots, \tau_{2,n}) \to \tau_{2,r}} \]

The hidden parameter self is covariant: the very fact that a method was found in Square means self is a Square.

Regarding encapsulation, if fields are fully accessible, they must be invariant in type when overridden. Covariant field overrides fail because a superclass method could write an incompatible value; contravariant overrides fail because a superclass method could call methods not available on the more general type. Since there is no useful way to override fields, Smalltalk makes fields completely private, even inaccessible from subclasses. Conversely, all Smalltalk methods are public. Most languages choose a compromise with private/public declarations. Private fields and methods are disregarded in the subtyping relation since they do not affect public interfaces.

Overloading

Overloading allows different entities in the same context to share the same name, a limited form of polymorphism. In method overloading, the compiler examines the number and type of arguments (and sometimes the return type) to select the correct method, a process called overload resolution. For example, a Money class could have two + methods: one taking Money and one taking Number. The expression m + 42 selects the Number version because Integer is not a Money but is a subtype of Number. Resolution can get vastly more complicated, and in some languages like C++, the “closest match” algorithm is not always intuitive.

Overloading introduces complications: we lose erasability because types affect which method is called. Erasing types from Strongtalk yields Smalltalk, which is why Strongtalk does not allow overloading. Implementations resolve this with name mangling, a type-directed compilation step that renames methods to system-generated names encoding the surrounding class, argument types, and possibly return type (e.g., Money>>+<Number> or something like _ZN5MoneyplE6Number).

Type Hierarchy: Top, Bottom, and Null

In Smalltalk, every class is a subclass of Object, so $ \forall C.\ C <: \text{Object} $. This makes Object the top $ \top $ of the type hierarchy. C++ has no root class and uses templates (a form of parametric polymorphism) instead; Java uses coercion, an ad hoc form of polymorphism that explicitly converts values at runtime (e.g., Object o = 42; boxes the integer into an Integer object).

The bottom type $ \bot $ would be a subtype of every type. Three answers exist: (1) Haskell treats $ \bot $ as the type of errors, exceptions, and infinite loops; (2) some languages have no bottom type, though this creates initialization difficulties; (3) most popular and most controversial is null. The value nil has no methods or fields, yet is declared axiomatically as a subtype of every type: $ \forall \tau.\ \text{nil} <: \tau $. This breaks type safety entirely, since calling a method on nil causes the semantics to get stuck. Tony Hoare called the null reference his “billion-dollar mistake.” The practical justification is the initialization problem: nil provides a universal default value for uninitialized fields. An amusing consequence is that every proof of type safety for an OO language with nil must include the caveat that an expression may also attempt to dereference null.

Every object in the heap carries its class name as run-time type information, enabling casting (checking and converting types at runtime) and flow typing (narrowing types based on control flow analysis, so that a type checker can give v the type Integer inside a branch that has confirmed v is<Integer>, avoiding explicit casts).

Generics

Generics bring parametric polymorphism to classes. A generic class like List[A] is a family of classes instantiated by substituting a specific type (e.g., List[Integer]). Generic types are invariant over their parameters: List[Integer] is not a subtype of List[Number] because setEl: could be used wrongly, and not a supertype because el could return the wrong type. Some languages allow explicit covariant or contravariant parameter declarations, restricting how those parameters appear in method signatures. Generics are erasable; Java erases them for backwards compatibility.

Multiple Inheritance

Multiple inheritance allows a class to inherit from multiple superclasses, creating semantic problems when inherited methods share names. Given a Wallet inheriting from both Money and Rectangle, if both define +, the meaning of v1 + v2 is ambiguous. Most languages ban multiple inheritance entirely. Alternatives include making conflicting methods unusable without explicit casting, explicit priority ordering (Strongtalk prioritizes later superclasses), or (in Java) separating interfaces (method signatures without bodies) from classes, with each class having one vtable for its class and extra vtables for each interface, looked up via hash maps.

Repeated inheritance (the diamond problem) occurs when a class A inherits from B and C, each of which inherits from D. If D has a method g that is overridden in C but not B, then A inherits two conflicting versions of g. Multiple inheritance is also problematic for implementation: virtual tables assume subclass vtables extend superclass vtables, but two unrelated superclasses will want different methods in the first vtable slot. Solutions include disallowing multiple inheritance, multiple vtables, or sparse vtables (where the compiler arranges vtables to avoid conflicts when all classes are visible at compile time).

Blocks

Smalltalk achieves Turing-completeness through blocks rather than built-in control flow. Blocks are objects of the Block class containing statements in the context of their surrounding method. They may define local variables. Classic Smalltalk allocates only one instance of a block’s variables per call to the surrounding method, while modern implementations create separate instances per block invocation.

Return statements in blocks return from the surrounding method, not just the block. This means Smalltalk must be able to break out of multiple stack layers. If a block containing a return statement is stripped from its surrounding method and later evaluated, the return has no valid target, typically causing an error (“return from a dead method context”).

Prototype-Based Languages

Object-based languages lack explicit class constructions. Proponents argue that classes constrain software evolution by requiring foresight about system structure. In Self (the exemplar prototype-based language), objects are defined directly with field and method declarations. Cloning produces shallow copies for creating objects with identical behavior. Prototyping allows an object to delegate behavior to another object via prototype fields (marked with *); method lookup proceeds through the prototype chain, with self still referring to the original object. The constructor pattern uses a prototype object containing methods that creates new objects with appropriate fields and a reference back to itself as prototype. JavaScript popularized this pattern without supporting cloning.

Miscellany

Smalltalk requires implementation-level bootstrapping for Object, blocks, Class, true, false, and nil. Smalltalk images serialize the entire object environment ($ \sigma $ and $ \Sigma $). Just-in-Time Compilation (JIT) was invented for Smalltalk, refined in Self, and popularized in Java and JavaScript.

Chapter 9: Concurrent Programming

Concurrency and Parallelism

Two key terms must be distinguished. Parallelism is two or more computations actually happening simultaneously. Concurrency is the experience of multiple tasks appearing to happen at the same time, whether or not they actually are. A single-core 1990s computer achieved concurrency without parallelism through context switching; a compiler may optimize a loop into a parallel CPU operation, yielding parallelism without concurrency. Modern multi-core systems exploit both.

The central idea of a concurrent system is that multiple tasks communicate with one another, and computation may advance in any of them at any rate. This requires non-deterministic semantics: the ability of any task to proceed is the essence of concurrency. Communication falls into two categories: shared-memory concurrency (tasks share a common heap $ \Sigma $, requiring synchronization such as locks) and message-passing concurrency (tasks send and receive messages over channels, with ordering arising naturally from directionality). The two models are equally expressive; each can simulate the other.

The $ \pi $-Calculus

The $ \pi $-calculus is a minimalist formal model of message-passing concurrency developed by Robin Milner, Joachim Parrow, and David Walker. Its only primitives are concurrency, communication, replication, and names, yet these suffice for Turing-complete computation despite the absence of functions. A program consists of concurrent processes separated by the pipe operator $ \mid $. The only values are channels (names), and the only thing sent over a channel is another channel.

The syntax provides: parallel composition ($ P \mid Q $), receiving on channel $ x $ into variable $ y $ then continuing ($ x(y).P $), sending $ y $ on channel $ x $ then continuing ($ \bar{x}\langle y \rangle.P $), restriction to create a fresh local name ($ (\nu x)P $), replication ($ !P $), and termination ($ 0 $). The pipe has lowest precedence.

Communication occurs when one process sends on a channel and another receives on the same channel. Receipt operates by substitution, as in the $ \lambda $-calculus. For example, in $ \bar{x}\langle h \rangle.0 \mid x(y).\bar{z}\langle y \rangle.0 $, the first process sends $ h $ on channel $ x $ and terminates; the second receives into $ y $ from $ x $, substitutes $ h $ for $ y $, and then sends $ h $ on $ z $. When multiple receivers wait on the same channel, the calculus is non-deterministic: either may proceed, and both reductions are equally valid.

Restriction ($ \nu $ is the Greek letter nu) scopes a name: the $ x $ inside $ (\nu x) $ is distinct from any $ x $ outside. This is analogous to how $ \lambda $-abstractions prevent variable capture during substitution. A top-level restriction can always be eliminated by renaming the restricted name to a fresh variable. Restriction interacts subtly with channel passing: if a process under restriction sends its restricted name over an unrestricted channel, the receiving process obtains the restricted name, potentially enabling further communication.

Since names are both values and channels, processes can send channels over channels, enabling dynamic communication patterns. For example, in $ \bar{x}\langle z \rangle.0 \mid x(y).\bar{y}\langle h \rangle.0 \mid z(a).0 $, the first process sends the name $ z $ over channel $ x $. The second receives it (bound to $ y $), then sends on $ y $, which after substitution becomes a send on $ z $, enabling communication with the third process that was previously unreachable.

Process creation occurs through two mechanisms. First, nesting: a parallel composition inside a receive becomes active only after communication, effectively spawning new processes. Second, replication: $ !P $ produces an unbounded supply of copies of $ P $. For instance, $ !x(a).0 \mid \bar{x}\langle b \rangle.0 \mid \bar{x}\langle c \rangle.0 $ has three senders but one receiver; the $ ! $ replicates the receiver as many times as needed. An important observation is that all reduction sequences are strictly sequential; concurrency arises from non-deterministic choice among valid steps, not from true parallelism in the semantics.

Structural Congruence

Since processes may execute in any order, $ P \mid Q $ is not meaningfully different from $ Q \mid P $. Structural congruence ($ \equiv $) formalizes these equivalences. It is reflexive, symmetric, and transitive, with the following key rules:

C_Alpha: $ \alpha $-equivalent programs are congruent.
C_Order: $ P \mid Q \equiv Q \mid P $.
C_Paren: parallel composition is associative.
C_Termination: $ 0 \mid P \equiv P $.
C_Restriction: $ (\nu x)P \equiv P\left[y/x\right] $ for fresh $ y $.
C_Replication: $ !P \equiv P \mid !P $.
C_Nest: congruence propagates into subprograms.

As a consequence, the only true computational steps are message sends and receives; everything else is structural rearrangement.

Formal Semantics

The formal semantics require just two rules:

\[ \textbf{Congruence} \quad \dfrac{P \equiv Q \quad Q \to Q' \quad Q' \equiv R}{P \to R} \]\[ \textbf{Message} \quad \bar{x}\langle y \rangle.P \mid x(z).Q \mid R \;\to\; P \mid Q\left[y/z\right] \mid R \]

Message is the sole reduction rule: a sender and receiver on the same channel both advance, with the received variable substituted by the sent value. Non-determinism arises through Congruence: different structurally congruent rearrangements enable different applications of Message.

Uses of $ \pi $-Calculus

Many concurrent programming problems reduce to establishing happens-before relationships. The $ \pi $-calculus provides a framework for proving such ordering properties. Proving a relationship holds can require enumerating all interesting reductions; for programs that reduce indefinitely, inductive techniques are needed. In practice, the $ \pi $-calculus is extended with additional features (e.g., combined with the $ \lambda $-calculus) to express internal computation within each process.

Exemplar: Erlang

Erlang is the paradigmatic message-passing concurrent language, created in the late 1980s at Ericsson by Joe Armstrong, Robert Virding, and Mike Williams for telecom systems. Three design goals drove its creation: scaling from single systems to distributed systems with minimal rewriting, process isolation so faults in one process would not affect others, and live replacement of individual processes for zero-downtime upgrades.

Processes play the role that objects play in Smalltalk: nearly all compound data resides within processes, and interaction occurs exclusively through message passing. Unlike Smalltalk, Erlang provides primitive non-process data types (integers, floats, tuples, lists, maps, atoms). Erlang syntax resembles Prolog: variables begin with capital letters, atoms and function names with lowercase.

Code is organized into modules, each a source file with a module declaration and explicit exports identified by name and arity (e.g., merge/2). Processes are created with spawn(Module, Function, Args), which returns a PID (process identifier). Messages are sent with Target ! Message and received with receive expressions that pattern-match on incoming messages. A message is delivered only when the target process executes a matching receive, analogous to the $ \pi $-calculus requirement that a receiver must be waiting.

The classic ping-pong example illustrates these features: ping sends {ping, self()} tuples (including its own PID as a reply channel, just as in the $ \pi $-calculus), receives pong atoms in response, and recurses with a decremented count. The pong process waits for ping messages, replies, and recurses. Because pong sends its reply before printing, various interleavings of output are possible. A cleaner version adds an explicit terminate message so that pong stops recursing when done.

Processes as References

Despite lacking mutable variables, Erlang can implement mutable references using processes. A reference is a dedicated process storing a value. ref(V) spawns the process with initial value V. get(Ref) sends a {get, self()} message and waits for a {refval, V} reply. put(Ref, V) sends {put, self(), V} and waits for acknowledgment. The core refproc function recurses with the same value on get and with the new value on put. If two processes hold the same reference PID, both can read and mutate the stored value, effectively implementing shared-memory concurrency atop message passing.

Processes as Objects

The reference pattern generalizes naturally to objects. An RPN calculator process stores a stack as its recursive parameter, handling push and op messages. Specific operations (add, sub, mul, divide) send function arguments via op. Despite using only immutable variables, the sequential nature of message passing enables stateful object emulation. Object fields can themselves be reference processes, composing naturally since Erlang is designed for thousands of concurrent processes.

Implementation

Erlang uses green threads: the runtime manages lightweight thread switching internally, maintaining each process’s stack as a data structure. When a process executes receive without a matching message, the runtime suspends it and resumes another, with far less overhead than OS-level context switching. The runtime uses as many OS threads as CPU cores, each multiplexing many green threads. Efficient scheduling requires fast pattern matching on stored message patterns.

For message delivery, languages with mutable values must copy messages to prevent shared-memory access. Erlang sidesteps this through pervasive immutability: since values are never mutated, processes can safely share pointers to the same data.

Erlang adopts a “let it crash” philosophy: processes that fail are restarted by supervisors rather than defensively handling every error. Erlang also supports hot code reloading, allowing modules to be recompiled and processes migrated without system downtime.

Chapter 10: Systems Programming

The Trouble with Systems Programming

When formal semantics are applied to real-world software, gaps emerge. What does it mean to get stuck? How much data can appear in $ \sigma $ and $ \Sigma $? What are labels? What happens with integer overflow? In practice, formal semantics prove specific properties (type safety, termination) by eliminating irrelevant features, stopping well short of systems programming.

Systems programming informally means relatively direct interaction with hardware. The boundary of “low-level interaction” is vague, and most languages can gain low-level access through libraries. In practice, the term is restricted to languages with a history of building core system components: operating system kernels, memory managers, and language runtimes. Only a handful qualify: Fortran, Pascal, C, C++, the various assembly languages, and perhaps Rust. The feature all share is pointers, first-class memory addresses essential for writing kernels and memory managers. This is where systems languages conflict with formal modeling: the formal models presented in earlier chapters have never actually modeled memory.

Consider return (&x)[-1]; in C: perfectly valid code whose behavior depends on memory layout. Real implementations also routinely cast data pointers to function pointers. One tempting response is “undefined behavior,” but the C standard’s definition allows documented behavior as one valid interpretation. In a real system, multiple specification layers (ABI, compiler, processor) define behavior that the C standard leaves undefined. With an exhaustive collection of all specifications, there is no truly undefined behavior.

This leaves two problems: all specifications are informal (no basis for formal proofs), and any implementation layer could contain bugs. The second can be addressed through the first: if every layer is proved correct by formal modeling, formal reasoning extends to real programs.

Exemplar: CompCert C

CompCert is a C compiler whose compilation process is formally proved correct. If $ S $ denotes a source program, $ C $ its compiled form, and $ X \Downarrow B $ means “executing $ X $ produces observable behavior $ B $,” then $ S \Downarrow B \Rightarrow C \Downarrow B $ holds for all safe source programs. Establishing this requires formal definitions for the source language, target language, and the compilation mapping between them.

Compilation proceeds through intermediate phases $ C_1, C_2, \ldots, C_n $, each individually proved correct. A compiler is formally defined as a function $ \text{Comp} $ where $ \text{Comp}(S) = \text{OK}(C) $ on success or $ \text{Comp}(S) = \text{Error} $ on failure.

CompCert’s semantics are implemented in Coq, a theorem-proving language far more powerful than Prolog. Coq supports predicates defined as functional programs, and since the semantics are queryable, CompCert’s formal definition naturally yields a (very slow) C interpreter. The advantage over Prolog is that far more can be proved; the disadvantages are that Coq’s proof engine is a specific implementation subject to change and has unpredictable time bounds.

CompCert provides formal semantics for C, several intermediate languages, an idealized “Mach” architecture, and concrete architectures (PowerPC, ARM, x86, AMD-64, RISC-V). Proving each compilation step correct involves showing $ S \Downarrow B \Rightarrow C \Downarrow B $ for every language structure, inductively over subexpressions.

Memory Models

CompCert includes a formal model of C’s pointer-based memory, sufficient for non-concurrent programs. Concurrency introduces far greater complexity. If two threads concurrently increment a shared variable, the final value depends on CPU-specific details: whether writes are atomic, byte ordering, and cache coherence across cores. On a multi-core system, the result can be less than what either thread alone would produce, because cache coherence is only guaranteed for a single thread.

CPUs provide locking mechanisms to guarantee memory ordering across threads, but lock-free algorithms also exist and require deep understanding of which memory orderings are possible. Memory models describe hardware behavior, falling into strongly consistent (a single global order of memory accesses) and weakly consistent (consistent order only within each thread). Proving algorithm correctness under a given memory model is comparable in complexity to the CompCert verification effort. Many language specifications sidestep memory models entirely; C defines behavior only for programs using locks, while Java assumes strong consistency.

Software Verification

CompCert verifies the compiler, not the programs it compiles. Systems code routinely performs operations like masking an address and casting to a different pointer type, which no standard type system can verify. Type safety is a very weak form of correctness; far broader properties may be desirable, such as proving that an elevator is never inoperable while occupied or that a radiation therapy machine never exceeds its dose specification.

Software model checking verifies specific properties of software. Type checking and CompCert’s proofs are limited forms of model checking. Software in safety-critical domains ideally combines model checking (source-level correctness) with verified compilation (faithful translation to machine code).

A natural question is why not simply use “better” languages to avoid these problems. While there is some truth to this view, research does not support the claim that better languages solve all software problems. Programmers make mistakes regardless of type systems, mutation policies, or concurrency support. The lesson from CompCert may be to use the right language and the right verification tools for the task at hand. Understanding paradigms as formal systems provides conceptual tools applicable to all programming; the power of programming languages is, ultimately, the power of abstraction.

Sources and References

Chapter 1: Languages and Paradigms

Introduction

History of Programming Languages

Programming Paradigms and Design Principles

OCaml

Smalltalk

Messages, Precedence, and Syntax

Booleans, Conditions, and Loops

Variables, Containers, and Dictionaries

Classes, Subclasses, and the RPN Calculator

Blocks, Non-local Returns, and Display

Chapter 2: Untyped \(\lambda\)-Calculus

Syntax and Definitions

Free and Bound Variables

Substitution and \(\beta\)-Reduction

Normal Forms and Reduction Strategies

The Church-Rosser Theorem

Applicative Order vs. Normal Order

Programming with \(\lambda\)-Calculus

Booleans and Conditionals

Pairs and Lists

Church Numerals

Recursion and the Y Combinator

De Bruijn Notation

Implementation Strategies

Chapter 3: Formal Semantics

Semantics and Category Theory

Semantics and Reality

The Post System

Operational Semantics for \(\lambda\)-Calculus

Defining Evaluation Order

Terminal Values

A Proof with Multi-step Reduction

Adding Primitives

Booleans and Conditionals

Numbers

Lists

Sets

Denotational Semantics

Looking Ahead

Chapter 4: Types

Stuck States and the Need for Types

Static vs. Dynamic Typing

The Simply-Typed Lambda-Calculus

Semantics with Terminal Values

Polymorphism

System F: The Polymorphic Lambda-Calculus

Encodings in System F

Properties of System F

Types in Practice

Adding More Types

Let Bindings and Recursion

Booleans, Numbers, Lists, and Sets

Chapter 5: Functional Programming

Introduction

Exemplar: Haskell

Expressions and Functions

Conditionals, Guards, and Syntactic Sugar

Environments and Name Binding

Algebraic Data Types

Parametric Polymorphism and System F

Type Inference

Unification

Algorithm W

Pattern Matching

Semantics of Patterns

The ML Module System

Errors and Exceptions

Lazy Evaluation

Impure State and Assignment

Monads

The IO Monad

Generalizing Monads

The Maybe Monad

The State Monad

Continuations and CPS

First-Class Continuations

Appendix: Full Semantics Reference

Chapter 6: Logic Programming