CS 444: Compiler Construction
Ondřej Lhoták
Estimated study time: 1 hr 23 min
Table of contents
Compiler Architecture Overview
A compiler translates source programs written in a high-level source language (e.g., Java) into a target language (e.g., x86 machine code). Understanding the full pipeline — from raw character streams to executable instructions — is the central goal of this course.
Compilers are traditionally divided into two halves. The front end analyzes the source program to verify correctness and extract structure. It comprises lexical analysis (scanning), syntax analysis (parsing), and context-sensitive semantic analysis. The back end synthesizes the target program from the structured representation; it handles optimization and code generation.
Source Program
│
▼
┌───────────────────┐
│ Scanning │ characters → tokens
└─────────┬─────────┘
▼
┌───────────────────┐
│ Parsing │ tokens → parse tree / AST
└─────────┬─────────┘
▼
┌───────────────────┐
│ Semantic Analysis│ type checking, name resolution
│ (A2–A4) │
└─────────┬─────────┘
▼
┌───────────────────┐
│ Code Generation │ AST → assembly
│ (A5) │
└─────────┬─────────┘
▼
Machine Code
In CS 444, the source language is Joos 1W — a carefully defined subset of Java 1.3 with most of the interesting features retained (classes, interfaces, inheritance, arrays, exceptions) but certain complexities removed (no inner classes, no generics, limited overloading). The target is IA-32 (i386) assembly, executable on Linux.
Formal Language Foundations
Before diving into scanning, we need a framework for precisely describing the sets of strings a language accepts. Three concepts form the foundation:
- Alphabet (Σ): A finite, non-empty set of symbols (e.g., ASCII characters, tokens).
- Word: A finite sequence of symbols drawn from an alphabet. The empty word is written ε.
- Language: A (precisely defined) set of words over an alphabet. Note that a language is just a set — it can be finite, infinite, or even empty.
The key insight is that we can describe languages with different levels of expressive power: regular languages (for scanning), context-free languages (for parsing), and less restricted formalisms (for semantics).
Lexical Analysis (Scanning)
Lexical analysis, or scanning, transforms a raw stream of characters into a sequence of tokens. A token is a classified lexeme — for example, the characters while become a keyword token WHILE, and 123 becomes an integer literal token INT_LIT(123).
Scanning uses regular languages, which are expressive enough to describe all valid token patterns but simple enough that recognition is linear-time and implementable with finite automata.
Regular Expressions
A regular expression over alphabet Σ is defined inductively, and each expression e denotes a language L(e):
| Expression | Language L(e) |
|---|---|
| \( a \in \Sigma \) | \( \{a\} \) |
| \( \varepsilon \) | \( \{\varepsilon\} \) |
| \( \emptyset \) | \( \{\} \) |
| \( e_1 \, e_2 \) | \( \{ xy \mid x \in L(e_1),\ y \in L(e_2) \} \) |
| \( e_1 \mid e_2 \) | \( L(e_1) \cup L(e_2) \) |
| \( e^* \) | \( \{\varepsilon\} \cup L(e) \cup L(e)L(e) \cup \cdots \) |
Common shorthands: \( e^+ = e \, e^* \), \( e? = e \mid \varepsilon \), \( [a\text{-}z] = a \mid b \mid \cdots \mid z \).
Example: Integer literals (non-empty sequences of digits): [0-9]+
Example: Java identifiers: [a-zA-Z_][a-zA-Z0-9_]*
Example: Block comments: /\*([^*]|\*[^/])*\*/
Finite Automata
Regular languages are recognized by finite automata. There are two variants; they have equal expressive power.
NFA (Non-deterministic Finite Automaton)
An NFA is a 5-tuple \( \langle \Sigma, Q, q_0, A, \delta \rangle \) where:
- \( \Sigma \): input alphabet
- \( Q \): finite set of states
- \( q_0 \in Q \): start state
- \( A \subseteq Q \): accepting states
- \( \delta : Q \times (\Sigma \cup \{\varepsilon\}) \to 2^Q \): transition function (may return multiple states, and allows ε-transitions)
An NFA accepts a string if at least one sequence of transitions leads to an accepting state.
DFA (Deterministic Finite Automaton)
A DFA is the same 5-tuple but with \( \delta : Q \times \Sigma \to Q \) — the transition function is total and returns exactly one state for every (state, character) pair.
Recognition algorithm (DFA):
input: string w, DFA M = (Σ, Q, q₀, δ, A)
output: true iff w ∈ L(M)
q ← q₀
for each character c in w:
q ← δ(q, c)
return q ∈ A
DFAs enable O(n) scanning with no backtracking. NFAs are easier to build from regexes; we convert them to DFAs for execution.
*NFA for regex (a|b)abb — 5-state NFA with ε-transitions (Thompson construction).
NFA-to-DFA Conversion (Subset Construction)
The key idea: simulate all possible NFA paths simultaneously by tracking sets of NFA states.
ε-closure: The set of states reachable from a given set via zero or more ε-transitions alone.
function ε_closure(S):
S' ← S
worklist ← copy of S
while worklist is not empty:
remove q from worklist
for each q' in δ(q, ε):
if q' ∉ S':
add q' to S' and worklist
return S'
Subset construction:
- DFA start state: \( q'_0 = \varepsilon\text{-closure}(\{q_0\})\)
- DFA transition: \( \delta'(q', a) = \varepsilon\text{-closure}\!\left(\bigcup_{q \in q'} \delta(q, a)\right) \)
- DFA accepting states: \( A' = \{q' \in Q' \mid q' \cap A \neq \emptyset\} \)
- Enumerate all reachable DFA states by applying δ’ until no new states appear.
Key property: In the worst case, an NFA with n states may yield a DFA with \( 2^n \) states (exponential blowup). In practice, this is rarely a problem for realistic programming language tokens.
Scanning with Maximal Munch
Real scanning must handle multiple token kinds (keywords, identifiers, literals, operators) and must decide where each token ends. The standard strategy is maximal munch: always consume the longest possible prefix that forms a valid token.
Abstract maximal munch:
while input is not empty:
find the maximal prefix of remaining input that is a valid token
if no non-empty valid prefix exists: ERROR
emit token; consume prefix
Concrete implementation with DFA backtracking:
while input is not empty:
run DFA from start state, advancing through characters
after each step, if in an accepting state, record (state, position) as a "candidate"
when DFA gets stuck (no transition for current character):
if no candidate was recorded: ERROR
backtrack input to candidate position
emit token identified by candidate state
reset DFA to start state
Java scanning note (JLS 3.2): Because of maximal munch, the expression a--b is scanned as a, --, b — not as a, -, -, b. The -- decrement operator is greedily consumed before the second - can be seen. This is why a--b is a compile error (you can’t apply -- to the value a and then subtract b).
Building a Scanning DFA from Multiple Token Rules
A scanner recognizes many token kinds simultaneously. The construction combines them all:
input: regular expressions R₁, R₂, ..., Rₙ (in priority order, highest first)
output: DFA whose accepting states are labeled with token kinds
1. Build NFA N for R₁ | R₂ | ... | Rₙ
(each Rᵢ has its own accepting state labeled with token kind i)
2. Convert N to DFA D using subset construction
3. For each DFA accepting state (which is a set of NFA states):
label it with the highest-priority token kind among the NFA accepting
states it contains
Priority example: if matches both the IF keyword pattern and the ID identifier pattern. Since keyword patterns have higher priority (they come first in the specification), the DFA accepting state for if is labeled IF, not ID. This is how scanners uniformly handle reserved keywords.
Regex: if → NFA states {acc_IF}
Regex: [a-z]+ → NFA states {acc_ID}
DFA state for "if" contains both NFA states.
Priority: IF > ID → label as IF.
Parsing
Scanning gives us a token sequence; parsing checks whether that sequence is a valid program and builds a structural representation of it. Parsing uses context-free grammars (CFGs), which are strictly more expressive than regular languages.
Context-Free Grammars
A CFG is a 4-tuple \( G = (N, T, R, S) \) where:
- \( T \): terminals (tokens, written lowercase:
a,b,if,+) - \( N \): non-terminals (syntactic categories, uppercase: \( S \), \( E \), \( D \))
- \( R \subseteq N \times V^* \): production rules, written \( A \to \alpha \)
- \( S \in N \): start symbol
- \( V = N \cup T \): all grammar symbols
Derivation: \( \beta A \gamma \Rightarrow \beta \alpha \gamma \) if \( A \to \alpha \in R \) (replace non-terminal with body). \( \alpha \Rightarrow^* \beta \): zero or more derivation steps.
Language of a grammar: \( L(G) = \{x \in T^* \mid S \Rightarrow^* x\} \)
Sentential form: Any string \( \alpha \in V^* \) such that \( S \Rightarrow^* \alpha \).
Parse tree: A tree with root labelled \( S \), internal nodes labelled with non-terminals, leaves with terminals or ε, such that each internal node’s children spell out a production rule.
A grammar is ambiguous if some sentence has two distinct parse trees. Ambiguity must be eliminated before using the grammar for parsing (or the parser must handle it explicitly).
A Simple Expression Grammar
E → E + T
E → T
T → T * F
T → F
F → ( E )
F → id
This grammar encodes operator precedence and left-associativity through its structure: multiplication binds tighter than addition because * appears lower in the hierarchy (closer to the leaf id), not through any special mechanism.
Left Recursion and Left Factoring
Left-recursive grammars (e.g., E → E + T) are problematic for top-down parsers because they cause infinite recursion. Left recursion can be eliminated:
Original: \( A \to A\alpha \mid \beta \) Equivalent: \( A \to \beta A' \), \( A' \to \alpha A' \mid \varepsilon \)
Left factoring handles ambiguous choice for top-down parsers when two alternatives share a common prefix:
Original: \( A \to \alpha \beta \mid \alpha \gamma \) Factored: \( A \to \alpha A' \), \( A' \to \beta \mid \gamma \)
Top-Down Parsing
Top-down parsing builds the parse tree from root to leaves, predicting which production to apply at each step.
Algorithm (naive):
α ← S
while α ≠ input:
let A be the leftmost non-terminal in α
choose some production A → β ∈ R
replace A with β in α
This produces a left derivation — leftmost non-terminal is always expanded first. The challenge is choosing the right production without trying all possibilities.
LL(1) Parsing
An LL(1) parser is a top-down parser that makes decisions using exactly one symbol of lookahead. The name stands for:
- L: scan input Left-to-right
- L: construct a Leftmost derivation
- 1: 1 token of lookahead
First, Nullable, and Follow Sets
Three sets are needed to decide which production to apply:
Nullable(β): True iff \( \beta \Rightarrow^* \varepsilon \). A sequence is nullable iff every symbol in it is nullable (terminals are never nullable, non-terminals may be).
\[ \text{first}(\beta) = \{a \in T \mid \exists \gamma : \beta \Rightarrow^* a\gamma\} \]\[ \text{follow}(A) = \{a \in T \mid \exists \gamma, \delta : S\$ \Rightarrow^* \gamma A a \delta\} \]Computing Follow: For each production \( B \to \alpha A \gamma \):
- Add \( \text{first}(\gamma) \) to \( \text{follow}(A) \)
- If \( \text{nullable}(\gamma) \), add \( \text{follow}(B) \) to \( \text{follow}(A) \)
These rules are applied iteratively until no more changes occur (fixpoint).
Example: For the grammar \( S \to aB,\ B \to b \mid \varepsilon \):
- Nullable(B) = true
- First(S) = {a}, First(B) = {b}
- Follow(B) = Follow(S) = {$}
Predict Sets
The predict set tells the parser which production to use for non-terminal A when seeing lookahead terminal a:
\[ \text{predict}(A, a) = \{A \to \beta \mid a \in \text{first}(\beta)\} \cup \{A \to \beta \mid \text{nullable}(\beta) \text{ and } a \in \text{follow}(A)\} \]A grammar is LL(1) iff \( |\text{predict}(A, a)| \leq 1 \) for all \( A \in N, a \in T \). This means no two productions for the same non-terminal share a predict set entry — the lookahead always uniquely determines which production to use.
LL(1) Parse Algorithm
The stack-based LL(1) algorithm runs in O(n) time:
Augment grammar: add fresh start S' → S$, append $ to input
Push S$ onto stack
for each terminal a in input$:
while top of stack is a non-terminal A:
pop A
look up predict(A, a):
if empty: ERROR (unexpected token)
if unique production A → β: push β (right-to-left)
pop terminal b from stack
if b ≠ a: ERROR (mismatch)
accept
Example: Grammar \( S \to aB,\ B \to b \mid \varepsilon \), input a:
Stack Input Action
S$ a$ predict(S, a) = {S → aB}, push aB
aB$ a$ pop a, match a
B$ $ predict(B, $) = {B → ε}, push ε (pop B)
$ $ accept
LL(1) Parse Table
The predict sets define a parse table with rows for non-terminals and columns for terminals. Entry [A, a] contains the production to use:
a b $
S aB
B b b ε
A conflict (two productions in the same cell) means the grammar is not LL(1).
LR Parsing (Bottom-Up)
Bottom-up parsing works in reverse: start with the input and repeatedly apply productions in reverse (reductions) until reaching the start symbol. This corresponds to constructing the parse tree from leaves to root.
Example derivation (reversed):
a + a$ ← E + a$ ← E + E$ ← E$ ← S
↑ ↑
E→a E→a
The parser maintains a stack representing the “processed” prefix of input. At each step it either shifts (push next input token) or reduces (pop the body of a production and push its head).
Viable Prefixes
A string α is a viable prefix if it can appear on the parsing stack without yet having detected an error. Formally: \( \alpha \) is a viable prefix iff \( \exists\beta : S \Rightarrow^* \alpha\beta \) where \( \alpha \) is a prefix of some right-sentential form.
Key invariant: The content of the LR parser stack is always a viable prefix.
LR(0) Parser
The LR(0) parser uses only the stack content (no lookahead) to decide whether to reduce.
LR(0) items: Dotted productions tracking how far along we are in recognizing a rule: \( A \to \alpha \cdot \beta \) means we have recognized \( \alpha \) and expect \( \beta \) next.
When the dot is at the right end \( (A \to \alpha \cdot) \), we can reduce by \( A \to \alpha \).
LR(0) NFA construction:
- States: All items \( A \to \alpha \cdot \beta \)
- Start: \( S' \to \cdot S\$ \) (augmented grammar)
- Transitions:
- On symbol \( X \): \( (A \to \alpha \cdot X\beta) \xrightarrow{X} (A \to \alpha X \cdot \beta) \)
- ε-transitions: \( (A \to \alpha \cdot B\beta) \xrightarrow{\varepsilon} (B \to \cdot \gamma) \) for all rules \( B \to \gamma \)
- All states are “accepting” (stack always a valid VP)
Convert the NFA to a DFA via subset construction (as before). Each DFA state is a set of LR(0) items (an “item set” or “configuration set”).
LR(0) parse decision: In each DFA state:
- If the state contains \( A \to \alpha \cdot \) and nothing else: reduce by \( A \to \alpha \)
- If the state contains only shift items \( A \to \alpha \cdot X\beta \): shift X
- Reduce-reduce conflict: Two reduce items in the same state → grammar is not LR(0)
- Shift-reduce conflict: A shift and reduce item in the same state → grammar is not LR(0)
LR(0) algorithm:
for each token a in input$:
while Reduce(stack) = {A → γ}:
pop |γ| symbols from stack
push A
if Reject(stack + a): ERROR
push a onto stack
accept when $ is consumed
Visualizing LR(0) DFA
For the grammar \( S' \to S\$,\ S \to a \mid E = E,\ E \to a \):
State 0: {S'→·S$, S→·a, S→·E=E, E→·a}
──a──▶ State 1: {S→a·, E→a·} (shift-reduce conflict on 'a'!)
──E──▶ State 2: {S→E·=E}
──S──▶ State 3: {S'→S·$}
State 1 has both \( S \to a \cdot \) and \( E \to a \cdot \) — a reduce-reduce conflict. This grammar is not LR(0) but is LR(1) (the lookahead resolves the conflict).
SLR(1) Parser
SLR(1) adds a simple lookahead to LR(0): reduce by \( A \to \alpha \) only when the current lookahead is in \( \text{follow}(A) \).
Reduce(stack, a) = {A → γ | stack ends with viable prefix βγ,
βA is a viable prefix,
a ∈ follow(A)}
SLR(1) conflicts:
- Shift-reduce: State has \( A \to \alpha \cdot \) and \( B \to \beta \cdot X\gamma \), and \( X \in \text{follow}(A) \)
- Reduce-reduce: State has \( A \to \alpha \cdot \) and \( B \to \beta \cdot \), and \( \text{follow}(A) \cap \text{follow}(B) \neq \emptyset \)
SLR(1) uses the same DFA as LR(0) but resolves more conflicts by consulting follow sets.
LR(1) Parser
LR(1) items carry an explicit lookahead: \( (A \to \alpha \cdot \beta, a) \) means “after shifting α, if we eventually reduce by this rule, the next input must be a.”
LR(1) NFA:
- States: All items \( (A \to \alpha \cdot \beta,\ a) \)
- Start state: \( (S' \to \cdot S\$,\ \$) \)
- Shift transition: \( (A \to \alpha \cdot X\beta,\ a) \xrightarrow{X} (A \to \alpha X \cdot \beta,\ a) \)
- ε-closure: \( (A \to \alpha \cdot B\beta,\ a) \xrightarrow{\varepsilon} (B \to \cdot \gamma,\ b) \) for all \( b \in \text{first}(\beta a) \)
LR(1) reduce: Reduce \( A \to \alpha \) when item \( (A \to \alpha \cdot,\ a) \) is in the current state and current lookahead is \( a \).
LR(1) > SLR(1) > LR(0):
| Parser | Reduces when… |
|---|---|
| LR(0) | βA is a viable prefix |
| SLR(1) | βA is a VP and a ∈ follow(A) |
| LR(1) | βAa is a viable prefix |
LR(1) is strictly more powerful but its DFA can have exponentially more states than SLR(1)/LR(0).
LALR(1) Parser
LALR(1) (Lookahead LR) is the practical compromise: use the LR(0) DFA (fewer states) but attach lookahead sets derived from the LR(1) DFA.
Core of a state: Strip lookaheads from LR(1) items → \( \text{core}(q) = \{A \to \alpha \cdot \beta \mid (A \to \alpha \cdot \beta, a) \in q\} \).
Multiple LR(1) states can share the same core but differ in lookaheads. LALR(1) merges such states and takes the union of their lookaheads.
Algorithm:
for each LR(0) state q:
for each item A → α·β in q:
lookaheads(A→α·β, q) ← {a | ∃ LR(1) state q' : core(q')=core(q)
and (A→α·β, a) ∈ q'}
LALR(1) conflicts are rarer than SLR(1) because the lookaheads are more precise (per-state, not global). Most real programming language grammars (including Joos) are LALR(1), which is why yacc/bison use LALR(1).
Hierarchy (from weakest to strongest):
LR(0) ⊂ SLR(1) ⊂ LALR(1) ⊂ LR(1) ⊂ Context-Free
A grammar accepted by a weaker parser is also accepted by all stronger ones. Most practical grammars are LALR(1).
Generating the LR(1) Parse Table
The parse table has:
- Rows: DFA states
- Columns: terminals (for ACTION) and non-terminals (for GOTO)
- Entries:
- shift n: shift token and go to state n
- reduce A→α: reduce by this production
- goto n: after reduce, go to state n
- accept: success
Conflict example (grammar not LALR(1)):
S' → S$
S → a
S → E = E
E → a
For input a$, state after seeing a contains both S→a· and E→a·. With SLR(1), follow(S) = {$} and follow(E) = {$, =} overlap at $, causing a reduce-reduce conflict. With LR(1), the items are (S→a·, $) and (E→a·, $) — distinct only by what follows, yet both have lookahead $. LR(1) also has a conflict here; the grammar itself is inherently ambiguous on input a$.
Abstract Syntax Trees and Weeding
From Parse Tree to AST
The parse tree (also called concrete syntax tree) produced by the parser contains a great deal of syntactic noise: intermediate non-terminals introduced for precedence and associativity, single-child chains, and extra punctuation nodes. The Abstract Syntax Tree (AST) is a cleaned-up, semantically meaningful tree structure.
Parse tree vs AST for 1 + 2 * 3 — concrete tree preserves grammar structure; AST captures semantics directly.
Parse tree for 1 + 2 * 3:
E
/|\
E + T
| |\
T T * F
| | |
F F 3
| |
1 2
Corresponding AST:
+
/ \
1 *
/ \
2 3
The AST discards the non-terminal hierarchy used to encode precedence and replaces it with direct tree structure.
Constructing the AST
AST construction is typically done during or just after parsing:
- Each grammar non-terminal corresponds to a class hierarchy in the AST
- Production rules correspond to constructors
- Parsing actions create AST nodes bottom-up
A typical Joos AST has nodes like BinaryExpr, UnaryExpr, MethodInvocation, ClassDecl, FieldDecl, IfStmt, WhileStmt, ReturnStmt, etc.
Weeding
Weeding is a post-parsing pass that enforces language constraints that could be checked by the grammar but would make the grammar too complex or unreadable. Weeding walks the AST and explicitly checks rules like:
- A class cannot be both
abstractandfinal - A
finalmethod cannot also beabstract - No duplicate
implementsinterfaces in a class declaration - An interface method cannot have a body
- A constructor name must match the enclosing class name
voidcan only be used as a return type, not in expressions- Every
abstractclass method must be in anabstractclass
Why not encode these in the grammar? Some rules involve counting (no duplicate modifiers) or cross-referencing distant parts of the tree (constructor name vs. class name). These are far cleaner as AST traversals than as grammar constraints.
Semantic Analysis
Parsing verifies syntax but not meaning. Semantic analysis (also called context-sensitive analysis) checks that the program makes semantic sense: names refer to declared entities, types are consistent, and the inheritance hierarchy is well-formed.
Environments and Scope
A scope is a region of source code within which a name declaration is visible. An environment (also called a symbol table) maps names to their declarations.
Declaring a name (entering a new declaration):
if name already exists in current environment: ERROR (duplicate declaration)
else: insert name → declaration into environment
Looking up a name (resolving a reference):
search current (innermost) environment
if not found: search enclosing environment
...
if not found at outermost scope: ERROR (undeclared name)
In practice, environments are organized as a stack or chain following the nesting of scopes.
Namespaces in Java
Java has six distinct namespaces (JLS 6.5.1) — names in different namespaces cannot conflict with each other:
- Package namespace:
com.example - Type namespace: class and interface names
- Method namespace: method names
- Expression namespace: local variables, parameters, fields
- Package-or-type namespace: for qualified names that could be either
- Ambiguous namespace: for multi-part names like
a.b.cbefore disambiguation
Joos simplifies this but retains the essential separation of types, methods, and variables.
Java Name Resolution: Seven Passes
Joos name resolution is performed in a specific order across the entire program (not statement-by-statement). Each pass builds on the previous one.
Pass 1: Build the Global Environment
Collect all class and interface declarations across all files, indexed by their fully-qualified names (com.example.Foo). This creates a global map: fully-qualified name → class/interface AST node.
Pass 2: Resolve Type Names
Every type reference (in declarations, cast expressions, instanceof, etc.) must be resolved to a class/interface in the global environment.
For a simple type name Foo appearing in class C in package pkg:
- Check if
Foois declared in the same compilation unit (explicit import or same package) - Check single-type-import declarations (
import a.b.Foo;) - Check types in the same package (
pkg.Foo) - Check type-import-on-demand (
import a.b.*;) - Ambiguity if found in more than one place at the same priority
For a qualified type name a.b.Foo:
- Traverse:
ais a package,a.bis a package,Foois a class in that package
Pass 3: Check the Class Hierarchy
Verify that the inheritance graph is well-formed:
- No class extends itself (directly or transitively) — no cycles
- Every superclass exists and is accessible
- No class extends an interface (use
implements) - No interface extends a class
Pass 4: Disambiguate Ambiguous Names
A multi-part name like a.b.c.d could mean many things. Resolution algorithm:
Given name a₁.a₂.a₃.a₄:
1. If a₁ is a local variable in scope:
a₂ is an instance field of type(a₁)
a₃ is an instance field of its type, etc.
2. Else if a₁ is a field in contain(current class):
a₂, a₃, ... are instance fields
3. Else try prefixes of increasing length:
if a₁ is a type → a₂ is a static field, a₃... are instance fields
if a₁.a₂ is a type → a₃ is a static field, etc.
Pass 5: Resolve Expressions (Variables and Fields)
Resolve all variable and field references within method bodies and initializers. This attaches each name occurrence to its declaration node.
Pass 6: Type Checking
Check that all expressions are type-correct (see Type Checking section below).
Pass 7: Resolve Method Calls and Instance Fields
After type checking (which assigns types to all sub-expressions), resolve method calls:
- For
e.m(args), look up methodmincontain(type(e)) - In Joos, there is no overloading (simplification), so the method name uniquely identifies it within a class hierarchy
Class Hierarchy and Inheritance
Formalizing the Hierarchy
The class hierarchy defines which methods and fields each class inherits and which it overrides.
Notation:
- \( \text{super}(T) \): direct superclasses/superinterfaces of type T
- \( S < T \): S is a strict subtype of T (directly or transitively)
- \( S \leq T \): S is a subtype of T (including \( S = T \))
Method and field sets:
- \( \text{declare}(T) \): methods/fields directly declared in T
- \( \text{inherit}(T) \): methods/fields inherited by T (not redeclared)
- \( \text{contain}(T) = \text{declare}(T) \cup \text{inherit}(T) \): all accessible members
Signatures: \( \text{sig}(m) = (\text{name}(m),\ \text{paramTypes}(m)) \). Note that return type is not part of the signature.
Overriding: \( (m, m') \in \text{replace} \) means m overrides m’. This happens when m ∈ declare(T), m’ ∈ contain(S), S ∈ super(T), and sig(m) = sig(m’).
Inheritance Rules (Inference Rules)
These rules define when a method is inherited:
Hierarchy Checks
The compiler must enforce 10 well-formedness constraints on the hierarchy (JLS §8,9):
| # | Constraint |
|---|---|
| 1 | No cycles: \( \neg\exists T: T < T \) |
| 2 | No duplicate methods: distinct \( m, m' \in \text{declare}(T) \Rightarrow \text{sig}(m) \neq \text{sig}(m') \) |
| 3 | Return type consistency: \( m, m' \in \text{contain}(T),\ \text{sig}(m) = \text{sig}(m') \Rightarrow \text{type}(m) = \text{type}(m') \) |
| 4 | Concrete class completeness: if \( m \in \text{contain}(T) \) and abstract \( \in \text{mods}(m) \), then abstract \( \in \text{mods}(T) \) |
| 5 | Static consistency: \( (m, m') \in \text{replace} \Rightarrow (\text{static} \in \text{mods}(m) \Leftrightarrow \text{static} \in \text{mods}(m')) \) |
| 6 | Return type match: \( (m, m') \in \text{replace} \Rightarrow \text{type}(m) = \text{type}(m') \) |
| 7 | Visibility non-narrowing: \( (m, m') \in \text{replace},\ \text{public} \in \text{mods}(m') \Rightarrow \text{public} \in \text{mods}(m) \) |
| 8 | (No checked exception widening) |
| 9 | No overriding final: \( (m, m') \in \text{replace} \Rightarrow \text{final} \notin \text{mods}(m') \) |
| 10 | Field uniqueness: distinct \( f, f' \in \text{declare}(T) \Rightarrow \text{name}(f) \neq \text{name}(f') \) |
Interface Special Cases
- Interface methods are implicitly
public abstract - Interface fields are implicitly
public static final - An empty interface implicitly inherits abstract versions of all
publicmethods fromjava.lang.Object - A class implementing an interface must provide (directly or via inheritance) concrete implementations of all abstract interface methods, or be declared
abstract
Type Checking
Type System Concepts
A type at the static (compile-time) level is the set of values an expression can evaluate to. At runtime, every value has a dynamic type describing how its bits are to be interpreted.
Declared type: The static assertion that a variable or expression holds only values of that type.
Type correctness: Informally, a program is type-correct if no operation is ever applied to a value of the wrong type at runtime. This property is undecidable in general (Rice’s theorem applies), so compilers use a conservative type system: a static approximation that may reject some valid programs but never accepts invalid ones.
Soundness: A type system is sound if static type correctness implies runtime type correctness. Java’s type system is almost sound — it has exactly one soundness gap (array covariance, discussed below).
Type Judgments
Type checking is specified with inference rules written as:
\[ \frac{\text{premises}}{\text{conclusion}} \]The conclusion \( C, L, \sigma \vdash E : \tau \) means: “In class \( C \), local variable environment \( L \), with method return type \( \sigma \), expression \( E \) has type \( \tau \).”
Literal and Variable Rules
\[ \frac{}{C, L, \sigma \vdash 42 : \texttt{int}} \qquad \frac{}{C, L, \sigma \vdash \texttt{true} : \texttt{boolean}} \qquad \frac{}{C, L, \sigma \vdash \texttt{'a'} : \texttt{char}} \]\[ \frac{}{C, L, \sigma \vdash \texttt{"abc"} : \texttt{java.lang.String}} \qquad \frac{}{C, L, \sigma \vdash \texttt{null} : \texttt{null}} \]\[ \frac{L(n) = \tau}{C, L, \sigma \vdash n : \tau} \qquad \frac{}{C, L, \sigma \vdash \texttt{this} : C} \]Operator Rules
\[ \frac{C, L, \sigma \vdash E : \texttt{int}}{C, L, \sigma \vdash {-E} : \texttt{int}} \qquad \frac{C, L, \sigma \vdash E : \texttt{boolean}}{C, L, \sigma \vdash {!E} : \texttt{boolean}} \]\[ \frac{E_1 : \tau_1 \quad E_2 : \tau_2 \quad \text{num}(\tau_1) \quad \text{num}(\tau_2)}{E_1 + E_2 : \texttt{int}} \]\[ \frac{E_1 : \texttt{String} \quad E_2 : \tau \quad \tau \neq \texttt{void}}{E_1 + E_2 : \texttt{String}} \qquad \frac{E_1 : \tau \quad E_2 : \texttt{String} \quad \tau \neq \texttt{void}}{E_1 + E_2 : \texttt{String}} \]Where \( \text{num}(\tau) \Leftrightarrow \tau \in \{\texttt{byte}, \texttt{short}, \texttt{char}, \texttt{int}\} \).
\[ \frac{E_1 : \tau_1 \quad E_2 : \tau_2 \quad (\tau_1 :=: \tau_2 \vee \tau_2 :=: \tau_1)}{E_1 == E_2 : \texttt{boolean}} \]\[ \frac{E_1 : \tau_1 \quad E_2 : \tau_2 \quad \text{num}(\tau_1) \quad \text{num}(\tau_2)}{E_1 < E_2 : \texttt{boolean}} \]Assignability
Assignability \( \tau_1 := \tau_2 \) (read: “τ₁ can hold a value of type τ₂”) is defined by these rules:
\[ \tau := \tau \quad \text{(reflexivity)} \]\[ \frac{D \leq C}{C := D} \quad \text{(subtype)} \qquad \frac{}{C := \texttt{null}} \quad \text{(null is assignable to any reference type)} \]\[ \texttt{int} := \texttt{short} \quad \texttt{int} := \texttt{char} \quad \texttt{short} := \texttt{byte} \quad \text{(numeric widening)} \]\[ \frac{\sigma := \tau \quad \tau := \rho}{\sigma := \rho} \quad \text{(transitivity)} \]Cast Rules
\[ \frac{E : \tau_1 \quad (\tau_2 := \tau_1 \vee \tau_1 := \tau_2)}{(\tau_2)E : \tau_2} \quad \text{(reference cast)} \]\[ \frac{E : \tau_1 \quad \text{num}(\tau_1) \quad \text{num}(\tau_2)}{(\tau_2)E : \tau_2} \quad \text{(numeric cast)} \]Reference casts require a runtime check (may throw ClassCastException). Numeric casts truncate bits as needed.
Field and Method Rules
\[ \frac{\texttt{static}\ \tau\ f \in \text{contain}(D)}{D.f : \tau} \]\[ \frac{E : D \quad \tau\ f \in \text{contain}(D) \quad \texttt{static} \notin \text{mods}(f)}{E.f : \tau} \]\[ \frac{E : D \quad \forall i: E_i : \tau_i \quad \tau\ m(\tau_1,\ldots,\tau_k) \in \text{contain}(D)}{E.m(E_1,\ldots,E_k) : \tau} \]Statement Rules
\[ \frac{C, L, \sigma \vdash E : \tau \quad \sigma := \tau}{C, L, \sigma \vdash \texttt{return}\ E} \qquad \frac{}{C, L, \texttt{void} \vdash \texttt{return}} \]\[ \frac{C, L, \sigma \vdash E_1 : \tau_1 \quad C, L, \sigma \vdash E_2 : \tau_2 \quad \tau_1 := \tau_2}{C, L, \sigma \vdash E_1 = E_2 : \tau_1} \]\[ \frac{C, L, \sigma \vdash E : \texttt{boolean} \quad C, L, \sigma \vdash S_1 \quad C, L, \sigma \vdash S_2}{C, L, \sigma \vdash \texttt{if}\ (E)\ S_1\ \texttt{else}\ S_2} \]Array Rules
\[ \frac{E_1 : \tau_1[] \quad E_2 : \tau_2 \quad \text{num}(\tau_2)}{E_1[E_2] : \tau_1} \quad \text{(array access)} \]\[ \frac{E : \tau[]}{E.\texttt{length} : \texttt{int}} \quad \text{(array length)} \]\[ \frac{E : \tau_2 \quad \text{num}(\tau_2)}{\texttt{new}\ \tau_1[E] : \tau_1[]} \quad \text{(array creation)} \]\[ \frac{D \leq C}{C[] := D[]} \quad \text{(array covariance)} \]This rule seems natural but introduces unsoundness:
B[] bs = new B[1];
Object[] os = bs; // OK: Object[] := B[] (covariance)
os[0] = new C(); // Allowed statically (C is an Object)
// But throws ArrayStoreException at runtime!
B b = bs[0]; // b would have wrong type
The JVM detects this at runtime with ArrayStoreException. The compiler must emit a runtime type check on every array write.
Type Checking Implementation
function typecheck(node E):
find the type rule matching E's AST node kind
recursively typecheck all subexpressions to get their types
verify all premises hold
return the type τ from the conclusion
Static Analysis
Static analysis proves properties about runtime behavior without executing the program. Unlike type checking (which is always applied), some analyses are optional but improve safety or performance.
Rice’s Theorem and Conservative Analysis
Rice’s theorem: For any non-trivial property of program behavior, determining whether an arbitrary program has that property is undecidable.
This means we cannot precisely determine most runtime properties at compile time. Instead, compilers use conservative approximations — analyses that may give imprecise answers but are never wrong in the direction that matters:
- A soundly conservative reachability analysis may say “maybe reachable” when code is actually unreachable. It never says “unreachable” for code that executes.
- A soundly conservative type analysis may reject some valid programs. It never accepts type-unsafe programs.
Java Reachability Analysis (JLS §14.20)
Java requires that every statement be potentially reachable — dead code is a compile error. The analysis tracks two values per statement:
- \( \text{in}[s] \in \{\texttt{maybe}, \texttt{no}\} \): can s begin executing?
- \( \text{out}[s] \in \{\texttt{maybe}, \texttt{no}\} \): can s finish executing (fall through)?
Merging: \( \texttt{maybe} \vee \texttt{no} = \texttt{maybe} \), \( \texttt{no} \vee \texttt{no} = \texttt{no} \)
Rules by statement kind:
| Statement | in[S] | out[S] |
|---|---|---|
return E | — | no |
if (E) S | given | out[S] ∨ in[L] |
if (E) S1 else S2 | given | out[S1] ∨ out[S2] |
while (E) S (general) | given | in[L] ∨ out[S] |
while (true) S | given | no |
while (false) S | in[S] = no | in[L] |
Block {S1; S2} | in[S1]=in[L], in[S2]=out[S1] | out[S2] |
| Any other statement | given | in[L] |
Error conditions:
- If \( \text{in}[s] = \texttt{no} \): unreachable code — ERROR
- If \( \text{out}[\text{method body}] = \texttt{maybe} \) and return type ≠ void: ERROR (missing return)
Constant expressions: while(1 == 1) is treated as while(true) because 1 == 1 is a constant expression evaluating to true. This is evaluated at compile time per JLS §15.28.
Definite Assignment Analysis (JLS §16)
Every local variable must be definitely assigned before it is read. This prevents reading uninitialized variables.
Definitions:
- \( \text{in}[s] \subseteq \text{Vars} \): set of variables definitely initialized before s
- \( \text{out}[s] \subseteq \text{Vars} \): set of variables definitely initialized after s
For conditional expressions, we track separate “true-exit” and “false-exit” sets:
- \( \text{out}_\top[E] \): variables initialized when E evaluates to true
- \( \text{out}_\bot[E] \): variables initialized when E evaluates to false
Rules for key statements:
\[ \text{in}[E] = \text{in}[L], \quad \text{out}[L] = \text{out}[E] \cup \{x\} \]\[ \text{in}[E] = \text{in}[L], \quad \text{out}[L] = \text{out}[E] \cup \{x\} \]\[ \text{in}[E] = \text{in}[L], \quad \text{in}[S_1] = \text{out}_\top[E], \quad \text{in}[S_2] = \text{out}_\bot[E] \]\[ \text{out}[L] = \text{out}[S_1] \cap \text{out}[S_2] \]\[ \text{in}[E_1] = \text{in}[L], \quad \text{in}[E_2] = \text{out}_\top[E_1] \]\[ \text{out}_\top[L] = \text{out}_\top[E_2], \quad \text{out}_\bot[L] = \text{out}_\bot[E_1] \cap \text{out}_\bot[E_2] \]\[ \text{in}[E_1] = \text{in}[L], \quad \text{in}[E_2] = \text{out}_\bot[E_1] \]\[ \text{out}_\top[L] = \text{out}_\top[E_1] \cup \text{out}_\top[E_2], \quad \text{out}_\bot[L] = \text{out}_\bot[E_2] \]Joos simplification: Local variables must be initialized at declaration (int x = expr), so definite assignment analysis is simpler — no uninitialized declaration case.
Live Variable Analysis
A variable is live at a program point if its current value might be read before it is next overwritten. Dead assignments (writing a variable whose value will never be read) can be eliminated.
Data flow equations (backward analysis):
- \( \text{in}[S] = (\text{out}[S] \setminus \text{KILL}[S]) \cup \text{GEN}[S] \)
Where:
- \( \text{GEN}[S] \): variables read in S (they must be live before S)
- \( \text{KILL}[S] \): variables written in S (earlier values become dead)
For assignment x = e: KILL = {x}, GEN = vars_read(e)
The loop body is live if either the next statement needs it OR the body itself needs it.
Fixed-Point Computation with Lattices
The equations for live variable analysis (and other dataflow analyses) are cyclic (a loop’s in and out sets depend on each other). We solve them iteratively using fixed-point computation.
Partial order (⊑): A binary relation that is reflexive, transitive, and antisymmetric.
Lattice: A partial order where every pair of elements has a least upper bound (\( \sqcup \), join) and greatest lower bound (\( \sqall \), meet).
Complete lattice: Every set of elements has a least upper bound and greatest lower bound.
Examples:
- Power set \( 2^{\text{Vars}} \) ordered by \( \subseteq \): complete lattice, \( \bot = \emptyset \), \( \top = \text{Vars} \), \( \sqcup = \cup \), \( \sqall = \cap \)
- \( \{\texttt{no}, \texttt{maybe}\} \) ordered by no ⊑ maybe: complete lattice with \( \bot = \texttt{no} \)
Monotone function: \( f: L \to L \) is monotone if \( x \sqsubseteq y \Rightarrow f(x) \sqsubseteq f(y) \).
- A fixed point exists.
- The least fixed point equals \( \bigsqcup_{n \geq 0} F^n(\bot) \).
Iterative algorithm for live variable analysis:
initialize in[S] = out[S] = ∅ for all S
repeat:
update all in[S], out[S] according to dataflow equations
until no changes occur
Termination is guaranteed because:
- The lattice \( 2^{\text{Vars}} \) is finite
- Each iteration increases or preserves \( \sqcup \) (monotone)
- A finite lattice has no infinite ascending chains
The General Dataflow Analysis Framework
The three analyses covered above (reachability, definite assignment, live variables) are all instances of a single general dataflow framework. A dataflow analysis is specified by four components:
| Component | Description |
|---|---|
| Direction | Forward (info flows from entry to exit) or Backward (exit to entry) |
| Lattice \( L \) | The domain of analysis facts; must be a complete lattice |
| Initial value \( \bot \) | Starting assumption at all program points |
| Transfer function | For each statement \( S \): \( f_S : L \to L \) (must be monotone) |
The meet operator \( \sqcup \) merges information from multiple control-flow paths. Different analyses use different meet operators:
| Analysis | Direction | Lattice | Meet | \( \bot \) |
|---|---|---|---|---|
| Reachability | Forward | \( \{\texttt{no}, \texttt{maybe}\} \) | \( \vee \) | no |
| Definite Assignment | Forward | \( 2^{\text{Vars}} \) ordered by \( \supseteq \) | \( \cap \) | Vars |
| Live Variables | Backward | \( 2^{\text{Vars}} \) ordered by \( \subseteq \) | \( \cup \) | \( \emptyset \) |
| Reaching Definitions | Forward | sets of definitions | \( \cup \) | \( \emptyset \) |
| Available Expressions | Forward | sets of expressions | \( \cap \) | all exprs |
The universal algorithm is always the same: initialize to \( \bot \), apply transfer functions repeatedly until no changes (Kleene fixed-point iteration).
Three-address code with SSA φ-nodes — each variable defined exactly once; φ selects value based on control flow path.
Reaching Definitions
A definition of variable \( x \) is any assignment x = ... in the program. A definition \( d \) reaches program point \( p \) if there exists an execution path from \( d \) to \( p \) along which \( x \) is not redefined.
Purpose: Reaching definitions is a forward analysis used to:
- Determine whether constant propagation is safe (if only one definition of
xreaches a use, and that definition assigns a constant, thenxis constant at that use) - Build def-use chains for optimization
Where:
- \( \text{GEN}[S] = \{d\} \) if \( S \) is a definition \( d \): \( x = e \)
- \( \text{KILL}[S] = \) all other definitions of \( x \) in the program
Meet operator: \( \text{in}[S] = \bigcup_{\text{pred } p} \text{out}[p] \) (union — a definition reaches a point if it reaches along any path)
Example:
d1: x = 1 // defines x
d2: y = x + 1 // uses x; d1 reaches here
d3: x = y * 2 // kills d1, defines new d3
d4: z = x // only d3 reaches here
At d4, only definition d3 for x is in the reaching-definitions set — d1 was killed by d3. This tells the optimizer that x at d4 has the value computed at d3.
Lattice: \( 2^{\text{Defs}} \) with \( \subseteq \) order, \( \bot = \emptyset \), \( \sqcup = \cup \). Each element is a set of (possibly) reaching definitions.
Available Expressions
An expression \( e \) is available at program point \( p \) if every path from the program entry to \( p \) evaluates \( e \), and none of \( e \)’s operands is redefined after that last evaluation.
Purpose: Available expressions is a forward analysis used for Common Subexpression Elimination (CSE): if x + y is available at a use, compute it once, save it in a temporary, and reuse.
Where:
- \( \text{GEN}[S] \) = expressions computed by \( S \) whose operands are not modified by \( S \)
- \( \text{KILL}[S] \) = all expressions in the program that contain any variable written by \( S \)
Meet operator: \( \text{in}[S] = \bigcap_{\text{pred } p} \text{out}[p] \) (intersection — an expression is available only if it reaches along every path)
Initial value: \( \text{in}[\text{entry}] = \emptyset \). For all other nodes, initialize to the set of all expressions (the \( \top \) of the lattice) so that intersection can only shrink the set.
Lattice: \( 2^{\text{Exprs}} \) ordered by \( \supseteq \) (larger sets are lower in the lattice; \( \bot \) = set of all expressions, \( \top = \emptyset \)), \( \sqcup = \cap \). This is the “pessimistic must” direction: an expression is available only if all paths guarantee it.
Example:
// Entry: {}
a = b + c; // GEN: {b+c}, out = {b+c}
if (...) {
d = b + c; // b+c already in, no kill, out = {b+c}
} else {
b = 1; // KILL: {b+c} (b is written), out = {}
}
// Meet (intersection): {} ∩ {b+c} = {} — b+c NOT available
e = b + c; // must recompute
If the else branch did not modify b or c, b+c would be available after the merge, and the compiler could avoid recomputing it.
Code Generation: IA-32 Assembly
Code generation translates the type-checked, annotated AST into actual machine instructions. The course targets IA-32 (i386) — the 32-bit x86 architecture.
IA-32 Architecture Overview
The i386 is the 32-bit version of the x86 architecture, introduced with the Intel 80386 in 1986. It features:
- 8 general-purpose 32-bit registers: eax, ebx, ecx, edx, esi, edi, esp, ebp
- 6 segment registers (largely ignored in user-space code): cs, ds, es, fs, gs, ss
- EFLAGS register: condition codes (ZF, SF, OF, CF, etc.) set by comparisons
- EIP: instruction pointer (program counter)
Sub-register access: ax (low 16 bits of eax), al (low 8 bits), ah (high 8 bits of ax).
Register Conventions
| Register | Role | Save |
|---|---|---|
| eax | Return value, accumulator | caller-save |
| ecx | Loop counter | caller-save |
| edx | High bits of multiply/divide | caller-save |
| ebx | General purpose | callee-save |
| esi | Source index | callee-save |
| edi | Destination index | callee-save |
| esp | Stack pointer | callee-save |
| ebp | Frame pointer (base pointer) | callee-save |
Callee-save registers must be preserved by a function — if a callee uses them, it must save/restore them. Caller-save registers may be clobbered by a called function — if the caller needs them, it must save them before calling.
IA-32 Instruction Set (Selected)
Memory Addressing
The i386 supports flexible memory addressing: [base + index*scale + displacement]
mov eax, ebx ; register to register: eax = ebx
mov eax, [ebx] ; load: eax = *ebx
mov [eax], ebx ; store: *eax = ebx
mov eax, 42 ; immediate: eax = 42
mov eax, [ebp - 8] ; local variable at frame offset -8
mov eax, [eax + ecx*4] ; array element (4-byte ints, index in ecx)
Arithmetic
add eax, ebx ; eax += ebx
sub eax, ebx ; eax -= ebx
imul eax, ebx ; eax *= ebx (signed 32-bit result)
cdq ; sign-extend eax into edx:eax (for division)
idiv ecx ; eax = edx:eax / ecx; edx = remainder (signed)
neg eax ; eax = -eax
Comparisons and Branches
cmp eax, ebx ; sets flags based on eax - ebx (no result stored)
je label ; jump if equal (ZF=1)
jne label ; jump if not equal
jg label ; jump if greater (signed)
jge label ; jump if ≥ (signed)
jl label ; jump if less (signed)
jle label ; jump if ≤ (signed)
ja label ; jump if above (unsigned)
jb label ; jump if below (unsigned)
jmp label ; unconditional jump
Stack and Calls
push eax ; esp -= 4; [esp] = eax
pop eax ; eax = [esp]; esp += 4
call label ; push eip; jmp label (stores return address)
ret ; pop eip (return to caller)
Data Declarations (NASM syntax)
section .data
myint: dd 1234 ; declare 32-bit integer
mystr: db 'h','e','l','l','o',0 ; null-terminated string
myarr: dd 1, 2, 3, 4 ; array of 4 ints
align 4 ; pad to 4-byte boundary
section .text
global main ; export symbol
extern printf ; import external symbol
System Calls
mov eax, 4 ; sys_write
mov ebx, 1 ; fd = stdout
mov ecx, msg ; buffer address
mov edx, len ; length
int 0x80 ; invoke kernel
Code Generation Strategy
Three main tasks:
- Plan data layout: Decide where each variable, field, and object lives in memory.
- Generate code bottom-up from AST: Each node type has a code template.
- Write runtime support: Malloc, exception handlers, vtable structures.
Data Layout Decisions
| Data | Storage | Rationale |
|---|---|---|
| Integer/boolean literals | Immediates | Encoded in instruction |
| Local variables | Stack | Automatic lifetime, fast access via ebp offsets |
| Objects | Heap | Dynamic lifetime, reference semantics |
| Static fields | Fixed memory (labeled) | Per-class, persistent |
| String/array objects | Heap | Variable size |
| Vtables / class structures | Fixed memory | One per class, permanent |
Stack Frame Layout
Every function call establishes a stack frame — a fixed region on the stack for that invocation’s local variables and bookkeeping.
Standard IA-32 frame layout (cdecl convention):
High address
┌────────────────────┐
│ arg n │ ← [ebp + 4*(n+1) + 4]
│ ... │
│ arg 2 │ ← [ebp + 16]
│ arg 1 │ ← [ebp + 12]
│ arg 0 │ ← [ebp + 8]
│ return address │ ← [ebp + 4] (pushed by call)
│ saved ebp │ ← [ebp + 0] (pushed by prologue)
│ local var 0 │ ← [ebp - 4]
│ local var 1 │ ← [ebp - 8]
│ ... │
│ local var n │ ← [ebp - 4*(n+1)]
└────────────────────┘
Low address (esp points here)
Function prologue:
push ebp ; save caller's frame pointer
mov ebp, esp ; establish new frame pointer
sub esp, N ; allocate N bytes for local variables
Function epilogue:
mov esp, ebp ; deallocate locals
pop ebp ; restore caller's frame pointer
ret ; return to caller
Calling Convention (cdecl)
Caller’s responsibility:
- Save caller-save registers if needed (eax, ecx, edx)
- Push arguments right-to-left (last arg first)
call foo(pushes return address)- After return, pop arguments:
add esp, 4*n - Result is in eax
Callee’s responsibility:
- Execute prologue (push ebp, mov ebp esp, sub esp N)
- Save callee-save registers that will be used (ebx, esi, edi)
- Execute function body
- Place return value in eax
- Restore callee-save registers
- Execute epilogue (mov esp ebp, pop ebp, ret)
Example: int f(int a, int b)
; --- Caller side ---
push 2 ; push b
push 1 ; push a
call f
add esp, 8 ; pop 2 args × 4 bytes
; --- Callee: f(a, b) ---
f:
push ebp
mov ebp, esp
sub esp, 4 ; one local variable
; a is at [ebp + 8], b is at [ebp + 12]
mov eax, [ebp + 8] ; load a into eax (return value)
add eax, [ebp+12] ; eax = a + b
mov esp, ebp
pop ebp
ret
Code Generation for Expressions
The code generator for an expression node produces code that leaves the result in eax. A common pattern is push/pop to save intermediate values:
Binary subtraction E1 - E2:
; generate code for E1
push eax ; save E1 result
; generate code for E2
pop ebx ; restore E1 into ebx
sub ebx, eax ; ebx = E1 - E2
mov eax, ebx ; result in eax
Note: Division requires extra care — idiv uses edx:eax as dividend:
; generate code for E1 → eax
push eax
; generate code for E2 → eax (divisor)
mov ecx, eax
pop eax ; restore dividend
cdq ; sign extend eax into edx
idiv ecx ; eax = eax / ecx; edx = remainder
L-value vs R-value
Most expressions produce an r-value (a value). Assignments need the l-value (an address) of the left-hand side.
| Expression | L-value (address) | R-value (value) |
|---|---|---|
Local var v | ebp + offset(v) | [ebp + offset(v)] |
Static field C.f | address of label | [label] |
Instance field e.f | e + offset(f) | [e + offset(f)] |
Array element a[i] | a + header + i*4 | [a + header + i*4] |
Assignment E1 = E2:
; compute l-value of E1 → eax
push eax ; save address
; compute r-value of E2 → eax
pop ebx ; restore address
mov [ebx], eax ; store value at address
; eax still holds the value (assignment is an expression in Java)
If-Else and Loops
If-else:
; condition code → eax
cmp eax, 0
je else_42
; then-branch code
jmp end_42
else_42:
; else-branch code
end_42:
While loop:
loop_42:
; condition code → eax
cmp eax, 0
je end_42
; body code
jmp loop_42
end_42:
Short-circuit evaluation (&&): If the left side is false, skip evaluating the right side.
; E1 code → eax
cmp eax, 0
je false_42 ; skip E2 if E1 is false
; E2 code → eax
false_42:
Object Layout and Virtual Dispatch
Memory Layout of Objects
Each object in memory has a fixed layout that allows treating subclass instances as superclass instances:
Object of class B (inherits from A):
┌───────────────────┐
│ vtable pointer │ ← [obj + 0] always first
├───────────────────┤
│ field a (from A) │ ← [obj + 4]
├───────────────────┤
│ field b (from B) │ ← [obj + 8]
└───────────────────┘
Type-compatible layout: If B extends A, then the layout of B is an extension of A’s layout — A’s fields appear at the same offsets in B. This means a pointer to a B object can be safely treated as a pointer to an A object.
A layout: B layout (B extends A):
vtable ptr vtable ptr ← same offset
field_a field_a ← same offset
field_b ← appended
Virtual Method Tables (Vtables)
The problem: Given a reference of static type Vehicle, calling v.numTires() must dispatch to the correct implementation depending on the dynamic type (Car, Truck, Bicycle…).
Solution: Each object’s first word points to a vtable (virtual method table) — a class-specific array of function pointers.
Object: Vtable for Car:
┌─────────────┐ ┌──────────────────────────┐
│ vtable ptr │──────▶│ METHOD$Car$numTires │ ← offset 0
├─────────────┤ ├──────────────────────────┤
│ field ... │ │ METHOD$Vehicle$foo │ ← offset 4 (inherited)
└─────────────┘ ├──────────────────────────┤
│ METHOD$Car$bar │ ← offset 8 (new)
└──────────────────────────┘
Critical invariant: Inherited methods appear at the same offset in subclass vtables as in superclass vtables. This allows:
; v.numTires() where v has static type Vehicle
mov eax, [v] ; load vtable pointer
mov eax, [eax + 0] ; load numTires entry (offset 0 for Vehicle)
call eax ; dispatch — calls Car's or Vehicle's version
; depending on dynamic type of v
Vtable structure:
VTABLE$Vehicle:
dd METHOD$Vehicle$numTires ; offset 0
dd METHOD$Vehicle$foo ; offset 4
VTABLE$Car:
dd METHOD$Car$numTires ; offset 0 (overridden)
dd METHOD$Vehicle$foo ; offset 4 (inherited — same pointer!)
dd METHOD$Car$bar ; offset 8 (new method)
Dispatching Interface Methods
Interface dispatch is harder because a class can implement multiple interfaces, and each interface expects its methods at offset 0 of its own “view” of the vtable:
interface A { void ma(); } // ma at offset 0 of A-vtable
interface B { void mb(); } // mb at offset 0 of B-vtable
class C implements A, B { } // C must satisfy both — conflict!
Solution 1: Selector-Indexed Table (O(1) dispatch)
Assign a global selector number to each unique method signature. Create a 2D table: rows index classes, columns index selectors. Each class has a pointer to its column. Dispatching:
mov eax, [obj] ; vtable pointer (actually selector table column)
mov eax, [eax + 4*sel] ; load function pointer for selector sel
call eax
Space: O(S × C) for S distinct selectors and C classes. In practice, most cells are empty, so compression is needed. This is the itab approach (similar to C++ virtual function tables for multiple inheritance).
Solution 2: Hash Table Dispatch
Build a per-class hash table mapping selector → method pointer. Dispatch computes a hash at the call site.
- Space: O(Σ methods per class)
- Time: O(1) expected but with hash overhead
- Can use inline caching (remember last dispatch result at each call site) to make repeated calls fast
Subtype Testing at Runtime
Several operations require a runtime subtype check:
instanceof T: Does this object belong to T or a subtype?- Casts
(T) obj: ThrowClassCastExceptionif not a subtype - Array stores
a[i] = obj: ThrowArrayStoreExceptionif dynamic array type is incompatible
Interval Method (Single Inheritance)
For classes only (single inheritance): assign each class a DFS-traversal interval \( [\text{start}_C, \text{end}_C] \) such that \( B \leq A \Leftrightarrow \text{interval}(B) \subseteq \text{interval}(A) \).
DFS of hierarchy: Intervals:
Object [1, 8] Object: [1, 8]
/ \ Vehicle: [2, 5]
Vehicle[2,5] Number[6,8] Car: [3, 3]
/ \ Truck: [4, 4]
Car Truck Number: [6, 8]
[3,3] [4,4] ...
Test: x instanceof Vehicle ⟺ 2 ≤ start(class(x)) ≤ end(class(x)) ≤ 5
In assembly:
mov ecx, [obj] ; vtable pointer
mov ecx, [ecx + CLASS_TAG] ; class tag (index)
cmp ecx, VEHICLE_START
jl fail
cmp ecx, VEHICLE_END
jg fail
; success: obj is-a Vehicle
Interface Subtype Testing
For interfaces, single-inheritance interval doesn’t work (a class can implement many interfaces). Use a subtype matrix: a 2D boolean table with entry [C, T] = 1 iff C ≤ T.
Space: O(T²) where T is the number of types. Practical compression: bitset encoding or hash tables.
Arrays
Array object layout:
┌───────────────────┐
│ vtable pointer │ ← [arr + 0]
├───────────────────┤
│ length │ ← [arr + 4]
├───────────────────┤
│ element[0] │ ← [arr + 8]
├───────────────────┤
│ element[1] │ ← [arr + 12]
├───────────────────┤
│ ... │
└───────────────────┘
Arrays are heap-allocated objects with a vtable (for instanceof, length, and method dispatch inherited from java.lang.Object).
Array access a[i]:
; a → eax, i → ecx
; null check
test eax, eax
je null_exception
; bounds check
cmp ecx, [eax + 4] ; compare i with length
jae bounds_exception ; unsigned comparison (catches negative too)
; compute address
lea eax, [eax + 8 + ecx*4] ; eax = &a[i]
Array store needs an extra type check (to catch ArrayStoreException):
; check dynamic type of array element type
mov ebx, [arr] ; vtable of array
mov ebx, [ebx + ELEMENT_TYPE] ; element type
; check if value is assignment-compatible with element type
; (use subtype table)
Instance Creation
new A(args):
; 1. Allocate memory
mov eax, SIZE_OF_A
call __malloc ; returns pointer in eax
; 2. Initialize vtable pointer
mov [eax], VTABLE$A
; 3. Zero out fields (Java semantics require default initialization)
; 4. Push 'this' (the new object pointer)
push eax
; 5. Push arguments
; push arg_n
; ...
; push arg_1
; 6. Call constructor
call CONSTRUCTOR$A
; 7. Clean up args
add esp, 4*(1 + n_args)
; 8. Return value: object pointer already in eax
Constructor execution order (JLS §12.5):
- Superclass constructor is called first (implicit
super()if not explicit) - Instance field initializers execute in declaration order
- Constructor body executes
Memory Management
Memory Organization
A process’s virtual address space is divided into regions:
0xFFFFFFFF ──────────────────
│ kernel │
0xC0000000 ──────────────────
│ │
│ stack │ (grows down ↓)
│ │
│ ... │
│ │
│ heap │ (grows up ↑)
│ │
0x08048000 ──────────────────
│ text (code) │
│ data (statics) │
0x00000000 ──────────────────
The heap is where new allocates objects. The OS provides raw memory pages via mmap; the runtime’s allocator (malloc) subdivides these pages for individual allocations.
Manual Memory Management
In C, the programmer explicitly manages heap memory:
malloc(n): Allocate n bytes, return pointerfree(p): Return allocation at p to the pool
The allocator maintains free lists — linked lists of free memory blocks. When an allocation is requested:
- Find a free block of sufficient size
- Split it if much larger than needed
- Return the pointer
When free is called:
- The block is added back to the free list
- Adjacent free blocks may be coalesced to prevent fragmentation
Fragmentation: Over time, the heap may contain many small free blocks that cannot satisfy large allocations even though total free space is sufficient.
Automatic Memory Management (Garbage Collection)
Garbage collection automatically reclaims memory that is no longer reachable from the program. “Reachable” approximates “might be used again.”
Roots: The set of directly reachable memory locations — stack variables, static fields, CPU registers. Everything transitively reachable from roots is live; everything else is garbage.
The compiler must provide the GC with type information about object layouts (which fields are references) so the GC can traverse the object graph.
Mark-and-Sweep
The classic two-phase algorithm:
Phase 1 — Mark: DFS from all roots, mark every reachable object.
mark(object o):
if o is already marked: return
mark o
for each reference field f in o:
mark(o.f)
Phase 2 — Sweep: Scan the entire heap; free every unmarked object.
for each object o in heap:
if o is not marked: free(o)
else: unmark(o) // prepare for next GC
Cost: O(H) where H = heap size (sweep visits entire heap regardless of live data).
Disadvantage: Fragmentation (objects remain in place; free blocks are scattered).
Semispace Copying Collection
Split the heap into two equal halves: fromspace and tospace.
- Allocate in fromspace using bump pointer (just increment a pointer — no free list!)
- When fromspace is full: copy all reachable objects to tospace, updating all pointers
- Swap fromspace and tospace roles
copy(object o):
if o already has forwarding pointer:
return forwarding_pointer(o) // already copied
allocate space in tospace for copy
copy bits of o into tospace
install forwarding pointer in old location
for each reference field f:
copy.f ← copy(o.f)
return copy
Advantages:
- O(L) cost where L = live data (proportional to what’s copied, not heap size)
- Bump-pointer allocation is the fastest possible allocation
- Automatically compacts heap (no fragmentation)
- Single traversal (no separate sweep)
Disadvantages:
- Uses only half of heap at any time
- Must update every pointer in the live set
Mark-and-Compact
Retains objects in place (no pointer duplication) but rearranges them to eliminate fragmentation:
- Mark reachable objects (like mark-and-sweep)
- Compute new addresses for all live objects (compacted order)
- Update all references to use new addresses
- Move objects to their new addresses
Cost: O(H) with a high constant factor (3+ passes over the heap).
Generational Garbage Collection
Generational hypothesis: Most objects die young — a freshly allocated object has a much higher probability of becoming garbage quickly than one that has survived several GC cycles.
Structure:
- Young generation (nursery): Small, frequently collected
- Old generation (tenured space): Large, rarely collected
Young GC (minor collection):
- Copy-collect the nursery: survivors are promoted to old generation
- Very fast: most young objects are dead, so almost nothing is copied
Old GC (major/full collection):
- Mark-and-sweep or mark-and-compact on old generation
- Rare, but more expensive
Write barrier: When an old object’s reference field is written with a young object’s pointer, record this in a remembered set so the young GC can find all roots (not just stack/statics).
Comparison
| GC Algorithm | Allocation | GC Cost | Fragmentation | Pros | Cons |
|---|---|---|---|---|---|
| Mark-Sweep | Free list | O(H) | Yes | Simple | Fragmentation |
| Semispace Copying | Bump pointer | O(L) | No | Fast alloc, no frag | ½ heap wasted |
| Mark-Compact | Free list | O(H) | No | Full heap, no frag | High constant |
| Generational | Bump pointer | O(L) mostly | Varies | Fast in practice | Complex |
Optimization
Peephole Optimization
Peephole optimization applies local rewrite rules to short sequences of instructions, replacing inefficient patterns with better equivalents. It is applied as a final pass over generated assembly.
Example patterns:
; Pattern: redundant load after store
mov [ebp-4], eax
mov eax, [ebp-4]
; Replace with:
mov [ebp-4], eax ; (eax already has the value)
; Pattern: complex addressing folding
mov eax, ebp
add eax, -8
mov eax, [eax]
; Replace with:
mov eax, [ebp - 8]
; Pattern: jump to jump
jmp label1
label1: jmp label2
; Replace with:
jmp label2
Instruction Selection by Tree Tiling
A more principled approach to code generation: tree tiling maps AST subtrees to single machine instructions that implement multiple operations at once.
Each instruction is a “tile” — a tree pattern with a cost (latency or code size). The goal is to cover the AST with non-overlapping tiles of minimum total cost.
Example tiles for load:
Tile 1: [eax] → leaf cost: 1
Implements: mov eax, [eax]
Tile 2: [eax + imm] → leaf cost: 1
Implements: mov eax, [eax + const]
Tile 3: [eax + ebx*4] → leaf cost: 1
Implements: mov eax, [eax + ebx*4]
Optimal tiling algorithm (dynamic programming, bottom-up):
for each node n of AST in bottom-up order:
for each tile t that could root at n:
cost(t, n) = tile_base_cost(t)
+ Σ min_cost(child_i) for each subtree hole in t
select tile with minimum cost
record tile and total cost for node n
This is an O(n) algorithm (each node visited once, constant tiles per node).
Register Allocation
After instruction selection, code may reference many “virtual registers” (temporaries). Real machines have only 8 registers. Register allocation assigns virtual registers to physical registers, spilling to memory when needed.
Key insight: Two virtual registers can share a physical register if they are never simultaneously live.
Live range: The span of instructions between a variable’s definition and last use.
Interference graph — nodes are variables, edges connect simultaneously-live variables; graph coloring assigns physical registers.
Register allocation as graph coloring:
- Build interference graph: nodes = virtual registers; edges between simultaneously-live registers
- Color the graph with k colors (k = number of physical registers)
- Each color represents a physical register
- If the graph is not k-colorable, spill some registers to memory (add load/store instructions) and retry
Spilling heuristic: Prefer to spill registers that are used least frequently or have the longest live range.
Minimum registers needed (Sethi-Ullman algorithm for expressions):
\[ r[\text{node}] = \begin{cases} r_1 + 1 & \text{if } r_1 = r_2 \\ r_1 & \text{otherwise} \end{cases} \]Evaluate the child needing more registers first to minimize peak register usage.
Example:
+
/ \
* b (b needs 1 register)
/ \
a c
If * needs 2 registers and b needs 1, evaluate * first (frees a register after), then b. If evaluated in the wrong order, we’d need 3 registers instead of 2.
Instruction Scheduling
Even with optimal instruction selection and register allocation, the order of instructions matters for modern pipelined CPUs. Instructions have variable latencies, and the CPU may stall while waiting for a result to be ready.
Key idea: Reorder instructions (without changing program semantics) so that independent instructions fill the gaps between dependent ones.
Constraints: The dependence graph captures which instruction orderings are semantically equivalent:
- True dependence (RAW): Instruction A writes a register that instruction B reads — B must come after A.
- Anti-dependence (WAR): A reads a register that B writes — B must come after A.
- Output dependence (WAW): Both A and B write the same register — B must come after A.
List scheduling (a greedy heuristic):
1. Build dependence graph
2. Compute priorities (e.g., critical path length through the graph)
3. Maintain a "ready list" of instructions with all predecessors done
4. At each step: pick highest-priority ready instruction and emit it
Example — moving a load earlier to hide memory latency:
; Before scheduling (stalls if mov takes 4 cycles):
mov eax, [ebx] ; load from memory (slow)
mov ecx, eax ; stall: waits for load
add ecx, 1
; After scheduling (fill the latency gap):
mov eax, [ebx] ; start load
mov edx, 5 ; independent: no stall
add edx, 2 ; independent: no stall
mov ecx, eax ; now load is done, no stall
add ecx, 1
Constant Folding and Constant Propagation
Constant folding evaluates constant expressions at compile time rather than at runtime. This requires no static analysis — just pattern-matching on the AST.
3 + 4 → 7
2 * 10 → 20
true && false → false
Constant propagation uses reaching definitions to determine that a variable has a known constant value at a use site, then replaces the use with the constant (enabling further folding).
int x = 5;
int y = x + 3; // reaching def of x is "x=5"; propagate → y = 5+3 → y = 8
int z = y * 2; // y=8; propagate → z = 8*2 → z = 16
The analysis uses the reaching-definitions lattice instantiated with constant values (the constant propagation lattice: ⊥ = “undefined”, a specific constant, ⊤ = “non-constant”). Transfer functions check if a variable’s only reaching definition is a constant.
Strength Reduction
Strength reduction replaces expensive operations with cheaper equivalent ones. Classic examples:
| Original | Replacement | Rationale |
|---|---|---|
x * 2 | x + x or x << 1 | Shift/add faster than multiply |
x * 4 | x << 2 | Power-of-2 multiply → shift |
x / 4 | x >> 2 | Power-of-2 divide → shift (unsigned) |
x % 8 | x & 7 | Power-of-2 modulo → bitwise AND |
In loop contexts, strength reduction is especially powerful. If i*4 is computed each iteration, it can be replaced with an accumulator that increments by 4 each step, eliminating the multiply entirely.
Prof’s example from lecture: Multiplying by a constant like 2 can be replaced by add eax, eax — addition is faster than imul on many pipelines, and the operation is semantically identical for integers.
Common Subexpression Elimination (CSE)
Common Subexpression Elimination uses available expressions analysis: if an expression is available at a use site (it has been computed on every path and its operands haven’t been reassigned), replace it with a saved temporary.
// Before CSE:
a[i] = a[i] + 1;
// Computes a[i]'s address twice
// After CSE:
temp = a[i]; // compute a[i] once
a[i] = temp + 1; // reuse
More generally:
x = b + c;
// ... (b and c not modified) ...
y = b + c; // b+c is available → replace with x
y = x;
The available-expressions analysis determines exactly when this substitution is safe.
Putting It All Together: The Joos Compiler Pipeline
Compiler pipeline — stages from source to binary.
Here is the complete data flow from source code to executable:
Source (.java)
│
▼ Scanning (DFA + maximal munch)
Token stream
│
▼ Parsing (LALR(1))
Concrete syntax tree
│
▼ AST construction
Abstract syntax tree
│
▼ Weeding
Verified AST (syntactic constraints)
│
▼ Pass 1: Build global environment
Global class table
│
▼ Pass 2: Resolve type names
Types resolved
│
▼ Pass 3: Check class hierarchy
Hierarchy verified
│
▼ Pass 4: Disambiguate names
Names classified
│
▼ Pass 5: Resolve variables/fields
Declarations linked to uses
│
▼ Pass 6: Type checking
Types assigned to all expressions
│
▼ Pass 7: Resolve methods/instance fields
Method invocations resolved
│
▼ Reachability analysis (JLS §14.20)
Unreachable code detected
│
▼ Definite assignment (JLS §16)
Uninitialized reads detected
│
▼ Code generation (IA-32 assembly)
│ • Data layout
│ • Method calls / vtables
│ • Object allocation
│ • Array bounds checks
│ • Subtype testing
▼
NASM assembly (.s)
│
▼ NASM assembler
Object file (.o)
│
▼ ld linker (with Joos runtime)
Executable (ELF)
Each assignment in the course (A1–A5) implements one or two of these stages, culminating in a complete compiler for Joos 1W.