CS 444: Compiler Construction

Ondřej Lhoták

Estimated study time: 1 hr 23 min

Table of contents

Compiler Architecture Overview

A compiler translates source programs written in a high-level source language (e.g., Java) into a target language (e.g., x86 machine code). Understanding the full pipeline — from raw character streams to executable instructions — is the central goal of this course.

Compilers are traditionally divided into two halves. The front end analyzes the source program to verify correctness and extract structure. It comprises lexical analysis (scanning), syntax analysis (parsing), and context-sensitive semantic analysis. The back end synthesizes the target program from the structured representation; it handles optimization and code generation.

Source Program
      │
      ▼
┌───────────────────┐
│  Scanning         │  characters → tokens
└─────────┬─────────┘
          ▼
┌───────────────────┐
│  Parsing          │  tokens → parse tree / AST
└─────────┬─────────┘
          ▼
┌───────────────────┐
│  Semantic Analysis│  type checking, name resolution
│  (A2–A4)          │
└─────────┬─────────┘
          ▼
┌───────────────────┐
│  Code Generation  │  AST → assembly
│  (A5)             │
└─────────┬─────────┘
          ▼
  Machine Code

In CS 444, the source language is Joos 1W — a carefully defined subset of Java 1.3 with most of the interesting features retained (classes, interfaces, inheritance, arrays, exceptions) but certain complexities removed (no inner classes, no generics, limited overloading). The target is IA-32 (i386) assembly, executable on Linux.

Formal Language Foundations

Before diving into scanning, we need a framework for precisely describing the sets of strings a language accepts. Three concepts form the foundation:

  • Alphabet (Σ): A finite, non-empty set of symbols (e.g., ASCII characters, tokens).
  • Word: A finite sequence of symbols drawn from an alphabet. The empty word is written ε.
  • Language: A (precisely defined) set of words over an alphabet. Note that a language is just a set — it can be finite, infinite, or even empty.

The key insight is that we can describe languages with different levels of expressive power: regular languages (for scanning), context-free languages (for parsing), and less restricted formalisms (for semantics).


Lexical Analysis (Scanning)

Lexical analysis, or scanning, transforms a raw stream of characters into a sequence of tokens. A token is a classified lexeme — for example, the characters while become a keyword token WHILE, and 123 becomes an integer literal token INT_LIT(123).

Scanning uses regular languages, which are expressive enough to describe all valid token patterns but simple enough that recognition is linear-time and implementable with finite automata.

Regular Expressions

A regular expression over alphabet Σ is defined inductively, and each expression e denotes a language L(e):

ExpressionLanguage L(e)
\( a \in \Sigma \)\( \{a\} \)
\( \varepsilon \)\( \{\varepsilon\} \)
\( \emptyset \)\( \{\} \)
\( e_1 \, e_2 \)\( \{ xy \mid x \in L(e_1),\ y \in L(e_2) \} \)
\( e_1 \mid e_2 \)\( L(e_1) \cup L(e_2) \)
\( e^* \)\( \{\varepsilon\} \cup L(e) \cup L(e)L(e) \cup \cdots \)

Common shorthands: \( e^+ = e \, e^* \), \( e? = e \mid \varepsilon \), \( [a\text{-}z] = a \mid b \mid \cdots \mid z \).

Example: Integer literals (non-empty sequences of digits): [0-9]+ Example: Java identifiers: [a-zA-Z_][a-zA-Z0-9_]* Example: Block comments: /\*([^*]|\*[^/])*\*/

Finite Automata

Regular languages are recognized by finite automata. There are two variants; they have equal expressive power.

NFA (Non-deterministic Finite Automaton)

An NFA is a 5-tuple \( \langle \Sigma, Q, q_0, A, \delta \rangle \) where:

  • \( \Sigma \): input alphabet
  • \( Q \): finite set of states
  • \( q_0 \in Q \): start state
  • \( A \subseteq Q \): accepting states
  • \( \delta : Q \times (\Sigma \cup \{\varepsilon\}) \to 2^Q \): transition function (may return multiple states, and allows ε-transitions)

An NFA accepts a string if at least one sequence of transitions leads to an accepting state.

DFA (Deterministic Finite Automaton)

A DFA is the same 5-tuple but with \( \delta : Q \times \Sigma \to Q \) — the transition function is total and returns exactly one state for every (state, character) pair.

Recognition algorithm (DFA):

input: string w, DFA M = (Σ, Q, q₀, δ, A)
output: true iff w ∈ L(M)

q ← q₀
for each character c in w:
    q ← δ(q, c)
return q ∈ A

DFAs enable O(n) scanning with no backtracking. NFAs are easier to build from regexes; we convert them to DFAs for execution.

*NFA for regex (a|b)abb — 5-state NFA with ε-transitions (Thompson construction).

q₀q₁q₂q₃q₄q₅εεabε (Kleene *)εb→ then a →aε-transitiona-transitionb-transition

NFA-to-DFA Conversion (Subset Construction)

The key idea: simulate all possible NFA paths simultaneously by tracking sets of NFA states.

ε-closure: The set of states reachable from a given set via zero or more ε-transitions alone.

function ε_closure(S):
    S' ← S
    worklist ← copy of S
    while worklist is not empty:
        remove q from worklist
        for each q' in δ(q, ε):
            if q' ∉ S':
                add q' to S' and worklist
    return S'

Subset construction:

  • DFA start state: \( q'_0 = \varepsilon\text{-closure}(\{q_0\})\)
  • DFA transition: \( \delta'(q', a) = \varepsilon\text{-closure}\!\left(\bigcup_{q \in q'} \delta(q, a)\right) \)
  • DFA accepting states: \( A' = \{q' \in Q' \mid q' \cap A \neq \emptyset\} \)
  • Enumerate all reachable DFA states by applying δ’ until no new states appear.

Key property: In the worst case, an NFA with n states may yield a DFA with \( 2^n \) states (exponential blowup). In practice, this is rarely a problem for realistic programming language tokens.

Scanning with Maximal Munch

Real scanning must handle multiple token kinds (keywords, identifiers, literals, operators) and must decide where each token ends. The standard strategy is maximal munch: always consume the longest possible prefix that forms a valid token.

Abstract maximal munch:

while input is not empty:
    find the maximal prefix of remaining input that is a valid token
    if no non-empty valid prefix exists: ERROR
    emit token; consume prefix

Concrete implementation with DFA backtracking:

while input is not empty:
    run DFA from start state, advancing through characters
    after each step, if in an accepting state, record (state, position) as a "candidate"
    when DFA gets stuck (no transition for current character):
        if no candidate was recorded: ERROR
        backtrack input to candidate position
        emit token identified by candidate state
        reset DFA to start state

Java scanning note (JLS 3.2): Because of maximal munch, the expression a--b is scanned as a, --, b — not as a, -, -, b. The -- decrement operator is greedily consumed before the second - can be seen. This is why a--b is a compile error (you can’t apply -- to the value a and then subtract b).

Building a Scanning DFA from Multiple Token Rules

A scanner recognizes many token kinds simultaneously. The construction combines them all:

input: regular expressions R₁, R₂, ..., Rₙ (in priority order, highest first)
output: DFA whose accepting states are labeled with token kinds

1. Build NFA N for R₁ | R₂ | ... | Rₙ
   (each Rᵢ has its own accepting state labeled with token kind i)
2. Convert N to DFA D using subset construction
3. For each DFA accepting state (which is a set of NFA states):
   label it with the highest-priority token kind among the NFA accepting
   states it contains

Priority example: if matches both the IF keyword pattern and the ID identifier pattern. Since keyword patterns have higher priority (they come first in the specification), the DFA accepting state for if is labeled IF, not ID. This is how scanners uniformly handle reserved keywords.

    Regex: if      → NFA states {acc_IF}
    Regex: [a-z]+  → NFA states {acc_ID}

    DFA state for "if" contains both NFA states.
    Priority: IF > ID → label as IF.

Parsing

Scanning gives us a token sequence; parsing checks whether that sequence is a valid program and builds a structural representation of it. Parsing uses context-free grammars (CFGs), which are strictly more expressive than regular languages.

Context-Free Grammars

A CFG is a 4-tuple \( G = (N, T, R, S) \) where:

  • \( T \): terminals (tokens, written lowercase: a, b, if, +)
  • \( N \): non-terminals (syntactic categories, uppercase: \( S \), \( E \), \( D \))
  • \( R \subseteq N \times V^* \): production rules, written \( A \to \alpha \)
  • \( S \in N \): start symbol
  • \( V = N \cup T \): all grammar symbols

Derivation: \( \beta A \gamma \Rightarrow \beta \alpha \gamma \) if \( A \to \alpha \in R \) (replace non-terminal with body). \( \alpha \Rightarrow^* \beta \): zero or more derivation steps.

Language of a grammar: \( L(G) = \{x \in T^* \mid S \Rightarrow^* x\} \)

Sentential form: Any string \( \alpha \in V^* \) such that \( S \Rightarrow^* \alpha \).

Parse tree: A tree with root labelled \( S \), internal nodes labelled with non-terminals, leaves with terminals or ε, such that each internal node’s children spell out a production rule.

A grammar is ambiguous if some sentence has two distinct parse trees. Ambiguity must be eliminated before using the grammar for parsing (or the parser must handle it explicitly).

A Simple Expression Grammar

E → E + T
E → T
T → T * F
T → F
F → ( E )
F → id

This grammar encodes operator precedence and left-associativity through its structure: multiplication binds tighter than addition because * appears lower in the hierarchy (closer to the leaf id), not through any special mechanism.

Left Recursion and Left Factoring

Left-recursive grammars (e.g., E → E + T) are problematic for top-down parsers because they cause infinite recursion. Left recursion can be eliminated:

Original: \( A \to A\alpha \mid \beta \) Equivalent: \( A \to \beta A' \), \( A' \to \alpha A' \mid \varepsilon \)

Left factoring handles ambiguous choice for top-down parsers when two alternatives share a common prefix:

Original: \( A \to \alpha \beta \mid \alpha \gamma \) Factored: \( A \to \alpha A' \), \( A' \to \beta \mid \gamma \)

Top-Down Parsing

Top-down parsing builds the parse tree from root to leaves, predicting which production to apply at each step.

Algorithm (naive):

α ← S
while α ≠ input:
    let A be the leftmost non-terminal in α
    choose some production A → β ∈ R
    replace A with β in α

This produces a left derivation — leftmost non-terminal is always expanded first. The challenge is choosing the right production without trying all possibilities.

LL(1) Parsing

An LL(1) parser is a top-down parser that makes decisions using exactly one symbol of lookahead. The name stands for:

  • L: scan input Left-to-right
  • L: construct a Leftmost derivation
  • 1: 1 token of lookahead

First, Nullable, and Follow Sets

Three sets are needed to decide which production to apply:

Nullable(β): True iff \( \beta \Rightarrow^* \varepsilon \). A sequence is nullable iff every symbol in it is nullable (terminals are never nullable, non-terminals may be).

\[ \text{first}(\beta) = \{a \in T \mid \exists \gamma : \beta \Rightarrow^* a\gamma\} \]\[ \text{follow}(A) = \{a \in T \mid \exists \gamma, \delta : S\$ \Rightarrow^* \gamma A a \delta\} \]

Computing Follow: For each production \( B \to \alpha A \gamma \):

  1. Add \( \text{first}(\gamma) \) to \( \text{follow}(A) \)
  2. If \( \text{nullable}(\gamma) \), add \( \text{follow}(B) \) to \( \text{follow}(A) \)

These rules are applied iteratively until no more changes occur (fixpoint).

Example: For the grammar \( S \to aB,\ B \to b \mid \varepsilon \):

  • Nullable(B) = true
  • First(S) = {a}, First(B) = {b}
  • Follow(B) = Follow(S) = {$}

Predict Sets

The predict set tells the parser which production to use for non-terminal A when seeing lookahead terminal a:

\[ \text{predict}(A, a) = \{A \to \beta \mid a \in \text{first}(\beta)\} \cup \{A \to \beta \mid \text{nullable}(\beta) \text{ and } a \in \text{follow}(A)\} \]

A grammar is LL(1) iff \( |\text{predict}(A, a)| \leq 1 \) for all \( A \in N, a \in T \). This means no two productions for the same non-terminal share a predict set entry — the lookahead always uniquely determines which production to use.

LL(1) Parse Algorithm

The stack-based LL(1) algorithm runs in O(n) time:

Augment grammar: add fresh start S' → S$, append $ to input

Push S$ onto stack
for each terminal a in input$:
    while top of stack is a non-terminal A:
        pop A
        look up predict(A, a):
            if empty: ERROR (unexpected token)
            if unique production A → β: push β (right-to-left)
    pop terminal b from stack
    if b ≠ a: ERROR (mismatch)
accept

Example: Grammar \( S \to aB,\ B \to b \mid \varepsilon \), input a:

Stack     Input    Action
S$        a$       predict(S, a) = {S → aB}, push aB
aB$       a$       pop a, match a
B$        $        predict(B, $) = {B → ε}, push ε (pop B)
$         $        accept

LL(1) Parse Table

The predict sets define a parse table with rows for non-terminals and columns for terminals. Entry [A, a] contains the production to use:

     a    b    $
S    aB
B    b    b    ε

A conflict (two productions in the same cell) means the grammar is not LL(1).

LR Parsing (Bottom-Up)

Bottom-up parsing works in reverse: start with the input and repeatedly apply productions in reverse (reductions) until reaching the start symbol. This corresponds to constructing the parse tree from leaves to root.

Example derivation (reversed):

a + a$  ←  E + a$  ←  E + E$  ←  E$  ←  S
            ↑              ↑
          E→a            E→a

The parser maintains a stack representing the “processed” prefix of input. At each step it either shifts (push next input token) or reduces (pop the body of a production and push its head).

Viable Prefixes

A string α is a viable prefix if it can appear on the parsing stack without yet having detected an error. Formally: \( \alpha \) is a viable prefix iff \( \exists\beta : S \Rightarrow^* \alpha\beta \) where \( \alpha \) is a prefix of some right-sentential form.

Key invariant: The content of the LR parser stack is always a viable prefix.

LR(0) Parser

The LR(0) parser uses only the stack content (no lookahead) to decide whether to reduce.

LR(0) items: Dotted productions tracking how far along we are in recognizing a rule: \( A \to \alpha \cdot \beta \) means we have recognized \( \alpha \) and expect \( \beta \) next.

When the dot is at the right end \( (A \to \alpha \cdot) \), we can reduce by \( A \to \alpha \).

LR(0) NFA construction:

  • States: All items \( A \to \alpha \cdot \beta \)
  • Start: \( S' \to \cdot S\$ \) (augmented grammar)
  • Transitions:
    • On symbol \( X \): \( (A \to \alpha \cdot X\beta) \xrightarrow{X} (A \to \alpha X \cdot \beta) \)
    • ε-transitions: \( (A \to \alpha \cdot B\beta) \xrightarrow{\varepsilon} (B \to \cdot \gamma) \) for all rules \( B \to \gamma \)
  • All states are “accepting” (stack always a valid VP)

Convert the NFA to a DFA via subset construction (as before). Each DFA state is a set of LR(0) items (an “item set” or “configuration set”).

LR(0) parse decision: In each DFA state:

  • If the state contains \( A \to \alpha \cdot \) and nothing else: reduce by \( A \to \alpha \)
  • If the state contains only shift items \( A \to \alpha \cdot X\beta \): shift X
  • Reduce-reduce conflict: Two reduce items in the same state → grammar is not LR(0)
  • Shift-reduce conflict: A shift and reduce item in the same state → grammar is not LR(0)

LR(0) algorithm:

for each token a in input$:
    while Reduce(stack) = {A → γ}:
        pop |γ| symbols from stack
        push A
    if Reject(stack + a): ERROR
    push a onto stack
accept when $ is consumed

Visualizing LR(0) DFA

For the grammar \( S' \to S\$,\ S \to a \mid E = E,\ E \to a \):

State 0: {S'→·S$, S→·a, S→·E=E, E→·a}
    ──a──▶ State 1: {S→a·, E→a·}      (shift-reduce conflict on 'a'!)
    ──E──▶ State 2: {S→E·=E}
    ──S──▶ State 3: {S'→S·$}

State 1 has both \( S \to a \cdot \) and \( E \to a \cdot \) — a reduce-reduce conflict. This grammar is not LR(0) but is LR(1) (the lookahead resolves the conflict).

SLR(1) Parser

SLR(1) adds a simple lookahead to LR(0): reduce by \( A \to \alpha \) only when the current lookahead is in \( \text{follow}(A) \).

Reduce(stack, a) = {A → γ | stack ends with viable prefix βγ,
                            βA is a viable prefix,
                            a ∈ follow(A)}

SLR(1) conflicts:

  • Shift-reduce: State has \( A \to \alpha \cdot \) and \( B \to \beta \cdot X\gamma \), and \( X \in \text{follow}(A) \)
  • Reduce-reduce: State has \( A \to \alpha \cdot \) and \( B \to \beta \cdot \), and \( \text{follow}(A) \cap \text{follow}(B) \neq \emptyset \)

SLR(1) uses the same DFA as LR(0) but resolves more conflicts by consulting follow sets.

LR(1) Parser

LR(1) items carry an explicit lookahead: \( (A \to \alpha \cdot \beta, a) \) means “after shifting α, if we eventually reduce by this rule, the next input must be a.”

LR(1) NFA:

  • States: All items \( (A \to \alpha \cdot \beta,\ a) \)
  • Start state: \( (S' \to \cdot S\$,\ \$) \)
  • Shift transition: \( (A \to \alpha \cdot X\beta,\ a) \xrightarrow{X} (A \to \alpha X \cdot \beta,\ a) \)
  • ε-closure: \( (A \to \alpha \cdot B\beta,\ a) \xrightarrow{\varepsilon} (B \to \cdot \gamma,\ b) \) for all \( b \in \text{first}(\beta a) \)

LR(1) reduce: Reduce \( A \to \alpha \) when item \( (A \to \alpha \cdot,\ a) \) is in the current state and current lookahead is \( a \).

LR(1) > SLR(1) > LR(0):

ParserReduces when…
LR(0)βA is a viable prefix
SLR(1)βA is a VP and a ∈ follow(A)
LR(1)βAa is a viable prefix

LR(1) is strictly more powerful but its DFA can have exponentially more states than SLR(1)/LR(0).

LALR(1) Parser

LALR(1) (Lookahead LR) is the practical compromise: use the LR(0) DFA (fewer states) but attach lookahead sets derived from the LR(1) DFA.

Core of a state: Strip lookaheads from LR(1) items → \( \text{core}(q) = \{A \to \alpha \cdot \beta \mid (A \to \alpha \cdot \beta, a) \in q\} \).

Multiple LR(1) states can share the same core but differ in lookaheads. LALR(1) merges such states and takes the union of their lookaheads.

Algorithm:

for each LR(0) state q:
    for each item A → α·β in q:
        lookaheads(A→α·β, q) ← {a | ∃ LR(1) state q' : core(q')=core(q)
                                      and (A→α·β, a) ∈ q'}

LALR(1) conflicts are rarer than SLR(1) because the lookaheads are more precise (per-state, not global). Most real programming language grammars (including Joos) are LALR(1), which is why yacc/bison use LALR(1).

Hierarchy (from weakest to strongest):

LR(0) ⊂ SLR(1) ⊂ LALR(1) ⊂ LR(1) ⊂ Context-Free

A grammar accepted by a weaker parser is also accepted by all stronger ones. Most practical grammars are LALR(1).

Generating the LR(1) Parse Table

The parse table has:

  • Rows: DFA states
  • Columns: terminals (for ACTION) and non-terminals (for GOTO)
  • Entries:
    • shift n: shift token and go to state n
    • reduce A→α: reduce by this production
    • goto n: after reduce, go to state n
    • accept: success

Conflict example (grammar not LALR(1)):

S' → S$
S  → a
S  → E = E
E  → a

For input a$, state after seeing a contains both S→a· and E→a·. With SLR(1), follow(S) = {$} and follow(E) = {$, =} overlap at $, causing a reduce-reduce conflict. With LR(1), the items are (S→a·, $) and (E→a·, $) — distinct only by what follows, yet both have lookahead $. LR(1) also has a conflict here; the grammar itself is inherently ambiguous on input a$.


Abstract Syntax Trees and Weeding

From Parse Tree to AST

The parse tree (also called concrete syntax tree) produced by the parser contains a great deal of syntactic noise: intermediate non-terminals introduced for precedence and associativity, single-child chains, and extra punctuation nodes. The Abstract Syntax Tree (AST) is a cleaned-up, semantically meaningful tree structure.

Parse tree vs AST for 1 + 2 * 3 — concrete tree preserves grammar structure; AST captures semantics directly.

Concrete Parse TreeEE+TTT*FFF123

Abstract Syntax Tree

+1*23Direct operator tree — no grammar artifacts

Parse tree for 1 + 2 * 3:

        E
       /|\
      E  +  T
      |     |\
      T     T  *  F
      |     |     |
      F     F     3
      |     |
      1     2

Corresponding AST:

      +
     / \
    1   *
       / \
      2   3

The AST discards the non-terminal hierarchy used to encode precedence and replaces it with direct tree structure.

Constructing the AST

AST construction is typically done during or just after parsing:

  • Each grammar non-terminal corresponds to a class hierarchy in the AST
  • Production rules correspond to constructors
  • Parsing actions create AST nodes bottom-up

A typical Joos AST has nodes like BinaryExpr, UnaryExpr, MethodInvocation, ClassDecl, FieldDecl, IfStmt, WhileStmt, ReturnStmt, etc.

Weeding

Weeding is a post-parsing pass that enforces language constraints that could be checked by the grammar but would make the grammar too complex or unreadable. Weeding walks the AST and explicitly checks rules like:

  • A class cannot be both abstract and final
  • A final method cannot also be abstract
  • No duplicate implements interfaces in a class declaration
  • An interface method cannot have a body
  • A constructor name must match the enclosing class name
  • void can only be used as a return type, not in expressions
  • Every abstract class method must be in an abstract class

Why not encode these in the grammar? Some rules involve counting (no duplicate modifiers) or cross-referencing distant parts of the tree (constructor name vs. class name). These are far cleaner as AST traversals than as grammar constraints.


Semantic Analysis

Parsing verifies syntax but not meaning. Semantic analysis (also called context-sensitive analysis) checks that the program makes semantic sense: names refer to declared entities, types are consistent, and the inheritance hierarchy is well-formed.

Environments and Scope

A scope is a region of source code within which a name declaration is visible. An environment (also called a symbol table) maps names to their declarations.

Declaring a name (entering a new declaration):

if name already exists in current environment: ERROR (duplicate declaration)
else: insert name → declaration into environment

Looking up a name (resolving a reference):

search current (innermost) environment
if not found: search enclosing environment
...
if not found at outermost scope: ERROR (undeclared name)

In practice, environments are organized as a stack or chain following the nesting of scopes.

Namespaces in Java

Java has six distinct namespaces (JLS 6.5.1) — names in different namespaces cannot conflict with each other:

  1. Package namespace: com.example
  2. Type namespace: class and interface names
  3. Method namespace: method names
  4. Expression namespace: local variables, parameters, fields
  5. Package-or-type namespace: for qualified names that could be either
  6. Ambiguous namespace: for multi-part names like a.b.c before disambiguation

Joos simplifies this but retains the essential separation of types, methods, and variables.

Java Name Resolution: Seven Passes

Joos name resolution is performed in a specific order across the entire program (not statement-by-statement). Each pass builds on the previous one.

Pass 1: Build the Global Environment

Collect all class and interface declarations across all files, indexed by their fully-qualified names (com.example.Foo). This creates a global map: fully-qualified name → class/interface AST node.

Pass 2: Resolve Type Names

Every type reference (in declarations, cast expressions, instanceof, etc.) must be resolved to a class/interface in the global environment.

For a simple type name Foo appearing in class C in package pkg:

  1. Check if Foo is declared in the same compilation unit (explicit import or same package)
  2. Check single-type-import declarations (import a.b.Foo;)
  3. Check types in the same package (pkg.Foo)
  4. Check type-import-on-demand (import a.b.*;)
  5. Ambiguity if found in more than one place at the same priority

For a qualified type name a.b.Foo:

  • Traverse: a is a package, a.b is a package, Foo is a class in that package

Pass 3: Check the Class Hierarchy

Verify that the inheritance graph is well-formed:

  • No class extends itself (directly or transitively) — no cycles
  • Every superclass exists and is accessible
  • No class extends an interface (use implements)
  • No interface extends a class

Pass 4: Disambiguate Ambiguous Names

A multi-part name like a.b.c.d could mean many things. Resolution algorithm:

Given name a₁.a₂.a₃.a₄:
1. If a₁ is a local variable in scope:
       a₂ is an instance field of type(a₁)
       a₃ is an instance field of its type, etc.
2. Else if a₁ is a field in contain(current class):
       a₂, a₃, ... are instance fields
3. Else try prefixes of increasing length:
       if a₁ is a type → a₂ is a static field, a₃... are instance fields
       if a₁.a₂ is a type → a₃ is a static field, etc.

Pass 5: Resolve Expressions (Variables and Fields)

Resolve all variable and field references within method bodies and initializers. This attaches each name occurrence to its declaration node.

Pass 6: Type Checking

Check that all expressions are type-correct (see Type Checking section below).

Pass 7: Resolve Method Calls and Instance Fields

After type checking (which assigns types to all sub-expressions), resolve method calls:

  • For e.m(args), look up method m in contain(type(e))
  • In Joos, there is no overloading (simplification), so the method name uniquely identifies it within a class hierarchy

Class Hierarchy and Inheritance

Formalizing the Hierarchy

The class hierarchy defines which methods and fields each class inherits and which it overrides.

Notation:

  • \( \text{super}(T) \): direct superclasses/superinterfaces of type T
  • \( S < T \): S is a strict subtype of T (directly or transitively)
  • \( S \leq T \): S is a subtype of T (including \( S = T \))

Method and field sets:

  • \( \text{declare}(T) \): methods/fields directly declared in T
  • \( \text{inherit}(T) \): methods/fields inherited by T (not redeclared)
  • \( \text{contain}(T) = \text{declare}(T) \cup \text{inherit}(T) \): all accessible members

Signatures: \( \text{sig}(m) = (\text{name}(m),\ \text{paramTypes}(m)) \). Note that return type is not part of the signature.

Overriding: \( (m, m') \in \text{replace} \) means m overrides m’. This happens when m ∈ declare(T), m’ ∈ contain(S), S ∈ super(T), and sig(m) = sig(m’).

Inheritance Rules (Inference Rules)

These rules define when a method is inherited:

Rule 1 (Concrete override): If S ∈ super(T), m ∈ declare(T), m' ∈ contain(S), and sig(m) = sig(m'), then (m, m') ∈ replace.
Rule 2 (Concrete inherit): If S ∈ super(T), m ∈ contain(S), no declaration in declare(T) has sig = sig(m), and m is not abstract → m ∈ inherit(T).
Rule 3 (Abstract inherit): If S ∈ super(T), m ∈ contain(S), no declaration in declare(T) has sig = sig(m), m is abstract, and ALL methods in contain(T) with the same signature are also abstract → m ∈ inherit(T).

Hierarchy Checks

The compiler must enforce 10 well-formedness constraints on the hierarchy (JLS §8,9):

#Constraint
1No cycles: \( \neg\exists T: T < T \)
2No duplicate methods: distinct \( m, m' \in \text{declare}(T) \Rightarrow \text{sig}(m) \neq \text{sig}(m') \)
3Return type consistency: \( m, m' \in \text{contain}(T),\ \text{sig}(m) = \text{sig}(m') \Rightarrow \text{type}(m) = \text{type}(m') \)
4Concrete class completeness: if \( m \in \text{contain}(T) \) and abstract \( \in \text{mods}(m) \), then abstract \( \in \text{mods}(T) \)
5Static consistency: \( (m, m') \in \text{replace} \Rightarrow (\text{static} \in \text{mods}(m) \Leftrightarrow \text{static} \in \text{mods}(m')) \)
6Return type match: \( (m, m') \in \text{replace} \Rightarrow \text{type}(m) = \text{type}(m') \)
7Visibility non-narrowing: \( (m, m') \in \text{replace},\ \text{public} \in \text{mods}(m') \Rightarrow \text{public} \in \text{mods}(m) \)
8(No checked exception widening)
9No overriding final: \( (m, m') \in \text{replace} \Rightarrow \text{final} \notin \text{mods}(m') \)
10Field uniqueness: distinct \( f, f' \in \text{declare}(T) \Rightarrow \text{name}(f) \neq \text{name}(f') \)

Interface Special Cases

  • Interface methods are implicitly public abstract
  • Interface fields are implicitly public static final
  • An empty interface implicitly inherits abstract versions of all public methods from java.lang.Object
  • A class implementing an interface must provide (directly or via inheritance) concrete implementations of all abstract interface methods, or be declared abstract

Type Checking

Type System Concepts

A type at the static (compile-time) level is the set of values an expression can evaluate to. At runtime, every value has a dynamic type describing how its bits are to be interpreted.

Declared type: The static assertion that a variable or expression holds only values of that type.

Type correctness: Informally, a program is type-correct if no operation is ever applied to a value of the wrong type at runtime. This property is undecidable in general (Rice’s theorem applies), so compilers use a conservative type system: a static approximation that may reject some valid programs but never accepts invalid ones.

Soundness: A type system is sound if static type correctness implies runtime type correctness. Java’s type system is almost sound — it has exactly one soundness gap (array covariance, discussed below).

Type Judgments

Type checking is specified with inference rules written as:

\[ \frac{\text{premises}}{\text{conclusion}} \]

The conclusion \( C, L, \sigma \vdash E : \tau \) means: “In class \( C \), local variable environment \( L \), with method return type \( \sigma \), expression \( E \) has type \( \tau \).”

Literal and Variable Rules

\[ \frac{}{C, L, \sigma \vdash 42 : \texttt{int}} \qquad \frac{}{C, L, \sigma \vdash \texttt{true} : \texttt{boolean}} \qquad \frac{}{C, L, \sigma \vdash \texttt{'a'} : \texttt{char}} \]\[ \frac{}{C, L, \sigma \vdash \texttt{"abc"} : \texttt{java.lang.String}} \qquad \frac{}{C, L, \sigma \vdash \texttt{null} : \texttt{null}} \]\[ \frac{L(n) = \tau}{C, L, \sigma \vdash n : \tau} \qquad \frac{}{C, L, \sigma \vdash \texttt{this} : C} \]

Operator Rules

\[ \frac{C, L, \sigma \vdash E : \texttt{int}}{C, L, \sigma \vdash {-E} : \texttt{int}} \qquad \frac{C, L, \sigma \vdash E : \texttt{boolean}}{C, L, \sigma \vdash {!E} : \texttt{boolean}} \]\[ \frac{E_1 : \tau_1 \quad E_2 : \tau_2 \quad \text{num}(\tau_1) \quad \text{num}(\tau_2)}{E_1 + E_2 : \texttt{int}} \]\[ \frac{E_1 : \texttt{String} \quad E_2 : \tau \quad \tau \neq \texttt{void}}{E_1 + E_2 : \texttt{String}} \qquad \frac{E_1 : \tau \quad E_2 : \texttt{String} \quad \tau \neq \texttt{void}}{E_1 + E_2 : \texttt{String}} \]

Where \( \text{num}(\tau) \Leftrightarrow \tau \in \{\texttt{byte}, \texttt{short}, \texttt{char}, \texttt{int}\} \).

\[ \frac{E_1 : \tau_1 \quad E_2 : \tau_2 \quad (\tau_1 :=: \tau_2 \vee \tau_2 :=: \tau_1)}{E_1 == E_2 : \texttt{boolean}} \]\[ \frac{E_1 : \tau_1 \quad E_2 : \tau_2 \quad \text{num}(\tau_1) \quad \text{num}(\tau_2)}{E_1 < E_2 : \texttt{boolean}} \]

Assignability

Assignability \( \tau_1 := \tau_2 \) (read: “τ₁ can hold a value of type τ₂”) is defined by these rules:

\[ \tau := \tau \quad \text{(reflexivity)} \]\[ \frac{D \leq C}{C := D} \quad \text{(subtype)} \qquad \frac{}{C := \texttt{null}} \quad \text{(null is assignable to any reference type)} \]\[ \texttt{int} := \texttt{short} \quad \texttt{int} := \texttt{char} \quad \texttt{short} := \texttt{byte} \quad \text{(numeric widening)} \]\[ \frac{\sigma := \tau \quad \tau := \rho}{\sigma := \rho} \quad \text{(transitivity)} \]

Cast Rules

\[ \frac{E : \tau_1 \quad (\tau_2 := \tau_1 \vee \tau_1 := \tau_2)}{(\tau_2)E : \tau_2} \quad \text{(reference cast)} \]\[ \frac{E : \tau_1 \quad \text{num}(\tau_1) \quad \text{num}(\tau_2)}{(\tau_2)E : \tau_2} \quad \text{(numeric cast)} \]

Reference casts require a runtime check (may throw ClassCastException). Numeric casts truncate bits as needed.

Field and Method Rules

\[ \frac{\texttt{static}\ \tau\ f \in \text{contain}(D)}{D.f : \tau} \]\[ \frac{E : D \quad \tau\ f \in \text{contain}(D) \quad \texttt{static} \notin \text{mods}(f)}{E.f : \tau} \]\[ \frac{E : D \quad \forall i: E_i : \tau_i \quad \tau\ m(\tau_1,\ldots,\tau_k) \in \text{contain}(D)}{E.m(E_1,\ldots,E_k) : \tau} \]

Statement Rules

\[ \frac{C, L, \sigma \vdash E : \tau \quad \sigma := \tau}{C, L, \sigma \vdash \texttt{return}\ E} \qquad \frac{}{C, L, \texttt{void} \vdash \texttt{return}} \]\[ \frac{C, L, \sigma \vdash E_1 : \tau_1 \quad C, L, \sigma \vdash E_2 : \tau_2 \quad \tau_1 := \tau_2}{C, L, \sigma \vdash E_1 = E_2 : \tau_1} \]\[ \frac{C, L, \sigma \vdash E : \texttt{boolean} \quad C, L, \sigma \vdash S_1 \quad C, L, \sigma \vdash S_2}{C, L, \sigma \vdash \texttt{if}\ (E)\ S_1\ \texttt{else}\ S_2} \]

Array Rules

\[ \frac{E_1 : \tau_1[] \quad E_2 : \tau_2 \quad \text{num}(\tau_2)}{E_1[E_2] : \tau_1} \quad \text{(array access)} \]\[ \frac{E : \tau[]}{E.\texttt{length} : \texttt{int}} \quad \text{(array length)} \]\[ \frac{E : \tau_2 \quad \text{num}(\tau_2)}{\texttt{new}\ \tau_1[E] : \tau_1[]} \quad \text{(array creation)} \]\[ \frac{D \leq C}{C[] := D[]} \quad \text{(array covariance)} \]

This rule seems natural but introduces unsoundness:

B[] bs = new B[1];
Object[] os = bs;     // OK: Object[] := B[]  (covariance)
os[0] = new C();      // Allowed statically (C is an Object)
                      // But throws ArrayStoreException at runtime!
B b = bs[0];          // b would have wrong type

The JVM detects this at runtime with ArrayStoreException. The compiler must emit a runtime type check on every array write.

Type Checking Implementation

function typecheck(node E):
    find the type rule matching E's AST node kind
    recursively typecheck all subexpressions to get their types
    verify all premises hold
    return the type τ from the conclusion

Static Analysis

Static analysis proves properties about runtime behavior without executing the program. Unlike type checking (which is always applied), some analyses are optional but improve safety or performance.

Rice’s Theorem and Conservative Analysis

Rice’s theorem: For any non-trivial property of program behavior, determining whether an arbitrary program has that property is undecidable.

This means we cannot precisely determine most runtime properties at compile time. Instead, compilers use conservative approximations — analyses that may give imprecise answers but are never wrong in the direction that matters:

  • A soundly conservative reachability analysis may say “maybe reachable” when code is actually unreachable. It never says “unreachable” for code that executes.
  • A soundly conservative type analysis may reject some valid programs. It never accepts type-unsafe programs.

Java Reachability Analysis (JLS §14.20)

Java requires that every statement be potentially reachable — dead code is a compile error. The analysis tracks two values per statement:

  • \( \text{in}[s] \in \{\texttt{maybe}, \texttt{no}\} \): can s begin executing?
  • \( \text{out}[s] \in \{\texttt{maybe}, \texttt{no}\} \): can s finish executing (fall through)?

Merging: \( \texttt{maybe} \vee \texttt{no} = \texttt{maybe} \), \( \texttt{no} \vee \texttt{no} = \texttt{no} \)

Rules by statement kind:

Statementin[S]out[S]
return Eno
if (E) Sgivenout[S] ∨ in[L]
if (E) S1 else S2givenout[S1] ∨ out[S2]
while (E) S (general)givenin[L] ∨ out[S]
while (true) Sgivenno
while (false) Sin[S] = noin[L]
Block {S1; S2}in[S1]=in[L], in[S2]=out[S1]out[S2]
Any other statementgivenin[L]

Error conditions:

  • If \( \text{in}[s] = \texttt{no} \): unreachable code — ERROR
  • If \( \text{out}[\text{method body}] = \texttt{maybe} \) and return type ≠ void: ERROR (missing return)

Constant expressions: while(1 == 1) is treated as while(true) because 1 == 1 is a constant expression evaluating to true. This is evaluated at compile time per JLS §15.28.

Definite Assignment Analysis (JLS §16)

Every local variable must be definitely assigned before it is read. This prevents reading uninitialized variables.

Definitions:

  • \( \text{in}[s] \subseteq \text{Vars} \): set of variables definitely initialized before s
  • \( \text{out}[s] \subseteq \text{Vars} \): set of variables definitely initialized after s

For conditional expressions, we track separate “true-exit” and “false-exit” sets:

  • \( \text{out}_\top[E] \): variables initialized when E evaluates to true
  • \( \text{out}_\bot[E] \): variables initialized when E evaluates to false

Rules for key statements:

\[ \text{in}[E] = \text{in}[L], \quad \text{out}[L] = \text{out}[E] \cup \{x\} \]\[ \text{in}[E] = \text{in}[L], \quad \text{out}[L] = \text{out}[E] \cup \{x\} \]\[ \text{in}[E] = \text{in}[L], \quad \text{in}[S_1] = \text{out}_\top[E], \quad \text{in}[S_2] = \text{out}_\bot[E] \]\[ \text{out}[L] = \text{out}[S_1] \cap \text{out}[S_2] \]\[ \text{in}[E_1] = \text{in}[L], \quad \text{in}[E_2] = \text{out}_\top[E_1] \]\[ \text{out}_\top[L] = \text{out}_\top[E_2], \quad \text{out}_\bot[L] = \text{out}_\bot[E_1] \cap \text{out}_\bot[E_2] \]\[ \text{in}[E_1] = \text{in}[L], \quad \text{in}[E_2] = \text{out}_\bot[E_1] \]\[ \text{out}_\top[L] = \text{out}_\top[E_1] \cup \text{out}_\top[E_2], \quad \text{out}_\bot[L] = \text{out}_\bot[E_2] \]

Joos simplification: Local variables must be initialized at declaration (int x = expr), so definite assignment analysis is simpler — no uninitialized declaration case.

Live Variable Analysis

A variable is live at a program point if its current value might be read before it is next overwritten. Dead assignments (writing a variable whose value will never be read) can be eliminated.

Data flow equations (backward analysis):

  • \( \text{in}[S] = (\text{out}[S] \setminus \text{KILL}[S]) \cup \text{GEN}[S] \)

Where:

  • \( \text{GEN}[S] \): variables read in S (they must be live before S)
  • \( \text{KILL}[S] \): variables written in S (earlier values become dead)

For assignment x = e: KILL = {x}, GEN = vars_read(e)

\[ \text{out}[S_\text{body}] = \text{in}[S_\text{next}] \cup \text{in}[S_\text{body}] \]

The loop body is live if either the next statement needs it OR the body itself needs it.

Fixed-Point Computation with Lattices

The equations for live variable analysis (and other dataflow analyses) are cyclic (a loop’s in and out sets depend on each other). We solve them iteratively using fixed-point computation.

Partial order (⊑): A binary relation that is reflexive, transitive, and antisymmetric.

Lattice: A partial order where every pair of elements has a least upper bound (\( \sqcup \), join) and greatest lower bound (\( \sqall \), meet).

Complete lattice: Every set of elements has a least upper bound and greatest lower bound.

Examples:

  • Power set \( 2^{\text{Vars}} \) ordered by \( \subseteq \): complete lattice, \( \bot = \emptyset \), \( \top = \text{Vars} \), \( \sqcup = \cup \), \( \sqall = \cap \)
  • \( \{\texttt{no}, \texttt{maybe}\} \) ordered by no ⊑ maybe: complete lattice with \( \bot = \texttt{no} \)

Monotone function: \( f: L \to L \) is monotone if \( x \sqsubseteq y \Rightarrow f(x) \sqsubseteq f(y) \).

Knaster-Tarski Fixed Point Theorem: If \( L \) is a complete lattice and \( F: L \to L \) is monotone, then:
  1. A fixed point exists.
  2. The least fixed point equals \( \bigsqcup_{n \geq 0} F^n(\bot) \).

Iterative algorithm for live variable analysis:

initialize in[S] = out[S] = ∅ for all S
repeat:
    update all in[S], out[S] according to dataflow equations
until no changes occur

Termination is guaranteed because:

  1. The lattice \( 2^{\text{Vars}} \) is finite
  2. Each iteration increases or preserves \( \sqcup \) (monotone)
  3. A finite lattice has no infinite ascending chains

The General Dataflow Analysis Framework

The three analyses covered above (reachability, definite assignment, live variables) are all instances of a single general dataflow framework. A dataflow analysis is specified by four components:

ComponentDescription
DirectionForward (info flows from entry to exit) or Backward (exit to entry)
Lattice \( L \)The domain of analysis facts; must be a complete lattice
Initial value \( \bot \)Starting assumption at all program points
Transfer functionFor each statement \( S \): \( f_S : L \to L \) (must be monotone)
\[ \text{out}[S] = f_S(\text{in}[S]), \qquad \text{in}[S] = \bigsqcup_{\text{pred } p \text{ of } S} \text{out}[p] \]\[ \text{in}[S] = f_S(\text{out}[S]), \qquad \text{out}[S] = \bigsqcup_{\text{succ } s \text{ of } S} \text{in}[s] \]

The meet operator \( \sqcup \) merges information from multiple control-flow paths. Different analyses use different meet operators:

AnalysisDirectionLatticeMeet\( \bot \)
ReachabilityForward\( \{\texttt{no}, \texttt{maybe}\} \)\( \vee \)no
Definite AssignmentForward\( 2^{\text{Vars}} \) ordered by \( \supseteq \)\( \cap \)Vars
Live VariablesBackward\( 2^{\text{Vars}} \) ordered by \( \subseteq \)\( \cup \)\( \emptyset \)
Reaching DefinitionsForwardsets of definitions\( \cup \)\( \emptyset \)
Available ExpressionsForwardsets of expressions\( \cap \)all exprs

The universal algorithm is always the same: initialize to \( \bot \), apply transfer functions repeatedly until no changes (Kleene fixed-point iteration).

Three-address code with SSA φ-nodes — each variable defined exactly once; φ selects value based on control flow path.

B0:x₁ = 0(entry)B1 (then):x₂ = x₁ + 1B2 (else):x₃ = x₁ * 2B3 (merge):x₄ = φ(x₂, x₃)y = x₄ + 5↙ φ picks based on path ↘

Reaching Definitions

A definition of variable \( x \) is any assignment x = ... in the program. A definition \( d \) reaches program point \( p \) if there exists an execution path from \( d \) to \( p \) along which \( x \) is not redefined.

Purpose: Reaching definitions is a forward analysis used to:

  • Determine whether constant propagation is safe (if only one definition of x reaches a use, and that definition assigns a constant, then x is constant at that use)
  • Build def-use chains for optimization
\[ \text{out}[S] = (\text{in}[S] \setminus \text{KILL}[S]) \cup \text{GEN}[S] \]

Where:

  • \( \text{GEN}[S] = \{d\} \) if \( S \) is a definition \( d \): \( x = e \)
  • \( \text{KILL}[S] = \) all other definitions of \( x \) in the program

Meet operator: \( \text{in}[S] = \bigcup_{\text{pred } p} \text{out}[p] \) (union — a definition reaches a point if it reaches along any path)

Example:

d1: x = 1       // defines x
d2: y = x + 1   // uses x; d1 reaches here
d3: x = y * 2   // kills d1, defines new d3
d4: z = x       // only d3 reaches here

At d4, only definition d3 for x is in the reaching-definitions set — d1 was killed by d3. This tells the optimizer that x at d4 has the value computed at d3.

Lattice: \( 2^{\text{Defs}} \) with \( \subseteq \) order, \( \bot = \emptyset \), \( \sqcup = \cup \). Each element is a set of (possibly) reaching definitions.

Available Expressions

An expression \( e \) is available at program point \( p \) if every path from the program entry to \( p \) evaluates \( e \), and none of \( e \)’s operands is redefined after that last evaluation.

Purpose: Available expressions is a forward analysis used for Common Subexpression Elimination (CSE): if x + y is available at a use, compute it once, save it in a temporary, and reuse.

\[ \text{out}[S] = (\text{in}[S] \setminus \text{KILL}[S]) \cup \text{GEN}[S] \]

Where:

  • \( \text{GEN}[S] \) = expressions computed by \( S \) whose operands are not modified by \( S \)
  • \( \text{KILL}[S] \) = all expressions in the program that contain any variable written by \( S \)

Meet operator: \( \text{in}[S] = \bigcap_{\text{pred } p} \text{out}[p] \) (intersection — an expression is available only if it reaches along every path)

Initial value: \( \text{in}[\text{entry}] = \emptyset \). For all other nodes, initialize to the set of all expressions (the \( \top \) of the lattice) so that intersection can only shrink the set.

Lattice: \( 2^{\text{Exprs}} \) ordered by \( \supseteq \) (larger sets are lower in the lattice; \( \bot \) = set of all expressions, \( \top = \emptyset \)), \( \sqcup = \cap \). This is the “pessimistic must” direction: an expression is available only if all paths guarantee it.

Example:

// Entry: {}
a = b + c;      // GEN: {b+c}, out = {b+c}
if (...) {
    d = b + c;  // b+c already in, no kill, out = {b+c}
} else {
    b = 1;      // KILL: {b+c} (b is written), out = {}
}
// Meet (intersection): {} ∩ {b+c} = {} — b+c NOT available
e = b + c;      // must recompute

If the else branch did not modify b or c, b+c would be available after the merge, and the compiler could avoid recomputing it.


Code Generation: IA-32 Assembly

Code generation translates the type-checked, annotated AST into actual machine instructions. The course targets IA-32 (i386) — the 32-bit x86 architecture.

IA-32 Architecture Overview

The i386 is the 32-bit version of the x86 architecture, introduced with the Intel 80386 in 1986. It features:

  • 8 general-purpose 32-bit registers: eax, ebx, ecx, edx, esi, edi, esp, ebp
  • 6 segment registers (largely ignored in user-space code): cs, ds, es, fs, gs, ss
  • EFLAGS register: condition codes (ZF, SF, OF, CF, etc.) set by comparisons
  • EIP: instruction pointer (program counter)

Sub-register access: ax (low 16 bits of eax), al (low 8 bits), ah (high 8 bits of ax).

Register Conventions

RegisterRoleSave
eaxReturn value, accumulatorcaller-save
ecxLoop countercaller-save
edxHigh bits of multiply/dividecaller-save
ebxGeneral purposecallee-save
esiSource indexcallee-save
ediDestination indexcallee-save
espStack pointercallee-save
ebpFrame pointer (base pointer)callee-save

Callee-save registers must be preserved by a function — if a callee uses them, it must save/restore them. Caller-save registers may be clobbered by a called function — if the caller needs them, it must save them before calling.

IA-32 Instruction Set (Selected)

Memory Addressing

The i386 supports flexible memory addressing: [base + index*scale + displacement]

mov eax, ebx            ; register to register: eax = ebx
mov eax, [ebx]          ; load: eax = *ebx
mov [eax], ebx          ; store: *eax = ebx
mov eax, 42             ; immediate: eax = 42
mov eax, [ebp - 8]      ; local variable at frame offset -8
mov eax, [eax + ecx*4]  ; array element (4-byte ints, index in ecx)

Arithmetic

add eax, ebx            ; eax += ebx
sub eax, ebx            ; eax -= ebx
imul eax, ebx           ; eax *= ebx (signed 32-bit result)
cdq                     ; sign-extend eax into edx:eax (for division)
idiv ecx                ; eax = edx:eax / ecx; edx = remainder (signed)
neg eax                 ; eax = -eax

Comparisons and Branches

cmp eax, ebx            ; sets flags based on eax - ebx (no result stored)
je  label               ; jump if equal (ZF=1)
jne label               ; jump if not equal
jg  label               ; jump if greater (signed)
jge label               ; jump if ≥ (signed)
jl  label               ; jump if less (signed)
jle label               ; jump if ≤ (signed)
ja  label               ; jump if above (unsigned)
jb  label               ; jump if below (unsigned)
jmp label               ; unconditional jump

Stack and Calls

push eax                ; esp -= 4; [esp] = eax
pop  eax                ; eax = [esp]; esp += 4
call label              ; push eip; jmp label (stores return address)
ret                     ; pop eip (return to caller)

Data Declarations (NASM syntax)

section .data
myint: dd 1234          ; declare 32-bit integer
mystr: db 'h','e','l','l','o',0  ; null-terminated string
myarr: dd 1, 2, 3, 4   ; array of 4 ints
align 4                 ; pad to 4-byte boundary

section .text
global main             ; export symbol
extern printf           ; import external symbol

System Calls

mov eax, 4              ; sys_write
mov ebx, 1              ; fd = stdout
mov ecx, msg            ; buffer address
mov edx, len            ; length
int 0x80                ; invoke kernel

Code Generation Strategy

Three main tasks:

  1. Plan data layout: Decide where each variable, field, and object lives in memory.
  2. Generate code bottom-up from AST: Each node type has a code template.
  3. Write runtime support: Malloc, exception handlers, vtable structures.

Data Layout Decisions

DataStorageRationale
Integer/boolean literalsImmediatesEncoded in instruction
Local variablesStackAutomatic lifetime, fast access via ebp offsets
ObjectsHeapDynamic lifetime, reference semantics
Static fieldsFixed memory (labeled)Per-class, persistent
String/array objectsHeapVariable size
Vtables / class structuresFixed memoryOne per class, permanent

Stack Frame Layout

Every function call establishes a stack frame — a fixed region on the stack for that invocation’s local variables and bookkeeping.

Standard IA-32 frame layout (cdecl convention):

              High address
 ┌────────────────────┐
 │   arg n            │ ← [ebp + 4*(n+1) + 4]
 │   ...              │
 │   arg 2            │ ← [ebp + 16]
 │   arg 1            │ ← [ebp + 12]
 │   arg 0            │ ← [ebp + 8]
 │   return address   │ ← [ebp + 4]  (pushed by call)
 │   saved ebp        │ ← [ebp + 0]  (pushed by prologue)
 │   local var 0      │ ← [ebp - 4]
 │   local var 1      │ ← [ebp - 8]
 │   ...              │
 │   local var n      │ ← [ebp - 4*(n+1)]
 └────────────────────┘
              Low address (esp points here)

Function prologue:

push ebp                ; save caller's frame pointer
mov  ebp, esp           ; establish new frame pointer
sub  esp, N             ; allocate N bytes for local variables

Function epilogue:

mov  esp, ebp           ; deallocate locals
pop  ebp                ; restore caller's frame pointer
ret                     ; return to caller

Calling Convention (cdecl)

Caller’s responsibility:

  1. Save caller-save registers if needed (eax, ecx, edx)
  2. Push arguments right-to-left (last arg first)
  3. call foo (pushes return address)
  4. After return, pop arguments: add esp, 4*n
  5. Result is in eax

Callee’s responsibility:

  1. Execute prologue (push ebp, mov ebp esp, sub esp N)
  2. Save callee-save registers that will be used (ebx, esi, edi)
  3. Execute function body
  4. Place return value in eax
  5. Restore callee-save registers
  6. Execute epilogue (mov esp ebp, pop ebp, ret)

Example: int f(int a, int b)

; --- Caller side ---
push 2              ; push b
push 1              ; push a
call f
add  esp, 8         ; pop 2 args × 4 bytes

; --- Callee: f(a, b) ---
f:
push ebp
mov  ebp, esp
sub  esp, 4         ; one local variable
; a is at [ebp + 8], b is at [ebp + 12]
mov  eax, [ebp + 8] ; load a into eax (return value)
add  eax, [ebp+12]  ; eax = a + b
mov  esp, ebp
pop  ebp
ret

Code Generation for Expressions

The code generator for an expression node produces code that leaves the result in eax. A common pattern is push/pop to save intermediate values:

Binary subtraction E1 - E2:

; generate code for E1
push eax            ; save E1 result
; generate code for E2
pop  ebx            ; restore E1 into ebx
sub  ebx, eax       ; ebx = E1 - E2
mov  eax, ebx       ; result in eax

Note: Division requires extra care — idiv uses edx:eax as dividend:

; generate code for E1 → eax
push eax
; generate code for E2 → eax (divisor)
mov  ecx, eax
pop  eax            ; restore dividend
cdq                 ; sign extend eax into edx
idiv ecx            ; eax = eax / ecx; edx = remainder

L-value vs R-value

Most expressions produce an r-value (a value). Assignments need the l-value (an address) of the left-hand side.

ExpressionL-value (address)R-value (value)
Local var vebp + offset(v)[ebp + offset(v)]
Static field C.faddress of label[label]
Instance field e.fe + offset(f)[e + offset(f)]
Array element a[i]a + header + i*4[a + header + i*4]

Assignment E1 = E2:

; compute l-value of E1 → eax
push eax            ; save address
; compute r-value of E2 → eax
pop  ebx            ; restore address
mov  [ebx], eax     ; store value at address
; eax still holds the value (assignment is an expression in Java)

If-Else and Loops

If-else:

; condition code → eax
cmp  eax, 0
je   else_42
; then-branch code
jmp  end_42
else_42:
; else-branch code
end_42:

While loop:

loop_42:
; condition code → eax
cmp  eax, 0
je   end_42
; body code
jmp  loop_42
end_42:

Short-circuit evaluation (&&): If the left side is false, skip evaluating the right side.

; E1 code → eax
cmp  eax, 0
je   false_42       ; skip E2 if E1 is false
; E2 code → eax
false_42:

Object Layout and Virtual Dispatch

Memory Layout of Objects

Each object in memory has a fixed layout that allows treating subclass instances as superclass instances:

Object of class B (inherits from A):
┌───────────────────┐
│  vtable pointer   │  ← [obj + 0] always first
├───────────────────┤
│  field a (from A) │  ← [obj + 4]
├───────────────────┤
│  field b (from B) │  ← [obj + 8]
└───────────────────┘

Type-compatible layout: If B extends A, then the layout of B is an extension of A’s layout — A’s fields appear at the same offsets in B. This means a pointer to a B object can be safely treated as a pointer to an A object.

A layout:              B layout (B extends A):
  vtable ptr             vtable ptr        ← same offset
  field_a                field_a           ← same offset
                         field_b           ← appended

Virtual Method Tables (Vtables)

The problem: Given a reference of static type Vehicle, calling v.numTires() must dispatch to the correct implementation depending on the dynamic type (Car, Truck, Bicycle…).

Solution: Each object’s first word points to a vtable (virtual method table) — a class-specific array of function pointers.

Object:               Vtable for Car:
┌─────────────┐       ┌──────────────────────────┐
│ vtable ptr  │──────▶│ METHOD$Car$numTires       │  ← offset 0
├─────────────┤       ├──────────────────────────┤
│ field ...   │       │ METHOD$Vehicle$foo        │  ← offset 4 (inherited)
└─────────────┘       ├──────────────────────────┤
                      │ METHOD$Car$bar            │  ← offset 8 (new)
                      └──────────────────────────┘

Critical invariant: Inherited methods appear at the same offset in subclass vtables as in superclass vtables. This allows:

; v.numTires() where v has static type Vehicle
mov  eax, [v]           ; load vtable pointer
mov  eax, [eax + 0]     ; load numTires entry (offset 0 for Vehicle)
call eax                ; dispatch — calls Car's or Vehicle's version
                        ; depending on dynamic type of v

Vtable structure:

VTABLE$Vehicle:
    dd METHOD$Vehicle$numTires    ; offset 0
    dd METHOD$Vehicle$foo         ; offset 4

VTABLE$Car:
    dd METHOD$Car$numTires        ; offset 0 (overridden)
    dd METHOD$Vehicle$foo         ; offset 4 (inherited — same pointer!)
    dd METHOD$Car$bar             ; offset 8 (new method)

Dispatching Interface Methods

Interface dispatch is harder because a class can implement multiple interfaces, and each interface expects its methods at offset 0 of its own “view” of the vtable:

interface A { void ma(); }   // ma at offset 0 of A-vtable
interface B { void mb(); }   // mb at offset 0 of B-vtable
class C implements A, B { }  // C must satisfy both — conflict!

Solution 1: Selector-Indexed Table (O(1) dispatch)

Assign a global selector number to each unique method signature. Create a 2D table: rows index classes, columns index selectors. Each class has a pointer to its column. Dispatching:

mov eax, [obj]          ; vtable pointer (actually selector table column)
mov eax, [eax + 4*sel]  ; load function pointer for selector sel
call eax

Space: O(S × C) for S distinct selectors and C classes. In practice, most cells are empty, so compression is needed. This is the itab approach (similar to C++ virtual function tables for multiple inheritance).

Solution 2: Hash Table Dispatch

Build a per-class hash table mapping selector → method pointer. Dispatch computes a hash at the call site.

  • Space: O(Σ methods per class)
  • Time: O(1) expected but with hash overhead
  • Can use inline caching (remember last dispatch result at each call site) to make repeated calls fast

Subtype Testing at Runtime

Several operations require a runtime subtype check:

  1. instanceof T: Does this object belong to T or a subtype?
  2. Casts (T) obj: Throw ClassCastException if not a subtype
  3. Array stores a[i] = obj: Throw ArrayStoreException if dynamic array type is incompatible

Interval Method (Single Inheritance)

For classes only (single inheritance): assign each class a DFS-traversal interval \( [\text{start}_C, \text{end}_C] \) such that \( B \leq A \Leftrightarrow \text{interval}(B) \subseteq \text{interval}(A) \).

DFS of hierarchy:              Intervals:
     Object [1, 8]              Object: [1, 8]
    /       \                   Vehicle: [2, 5]
Vehicle[2,5] Number[6,8]        Car: [3, 3]
  / \                           Truck: [4, 4]
Car Truck                       Number: [6, 8]
[3,3] [4,4]                     ...

Test: x instanceof Vehicle2 ≤ start(class(x)) ≤ end(class(x)) ≤ 5

In assembly:

mov  ecx, [obj]                ; vtable pointer
mov  ecx, [ecx + CLASS_TAG]    ; class tag (index)
cmp  ecx, VEHICLE_START
jl   fail
cmp  ecx, VEHICLE_END
jg   fail
; success: obj is-a Vehicle

Interface Subtype Testing

For interfaces, single-inheritance interval doesn’t work (a class can implement many interfaces). Use a subtype matrix: a 2D boolean table with entry [C, T] = 1 iff C ≤ T.

Space: O(T²) where T is the number of types. Practical compression: bitset encoding or hash tables.

Arrays

Array object layout:

┌───────────────────┐
│ vtable pointer    │  ← [arr + 0]
├───────────────────┤
│ length            │  ← [arr + 4]
├───────────────────┤
│ element[0]        │  ← [arr + 8]
├───────────────────┤
│ element[1]        │  ← [arr + 12]
├───────────────────┤
│ ...               │
└───────────────────┘

Arrays are heap-allocated objects with a vtable (for instanceof, length, and method dispatch inherited from java.lang.Object).

Array access a[i]:

; a → eax, i → ecx
; null check
test eax, eax
je   null_exception
; bounds check
cmp  ecx, [eax + 4]     ; compare i with length
jae  bounds_exception   ; unsigned comparison (catches negative too)
; compute address
lea  eax, [eax + 8 + ecx*4]   ; eax = &a[i]

Array store needs an extra type check (to catch ArrayStoreException):

; check dynamic type of array element type
mov  ebx, [arr]                 ; vtable of array
mov  ebx, [ebx + ELEMENT_TYPE]  ; element type
; check if value is assignment-compatible with element type
; (use subtype table)

Instance Creation

new A(args):

; 1. Allocate memory
mov  eax, SIZE_OF_A
call __malloc           ; returns pointer in eax
; 2. Initialize vtable pointer
mov  [eax], VTABLE$A
; 3. Zero out fields (Java semantics require default initialization)
; 4. Push 'this' (the new object pointer)
push eax
; 5. Push arguments
; push arg_n
; ...
; push arg_1
; 6. Call constructor
call CONSTRUCTOR$A
; 7. Clean up args
add  esp, 4*(1 + n_args)
; 8. Return value: object pointer already in eax

Constructor execution order (JLS §12.5):

  1. Superclass constructor is called first (implicit super() if not explicit)
  2. Instance field initializers execute in declaration order
  3. Constructor body executes

Memory Management

Memory Organization

A process’s virtual address space is divided into regions:

0xFFFFFFFF ──────────────────
           │  kernel          │
0xC0000000 ──────────────────
           │                  │
           │  stack           │ (grows down ↓)
           │                  │
           │  ...             │
           │                  │
           │  heap            │ (grows up ↑)
           │                  │
0x08048000 ──────────────────
           │  text (code)     │
           │  data (statics)  │
0x00000000 ──────────────────

The heap is where new allocates objects. The OS provides raw memory pages via mmap; the runtime’s allocator (malloc) subdivides these pages for individual allocations.

Manual Memory Management

In C, the programmer explicitly manages heap memory:

  • malloc(n): Allocate n bytes, return pointer
  • free(p): Return allocation at p to the pool

The allocator maintains free lists — linked lists of free memory blocks. When an allocation is requested:

  1. Find a free block of sufficient size
  2. Split it if much larger than needed
  3. Return the pointer

When free is called:

  1. The block is added back to the free list
  2. Adjacent free blocks may be coalesced to prevent fragmentation

Fragmentation: Over time, the heap may contain many small free blocks that cannot satisfy large allocations even though total free space is sufficient.

Automatic Memory Management (Garbage Collection)

Garbage collection automatically reclaims memory that is no longer reachable from the program. “Reachable” approximates “might be used again.”

Roots: The set of directly reachable memory locations — stack variables, static fields, CPU registers. Everything transitively reachable from roots is live; everything else is garbage.

The compiler must provide the GC with type information about object layouts (which fields are references) so the GC can traverse the object graph.

Mark-and-Sweep

The classic two-phase algorithm:

Phase 1 — Mark: DFS from all roots, mark every reachable object.

mark(object o):
    if o is already marked: return
    mark o
    for each reference field f in o:
        mark(o.f)

Phase 2 — Sweep: Scan the entire heap; free every unmarked object.

for each object o in heap:
    if o is not marked: free(o)
    else: unmark(o)   // prepare for next GC

Cost: O(H) where H = heap size (sweep visits entire heap regardless of live data).

Disadvantage: Fragmentation (objects remain in place; free blocks are scattered).

Semispace Copying Collection

Split the heap into two equal halves: fromspace and tospace.

  1. Allocate in fromspace using bump pointer (just increment a pointer — no free list!)
  2. When fromspace is full: copy all reachable objects to tospace, updating all pointers
  3. Swap fromspace and tospace roles
copy(object o):
    if o already has forwarding pointer:
        return forwarding_pointer(o)   // already copied
    allocate space in tospace for copy
    copy bits of o into tospace
    install forwarding pointer in old location
    for each reference field f:
        copy.f ← copy(o.f)
    return copy

Advantages:

  • O(L) cost where L = live data (proportional to what’s copied, not heap size)
  • Bump-pointer allocation is the fastest possible allocation
  • Automatically compacts heap (no fragmentation)
  • Single traversal (no separate sweep)

Disadvantages:

  • Uses only half of heap at any time
  • Must update every pointer in the live set

Mark-and-Compact

Retains objects in place (no pointer duplication) but rearranges them to eliminate fragmentation:

  1. Mark reachable objects (like mark-and-sweep)
  2. Compute new addresses for all live objects (compacted order)
  3. Update all references to use new addresses
  4. Move objects to their new addresses

Cost: O(H) with a high constant factor (3+ passes over the heap).

Generational Garbage Collection

Generational hypothesis: Most objects die young — a freshly allocated object has a much higher probability of becoming garbage quickly than one that has survived several GC cycles.

Structure:

  • Young generation (nursery): Small, frequently collected
  • Old generation (tenured space): Large, rarely collected

Young GC (minor collection):

  • Copy-collect the nursery: survivors are promoted to old generation
  • Very fast: most young objects are dead, so almost nothing is copied

Old GC (major/full collection):

  • Mark-and-sweep or mark-and-compact on old generation
  • Rare, but more expensive

Write barrier: When an old object’s reference field is written with a young object’s pointer, record this in a remembered set so the young GC can find all roots (not just stack/statics).

Comparison

GC AlgorithmAllocationGC CostFragmentationProsCons
Mark-SweepFree listO(H)YesSimpleFragmentation
Semispace CopyingBump pointerO(L)NoFast alloc, no frag½ heap wasted
Mark-CompactFree listO(H)NoFull heap, no fragHigh constant
GenerationalBump pointerO(L) mostlyVariesFast in practiceComplex

Optimization

Peephole Optimization

Peephole optimization applies local rewrite rules to short sequences of instructions, replacing inefficient patterns with better equivalents. It is applied as a final pass over generated assembly.

Example patterns:

; Pattern: redundant load after store
mov [ebp-4], eax
mov eax, [ebp-4]
; Replace with:
mov [ebp-4], eax    ; (eax already has the value)

; Pattern: complex addressing folding
mov  eax, ebp
add  eax, -8
mov  eax, [eax]
; Replace with:
mov  eax, [ebp - 8]

; Pattern: jump to jump
jmp  label1
label1: jmp label2
; Replace with:
jmp  label2

Instruction Selection by Tree Tiling

A more principled approach to code generation: tree tiling maps AST subtrees to single machine instructions that implement multiple operations at once.

Each instruction is a “tile” — a tree pattern with a cost (latency or code size). The goal is to cover the AST with non-overlapping tiles of minimum total cost.

Example tiles for load:

Tile 1: [eax] → leaf          cost: 1
  Implements: mov eax, [eax]

Tile 2: [eax + imm] → leaf    cost: 1
  Implements: mov eax, [eax + const]

Tile 3: [eax + ebx*4] → leaf  cost: 1
  Implements: mov eax, [eax + ebx*4]

Optimal tiling algorithm (dynamic programming, bottom-up):

for each node n of AST in bottom-up order:
    for each tile t that could root at n:
        cost(t, n) = tile_base_cost(t)
                    + Σ min_cost(child_i)  for each subtree hole in t
    select tile with minimum cost
    record tile and total cost for node n

This is an O(n) algorithm (each node visited once, constant tiles per node).

Register Allocation

After instruction selection, code may reference many “virtual registers” (temporaries). Real machines have only 8 registers. Register allocation assigns virtual registers to physical registers, spilling to memory when needed.

Key insight: Two virtual registers can share a physical register if they are never simultaneously live.

Live range: The span of instructions between a variable’s definition and last use.

Interference graph — nodes are variables, edges connect simultaneously-live variables; graph coloring assigns physical registers.

aeaxbebxcebxdecxespill4-coloring: eax, ebx (reused for b&c), ecx; e spills

Register allocation as graph coloring:

  1. Build interference graph: nodes = virtual registers; edges between simultaneously-live registers
  2. Color the graph with k colors (k = number of physical registers)
  3. Each color represents a physical register
  4. If the graph is not k-colorable, spill some registers to memory (add load/store instructions) and retry

Spilling heuristic: Prefer to spill registers that are used least frequently or have the longest live range.

Minimum registers needed (Sethi-Ullman algorithm for expressions):

\[ r[\text{node}] = \begin{cases} r_1 + 1 & \text{if } r_1 = r_2 \\ r_1 & \text{otherwise} \end{cases} \]

Evaluate the child needing more registers first to minimize peak register usage.

Example:

     +
    / \
   *   b    (b needs 1 register)
  / \
 a   c

If * needs 2 registers and b needs 1, evaluate * first (frees a register after), then b. If evaluated in the wrong order, we’d need 3 registers instead of 2.

Instruction Scheduling

Even with optimal instruction selection and register allocation, the order of instructions matters for modern pipelined CPUs. Instructions have variable latencies, and the CPU may stall while waiting for a result to be ready.

Key idea: Reorder instructions (without changing program semantics) so that independent instructions fill the gaps between dependent ones.

Constraints: The dependence graph captures which instruction orderings are semantically equivalent:

  • True dependence (RAW): Instruction A writes a register that instruction B reads — B must come after A.
  • Anti-dependence (WAR): A reads a register that B writes — B must come after A.
  • Output dependence (WAW): Both A and B write the same register — B must come after A.

List scheduling (a greedy heuristic):

1. Build dependence graph
2. Compute priorities (e.g., critical path length through the graph)
3. Maintain a "ready list" of instructions with all predecessors done
4. At each step: pick highest-priority ready instruction and emit it

Example — moving a load earlier to hide memory latency:

; Before scheduling (stalls if mov takes 4 cycles):
mov eax, [ebx]       ; load from memory (slow)
mov ecx, eax         ; stall: waits for load
add ecx, 1

; After scheduling (fill the latency gap):
mov eax, [ebx]       ; start load
mov edx, 5           ; independent: no stall
add edx, 2           ; independent: no stall
mov ecx, eax         ; now load is done, no stall
add ecx, 1

Constant Folding and Constant Propagation

Constant folding evaluates constant expressions at compile time rather than at runtime. This requires no static analysis — just pattern-matching on the AST.

3 + 4    →    7
2 * 10   →    20
true && false  →  false

Constant propagation uses reaching definitions to determine that a variable has a known constant value at a use site, then replaces the use with the constant (enabling further folding).

int x = 5;
int y = x + 3;   // reaching def of x is "x=5"; propagate → y = 5+3 → y = 8
int z = y * 2;   // y=8; propagate → z = 8*2 → z = 16

The analysis uses the reaching-definitions lattice instantiated with constant values (the constant propagation lattice: ⊥ = “undefined”, a specific constant, ⊤ = “non-constant”). Transfer functions check if a variable’s only reaching definition is a constant.

Strength Reduction

Strength reduction replaces expensive operations with cheaper equivalent ones. Classic examples:

OriginalReplacementRationale
x * 2x + x or x << 1Shift/add faster than multiply
x * 4x << 2Power-of-2 multiply → shift
x / 4x >> 2Power-of-2 divide → shift (unsigned)
x % 8x & 7Power-of-2 modulo → bitwise AND

In loop contexts, strength reduction is especially powerful. If i*4 is computed each iteration, it can be replaced with an accumulator that increments by 4 each step, eliminating the multiply entirely.

Prof’s example from lecture: Multiplying by a constant like 2 can be replaced by add eax, eax — addition is faster than imul on many pipelines, and the operation is semantically identical for integers.

Common Subexpression Elimination (CSE)

Common Subexpression Elimination uses available expressions analysis: if an expression is available at a use site (it has been computed on every path and its operands haven’t been reassigned), replace it with a saved temporary.

// Before CSE:
a[i] = a[i] + 1;
// Computes a[i]'s address twice

// After CSE:
temp = a[i];     // compute a[i] once
a[i] = temp + 1; // reuse

More generally:

x = b + c;
// ... (b and c not modified) ...
y = b + c;   // b+c is available → replace with x
y = x;

The available-expressions analysis determines exactly when this substitution is safe.


Putting It All Together: The Joos Compiler Pipeline

Compiler pipeline — stages from source to binary.

SourceLexerParserASTSemanticAnalysisIR /OptimizerCodeGenBinary.javatokensparse treeasmELF

Here is the complete data flow from source code to executable:

Source (.java)
      │
      ▼ Scanning (DFA + maximal munch)
Token stream
      │
      ▼ Parsing (LALR(1))
Concrete syntax tree
      │
      ▼ AST construction
Abstract syntax tree
      │
      ▼ Weeding
Verified AST (syntactic constraints)
      │
      ▼ Pass 1: Build global environment
Global class table
      │
      ▼ Pass 2: Resolve type names
Types resolved
      │
      ▼ Pass 3: Check class hierarchy
Hierarchy verified
      │
      ▼ Pass 4: Disambiguate names
Names classified
      │
      ▼ Pass 5: Resolve variables/fields
Declarations linked to uses
      │
      ▼ Pass 6: Type checking
Types assigned to all expressions
      │
      ▼ Pass 7: Resolve methods/instance fields
Method invocations resolved
      │
      ▼ Reachability analysis (JLS §14.20)
Unreachable code detected
      │
      ▼ Definite assignment (JLS §16)
Uninitialized reads detected
      │
      ▼ Code generation (IA-32 assembly)
      │   • Data layout
      │   • Method calls / vtables
      │   • Object allocation
      │   • Array bounds checks
      │   • Subtype testing
      ▼
NASM assembly (.s)
      │
      ▼ NASM assembler
Object file (.o)
      │
      ▼ ld linker (with Joos runtime)
Executable (ELF)

Each assignment in the course (A1–A5) implements one or two of these stages, culminating in a complete compiler for Joos 1W.

Back to top