CS 241: Foundations of Sequential Programs

Carmen Bruni

Estimated study time: 1 hr 5 min

Table of contents

Chapter 1: Introduction and Binary Encoding

What Is a Program?

When you write code in a high-level language like C++ or Java, a remarkable chain of transformations takes place before your instructions ever execute on real hardware. The source code you write is first converted into tokens by a lexer, those tokens are structured into a parse tree by a parser, the tree is checked for semantic consistency by a type checker, and finally machine instructions are emitted by a code generator. CS 241 traces this entire pipeline from beginning to end, building each component from scratch. The course also studies the theoretical machinery that underlies compilation: formal languages, automata, and grammars.

The starting point is the physical substrate — the bits that computers actually manipulate.

Bits, Nibbles, Bytes, and Words

A bit is the fundamental unit of information, capable of holding one of two values, conventionally written 0 and 1. Physically, bits are realized as tiny electrical signals, magnetic orientations, or optical states. A single bit by itself is almost useless, so bits are grouped into larger units. A nibble is 4 bits, a byte is 8 bits, and a word is a machine-dependent grouping — for the MIPS architecture studied in this course, a word is 32 bits or 4 bytes.

An important notion is the most significant bit (MSB) and the least significant bit (LSB). In a binary number, the MSB is the leftmost bit (highest positional value) and the LSB is the rightmost bit (lowest positional value). The sign bit of a signed integer is its MSB.

Unsigned Binary Integers

An unsigned binary number is the standard positional representation base 2. A sequence of bits $b_{n-1} b_{n-2} \cdots b_1 b_0$ represents the non-negative integer

\[ \sum_{i=0}^{n-1} b_i \cdot 2^i. \]

With $n$ bits you can represent $2^n$ distinct values, ranging from 0 to $2^n - 1$.

Two’s Complement

Negative integers are represented using two’s complement, the dominant scheme on modern hardware. In two’s complement with $n$ bits, a bit pattern $b_{n-1} b_{n-2} \cdots b_0$ represents the signed integer

\[ -b_{n-1} \cdot 2^{n-1} + \sum_{i=0}^{n-2} b_i \cdot 2^i. \]

The range is $-2^{n-1}$ to $2^{n-1}-1$. With 32 bits, this is $-2{,}147{,}483{,}648$ to $2{,}147{,}483{,}647$.

To negate a two’s complement number, flip all the bits (bitwise NOT) and add 1. The key advantage of two’s complement is that addition, subtraction, and multiplication work identically for signed and unsigned numbers using the same hardware circuits; overflow simply wraps around modulo $2^n$.

Hexadecimal

Writing long binary strings is tedious and error-prone, so engineers use hexadecimal (base 16) as a compact notation. Each hexadecimal digit represents exactly one nibble (4 bits), so every 32-bit MIPS word maps to exactly eight hex digits. The sixteen hex digits are 0–9 and A–F (or a–f), where A = 10, B = 11, C = 12, D = 13, E = 14, F = 15. Hexadecimal literals in assembly are prefixed with 0x.

For example, the 32-bit pattern 00000000000000000000000000001010 in binary is 0x0000000A in hex, which equals 10 in decimal.

Chapter 2: Machine Language and MIPS

Character Encoding and ASCII

Before the machine language itself, it is worth understanding ASCII (American Standard Code for Information Interchange). ASCII is a 7-bit encoding that maps characters — letters, digits, punctuation, and control characters — to integers in the range 0–127. In practice, bytes are 8 bits, so the high bit is either ignored or used for extended encodings. The capital letter A is ASCII 65, lowercase a is 97, and the digit 0 is 48. This encoding is the foundation for all string handling in the assembler.

Von Neumann Architecture

Modern computers are built on the von Neumann architecture, which stores both programs and data in the same memory. The central processing unit (CPU) contains a set of registers — small, extremely fast storage locations — and an arithmetic logic unit (ALU) for computations. The CPU fetches instructions one by one from memory, decodes them, executes them, and stores results, in a cycle known as the Fetch-Execute cycle.

For MIPS (the instruction set studied in this course), there are 32 general-purpose registers, each 32 bits wide, labelled $0 through $31. Register $0 always contains the value zero and cannot be written. Registers $1 through $31 can be used freely, though conventions attach specific roles to some of them.

MIPS Instruction Set

MIPS instructions are each 32 bits (one word) wide. Every instruction falls into one of several formats (R-type, I-type, J-type), encoding the operation, source registers, destination register, and immediate values within those 32 bits. The instructions used in CS 241 are:

Arithmetic and logic:

add $d, $s, $t — compute $s + $t, store in $d.
sub $d, $s, $t — compute $s − $t, store in $d.
mult $s, $t — signed 64-bit product, stored in the special hi:lo register pair.
div $s, $t — signed division; quotient in lo, remainder in hi.
mfhi $d — move from hi register into $d.
mflo $d — move from lo register into $d.

Load immediate and address:

lis $d — Load Immediate and Skip: load the next word in memory (the following instruction word) into $d and skip past it.

Memory access:

lw $t, i($s) — load word from address $s + i into $t.
sw $t, i($s) — store word from $t to address $s + i.

Comparison and branching:

slt $d, $s, $t — set $d = 1 if $s < $t (signed), else $d = 0.
sltu $d, $s, $t — same but unsigned comparison.
beq $s, $t, i — branch if $s = $t, adding $4i$ to the program counter (relative, in words).
bne $s, $t, i — branch if $s ≠ $t.

Jump and link:

jr $s — jump to the address stored in $s (set PC = $s).
jalr $s — jump and link: save PC + 4 in $31, then jump to $s.

The Fetch-Execute Cycle

The program counter (PC) is a special register holding the address of the next instruction to fetch. The Fetch-Execute cycle is:

Fetch the 32-bit word at memory address PC.
Increment PC by 4.
Decode and execute the instruction.
Repeat.

Because PC is incremented before execution, a beq with offset $i$ will actually jump to $\text{PC} + 4i$ relative to the instruction after the branch.

RAM and Memory Operations

MIPS memory is byte-addressed, but lw and sw require word-aligned addresses — addresses that are multiples of 4. The signed 16-bit immediate offset in lw $t, i($s) extends the range of base + offset addressing, allowing access to data within ±32 KB of the base register.

The MIPS convention for the test environment is that mips.twoints loads two integer arguments into $1 and $2, and the program should place its return value in $3. Execution ends with jr $31, which returns to the loader.

Branching, Comparison, and Loops

The slt instruction is the workhorse for conditional logic: it sets a register to 0 or 1 based on a signed comparison, and subsequent beq/bne instructions branch on whether that result equals zero. A typical loop uses a counter register decremented each iteration, with a branch back to the loop top when the counter is non-zero.

Labels in assembly are symbolic names for memory addresses, resolved by the assembler into numeric PC-relative offsets or absolute addresses.

Chapter 3: Procedures and the Stack

Why Procedures Need a Stack

Procedures (subroutines) are the building block of modular code. A single procedure call is straightforward — store the return address, jump to the callee, compute, return — but nested or recursive calls demand that multiple return addresses and local variable sets coexist simultaneously. The natural data structure for this is the call stack, a region of memory that grows and shrinks in last-in, first-out order.

In MIPS, register $30 is the stack pointer (SP). By convention, the stack grows downward: pushing a value decrements $30 first, then stores the value; popping a value loads from the current top, then increments $30. Register $31 holds the return address (RA) — the address to jump back to after a call, set automatically by jalr.

The Push and Pop Idiom

Pushing register $x onto the stack:

sw  $x, -4($30)
sub $30, $30, $4   ; $4 always holds 4

Popping into register $x:

add $30, $30, $4
lw  $x, -4($30)

Every procedure that calls another must save $31 before executing jalr, because jalr overwrites $31 with the new return address.

The Standard Procedure Template

A complete MIPS procedure follows this pattern. The caller-save vs. callee-save convention determines who is responsible for preserving register values across a call:

; --- Caller ---
sw   $31, -4($30)      ; save return address
sub  $30, $30,  $4
; ... set up arguments in registers ...
lis  $5
.word PROCEDURE
jalr $5                ; call; $31 = return address
add  $30, $30,  $4
lw   $31, -4($30)      ; restore return address

; --- Callee ---
PROCEDURE:
; save any registers the callee will overwrite
; ... body ...
; restore saved registers
jr   $31               ; return

Parameters are typically passed in registers. The convention used in this course is that $1 and $2 hold the two arguments to wain, and $3 holds the return value.

I/O Conventions

The MIPS simulator supports memory-mapped I/O. Writing a word to address 0xffff000c outputs a character to standard output (the low byte is treated as ASCII). Reading from address 0xffff0004 reads one character of input.

Chapter 4: The Assembler

What an Assembler Does

An assembler translates human-readable assembly language (mnemonics like add $3, $1, $2) into machine code (32-bit binary words). This translation is more mechanical than compilation — there is a nearly one-to-one correspondence between instructions — but several non-trivial tasks arise: resolving symbolic labels to addresses, computing PC-relative branch offsets, and handling assembler directives like .word.

Two-Pass Assembly

Labels cause a fundamental difficulty: a branch at the beginning of a file may target a label defined later. The standard solution is the two-pass assembler:

Pass 1: Scan the source file line by line, maintaining a location counter that tracks the current address (starting at 0, incrementing by 4 per instruction). Whenever a label definition is encountered, record its name and current address in the symbol table.
Pass 2: Scan again, this time translating each instruction into machine code. For any label reference, look up the address in the symbol table and substitute it (or compute a PC-relative offset for branches).

If a label is used but never defined, the assembler reports an error.

Assembling Instructions

Each MIPS instruction has a specific 32-bit encoding. For example, add $d, $s, $t is an R-type instruction:

\[ \underbrace{000000}_{6\text{ bits}} \underbrace{sssss}_{5} \underbrace{ttttt}_{5} \underbrace{ddddd}_{5} \underbrace{00000}_{5} \underbrace{100000}_{6} \]

The assembler extracts register numbers from operands, shifts them into the correct bit positions using bitwise operations, and ORs the fields together to produce the 32-bit word.

In C++, the output bytes are written one at a time using putchar after shifting and masking the 32-bit word. For a word w, the four bytes emitted (most significant first) are (w >> 24) & 0xFF, (w >> 16) & 0xFF, (w >> 8) & 0xFF, and w & 0xFF.

The `.word` Directive

The assembler directive .word X places the 32-bit value of X directly into the output at the current address, without encoding an instruction. If X is a numeric constant, its value is used directly. If X is a label, the symbol table value (an absolute address) is substituted. This distinction matters enormously for the loader and linker, as we will see later.

Chapter 5: Formal Languages and Deterministic Finite Automata

The Need for Formal Theory

The assembler’s first job — given a stream of characters, identify tokens — seems simple but hides substantial complexity. How does the assembler know that $30 is a register reference, 30 is an integer, and .word is a directive? The answer is a precise mathematical framework: formal language theory.

Basic Definitions

An alphabet $\Sigma$ is a finite non-empty set of symbols. A string (or word) over $\Sigma$ is a finite sequence of symbols from $\Sigma$. The empty string $\epsilon$ has length zero. The length $|w|$ of a string $w$ counts its symbols. The set of all strings over $\Sigma$, including $\epsilon$, is denoted $\Sigma^*$.

A language over $\Sigma$ is any subset $L \subseteq \Sigma^*$. Languages range from finite sets (like the set of MIPS mnemonics) to infinite ones (like the set of all valid C++ programs). The empty language $\emptyset$ and the language containing only the empty string $\{\epsilon\}$ are distinct.

Deterministic Finite Automata

A deterministic finite automaton (DFA) is the simplest machine that recognizes a language. Formally, a DFA is a 5-tuple $(Q, \Sigma, q_0, A, \delta)$ where:

$Q$ is a finite set of states.
$\Sigma$ is the input alphabet.
$q_0 \in Q$ is the start state.
$A \subseteq Q$ is the set of accepting states.
$\delta : Q \times \Sigma \to Q$ is the transition function.

The DFA processes input one symbol at a time, starting in state $q_0$. On reading symbol $a$ in state $q$, it moves to state $\delta(q, a)$. After consuming the entire input, the DFA accepts if and only if the current state is in $A$.

The extended transition function $\delta^* : Q \times \Sigma^* \to Q$ is defined recursively:

\[ \delta^*(q, \epsilon) = q, \qquad \delta^*(q, xa) = \delta(\delta^*(q, x), a). \]

The language recognized by a DFA $M$ is $L(M) = \{w \in \Sigma^* : \delta^*(q_0, w) \in A\}$.

Regular Languages

A language is regular if it is recognized by some DFA. Regular languages are closed under three fundamental operations:

Union: if $L_1$ and $L_2$ are regular, so is $L_1 \cup L_2$.
Concatenation: the language $L_1 L_2 = \{xy : x \in L_1, y \in L_2\}$ is regular.
Kleene star: $L^* = \{\epsilon\} \cup L \cup LL \cup LLL \cup \cdots$ is regular.

Regular expressions provide a compact notation for regular languages. A regular expression is built from atomic expressions ($\epsilon$, individual symbols) using union (|), concatenation (juxtaposition), and Kleene star ($*$). Every regular expression denotes a regular language, and every regular language has a regular expression.

Implementing a DFA

In practice, a DFA is implemented with a 2D array delta[state][character] giving the next state, and a boolean array accepting[state]. The simulation is a simple loop. For an error state (no valid transition), a dedicated dead state is used — once entered, no input can lead to acceptance.

Chapter 6: Non-Deterministic Finite Automata

The Concept of Non-Determinism

A non-deterministic finite automaton (NFA) generalizes the DFA by allowing multiple possible next states from a given state on the same input symbol — or even transitions on the empty string $\epsilon$. The NFA accepts a string if at least one path through the computation reaches an accepting state.

Formally, an NFA differs from a DFA only in its transition function: $\delta : Q \times \Sigma \to 2^Q$, mapping to the power set of $Q$ (a set of states rather than a single state).

The extended transition function for NFAs is:

\[ \delta^*(q, \epsilon) = \{q\}, \qquad \delta^*(q, xa) = \bigcup_{q' \in \delta^*(q, x)} \delta(q', a). \]

An NFA accepts $w$ if $\delta^*(q_0, w) \cap A \neq \emptyset$.

Subset Construction: NFA to DFA

Every NFA can be converted to an equivalent DFA by the subset construction (also called the powerset construction). The states of the DFA are subsets of the NFA’s state set $Q$. The DFA start state is $\{q_0\}$. The DFA transition from subset $S$ on symbol $a$ is $\bigcup_{q \in S} \delta(q, a)$. A DFA state $S$ is accepting if $S \cap A \neq \emptyset$.

This construction can in the worst case produce $2^{|Q|}$ DFA states, though in practice far fewer states are reachable.

Simulating an NFA Directly

Rather than converting, one can simulate the NFA by maintaining the current set of active states — all states the NFA could be in. On each input symbol, the active set is updated by taking the union of transitions from every active state. Acceptance checks whether any active state is accepting at the end of input.

Epsilon-NFAs

An ε-NFA further allows epsilon transitions: moves from one state to another without consuming any input symbol. The epsilon closure $E(S)$ of a set $S$ of states is the set of all states reachable from some state in $S$ using only ε-transitions (including the states in $S$ themselves).

The transition function of an ε-NFA is applied as follows: from current set $S$, on input $a$, the new set is $E\!\left(\bigcup_{q \in S} \delta(q, a)\right)$. The start set is $E(\{q_0\})$.

Kleene’s Theorem

Kleene’s theorem establishes the equivalence of all three formalisms:

A language is regular if and only if it is recognized by a DFA, if and only if it is recognized by an NFA, if and only if it is recognized by an ε-NFA, if and only if it is denoted by a regular expression.

This means that despite the apparent extra power of non-determinism and ε-transitions, all these machines recognize exactly the same class of languages — the regular languages.

The proof constructs an ε-NFA for each regular expression case by case:

For $\emptyset$: a single non-accepting state with no transitions.
For $\epsilon$: a start state that is also accepting.
For a single symbol $a$: a start state transitioning to an accepting state on $a$.
For $E_1 | E_2$: add a new start state with ε-transitions to the starts of each sub-ε-NFA.
For $E_1 E_2$: connect accepting states of $E_1$ via ε-transitions to the start of $E_2$.
For $E^*$: add a new accepting start state with an ε-loop back through the sub-ε-NFA.

Chapter 7: Scanning and Context-Free Grammars

Scanning and Maximal Munch

Scanning (also called lexical analysis or tokenization) is the process of reading a stream of characters and grouping them into tokens — the meaningful units that the parser will process. In assembly language, tokens include labels, mnemonics like add, register references like $30, integer literals, and directives like .word.

The standard approach is the maximal munch algorithm (also called longest match): always consume the longest possible sequence of characters that forms a valid token. The algorithm runs the DFA until no further transitions are possible; if the current state is accepting, the consumed characters form a token. If the current state is not accepting, backtrack to the most recent accepting state.

A simplified maximal munch algorithm avoids backtracking by ensuring the DFA is structured so that whenever the DFA gets stuck in a non-accepting state, the previous prefix was the longest match. In practice, most scanner generators use the full version with backtracking.

Context-Free Grammars

Regular languages, recognized by finite automata, cannot capture all syntactic structure. They cannot, for instance, match balanced parentheses, because checking balance requires unbounded memory. The right tool for programming language syntax is context-free grammars (CFGs).

A CFG is a 4-tuple $(N, \Sigma, P, S)$ where:

$N$ is a finite set of non-terminal symbols (the grammar variables).
$\Sigma$ is the terminal alphabet (the actual tokens; disjoint from $N$).
$P$ is a finite set of productions (rules) of the form $A \to \alpha$ where $A \in N$ and $\alpha \in (N \cup \Sigma)^*$.
$S \in N$ is the start symbol.

A production $A \to \alpha$ reads: “non-terminal $A$ can be rewritten as the string $\alpha$”. A derivation is a sequence of rewriting steps $S \Rightarrow \alpha_1 \Rightarrow \cdots \Rightarrow w$ ending in a string of terminals. We write $S \Rightarrow^* w$ for zero or more steps. The language generated by a CFG is $L(G) = \{w \in \Sigma^* : S \Rightarrow^* w\}$.

Example: Balanced Parentheses

The language of balanced parentheses (sequences where every opening paren is eventually closed) is not regular, but it is context-free:

\[ S \to \epsilon \mid (S) \mid SS \]

Starting from $S$, repeated application of rules generates all balanced strings, including $\epsilon$, (), (()), ()(), and so on.

Context-Sensitive Grammars

CFGs are a special case of context-sensitive grammars (CSGs). A CSG allows productions of the form $\alpha A \beta \to \alpha \gamma \beta$, meaning $A$ can only be rewritten in the context of specific surrounding symbols. CSGs generate a strictly larger class of languages than CFGs, but they are harder to parse efficiently. For the purpose of programming language syntax, CFGs suffice.

Chapter 8: Ambiguity and Top-Down Parsing

Parse Trees and Derivations

Given a CFG, a parse tree (or derivation tree) for a string $w$ is a tree where:

The root is labelled with the start symbol $S$.
Each internal node is labelled with a non-terminal $A$, and its children represent the right-hand side of some production for $A$.
The leaves, read left to right, yield $w$.

A parse tree compactly captures the grammatical structure without committing to a particular order of derivation steps. Two important derivation strategies are:

Leftmost derivation: always expand the leftmost non-terminal.
Rightmost derivation: always expand the rightmost non-terminal.

Both strategies produce the same parse tree for an unambiguous grammar.

Ambiguity

A CFG is ambiguous if some string in $L(G)$ has more than one parse tree (equivalently, more than one leftmost derivation). Ambiguity is a serious problem for compilers: if a program’s meaning depends on which parse tree is chosen, the compiler must pick one and the programmer cannot predict the result.

A classical example is the grammar for arithmetic without explicit precedence:

\[ E \to E + E \mid E \times E \mid a \mid b \mid c \]

The string $a + b \times c$ has two parse trees — one where addition is applied first and one where multiplication is. Correct interpretation requires multiplication to bind tighter than addition.

Inherent ambiguity refers to languages where every grammar generating that language is ambiguous. Not all context-free languages can be made unambiguous.

Forcing Associativity and Precedence

The standard remedy for arithmetic grammars is to encode precedence and associativity directly in the grammar structure. For BEDMAS (operator precedence convention), a grammar that correctly handles addition, subtraction, multiplication, division, and parentheses is:

\[ \begin{aligned} S &\to TZ' \\ Z' &\to +TZ' \mid -TZ' \mid \epsilon \\ T &\to FT' \\ T' &\to \times FT' \mid /FT' \mid \epsilon \\ F &\to a \mid b \mid c \mid (S) \end{aligned} \]

Here $F$ (factor) has the highest precedence, $T$ (term) next, and $S$ (sum) lowest. Left-associativity is achieved by the right-recursive structure with a “prime” non-terminal accumulating operators.

Top-Down Parsing

Top-down parsing builds a parse tree from the root downward, predicting which production to apply at each step based on the current input token. The stack holds the portion of the derivation not yet matched.

The invariant is: consumed input + reverse of stack = $\alpha_i$, the current sentential form in the leftmost derivation.

Algorithm outline:

Push the start symbol $S$ onto the stack.
If the top of the stack is a terminal $a$ and the next input token is also $a$, match (pop and advance).
If the top is a non-terminal $A$, predict which production to use (based on the next input token), and replace $A$ on the stack with the right-hand side of that production.
Accept when both the stack and the input are empty.

The Predict Table and LL(1)

The key operation is prediction: which production $A \to \beta$ to use when non-terminal $A$ is on top and token $a$ is next. This is answered by the Predict function:

\[ \text{Predict}(A, a) = \{A \to \beta : a \in \text{First}(\beta)\} \cup \{A \to \beta : \beta \text{ is nullable and } a \in \text{Follow}(A)\} \]

Three auxiliary sets are required:

Nullable($A$): true if $A \Rightarrow^* \epsilon$.
First($\beta$): the set of terminals that can begin strings derived from $\beta$.
Follow($A$): the set of terminals that can immediately follow $A$ in some sentential form.

A grammar is LL(1) if for every non-terminal $A$ and terminal $a$, $|\text{Predict}(A, a)| \leq 1$. An LL(1) grammar can be parsed deterministically by a top-down parser that reads the input left to right (the first L), produces a leftmost derivation (the second L), and needs only 1 token of lookahead.

A grammar is not LL(1) if two distinct productions for the same non-terminal can be triggered by the same lookahead token — which happens when the grammar is left-recursive.

Computing Nullable, First, and Follow

The algorithms iterate to a fixed point.

Nullable (simplified, assuming no ε-production cycles):

Initialize: $\text{Nullable}(A) = \text{false}$ for all $A$.
Repeat until no change:
- If $A \to \epsilon$ exists, set $\text{Nullable}(A) = \text{true}$.
- If $A \to B_1 \cdots B_n$ and each $\text{Nullable}(B_i) = \text{true}$, set $\text{Nullable}(A) = \text{true}$.

First:

Initialize: $\text{First}(A) = \emptyset$ for all $A$.
Repeat until no change:
- For $A \to a \alpha$ (terminal first): add $a$ to $\text{First}(A)$.
- For $A \to B_1 \cdots B_n$: add $\text{First}(B_i)$ for each $i$ up to and including the first non-nullable $B_i$.

Follow:

Initialize: $\text{Follow}(S) = \{\texttt{\`{}}\}$ (or end-of-input marker), $\text{Follow}(A) = \emptyset$ otherwise.
Repeat until no change:
- For $A \to \alpha B \beta$: add $\text{First}(\beta)$ to $\text{Follow}(B)$.
- For $A \to \alpha B \beta$ where $\beta$ is nullable: add $\text{Follow}(A)$ to $\text{Follow}(B)$.

Left Recursion and How to Remove It

Left-recursive grammars — where $A \Rightarrow^+ A\alpha$ — are incompatible with LL(1) parsing. The parser would loop forever trying to predict $A$ when $A$ is on top.

The standard remedy is to convert direct left recursion $A \to A\alpha \mid \beta$ (where $\beta$ does not start with $A$) into equivalent right-recursive rules:

\[ A \to \beta A', \qquad A' \to \alpha A' \mid \epsilon. \]

However, note that this changes associativity: the original left-recursive grammar gave left-associative operators, while the right-recursive form gives right-associative evaluation. This is why bottom-up parsing is preferred for languages with left-associative operators.

Left Factoring

Even right-recursive grammars can fail to be LL(1) when two productions share a common prefix. For example, $S \to T+S \mid T$ has both productions beginning with $T$. Left factoring factors out the common prefix:

\[ S \to TZ', \qquad Z' \to +S \mid \epsilon. \]

The resulting grammar has $\text{Predict}(Z', +) = \{Z' \to +S\}$ and $\text{Predict}(Z', a) = \{Z' \to \epsilon\}$, which is unambiguous.

The central takeaway is that LL(1) grammars are incompatible with left-associative constructs, and this is why compilers for languages with left-associative operators typically use bottom-up parsing instead.

Chapter 9: Bottom-Up Parsing

The Shift-Reduce Idea

Bottom-up parsing works in the opposite direction from top-down: it starts with the input string and attempts to reduce it to the start symbol, reversing the derivation steps. At each step, the parser examines what has been accumulated on a stack and decides whether to shift (push the next input token onto the stack) or reduce (recognize that the top of the stack matches the right-hand side of some production, and replace those symbols with the corresponding left-hand side non-terminal).

The invariant maintained is: $\text{Stack} + \text{Unread Input} = \alpha_i$, the current step in a rightmost derivation read right to left.

The parser accepts when the stack contains only the augmented start symbol $S'$ and the input is exhausted.

Items and the LR(0) Automaton

The theoretical foundation for bottom-up parsing comes from Knuth’s 1965 paper On the Translation of Languages from Left to Right. Knuth proved that the set of viable prefixes — all possible stack contents that arise during valid parses — form a regular language. This means a DFA can be built to track where the parser is in the derivation.

An item is a production with a dot $\bullet$ somewhere in its right-hand side, indicating how much of that production has been matched so far. For example, $S \to S \bullet + T$ means $S$ has been seen and the parser expects $+$ next, followed by $T$.

LR(0) automaton construction:

Begin with the item $S' \to \bullet \,\texttt{`}\, S \,\texttt{a}$ as the start state.
From each state, for each item $A \to \alpha \bullet X \beta$, add a transition on $X$ to a new state containing $A \to \alpha X \bullet \beta$.
Closure: if an item $A \to \alpha \bullet B \beta$ appears in a state (dot before a non-terminal $B$), add all items $B \to \bullet \gamma$ for every production $B \to \gamma$ to that state. Repeat until no new items can be added.

The resulting DFA has states that are sets of items. Converting this NFA to a DFA via subset construction gives the LR(0) parsing automaton.

Using the LR(0) Automaton

The parser maintains a stack of both grammar symbols and DFA states (to avoid recomputing the DFA state after a reduction).

Shift: take the next input token, push it (and the new DFA state) onto the stack.
Reduce: if the current DFA state is a reduce state — one containing a single item with the dot at the rightmost position — pop as many elements from the stack as there are symbols in the right-hand side of that production (and backtrack that many DFA states), then push the left-hand side non-terminal and follow the DFA transition for that non-terminal.
Accept: when the stack contains only the start state and $S'$, and the input is empty.

Worked Example

Consider the grammar:

\[ \begin{aligned} S' &\to \texttt{`}\, S \,\texttt{a} & (0)\\ S &\to S + T & (1)\\ S &\to T & (2)\\ T &\to d & (3) \end{aligned} \]

Processing the input ` d + d + d a:

Stack	Read	Remaining	Action
0		` d+d+d a	Shift `, go to state 1
0 ` 1	`	`d+d+d a`	Shift `d`, go to state 5
0 ` 1 `d` 5		`+d+d a`	Reduce $T\to d$; pop 1, push $T$, go to 4
0 ` 1 $T$ 4		`+d+d a`	Reduce $S\to T$; pop 1, push $S$, go to 2
0 ` 1 $S$ 2		`+d+d a`	Shift `+`, go to state 6
…	…	…	(continue shifting and reducing)
0 ` 1 $S$ 2 `a` 3			Reduce $S'\to\texttt{`}Sa$; push $S'$
0 $S'$			Accept

Shift-Reduce and Reduce-Reduce Conflicts

The LR(0) approach can fail in two ways:

Shift-reduce conflict: a state contains both a shift item $A \to \alpha \bullet a\beta$ and a reduce item $B \to \gamma \bullet$. The parser cannot decide whether to shift $a$ or reduce $B \to \gamma$.
Reduce-reduce conflict: a state contains two reduce items $A \to \alpha \bullet$ and $B \to \gamma \bullet$. The parser cannot decide which to reduce.

A grammar is LR(0) if its LR(0) automaton has neither type of conflict.

SLR(1) Parsing

Many grammars that are not LR(0) can still be parsed deterministically by adding one token of lookahead to resolve conflicts. The Simple LR(1) (SLR(1)) strategy attaches $\text{Follow}(A)$ to each reduce item $A \to \alpha \bullet$: only reduce by this production when the next input token is in $\text{Follow}(A)$.

In the example above, changing rule (1) to $S \to T + S$ (right-recursive) creates a shift-reduce conflict in the state containing both $S \to T \bullet + S$ and $S \to T \bullet$. SLR(1) resolves this: since $\text{Follow}(S) = \{\texttt{a}\}$, only reduce $S \to T$ when the lookahead is a. When the lookahead is +, shift.

LR(1) and LALR(1)

LR(1) parsing uses a more precise lookahead. Instead of attaching the full $\text{Follow}(A)$ set to reduce items, LR(1) items are pairs $[A \to \alpha \bullet \beta, \, t]$, where $t$ is the specific terminal that permits this reduction. These precise lookahead sets are computed as part of the automaton construction using $\text{First}(\beta t)$ for each item $[A \to \alpha \bullet B\beta, \, t]$ in a state.

LR(1) is strictly more powerful than SLR(1), but the automaton can have exponentially more states. LALR(1) (Lookahead LR) is a practical compromise: it merges LR(1) states that have the same item sets but different lookaheads, producing an automaton no larger than the SLR(1) automaton but with more precise lookaheads. Tools like Yacc and Bison use LALR(1).

Knuth proved that any LR(k) grammar for $k > 1$ can be converted to an equivalent LR(1) grammar. This means LR(1) grammars are the most general class parseable by efficient deterministic bottom-up methods.

The Grammar Hierarchy

The inclusions among grammar classes are strict: LL(1) ⊂ SLR(1) ⊂ LALR(1) ⊂ LR(1), and all are contained in the context-free grammars. Left-recursive grammars, which are not LL(1), are typically LR(0) or SLR(1), making bottom-up parsing far more expressive for natural programming language constructs.

Chapter 10: Type Checking

From Parse Trees to Meaning

Once a parse tree has been produced, the context-sensitive analysis phase checks properties that grammars cannot express: are all variables declared? Are they used with the correct types? Is a function called with the right number and types of arguments? These checks require information that persists across the tree — collected in a symbol table.

Symbol Tables

A symbol table maps identifiers to their attributes. For WLP4 (the subset of C++-like language studied in this course), each symbol table entry records at minimum the identifier’s name and its type (int or int*). For procedures, a nested symbol table structure is used: a global table maps procedure names to their signatures, and within each procedure, a local table maps its parameters and local variables to types.

Declaration errors that the symbol table catches include:

Declaring the same variable twice in the same scope.
Using a variable that has not been declared.
Calling a procedure with the wrong number of arguments.
Defining the same procedure name twice.

Type System for WLP4

WLP4 has two types: int (a 32-bit signed integer) and int* (a pointer to an integer). The type checker uses inference rules written in a formal notation borrowed from logic. A rule

\[ \frac{\text{premises}}{\text{conclusion}} \]

means: if all premises hold, then the conclusion holds.

Atomic Types

A numeric literal has type int: $\dfrac{}{\texttt{NUM} : \text{int}}$
NULL has type int*: $\dfrac{}{\texttt{NULL} : \text{int*}}$
An identifier declared with type $\tau$ has type $\tau$: $\dfrac{\langle \text{id.name}, \tau\rangle \in \text{declarations}}{\text{id} : \tau}$

Expressions

Parentheses preserve type. The address-of operator applied to an int yields int*. Dereferencing int* yields int. Allocating a new array of size int yields int*:

\[ \frac{E : \tau}{(E) : \tau} \quad \frac{E : \text{int}}{\&E : \text{int*}} \quad \frac{E : \text{int*}}{*E : \text{int}} \quad \frac{E : \text{int}}{\texttt{new int}[E] : \text{int*}} \]

Arithmetic

Multiplication, division, and modulo require both operands to be int and produce int. Addition and subtraction are overloaded:

\[ \frac{E_1 : \text{int} \quad E_2 : \text{int}}{E_1 + E_2 : \text{int}} \quad \frac{E_1 : \text{int*} \quad E_2 : \text{int}}{E_1 + E_2 : \text{int*}} \quad \frac{E_1 : \text{int} \quad E_2 : \text{int*}}{E_1 + E_2 : \text{int*}} \]\[ \frac{E_1 : \text{int} \quad E_2 : \text{int}}{E_1 - E_2 : \text{int}} \quad \frac{E_1 : \text{int*} \quad E_2 : \text{int}}{E_1 - E_2 : \text{int*}} \quad \frac{E_1 : \text{int*} \quad E_2 : \text{int*}}{E_1 - E_2 : \text{int}} \]

Pointer subtraction (two int* operands) gives the integer distance between them.

Procedure Calls

A procedure call is well-typed if the procedure is declared with matching argument types:

\[ \frac{\langle f, \tau_1, \ldots, \tau_n\rangle \in \text{declarations} \quad E_1 : \tau_1 \quad \cdots \quad E_n : \tau_n}{f(E_1,\ldots,E_n) : \text{int}} \]

Well-Typedness of Statements

An assignment is well-typed when both sides have the same type. print requires int. delete[] requires int*. Comparisons require both sides to have the same type (either both int or both int*). An if or while statement is well-typed when its test and body are each well-typed:

\[ \frac{\text{well-typed}(T) \quad \text{well-typed}(S_1) \quad \text{well-typed}(S_2)}{\text{well-typed}(\texttt{if } (T) \{S_1\} \texttt{ else } \{S_2\})} \]

lvalues

The left-hand side of an assignment must be an lvalue — an expression that denotes a storage location rather than a computed value. In WLP4, valid lvalues are variable names, dereferenced pointers, and parenthetical combinations of these. The literal 5 = x is rejected because 5 is not a storage location.

Chapter 11: Code Generation

From Parse Trees to MIPS

Code generation is the final translation step: given an augmented parse tree annotated with types and a symbol table, emit MIPS assembly instructions. By this stage, the program is known to be syntactically and semantically correct.

The guiding principle is a syntax-directed translation: for each grammar rule, define how to generate code from the codes of its sub-expressions. This recursive structure mirrors the parse tree.

Variable Storage on the Stack

Storing all variables in registers fails for recursive code — registers are a finite global resource. The solution is to keep variables on the call stack.

The stack pointer $30 points to the top of the stack, growing downward. But $30 changes as intermediate results are pushed and popped during expression evaluation, so storing variable offsets relative to $30 would require constantly updating the symbol table.

The fix is the frame pointer $29: set once at procedure entry to point to the bottom of the current stack frame. Since $29 is fixed for the lifetime of the procedure, variable offsets from $29 remain constant regardless of how much $30 moves during expression evaluation.

Register Conventions

The register conventions used throughout code generation are:

$0 — always 0.
$1, $2 — arguments 1 and 2 to wain.
$3 — output / scratch (result of sub-expressions).
$4 — always 4 (the word size).
$5 — additional scratch register.
$10 — address of the print runtime function.
$11 — always 1 (used for NULL and boolean complement).
$29 — frame pointer.
$30 — stack pointer.
$31 — return address.

Prologue and Epilogue

Every wain function begins with a prologue and ends with an epilogue.

Prologue:

.import print
lis  $4
.word 4
lis  $10
.word print
lis  $11
.word 1
sub  $29, $30, $4   ; frame pointer = top of frame
sw   $1,  -4($30)   ; push first parameter
sub  $30, $30, $4
sw   $2,  -4($30)   ; push second parameter
sub  $30, $30, $4
; (push initial values of local variables)

Epilogue:

add  $30, $29, $4   ; restore stack pointer (pop entire frame)
jr   $31

Symbol Table with Offsets

The symbol table stores each variable’s offset from $29. For wain(int a, int b) with local variable c:

Symbol	Type	Offset from $29
`a`	`int`	0
`b`	`int`	−4
`c`	`int`	−8

Loading variable a into $3: lw $3, 0($29).

Code Shorthands

For clarity in describing code generation, three shorthands are used:

code(a) = lw $3, N($29) where $N$ is a’s offset in the symbol table.
push($3) = sw $3, -4($30) followed by sub $30, $30, $4.
pop($5) = add $30, $30, $4 followed by lw $5, -4($30).

Arithmetic Expressions

Arithmetic is evaluated recursively, using the stack for intermediate results. For $E_1 + E_2$ where both have type int:

code(E1 + E2) =
    code(E1)
    push($3)
    code(E2)
    pop($5)
    add $3, $5, $3

For subtraction, use sub $3, $5, $3 (so $5 − $3 = left − right). This convention that push/pop are used to store the first operand while computing the second ensures only two scratch registers are ever needed simultaneously.

Printing and the Runtime Environment

The WLP4 runtime provides a print procedure (imported from print.merl) that outputs the integer in $1. Calling it requires saving $1 and $31.

code(println(expr);) =
    push($1)
    code(expr)
    add  $1, $3, $0
    push($31)
    lis  $5
    .word print
    jalr $5
    pop($31)
    pop($1)

Boolean Tests and Branching

WLP4 has no boolean type; tests are expressions in if and while conditions. The slt instruction (set-less-than) provides the necessary machinery. Comparisons produce 0 (false) or 1 (true) in $3.

exprA < exprB: code(exprA), push($3), code(exprB), pop($5), slt $3, $5, $3.
exprA > exprB: same but slt $3, $3, $5.
exprA != exprB: compute slt $6, $3, $5 and slt $7, $5, $3, then add $3, $6, $7. At least one is 1 iff they are unequal.
exprA == exprB: compute as !=, then sub $3, $11, $3 to flip 0 and 1 (since $11 = 1).
exprA <= exprB: equivalent to !(exprA > exprB).
exprA >= exprB: equivalent to !(exprA < exprB).

if Statements

code(if (test) {stmts1} else {stmts2}) =
    code(test)
    beq  $3, $0, else_N
    code(stmts1)
    beq  $0, $0, endif_N
else_N:
    code(stmts2)
endif_N:

A counter N must be maintained so that nested if statements get unique labels.

while Statements

code(while (test) {stmts}) =
loop_N:
    code(test)
    beq  $3, $0, endWhile_N
    code(stmts)
    beq  $0, $0, loop_N
endWhile_N:

Pointers and NULL

NULL is represented as 0x1 (not 0x0) because 0x0 is a valid MIPS memory address. The value 0x1 is not word-aligned, so any attempt to dereference NULL via lw or sw will immediately crash the MIPS simulator with an alignment error. Since $11 holds 1, code(NULL) = add $3, $0, $11.

Pointer Arithmetic

Pointer arithmetic obeys the type rules: adding an int offset to an int* must scale the offset by 4 (since each integer occupies 4 bytes). The code generator checks types and emits the appropriate scaling.

int* + int: multiply the int by 4 (mult $3, $4; mflo $3), then add to the pointer.
int + int*: same but the operand positions differ.
int* - int: scale the int by 4 and subtract.
int* - int*: subtract directly; result is an integer count of elements (in words), so divide by 4 (div $5, $3; mflo $3).

Heap Allocation and Deallocation

WLP4’s new int[E] and delete[] E are handled by the alloc.merl runtime library, which provides init, new, and delete procedures. The new procedure receives the requested size in $1 and returns the allocated address in $3. The delete procedure receives the address in $1. Both must be called with $31 saved.

Procedures: Caller and Callee Conventions

When one WLP4 procedure calls another, coordination is needed to avoid clobbering live values.

Callee’s responsibilities (save-restore): save the frame pointer $29 and return address $31 at the start, allocate stack space for local variables, and restore $29 and $31 before returning.

Caller’s responsibilities: push arguments onto the stack in order, jump to the callee via jalr, and clean up the stack after the call.

The function label for a WLP4 procedure named f is prefixed with F (e.g., Ff) to avoid conflicts with assembler-reserved words.

Compiler Optimizations

Once a correct code generator is working, several optimizations can reduce the size and speed of the output:

Constant folding: evaluate constant expressions at compile time. For example, 3 + 4 becomes 7 rather than instructions to add.
Constant propagation: if a variable is assigned a constant and never modified, replace uses of the variable with the constant.
Dead code elimination: remove code that can never be reached or whose results are never used.
Common subexpression elimination: avoid recomputing the same expression twice; store the result in a register.
Register allocation: assign frequently used variables to registers (avoiding repeated lw/sw). Techniques like graph colouring allocate registers globally.
Strength reduction: replace expensive operations with cheaper ones (e.g., multiplication by 2 with a left shift).
Function inlining: replace a function call with the function’s body when the callee is small.
Tail recursion: if a recursive call is the last thing a procedure does, it can be converted to a loop (jump to the top of the function with updated arguments), avoiding stack growth.

Chapter 12: Heap Management

The Heap

Programs need memory that lasts longer than a single function call and whose size may not be known at compile time. This memory is allocated from the heap, a region of memory separate from the stack. Unlike the stack, which grows and shrinks in perfect LIFO order, the heap requires explicit allocation and deallocation (in C++ via new/delete) or automatic management (in Java via garbage collection).

The challenge of heap management is twofold: efficiently fulfilling allocation requests of varying sizes, and reclaiming freed memory to avoid fragmentation — a state where enough total free memory exists but no single contiguous block is large enough to satisfy a request.

Explicit Memory Management: Free Lists

The simplest heap allocator maintains a free list: a linked list of free blocks. Each free block contains a header recording its size and a pointer to the next free block.

Allocation walks the free list to find a block large enough for the request. The allocator may use:

First fit: use the first block that is large enough.
Best fit: search the entire list and use the smallest block that fits (reduces wasted space but is slower).

Deallocation adds the freed block back to the free list. Naive deallocation causes external fragmentation: many small free blocks scattered through memory. Block coalescing (or consolidation) merges adjacent free blocks into a single larger block at deallocation time.

One byte of bookkeeping overhead per allocation is typical — the allocator prepends the size so that delete knows how much to free.

The Binary Buddy System

The binary buddy system is a sophisticated allocator that makes coalescing efficient. The heap is divided into power-of-two sized blocks. When a block of size $2^k$ is split to satisfy an allocation, it produces two buddies of size $2^{k-1}$. Deallocation checks whether the freed block’s buddy is also free; if so, the two buddies merge into a block of size $2^k$. This merging propagates upward until either the buddy is allocated or the full heap is reconstituted.

Example: Starting with 512 bytes, allocating 19 bytes (which rounds up to 32, including overhead) proceeds by halving: 512 → 256 | 256 → 128 | 128 → 64 | 64 → 32 | 32. The 32-byte block is used. Allocating 63 bytes (exact 64-byte block) uses the 64-byte buddy of the first allocation. Freeing both blocks allows the pair of 32s to merge into 64, then the pair of 64s to merge into 128, and so on until the full 512 bytes is restored.

The buddy system guarantees fast coalescing at the cost of internal fragmentation (a 33-byte request wastes 31 bytes in a 64-byte block).

Automatic Memory Management

Languages like Java relieve programmers of explicit deallocation. The garbage collector automatically identifies and reclaims memory that is no longer reachable from the program. Three classical techniques are:

Reference Counting

Each heap block carries a reference count — the number of pointers currently pointing to it. When a pointer is reassigned, the old block’s count is decremented (and the block is freed if it reaches zero), and the new block’s count is incremented.

Reference counting is simple and reclaims memory promptly, but it cannot handle cyclic structures: two blocks pointing to each other both have reference count ≥ 1, even if no other reference to the cycle exists.

Mark and Sweep

Mark-and-sweep operates in two phases. The mark phase starts from roots (stack-allocated pointers and global variables) and traverses all reachable heap blocks, marking each one. The sweep phase then scans the entire heap and reclaims any block that is not marked. This is essentially a graph traversal (BFS or DFS) followed by a linear scan.

Mark-and-sweep correctly handles cycles (unreachable cycles are simply not marked) but introduces pause times when the collector runs.

Copying Collector

The copying collector divides the heap into two halves, $H_1$ and $H_2$. Allocation proceeds in $H_1$. When $H_1$ is full, all live objects are copied from $H_1$ into $H_2$ compactly (without gaps). The roles of $H_1$ and $H_2$ are then swapped, and allocation resumes in the new active half.

Copying collection eliminates external fragmentation (live objects are always contiguous) and makes allocation trivially fast (just bump a pointer). The downside is that only half the heap is usable at any time. Variants divide the heap into more regions (e.g., generational collectors separate short-lived from long-lived objects) to recover efficiency.

Chapter 13: Loaders and Linkers

Why a Loader Is Needed

A simple approach to running a program is to always load it at address 0x0. The simulated OS might look like:

repeat:
    P ← next program to run
    copy P into memory at 0x0
    jalr $0
    beq $0, $0, repeat

This fails immediately because the operating system itself is a program that must reside in memory. If the OS is at 0x0 and every user program is also placed at 0x0, they collide.

The Loader’s Job

The loader is given a program $P$ and must:

Find a free region of memory of appropriate size.
Copy the program’s instructions and data to that region, starting at address $\alpha$.
Return $\alpha$ to the OS so the OS can jump to it.

The revised OS loop becomes:

repeat:
    P ← next program to run
    $3 ← loader(P)
    jalr $3
    beq $0, $0, repeat

The loader’s core algorithm reads $k$ words of machine code and copies them to $\text{MEM}[\alpha + 4i]$ for $i = 1, \ldots, k$, sets the stack pointer $30 to $\alpha + 4n$ (where $n = k + \text{stack space}$), and returns $\alpha$.

The Relocation Problem

Loading at an arbitrary address $\alpha$ introduces a subtle problem: any .word label directive in the source code was assembled assuming the program starts at 0x0. When the program is loaded at $\alpha$, these label references are wrong by $\alpha$ bytes and must be corrected by adding $\alpha$.

However, not all words in the program need relocation:

.word constant (a literal number) — do not relocate.
Branch offsets in beq/bne — do not relocate (they are PC-relative).
.word label (an absolute address) — must add $\alpha$.

Given a raw binary word like 0x00000018, the loader cannot tell whether it is a constant or a label address. This is the fundamental motivation for object code formats.

MERL: MIPS Executable Relocatable Linkable

Instead of outputting pure machine code, the assembler outputs a MERL file. A MERL file has three sections:

Header (three words): a cookie 0x10000002 (a beq instruction that skips the header), followed by the total file length in bytes, followed by the code length plus 12 (the header length).
Code section: the MIPS machine code, with label-address .word entries temporarily set to their 0x0-relative values.
Relocation table: a list of entries, each consisting of a format code followed by data. Format code 0x1 means “relocate the word at this address” — the following word is the location within the code section of a .word label that needs $\alpha$ added.

The Loader Relocation Algorithm

read_line()                  // skip cookie
endMod  ← read_line()        // total file length
codeLen ← read_line() - 12  // code length (no header)
alpha   ← findFreeRAM(codeLen)
for i from 0 to codeLen-1 step 4:
    MEM[alpha + i] ← read_line()
i ← codeLen + 12            // start of relocation table
while i < endMod:
    format ← read_line()
    if format == 1:
        rel ← read_line()
        MEM[alpha + rel - 12] += alpha - 12
    else: ERROR
    i += 8

The expression alpha + rel - 12 accounts for the fact that the relocation table gives addresses relative to the start of the MERL file (including the 12-byte header), but the code was loaded without the header.

Caveat: Code that hardcodes addresses numerically (e.g., lis $2; .word 12; jr $2) cannot be relocated because the loader has no way to know to update those values. Programmers must always use labels for any address that needs relocatability.

Linking: Multiple Files

Real programs span multiple source files. A function defined in b.asm may be called from a.asm using .word label where label is defined only in b.asm. This situation requires a linker.

The naive approach — concatenate source files and assemble together — works but is wasteful: every change to one file forces all files to be reassembled. The better approach is to assemble each file separately into a MERL file, then use the linker to combine the MERL files into a single executable MERL file.

External Symbol References and Definitions

To support cross-file label references, MERL is extended with two new entry types in the symbol table:

External Symbol Reference (ESR), format code 0x11: indicates that a .word label in this file refers to a label defined in another file. When the assembler encounters .word banana where banana is not defined locally, it:

Requires a .import banana directive (which triggers an error if absent, catching typos).
Places 0x0 as a placeholder in the code.
Adds an ESR entry to the symbol table recording the location of the placeholder and the name of the symbol.

External Symbol Definition (ESD), format code 0x05: indicates that a label in this file is exported (made available for other files to link against). The .export label directive triggers creation of an ESD entry recording the label’s name and its address.

An ESR entry contains:

Format code 0x11
Location where the symbol is used
Length of the symbol name
ASCII characters of the symbol name (one per word)

An ESD entry contains:

Format code 0x05
Address the symbol represents
Length of the symbol name
ASCII characters of the symbol name

The Linker Algorithm

Linking two MERL files $m_1$ and $m_2$ proceeds in three phases.

Phase 1 — Relocate $m_2$: Let $\alpha = m_1.\text{codeLen} - 12$. Relocate $m_2$ by $\alpha$ (add $\alpha$ to every address in $m_2$’s code and symbol table). Check that the export label sets of $m_1$ and $m_2$ are disjoint (duplicate label definitions are an error).

Phase 2 — Resolve imports: For each ⟨addr_1, label⟩ in $m_1$’s imports: if label appears in $m_2$’s exports at address $addr_2$, write $addr_2$ into $m_1.\text{code}[addr_1]$ and move $addr_1$ from $m_1$’s imports to $m_1$’s relocates. Symmetrically, for each ⟨addr_2, label⟩ in $m_2$’s imports resolved by $m_1$’s exports.

Phase 3 — Output: Emit the MERL cookie 0x10000002, then the combined file length, then the combined code length + 12, then $m_1$’s code, then $m_2$’s code, then the union of all remaining (unresolved) imports, all exports, and all relocates.

After linking, any unresolved imports remain in the output MERL file’s symbol table, allowing further linking with additional files (e.g., print.merl).

The Complete Toolchain

The full pipeline from WLP4 source to executable is:

./wlp4gen < source.wlp4i > source.asm   # Code generation (your compiler)
cs241.linkasm < source.asm > source.merl # Assemble to MERL
linker source.merl print.merl > exec.mips # Link with runtime
mips.twoints exec.mips                   # Run

This chain encapsulates the entire journey covered in CS 241: from the binary representation of data, through the formal theory of languages, automata, and grammars, through parsing and type checking, to code generation, heap management, loading, and linking. The result is a complete understanding of how a high-level program is transformed, step by step, into instructions that execute on real hardware.