CO 331: Coding Theory

Alfred Menezes

Estimated study time: 54 minutes

Table of contents

Based on lecture notes by Sibelius Peng — PDF source

Sources and References

Primary textbook — Huffman, W.C. and Pless, V. Fundamentals of Error-Correcting Codes. Cambridge University Press, 2003. Supplementary texts — MacWilliams, F.J. and Sloane, N.J.A. The Theory of Error-Correcting Codes. North-Holland, 1977. Lin, S. and Costello, D.J. Error Control Coding: Fundamentals and Applications, 2nd ed. Pearson, 2004. Online resources — MIT 6.895 Essential Coding Theory (Guruswami–Sudan notes). Stanford EE387 Information Theory and Coding. CMU 15-853 Algorithms in the Real World (coding modules). Essential Coding Theory survey by Guruswami, Rudra, Sudan (freely available). Lecture notes — Peng, S. CO 331 Lecture Notes (Winter 2021). https://pdf.sibeliusp.com/1211/co331.pdf

Chapter 1: Codes, Channels, and the Problem of Reliable Transmission

The central problem of coding theory is disarmingly simple to state: Alice wants to send a message to Bob, but the channel between them is noisy. Some bits may flip in transit. How should Alice encode her message so that Bob can recover it exactly, even after the channel has introduced errors?

The answer is redundancy. By adding carefully chosen extra bits to each message, we create a codeword that carries more information than strictly necessary — and that extra structure lets Bob detect and correct errors. The art of coding theory lies in designing codes that are simultaneously efficient (not too much redundancy), powerful (can correct many errors), and fast (encoding and decoding run in polynomial time).

The Communication Model

The canonical model has four components: a source produces a message \(m\) from a message space \(\mathcal{M}\); an encoder maps \(m\) to a codeword \(c\) in a code \(\mathcal{C} \subseteq \Sigma^n\) over an alphabet \(\Sigma\) of size \(q\); a noisy channel perturbs \(c\) to a received word \(r\); and a decoder recovers a message \(\hat{m}\) from \(r\). Successful communication means \(\hat{m} = m\) with high probability.

The most studied case is the binary symmetric channel (BSC), where \(\Sigma = \{0,1\}\) and each bit is flipped independently with probability \(p < 1/2\). Over a BSC, a good code must be able to handle up to \(t\) flips for some meaningful \(t\). Generalising, the \(q\)-ary symmetric channel flips each symbol to a uniformly random different symbol with probability \(p\).

\[ d_H(c, c') = |\{i : c_i \neq c'_i\}|. \]

The minimum distance is the key parameter governing a code’s error-correction capability. Geometrically, the Hamming distance makes \(\Sigma^n\) into a metric space, and each codeword sits at the centre of a Hamming ball. If two codewords are far apart, their Hamming balls don’t overlap, and errors that fall within one ball can be uniquely attributed to the nearest codeword.

Error Detection and Correction

Theorem (Error-correcting and detecting capabilities). An \((n, M, d)_q\) code \(\mathcal{C}\):

Detects all error patterns of weight up to \(d - 1\).
Corrects all error patterns of weight up to \(\lfloor (d-1)/2 \rfloor\) using nearest-neighbour decoding (MLD under the BSC).

The proof is a clean geometric argument: detecting \(d - 1\) errors works because any codeword perturbed by at most \(d - 1\) positions cannot coincide with another codeword. Correcting \(t = \lfloor (d-1)/2 \rfloor\) errors works because Hamming balls of radius \(t\) around distinct codewords are disjoint — a received word within distance \(t\) of some codeword is closer to that codeword than to any other.

Nearest-neighbour decoding finds the codeword \(c^* = \arg\min_{c \in \mathcal{C}} d_H(r, c)\). For an \([n,k,d]_2\) code this is optimal under uniform error probability, but naive search takes time \(O(2^k \cdot n)\), which is intractable for large codes. Much of algorithmic coding theory is devoted to efficient decoding.

The Singleton Bound and Optimal Codes

Not all triples \((n, M, d)\) are achievable — the parameters are constrained by fundamental bounds. The cleanest is:

\[ M \leq q^{n - d + 1}. \]

A code meeting this bound with equality is called a Maximum Distance Separable (MDS) code. Reed–Solomon codes, which appear in Chapter 7, are the canonical MDS family.

\[ M \cdot \sum_{i=0}^{t} \binom{n}{i}(q-1)^i \leq q^n. \]

Codes meeting this bound exactly — where the Hamming balls tile the entire space — are called perfect codes and are exceptionally rare.

Chapter 2: Finite Fields and Their Structure

Linear codes gain their power from linear algebra over finite fields. The field \(\mathbb{F}_q\) provides the alphabet and the algebraic structure that makes systematic generator matrices, parity-check matrices, and syndrome decoding possible. Before studying linear codes, we need to understand finite fields thoroughly.

Existence and Uniqueness

Theorem (Classification of finite fields). A finite field of order \(q\) exists if and only if \(q = p^m\) for some prime \(p\) and positive integer \(m\). Moreover, any two finite fields of the same order are isomorphic. The unique field of order \(q = p^m\) is denoted \(\mathbb{F}_q\) or \(\mathrm{GF}(q)\).

The proof of existence constructs \(\mathbb{F}_{p^m}\) as the splitting field of \(x^{p^m} - x\) over \(\mathbb{F}_p\). Uniqueness follows because any two fields of order \(q\) have the same characteristic \(p\), and both consist exactly of the roots of \(x^q - x = 0\), which has at most \(q\) roots in any field.

Non-existence. For \(q = 6 = 2 \cdot 3\), no field of order 6 exists. The argument: a field of order 6 would have characteristic dividing \(6\), so characteristic 2 or 3. If characteristic 2, the multiplicative group has order 5, which is prime, so the field is \(\mathbb{F}_2 \times \mathbb{F}_5\) — but this is a ring, not a field (it has zero divisors). A cleaner proof: by the classification theorem, \(q\) must be a prime power, and 6 is not.

Constructing \(\mathbb{F}_{p^m}\)

\[ \mathbb{F}_{p^m} \cong \mathbb{F}_p[x] / (f(x)). \]

Elements are polynomials of degree at most \(m - 1\) over \(\mathbb{F}_p\); addition is polynomial addition mod \(p\); multiplication is polynomial multiplication mod \(f(x)\) and \(p\). The element \(\alpha = x + (f(x))\) is a root of \(f\) and often a primitive element: a generator of the cyclic group \(\mathbb{F}_{p^m}^* = \mathbb{F}_{p^m} \setminus \{0\}\).

Theorem (Primitive elements). The multiplicative group \(\mathbb{F}_q^*\) is cyclic of order \(q - 1\). An element \(\alpha \in \mathbb{F}_q^*\) with \(\text{ord}(\alpha) = q - 1\) is called a primitive element or primitive root of \(\mathbb{F}_q\).

This cyclicity is crucial for BCH code construction, where the design distance is enforced by requiring consecutive powers of a primitive element to be roots of every codeword’s polynomial representation.

Key Properties

Several structural facts are used repeatedly in coding theory. The Frobenius endomorphism \(\phi : a \mapsto a^p\) is an automorphism of \(\mathbb{F}_{p^m}\), and the automorphism group is cyclic of order \(m\), generated by \(\phi\). The elements of \(\mathbb{F}_p\) are exactly the fixed points of \(\phi\).

The subfield structure is clean: \(\mathbb{F}_{p^r} \subseteq \mathbb{F}_{p^m}\) if and only if \(r \mid m\). This means the subfields of \(\mathbb{F}_{2^m}\) form a lattice isomorphic to the divisor lattice of \(m\). For BCH codes, this allows us to construct \(\mathbb{F}_{2^m}\) as an extension of \(\mathbb{F}_2\) and then use elements that generate the full group.

Chapter 3: Linear Codes

Linear codes are the workhorses of practical coding theory. By requiring the codebook to form a vector subspace of \(\mathbb{F}_q^n\), we gain a compact algebraic representation and efficient encoding/decoding algorithms.

Definitions and Basic Properties

Definition (Linear code). An \([n, k]_q\) linear code \(\mathcal{C}\) is a \(k\)-dimensional subspace of \(\mathbb{F}_q^n\). The code has \(q^k\) codewords, minimum distance \(d\), and is called an \([n, k, d]_q\) code. The rate is \(R = k/n\).

The generator matrix \(G \in \mathbb{F}_q^{k \times n}\) has rows that form a basis for \(\mathcal{C}\): encoding sends message \(m \in \mathbb{F}_q^k\) to codeword \(c = mG\). In systematic form, \(G = [I_k \mid P]\), so the first \(k\) coordinates of \(c\) are the message itself. The remaining \(n - k = r\) coordinates are check digits or parity bits.

\[ d(\mathcal{C}) = \min_{0 \neq c \in \mathcal{C}} \text{wt}(c), \]

where \(\text{wt}(c) = d_H(c, 0) = |\{i : c_i \neq 0\}|\) is the Hamming weight.

This simplification — computing \(d\) by scanning codewords for the lightest nonzero one instead of comparing all pairs — is one of the main benefits of linearity. It also allows weight enumerators to encode all the information about a code’s distance structure in a single polynomial.

The Dual Code and Parity-Check Matrix

\[ \mathcal{C} = \ker(H) = \{c \in \mathbb{F}_q^n : Hc^T = 0\}. \]

In systematic form, if \(G = [I_k \mid P]\) then \(H = [-P^T \mid I_{n-k}]\).

The parity-check matrix gives a clean characterisation of minimum distance:

Theorem (Distance via \(H\)). The minimum distance of a linear code with parity-check matrix \(H\) equals the minimum number of columns of \(H\) that are linearly dependent.

This bridges the algebraic and combinatorial views: a codeword of weight \(w\) corresponds to \(w\) columns of \(H\) that sum to zero.

Syndrome Decoding

\[ s = Hr^T = Hc^T + He^T = He^T. \]

Since \(Hc^T = 0\) for every codeword, the syndrome depends only on the error, not the message. Syndrome decoding works by precomputing a syndrome table: for each possible error pattern \(e\), store \((He^T, e)\). Decoding \(r\) means looking up syndrome \(Hr^T\) to find the most likely error pattern, then computing \(\hat{c} = r - \hat{e}\).

For a binary \([n, k, d]\) code with \(t = \lfloor (d-1)/2 \rfloor\), the syndrome table has \(2^{n-k}\) entries (one per syndrome), each achieved by exactly one error pattern of weight at most \(t\). This standard array decoding runs in \(O(1)\) time after precomputation of size \(O(2^{n-k})\).

Hamming Codes: A Family of Perfect Codes

Definition (Hamming code). For \(m \geq 2\), the binary Hamming code \(\text{Ham}(m, 2)\) has parity-check matrix \(H\) whose columns are all \(2^m - 1\) distinct nonzero vectors in \(\mathbb{F}_2^m\). This gives an \([n, k, d] = [2^m - 1,\, 2^m - 1 - m,\, 3]\) code.

Hamming codes are single-error-correcting. They are perfect: the Hamming balls of radius 1 around each codeword tile \(\mathbb{F}_2^n\) exactly (each of the \(2^n\) words lies within distance 1 of exactly one codeword). Decoding is beautiful: the syndrome \(Hr^T\) is exactly the binary representation of the position of the flipped bit, so correction takes \(O(n)\) time.

The extended Hamming code appends an overall parity bit, giving an \([2^m, 2^m - 1 - m, 4]\) code that detects up to 2 errors and corrects 1.

Chapter 4: The Golay Codes

The Golay codes are two exceptional perfect codes discovered by Marcel Golay in 1949. They are among the most beautiful objects in combinatorics and have applications ranging from NASA’s Voyager spacecraft to the study of the Mathieu groups in finite group theory.

The Binary Golay Code \(C_{23}\)

The binary Golay code \(C_{23}\) is a \([23, 12, 7]_2\) code. Its parameters sit at the theoretical edge of the Hamming bound: with \(t = 3\) error correction in length 23, the sphere-packing bound gives \(2^{12} \cdot \sum_{i=0}^3 \binom{23}{i} = 2^{23}\), so the Hamming balls exactly fill \(\mathbb{F}_2^{23}\). This makes \(C_{23}\) a perfect 3-error-correcting code, and it is the only perfect binary code (other than repetition codes and Hamming codes) up to equivalence.

The code can be constructed as a cyclic code: its generator polynomial is the degree-11 factor of \(x^{23} - 1\) over \(\mathbb{F}_2\). Concretely, since 23 is prime and 2 is a primitive root mod 23, \(x^{23} - 1\) factors as \((x-1) \cdot g_1(x) \cdot g_2(x)\) over \(\mathbb{F}_2\) where \(g_1\) and \(g_2\) are conjugate irreducible polynomials of degree 11; taking \(g_1\) as the generator gives \(C_{23}\).

The Extended Binary Golay Code \(C_{24}\)

Appending an overall parity-check bit to \(C_{23}\) produces the extended binary Golay code \(C_{24}\), a \([24, 12, 8]_2\) code. This code is self-dual: \(C_{24} = C_{24}^\perp\). Its generator matrix can be written as \(G = [I_{12} \mid A]\) where \(A\) is the binary “Miracle Octad Generator” matrix related to the Steiner system \(S(5, 8, 24)\) — every set of 5 positions is contained in a unique weight-8 codeword (octad).

Theorem (Reliability of \(C_{24}\)). The extended Golay code \(C_{24}\) can correct all patterns of up to 3 errors. Its minimum distance is 8, so it detects up to 7 errors. The weight distribution is entirely determined by the self-duality and has a beautiful symmetry.

A decoding algorithm for \(C_{24}\) works as follows. Given received word \(r\), compute syndrome \(s = Hr^T\). If \(\text{wt}(s) \leq 3\), the error is \(e = (s, 0)\). Otherwise, check each row \(b_i\) of \(B\): if \(\text{wt}(s \oplus b_i) \leq 2\), the error is \(e = (s \oplus b_i,\, e_i)\) where \(e_i\) has a 1 only in position \(i\). Symmetrically, compute \(sB\) and apply the same tests. This \(O(n)\) algorithm is elegant precisely because \(C_{24}\)’s algebraic structure makes the error patterns fall into a small number of cases.

The connection to the Mathieu group \(M_{24}\) — one of the 26 sporadic simple groups — arises because the automorphism group of \(C_{24}\) is exactly \(M_{24}\), acting on the 24 coordinate positions.

Chapter 5: Cyclic Codes and Their Algebraic Structure

Cyclic codes form the most algebraically rich family in classical coding theory. By requiring the code to be invariant under cyclic shifts, we can represent codewords as polynomials and leverage the theory of ideals in polynomial rings. BCH codes and Reed–Solomon codes are the principal families within cyclic codes.

The Ring \(R = \mathbb{F}_q[x]/(x^n - 1)\)

A cyclic shift of \((c_0, c_1, \ldots, c_{n-1})\) is \((c_{n-1}, c_0, \ldots, c_{n-2})\). Encoding vectors as polynomials \(c(x) = c_0 + c_1 x + \cdots + c_{n-1} x^{n-1}\), a cyclic shift corresponds to multiplication by \(x\) modulo \(x^n - 1\). So a code closed under cyclic shifts corresponds to an ideal in the ring \(R = \mathbb{F}_q[x]/(x^n - 1)\).

Theorem (Cyclic codes as ideals). Assume \(\gcd(n, q) = 1\). Every ideal of \(R = \mathbb{F}_q[x]/(x^n - 1)\) is principal, generated by the unique monic divisor \(g(x)\) of \(x^n - 1\) in \(\mathbb{F}_q[x]\). The corresponding cyclic code \(\mathcal{C}\) has:

Generator polynomial \(g(x)\) with \(\deg g = n - k\),
Dimension \(k\),
Codewords \(\{m(x) g(x) \bmod (x^n - 1) : m(x) \in \mathbb{F}_q[x],\, \deg m < k\}\).

The parity-check polynomial is \(h(x) = (x^n - 1)/g(x)\). The generator matrix has rows \(g(x), xg(x), \ldots, x^{k-1}g(x)\); the parity-check matrix has rows \(h^*(x) = x^k h(x^{-1})\) (the reciprocal polynomial of \(h\)) shifted cyclically. This gives a systematic structure perfect for hardware implementation via linear feedback shift registers (LFSRs).

Burst-error correcting. A cyclic code with generator polynomial \(g(x)\) can detect burst errors of length up to \(n - k = \deg g\). The Fire codes are designed specifically for burst-error correction: their generator polynomial is \(g(x) = (x^{2t-1} + 1)p(x)\) for an irreducible \(p(x)\) with \(\deg p = m\), correcting all burst errors of length \(\leq t\). Interleaving \(l\) codewords column-by-column converts a \(t\)-burst-error-correcting code into an \(lt\)-burst-error-correcting code, which is how CDs combine Reed–Solomon codes with interleaving.

Chapter 6: BCH Codes

BCH (Bose–Chaudhuri–Hocquenghem) codes are the most important family of cyclic codes. They were discovered independently by Hocquenghem (1959) and Bose–Chaudhuri (1960), and they provide a systematic way to construct codes with prescribed minimum distance.

Minimal Polynomials and Cyclotomic Cosets

To work with BCH codes over \(\mathbb{F}_q\) using roots in an extension field, we need minimal polynomials. The minimal polynomial of \(\alpha \in \mathbb{F}_{q^m}\) over \(\mathbb{F}_q\) is the lowest-degree monic polynomial \(\text{min}(x, \alpha) \in \mathbb{F}_q[x]\) with \(\alpha\) as a root.

Key facts: \(\text{min}(x, \alpha)\) is irreducible over \(\mathbb{F}_q\); its roots are \(\alpha, \alpha^q, \alpha^{q^2}, \ldots\) (the Frobenius orbit of \(\alpha\), also called the \(q\)-cyclotomic coset). To factor \(x^n - 1\) over \(\mathbb{F}_q\), partition \(\{0, 1, \ldots, n-1\}\) into \(q\)-cyclotomic cosets and multiply together the minimal polynomials for one element from each coset.

BCH Code Construction

Let \(\mathbb{F}_{q^m}\) contain a primitive \(n\)th root of unity \(\omega\) (so \(n \mid q^m - 1\)). Fix integers \(b\) and \(\delta \geq 2\) (the designed distance).

\[ g(x) = \text{lcm}(\text{min}(x, \omega^b),\, \text{min}(x, \omega^{b+1}),\, \ldots,\, \text{min}(x, \omega^{b+\delta-2})). \]

This code has roots \(\omega^b, \omega^{b+1}, \ldots, \omega^{b+\delta-2}\) in \(\mathbb{F}_{q^m}\).

The case \(n = q^m - 1\) (primitive BCH code) and \(b = 1\) (narrow-sense) is most common. For binary codes, a primitive narrow-sense BCH code of length \(n = 2^m - 1\) and designed distance \(\delta\) has \(\deg g = mt_0\) where \(t_0 \leq \lfloor (\delta - 1)/2 \rfloor\) is determined by the cyclotomic cosets.

Theorem (BCH bound). The actual minimum distance of a BCH code with designed distance \(\delta\) is at least \(\delta\).

The BCH bound is proved via a Vandermonde-determinant argument: if a codeword \(c(x)\) had fewer than \(\delta\) nonzero positions, then \(c(\omega^b) = \cdots = c(\omega^{b+\delta-2}) = 0\), and the \((\delta - 1) \times (\delta - 1)\) Vandermonde determinant of \(\omega^b, \ldots, \omega^{b+\delta-2}\) evaluated at those positions would vanish — but the Vandermonde determinant of distinct elements in a field is nonzero, contradiction.

BCH Decoding

BCH decoding finds the error positions and magnitudes from the syndrome. Given received word \(r(x) = c(x) + e(x)\), the syndrome values are \(S_j = r(\omega^j) = e(\omega^j)\) for \(j = b, \ldots, b + \delta - 2\).

Suppose at most \(t = \lfloor (\delta - 1)/2 \rfloor\) errors occurred at positions \(i_1, \ldots, i_\nu\) with magnitudes \(Y_1, \ldots, Y_\nu\) (binary: all magnitudes are 1). The error locator polynomial is \(\Lambda(z) = \prod_{j=1}^{\nu}(1 - X_j z)\) where \(X_j = \omega^{i_j}\) are the error locator numbers. The Berlekamp–Massey algorithm finds \(\Lambda\) from the syndromes in \(O(t^2)\) operations; Chien search then finds its roots in \(O(n)\) operations.

For non-binary BCH codes (including Reed–Solomon), Forney’s algorithm recovers the error magnitudes once the positions are known.

Chapter 7: Reed–Solomon Codes

Reed–Solomon (RS) codes, introduced by Irving Reed and Gustave Solomon in 1960, are the most practically important error-correcting codes. They are used in CDs, DVDs, Blu-ray discs, QR codes, DSL modems, deep-space communication, and RAID storage. Their key property is that they are MDS: they achieve the Singleton bound \(d = n - k + 1\), meaning they are maximally efficient given their length and dimension.

Construction

\[ \text{Enc}(f) = (f(\alpha_1), f(\alpha_2), \ldots, f(\alpha_n)). \]

Since two distinct polynomials of degree \(\leq k - 1\) can agree in at most \(k - 1\) positions, any two distinct codewords differ in at least \(n - (k - 1) = n - k + 1\) positions. So RS codes have minimum distance \(d = n - k + 1\), meeting the Singleton bound exactly.

Theorem (RS codes are MDS). \(\text{RS}_q(n, k)\) is an \([n, k, n-k+1]_q\) MDS code. When \(n = q - 1\) and \(\alpha_i = \omega^{i-1}\) for a primitive element \(\omega\), this is equivalently a BCH code with designed distance \(n - k + 1\).

The cyclic structure arises because evaluating at powers of a primitive root is equivalent to taking a discrete Fourier transform over \(\mathbb{F}_q\).

Burst-error correction. An RS code over \(\mathbb{F}_{2^m}\) with minimum distance \(d\) can correct \(\lfloor (d-1)/2 \rfloor\) symbol errors, each of which is \(m\) bits. This makes RS codes excellent for burst-error environments: a burst of \(m\lfloor (d-1)/2 \rfloor\) consecutive bit errors affects at most \(\lfloor (d-1)/2 \rfloor\) symbols. The CD standard uses two interleaved RS codes over \(\mathbb{F}_{2^8}\) in a product code called CIRC (Cross-Interleaved Reed–Solomon Coding), which handles scratches of up to 4000 consecutive bits.

List Decoding

Standard RS decoding corrects up to \(t = \lfloor (n-k)/2 \rfloor\) errors, but the Singleton bound suggests the true information-theoretic limit is \(n - k\) errors. List decoding relaxes unique decoding: instead of one answer, output a list of all codewords within distance \(\tau n\) from the received word.

Sudan (1997) gave the first polynomial-time list decoding algorithm for RS codes beyond \(t\), correcting up to \(n - \sqrt{nk}\) errors. Guruswami and Sudan (1999) improved this to \(n - \sqrt{nk}\) by finding a bivariate polynomial \(Q(x, y)\) of low \((1,k-1)\)-weighted degree with \(Q(\alpha_i, r_i) = 0\) for all received positions, then factoring. This handles up to \(1 - \sqrt{R}\) fraction errors for rate-\(R\) codes.

Chapter 8: Code-Based Cryptography

The hard decoding problems for general linear codes underpin a family of public-key cryptosystems. Code-based cryptography is appealing today because it is believed to resist quantum attacks: unlike RSA and elliptic curve systems, there is no known quantum algorithm that significantly speeds up generic decoding.

The McEliece Cryptosystem

Robert McEliece proposed the first code-based public-key system in 1978. The key insight is that an RS or Goppa code has efficient decoding, but a randomly scrambled version looks like a random linear code — for which decoding is NP-hard (in the worst case) and believed to be hard on average.

Setup. Alice chooses a \([n, k, d]\) binary Goppa code with efficient \(t\)-error-correcting decoder. She generates a random invertible \(k \times k\) matrix \(S\) and a random \(n \times n\) permutation matrix \(P\). The public key is \(\hat{G} = SGP\); the private key is \((S, G, P)\).

Encryption. To encrypt message \(m \in \mathbb{F}_2^k\), Bob computes \(c = m\hat{G} + e\) where \(e\) is a random vector of weight \(\leq t\).

Decryption. Alice computes \(cP^{-1} = mSG + eP^{-1}\), then decodes using the Goppa decoder to recover \(mS\), then computes \(m = (mS)S^{-1}\).

Security rests on two hardness assumptions: (1) it is hard to distinguish \(\hat{G}\) from a random matrix (this has resisted attack for 45 years for Goppa codes), and (2) it is hard to decode a random linear code. The main drawbacks are large public keys (\(\approx 500\) kilobytes for 128-bit security) and ciphertext expansion.

The Niederreiter variant uses a parity-check matrix instead of a generator matrix, reducing key sizes. Both systems survived the NIST post-quantum standardization process: Classic McEliece (based on binary Goppa codes) was selected as an alternate standard in 2024.

Chapter 9: Advanced Topics and Modern Directions

Coding Theory 2: LDPC Codes and Polar Codes

The codes studied in CO 331 — Hamming, Golay, BCH, RS — are all algebraic codes: their structure is defined by algebraic constraints. Two further families achieve the Shannon capacity and are used in modern communication systems.

Low-Density Parity-Check (LDPC) codes, introduced by Gallager (1962) and rediscovered by MacKay–Neal (1996), have sparse parity-check matrices. The sparsity enables an iterative decoding algorithm — belief propagation on a Tanner graph — that approaches capacity. LDPC codes are used in Wi-Fi (IEEE 802.11n), DVB-S2, and 5G NR.

Polar codes, introduced by Arıkan (2009), are the first family provably achieving the symmetric capacity of any binary discrete memoryless channel with an explicit construction and efficient \(O(n \log n)\) encoding/decoding. They exploit channel polarisation: repeated application of a two-channel combiner transforms \(n\) independent copies of a channel into \(n\) “polarised” synthetic channels — some nearly perfect, some nearly useless. Encoding sends data only over the good channels; the bad channels carry frozen bits. Polar codes are used in 5G NR for control channels.

Chapter 10: Beyond This Course

Natural Next Topics

Coding theory is a vast subject, and CO 331 provides the algebraic foundation for a rich set of extensions. The course stops at Reed–Solomon and a brief introduction to code-based cryptography, but the field extends in at least four major directions: algebraic geometry codes (replacing the projective line with richer curves), capacity-achieving modern codes (LDPC and polar), algorithmic list decoding, and quantum error correction.

\[ [n,\, k \geq \ell - g + 1 - (k_\text{correction}),\, d \geq n - \ell], \]

so the minimum distance bound degrades from the Singleton bound by \(g\). For \(g = 0\) (the projective line), this recovers RS codes with \(d = n - k + 1\). For higher genus, one loses \(g\) in the distance but gains access to codes of length \(n > q - 1\) (impossible for RS over \(\mathbb{F}_q\)) because curves of higher genus have more rational points.

The reason AG codes can beat the Gilbert–Varshamov (GV) bound is Ihara’s theorem and the Shimura–Taniyama bound: for a family of curves of genus \(g \to \infty\) over \(\mathbb{F}_{q^2}\), the ratio of rational points to genus satisfies \(\limsup N/g \leq 2\sqrt{q}\) (Ihara’s bound), and this limit is achieved by modular curves and Shimura curves — explicit families with the maximum possible number of rational points. Tsfasman, Vlădut, and Zink showed that for \(q \geq 49\) a perfect square, these families give asymptotic codes with rate-distance tradeoff exceeding the GV bound: a curve achieving \(N/g = \sqrt{q} - 1\) gives codes on the Tsfasman–Vlădut–Zink curve \(\alpha_q(\delta) = 1 - \delta - 1/(\sqrt{q} - 1)\) for rate \(\alpha_q\). For \(q = 49\), this lies above the binary GV bound, resolving a 30-year open problem in asymptotic coding theory.

\[ \ell(G) - \ell(K - G) = \deg G - g + 1, \]

where \(K\) is the canonical divisor and \(\ell(G) = \dim L(G)\) is the dimension of the Riemann–Roch space. This algebraic machinery is covered in courses on algebraic geometry or algebraic function fields.

List decoding beyond the unique-decoding radius. Unique decoding corrects up to \(t = \lfloor (d-1)/2 \rfloor\) errors — beyond that, there may be multiple codewords within the error ball. List decoding outputs all codewords within distance \(\tau n\) from the received word, even when \(\tau > t/n\). Sudan (1997) showed RS codes can be list decoded up to \(n - \sqrt{kn}\) errors in polynomial time: find a bivariate polynomial \(Q(x, y)\) of controlled bi-degree with \(Q(\alpha_i, r_i) = 0\) at all received positions, then factor over \(\mathbb{F}_q[x]\) to extract all plausible codewords. The Guruswami–Sudan improvement (1999) achieves \(n(1 - \sqrt{k/n}) = n(1 - \sqrt{R})\) errors.

Folded Reed–Solomon codes (Guruswami–Rudra 2006) push this further. Bundle consecutive positions of an RS code into “super-symbols” and apply a variant of the Sudan interpolation over the bundled positions. The resulting codes achieve list decoding up to \(1 - R - \varepsilon\) fraction errors — within \(\varepsilon\) of the Shannon capacity of the erasure channel. The key algebraic insight is that the “folded” structure makes the interpolation polynomial have fewer independent coefficients, enabling a tighter degree bound.

LDPC codes and belief propagation. Low-Density Parity-Check (LDPC) codes (Gallager 1962, rediscovered by MacKay–Neal 1996) have parity-check matrices with very few nonzero entries per row and column. This sparsity enables the belief propagation (BP) decoder — an iterative message-passing algorithm on the Tanner graph (a bipartite graph connecting variable nodes to check nodes). Under BP, each variable node iteratively updates its belief about its value by aggregating messages from its check nodes, and each check node aggregates from its variable nodes. BP is linear-time per iteration and empirically achieves near-capacity performance on long codes.

The rigorous analysis of BP decoding uses density evolution (Richardson–Urbanke 2001): track the distribution of messages across iterations in the infinite-length limit. The threshold phenomenon is sharp: for a given channel noise level, either BP converges to the correct codeword (with high probability) or it fails to converge — and the threshold can be computed by the density evolution equations. Degree distribution design via differential evolution optimization yields codes within 0.01 dB of the Shannon limit.

Polar codes and the mathematics of channel polarisation. Arıkan’s polar codes (2009) are the first family with a provable, explicit, capacity-achieving construction for any binary discrete memoryless channel. The construction applies the butterfly matrix \(\begin{pmatrix} 1 & 0 \\ 1 & 1 \end{pmatrix}\) recursively \(\log_2 n\) times to \(n\) independent channel copies, transforming them into \(n\) “synthetic” channels. By the polarisation phenomenon — proved rigorously using martingale methods — the fraction of channels with capacity approaching 1 is exactly the channel capacity \(C\), and the fraction approaching 0 is \(1 - C\). Encoding sends information bits only on the near-perfect synthetic channels; the near-zero channels carry fixed “frozen” bits. The successive cancellation decoder runs in \(O(n \log n)\) time.

Network coding. Classical coding theory studies point-to-point channels. Network coding (Ahlswede–Cai–Li–Yeung 2000) studies multi-source, multi-sink networks where intermediate nodes can perform arbitrary (or linear) combinations of their inputs. The key result is that linear network coding over a large enough field achieves the multicast capacity of a network. This generalises max-flow: the max information flow from source \(s\) to each of \(k\) sinks simultaneously equals the min-cut between \(s\) and the most disconnected sink. Random linear network coding, where each node sends a uniformly random linear combination, achieves capacity with high probability over a field of size at least \(k(|E| + 1)\). Subspace codes for errors in network coding (Kötter–Kschischang 2008) model corruption as a change in the subspace spanned by transmitted vectors, leading to a new class of codes in the Grassmannian.

Locally decodable and locally correctable codes. A locally decodable code (LDC) recovers any message bit by reading only a few codeword positions, even after a constant fraction of errors. The efficiency tradeoff — how short can a 3-query LDC be? — connects to complexity theory: if very short LDCs existed, they would imply circuit lower bounds. The trivial construction gives exponential length; matching vector codes (Efremenko 2009) give \(2^{O(\sqrt{\log n})}\) length for 3 queries, a sub-exponential improvement. The relationship between LDC length lower bounds and computational complexity lower bounds (Katz–Trevisan 2000) makes this a central challenge in theoretical computer science.

Follow-up Courses and Reading

Understanding the extensions above requires a progression through several mathematical and algorithmic subjects. Below is a detailed map of the relevant courses, texts, and resources at multiple levels.

At UWaterloo, the immediate next course is CO 487 (Applied Cryptography, same instructor Alfred Menezes), which applies the finite field theory and cyclic codes from CO 331 to cryptographic protocols — BCH-based constructions, the McEliece public-key system, and post-quantum security. PMATH 347 (Groups and Rings) and PMATH 348 (Fields and Galois Theory) provide deeper algebraic foundations: Galois theory for understanding splitting fields, automorphism groups of finite fields, and cyclotomic extensions used in BCH codes. CS 360 (Introduction to Theory of Computing) and CS 462 (Formal Languages and Parsing) connect coding theory to information-theoretic complexity via the PCP theorem.

For graduate work, the dedicated coding theory courses are: MIT’s 6.895 Essential Coding Theory (Guruswami–Sudan–Madry, lecture notes freely available at cse.buffalo.edu/~atri/courses/coding-theory/) covers AG codes, list decoding, expander codes, and information-theoretic hardness in depth. CMU’s 21-801 by Venkatesan Guruswami is similar in scope. Stanford’s EE376A Information Theory covers the Shannon limit, channel capacity theorems, and the information-theoretic underpinnings of why codes can or cannot achieve certain rate-distance tradeoffs.

Modern coding theory for LDPC and polar codes: Richardson and Urbanke’s course at EPFL (and their textbook Modern Coding Theory, Cambridge 2008) is the definitive resource for density evolution, LDPC design, and capacity-achieving codes. Arıkan’s original polar code paper (IEEE Transactions on Information Theory, 2009) is short and self-contained; the follow-up literature on successive cancellation list decoding (Tal–Vardy 2015), construction of good polar codes (Mori–Tanaka, polar subcodes), and finite-length analysis is extensive.

Key reference texts in coding theory:

Huffman and Pless Fundamentals of Error-Correcting Codes (Cambridge, 2003) — the modern comprehensive algebraic coding theory text.
MacWilliams and Sloane The Theory of Error-Correcting Codes (North-Holland, 1977) — the encyclopedic classic, still indispensable for weight enumerators and design theory connections.
Guruswami, Rudra, Sudan Essential Coding Theory — a freely available survey covering list decoding, capacity bounds, and expander codes at graduate level.
Richardson and Urbanke Modern Coding Theory (Cambridge, 2008) — LDPC and turbo codes from first principles.
Stichtenoth Algebraic Function Fields and Codes, 2nd ed. (Springer, 2009) — the standard graduate text for AG codes, covering Riemann–Roch in the function field setting.
Bernstein and Lange Post-Quantum Cryptography (Springer, 2009) — covers McEliece and Niederreiter cryptosystems alongside lattice-based and hash-based systems.

For quantum error correction, the key texts are Nielsen and Chuang Quantum Computation and Quantum Information (Cambridge, 2000) for the stabiliser formalism; Gottesman’s thesis (1997) for stabiliser codes; and the recent survey by Campbell, Terhal, Vuillot on fault-tolerant quantum computing. Kitaev’s toric code paper (Annals of Physics, 2003) is the foundational paper for topological quantum error correction.

Open Problems and Active Research

Coding theory has a rich collection of unsolved problems, ranging from fundamental information-theoretic questions to concrete algorithmic challenges. The following problems are considered among the most important open questions in the field.

The asymptotic Gilbert–Varshamov (GV) bound for binary codes. The GV bound says that binary codes with rate \(R\) and relative minimum distance \(\delta\) exist as long as \(R \leq 1 - H_2(\delta)\) (where \(H_2\) is the binary entropy function). This is proved by a probabilistic argument — random linear codes achieve it. The question is: can explicit algebraic constructions beat the GV bound for binary codes? Tsfasman–Vlădut–Zink showed this is possible over non-binary alphabets of size \(q \geq 49\), but for \(q = 2\), no algebraic family is known to exceed the GV bound. The Lin–Maneely conjecture and the more general question of algebraic codes beating GV form one of the central open problems. Recent approaches via multiplicity codes (Kopparty 2012) and affine-invariant codes (Kaufman–Sudan 2008) have not resolved it.

Average-case hardness of decoding. Unique decoding of a random linear code up to the minimum distance \(d/2\) is NP-hard in the worst case (Berlekamp–McEliece–van Tilborg 1978). But average-case hardness — whether a random linear code is hard to decode for a randomly chosen error vector — is an open question. The security of the McEliece cryptosystem rests on this average-case assumption. If decoding random linear codes were easy on average, McEliece would be broken. Establishing average-case hardness rigorously (via worst-case to average-case reductions) would be a major breakthrough for post-quantum cryptography.

Capacity-achieving codes with linear complexity. Polar codes and LDPC codes achieve capacity for binary symmetric channels, but their decoding algorithms have \(O(n \log n)\) complexity. For hardware implementations (5G, satellite communication), \(O(n)\) complexity would be highly desirable. The question: do there exist capacity-achieving binary codes with \(O(n)\) encoding and decoding complexity? This is related to whether capacity can be achieved by linear-time computable functions, a question at the boundary of information theory and theoretical computer science.

List decoding capacity for binary codes. Guruswami–Sudan decodes RS codes up to \(1 - \sqrt{R}\) errors. Folded RS codes achieve \(1 - R - \varepsilon\) over large alphabets. For binary codes, does list decoding up to the binary list-decoding capacity (the information-theoretic maximum, given by the binary entropy function) admit a polynomial-time algorithm? Guruswami–Indyk (2003) showed that random binary codes can be list decoded up to capacity, but no explicit construction achieves this with polynomial-time list decoding over binary alphabets. The existence of such codes is known; the question is explicit construction with efficient decoding.

Optimal quasi-cyclic codes. Quasi-cyclic codes are codes where a cyclic shift of a codeword by \(p\) positions (rather than 1) gives another codeword. They are attractive for hardware implementation. The question of characterising optimal quasi-cyclic codes — those meeting the best known rate-distance tradeoffs — requires a theory that blends cyclic code algebra (Chapter 5) with computer search and finite geometry. Many best-known codes are quasi-cyclic, but a systematic theory of their optimality is lacking.

McEliece key size reduction. Classic McEliece with binary Goppa codes requires public keys of roughly 500–1000 kilobytes for 128-bit security — 100× larger than RSA-2048. Reducing key sizes while maintaining security has been an active goal. Quasi-cyclic MDPC (moderate density parity-check) codes (Misoczki et al. 2013) give compact keys but are potentially vulnerable to decoding failure attacks. The fundamental question: is there a code family with compact representations that resists all structural attacks? All broken proposals (McEliece with RS codes, AG codes, Reed-Muller codes) had too much structure. Whether compact, structureless codes (contradictory by nature) can be combined with efficient decoding is an open conceptual problem.

Quantum LDPC codes and fault-tolerant quantum computing. Quantum error correction requires stabiliser codes where both the \(X\)-type and \(Z\)-type check matrices are sparse (quantum LDPC). The existence of such codes with both constant rate \(R = k/n > 0\) and linear minimum distance \(d = \Omega(n)\) was an open problem for over two decades. Panteleev–Kalachev (2022) and Leverrier–Zémor (2022) solved it: their quantum LDPC codes achieve \(k = \Omega(n)\) and \(d = \Omega(\sqrt{n}/\log n)\). Then Dinur–Evra–Livne–Lubotzky–Mozes (2022) achieved the fully linear distance \(d = \Omega(n)\) simultaneously with constant rate, using high-dimensional expanders — a long-sought breakthrough. Whether these codes admit efficient decoders with high threshold (making them practical for fault-tolerant quantum computing) remains an active research question.

Combinatorial designs, perfect difference sets, and cyclic codes. CO 331 establishes that BCH and Reed–Solomon codes are built from cyclic structure over finite fields. A deeper connection runs to combinatorial design theory: a \(t\)-(v, k, λ) design is a collection of \(k\)-element subsets (blocks) of a \(v\)-element set such that every \(t\)-element subset is covered by exactly \(\lambda\) blocks. The incidence matrix of a design (rows = points, columns = blocks, entries = 0/1) is intimately related to linear codes. Hadamard designs (from Hadamard matrices) yield the first-order Reed–Muller codes. Cyclic difference sets — subsets \(D = \{d_1, \ldots, d_k\} \subseteq \mathbb{Z}/v\mathbb{Z}\) such that every nonzero element of \(\mathbb{Z}/v\mathbb{Z}\) appears exactly \(\lambda\) times as a difference \(d_i - d_j\) — generate cyclic codes with highly regular weight distributions. The Singer difference set \(\{0, 1, \ldots, q^{m-1} - 1\} \pmod{(q^m - 1)/(q-1)}\) exists whenever \(q\) is a prime power and gives codes related to finite projective planes. Constructing difference sets for non-prime-power orders (if possible at all) is a classical open problem.

Perfect codes and their classification. A perfect code (or e-perfect code) is a code achieving equality in the Hamming bound: \(\sum_{i=0}^e \binom{n}{i} (q-1)^i = q^{n-k}\). Perfect codes achieve the densest possible sphere packing in Hamming space. Over binary alphabets, the known perfect codes are: the trivial codes, repetition codes, the [7,4,3] Hamming code and its generalisations \([2^r - 1, 2^r - r - 1, 3]\), the [23, 12, 7] Golay code, and the [11, 6, 5] ternary Golay code. The perfect code theorem (van Lint 1971, Tietäväinen 1973) states that these are the only nontrivial perfect codes over prime power alphabets — a striking rigidity result proved using a combination of algebraic and combinatorial methods (Lloyd’s theorem: the Lloyd polynomial of a perfect code must split into distinct integer roots). Perfect 1-error-correcting codes of length \(n = (q^m - 1)/(q - 1)\) correspond exactly to Steiner systems \(S(2, q, q^m)\) (finite projective planes with \(q\) points per line), and whether a Steiner system \(S(2, 6, 91)\) (a projective plane of order 6, which would need to be non-Desarguesian) exists is unsettled for non-prime-power orders — though the Bruck–Ryser theorem rules out orders \(\equiv 1, 2 \pmod 4\) that are not sums of two squares.

Cryptographic applications of coding theory beyond McEliece. The CO 331 chapter on code-based cryptography focused on McEliece (1978) and Niederreiter (1986). The field has expanded significantly. Syndrome decoding on the fq — the hardness of finding a codeword of weight \(w\) in the kernel of a random matrix over \(\mathbb{F}_q\) — is the basis for the Stern identification scheme and its variants. The Regev encryption scheme (underlying LWE) is closely related: both rely on the hardness of finding short vectors in a lattice or low-weight codewords in a random linear code. Code-based signatures (CFS scheme, Wave, LESS) use structured decoding problems to sign documents; NIST’s post-quantum standardisation process has evaluated several code-based signature candidates alongside Kyber/Dilithium. The frontier question is whether codes with compact key representations (quasi-cyclic, quasi-dyadic) can be proved secure against all known structural attacks, including the recent advances in information set decoding (Becker–Joux–May–Meurer 2012) that reduce the cost of generic decoding.

Weight enumerators, MacWilliams identity, and combinatorics. The weight enumerator of a linear code \(C\) is \(W_C(x, y) = \sum_{c \in C} x^{n - w(c)} y^{w(c)}\), where \(w(c)\) is the Hamming weight of codeword \(c\). The MacWilliams identity (1963) relates the weight enumerator of \(C\) to that of its dual \(C^\perp\): \(W_{C^\perp}(x, y) = \frac{1}{|C|} W_C(x + y, x - y)\). This identity is proved using character theory of finite abelian groups (the Fourier transform over \(\mathbb{F}_q^n\)) and is one of the most elegant results in algebraic coding theory. The weight enumerator determines the error-correcting performance of the code completely: the minimum distance is the smallest \(d\) with \(A_d > 0\), and the probability of decoder error under a BSC can be expressed as a polynomial in the weight enumerator coefficients. Divisibility of weight enumerators — whether the coefficients \(A_d\) are divisible by specific primes — is connected to the code’s algebraic structure. For self-dual codes (where \(C = C^\perp\)), MacWilliams restricts the possible weight enumerators to a polynomial ring, and the Gleason–Pierce theorem classifies all self-dual codes over \(\mathbb{F}_2\) and \(\mathbb{F}_3\) by their weight enumerator type. These connections between coding theory, combinatorics, and number theory (via the modular forms that appear as weight enumerators of extremal self-dual codes) make coding theory a rich meeting ground for multiple mathematical disciplines.