CO 330: Combinatorial Enumeration
Stephen Melczer
Estimated study time: 1 hr 27 min
Table of contents
Combinatorial enumeration asks a deceptively simple question: given a family of discrete objects, how many of them are there of each size? The answer — a sequence of non-negative integers — turns out to encode far more structure than its bare numerical appearance suggests. The central insight of this course is that sequences arising from combinatorics are almost never arbitrary; they satisfy recurrences, admit generating functions of special forms, and obey asymptotic laws that can be read off from the analytic properties of those generating functions. A sequence is thus not just a list of numbers but a mathematical object with rich internal structure waiting to be revealed.
The material divides into four broad movements. We begin with the foundations: what it means to count, how bijections transform counting problems into equivalent ones, and how classical combinatorial objects — permutations, subsets, lattice paths — are enumerated. We then develop the principal algebraic toolbox: formal power series, generating functions, and the symbolic method, which turns the construction of combinatorial objects into algebraic manipulations on power series. The third part explores extensions and refinements: polynomial analogues that refine classical counts, integer partitions and their beautiful product formulas, and exponential generating functions for labelled structures. Finally, we turn to asymptotics: the question of how the coefficients of a generating function behave for large indices, using tools from real and complex analysis to extract precise asymptotic formulas — and even algorithms for random generation.
Throughout the course, Sage (free, open-source computer algebra software) is used to explore examples computationally. Many identities that would take hours to verify by hand can be confirmed or discovered in seconds using a computer algebra system, and many asymptotic computations reduce to symbolic differentiation and root-finding.
Part I: Foundations of Enumeration
Chapter 1: The Language of Counting
Combinatorial Objects and Their Sequences
The objects of study in combinatorial enumeration are combinatorial classes — collections of discrete structures, each of which has a well-defined non-negative integer size (also called weight). A class \(\mathcal{A}\) is a (typically infinite) set of objects together with a size function \(|\cdot| : \mathcal{A} \to \mathbb{N}\), with the crucial finiteness condition that for each \(n \geq 0\) the set \(\mathcal{A}_n := \{\alpha \in \mathcal{A} : |\alpha| = n\}\) is finite. The counting sequence of \(\mathcal{A}\) is \((a_n)_{n \geq 0}\) where \(a_n = |\mathcal{A}_n|\) is the number of objects of size \(n\).
Two classes are considered equivalent (written \(\mathcal{A} \cong \mathcal{B}\)) if their counting sequences agree: \(a_n = b_n\) for all \(n \geq 0\). This is a broader equivalence than a bijection between the classes themselves — it says only that the same number of objects of each size exist, not that those objects are in any structural correspondence. When we exhibit a bijection between \(\mathcal{A}_n\) and \(\mathcal{B}_n\) for all \(n\), we prove the stronger claim of a size-preserving bijection, and we gain the deeper insight that the two classes are “the same thing in disguise.”
Examples are everywhere. The number of binary strings of length \(n\) is \(2^n\). The number of \(k\)-element subsets of an \(n\)-element set is the binomial coefficient \(\binom{n}{k}\). The number of rooted binary trees with \(n\) internal nodes is the Catalan number \(C_n = \frac{1}{n+1}\binom{2n}{n}\). The number of integer partitions of \(n\) — ways to write \(n\) as an unordered sum of positive integers — is the function \(p(n)\), which grows roughly like \(e^{\pi\sqrt{2n/3}}/(4n\sqrt{3})\).
Five Properties Sequences Can Have
One of Melczer’s first observations is that the sequences arising in combinatorics are never arbitrary: they tend to satisfy at least one of five remarkable properties, and often several simultaneously.
The first property is an explicit formula — a closed-form expression for \(a_n\) in terms of standard arithmetic operations. The most celebrated example is the formula for labeled trees: by a theorem of Borchardt (1860) and Cayley (1889), the number of labeled rooted trees on \(n+1\) vertices is \((n+1)^{n-1}\). The formula \(\binom{n}{k} = n!/(k!(n-k)!)\) for subsets and the formula \(C_n = \frac{1}{n+1}\binom{2n}{n}\) for Catalan numbers are further examples. A “nice” explicit formula is one involving factorials, binomial coefficients, and finite sums — not circular definitions or infinite series.
The second property is an encoding via a recurrence or generating function. When an explicit formula is unavailable, the sequence may satisfy a linear recurrence with polynomial (or constant) coefficients, or may be extractable as the \(n\)-th coefficient of a power series satisfying some algebraic or differential equation. The Virahanka–Fibonacci sequence \(v_{n+2} = v_{n+1} + v_n\) with \(v_0 = v_1 = 1\) is the canonical example of a C-finite sequence — one satisfying a linear recurrence with constant coefficients. The derangement numbers \(d_n\) satisfy \(d_{n+2} = (n+1)d_{n+1} + (n+1)d_n\), which is a P-recursive (or D-finite) recurrence — linear with polynomial coefficients. These distinctions organize a hierarchy of complexity:
- Rational if \(A(z) = P(z)/Q(z)\) for polynomials \(P,Q\). This is equivalent to \((a_n)\) being C-finite.
- Algebraic if \(P(z, A(z)) = 0\) for some polynomial \(P\) in two variables. Example: the Catalan generating function \(C(z) = \frac{1-\sqrt{1-4z}}{2z}\) satisfies \(zC(z)^2 - C(z) + 1 = 0\).
- D-finite if \(A(z)\) satisfies a linear ODE with polynomial coefficients. Every algebraic series is D-finite.
- D-algebraic if \(A(z)\) satisfies a polynomial ODE. Example: \(\tan(z)\) satisfies \(y'(z) = y(z)^2 + 1\).
The third property is a fast algorithm for computing terms. Knowing a recurrence of order \(r\) means the \(N\)-th term can be computed in \(O(r^3 \log N)\) operations by matrix exponentiation — the key idea is that computing the \(N\)-th Fibonacci number by repeated matrix squaring takes only \(O(\log N)\) matrix multiplications. In practice, this means computing the millionth Fibonacci number (which has over 200,000 decimal digits) takes about 10 milliseconds.
The fourth property is an asymptotic formula — a description of \(a_n\) for large \(n\) that is accurate enough for practical purposes. Classical asymptotics include Stirling’s approximation \(n! \sim (n/e)^n \sqrt{2\pi n}\) (1730), the Hardy–Ramanujan formula for partitions, and the Quicksort analysis \(c_n \sim 2n \log n\). The notation \(a_n \sim b_n\) means \(a_n/b_n \to 1\) as \(n \to \infty\). Asymptotics are especially important for algorithm analysis, where exact formulas are often unavailable but the growth rate determines practical feasibility.
The fifth property is a limit theorem — a statement that a combinatorial parameter, properly normalized, converges in distribution to a standard random variable (most often a Gaussian). The birthday paradox, the normal distribution of inversions in a random permutation, and the central limit theorem for the number of cycles in a random permutation are all examples. Flajolet catalogued 26 distinct limit laws in combinatorics. These distributional results often require the machinery of analytic combinatorics in several variables.
Coefficient Extraction and Notation
A notation that will appear constantly throughout the course is coefficient extraction: \([z^n] f(z)\) denotes the coefficient of \(z^n\) in the formal power series \(f(z)\). So if \(f(z) = \sum_{k \geq 0} a_k z^k\) then \([z^n] f(z) = a_n\). This notation is powerful because it allows algebraic manipulations on generating functions to be stated as identities for coefficients.
The key observation — which Melczer calls “perhaps the most crucial for enumerative combinatorics” — is that the counting sequence of a class is often best understood not term by term, but as the sequence of coefficients of a single generating function with its own algebraic identity. Computing \(a_{100}\) directly from the definition might require summing over millions of objects; reading it off from a rational generating function requires only partial fractions and a formula for geometric series.
Chapter 2: Bijective Methods
Sets and Bijections as the Atomic Units
Counting problems reduce, at their most elementary level, to comparing the sizes of finite sets. The fundamental principle is: two finite sets have the same cardinality if and only if there exists a bijection between them — a function \(f: S \to T\) that is both injective (no two elements of \(S\) map to the same element of \(T\)) and surjective (every element of \(T\) is hit). This tautology becomes powerful when we can exhibit a concrete bijection between a set whose size we want to know and a set whose size we already know.
A function \(f: S \to T\) is a surjection if for every \(t \in T\) there exists \(s \in S\) with \(f(s) = t\). It is an injection if \(f(s) = f(s')\) implies \(s = s'\). A bijection is both. If \(f: S \to T\) is a bijection, the inverse function \(f^{-1}: T \to S\) is well-defined by \(f^{-1}(t) = s\) if and only if \(f(s) = t\); importantly, \(f^{-1}\) is itself a bijection. The standard test for bijectivity is: \(f: S \to T\) is a bijection if and only if there exist functions \(g, h: T \to S\) such that \(g \circ f = \mathrm{id}_S\) and \(f \circ h = \mathrm{id}_T\), in which case \(g = h = f^{-1}\).
The following fiber-counting principle (sometimes called the “bijection principle”) is fundamental.
This proposition is the workhorse behind many classical counting formulas, because it reduces counting \(S\) to counting \(T\) (which may be easier) and counting the fibers.
Inclusion–Exclusion
The simplest union formula is \(|A \cup B| = |A| + |B| - |A \cap B|\): the sum double-counts the intersection, so we subtract it once. The general version is the Principle of Inclusion–Exclusion.
The classic application is counting derangements — permutations \(\sigma \in S_n\) with no fixed point, \(\sigma(i) \neq i\) for all \(i\). Let \(A_i\) be the set of permutations fixing \(i\). Then \(|A_S| = (n - |S|)!\) for \(\emptyset \neq S \subseteq [n]\), and inclusion–exclusion gives
\[ d_n = n! \sum_{k=0}^n \frac{(-1)^k}{k!} \xrightarrow{n \to \infty} \frac{n!}{e}. \]This says that as \(n \to \infty\), roughly \(1/e \approx 36.8\%\) of all permutations are derangements. A beautiful corollary: the sequence \(d_n/n!\) converges, and it does so extremely fast — already at \(n = 7\) the approximation is accurate to seven decimal places.
Another application is Euler’s totient function. For \(n = p_1^{c_1} \cdots p_m^{c_m}\), the number of integers \(1 \leq b \leq n\) with \(\gcd(b,n) = 1\) is
\[ \varphi(n) = n \prod_{i=1}^m \left(1 - \frac{1}{p_i}\right). \]Permutations and the Lehmer Code
The number of permutations of \([n] = \{1, \ldots, n\}\) is \(n!\). The standard proof proceeds by constructing an explicit bijection — the Lehmer code — between \(S_n\) and the product set \(Q_n = [n] \times [n-1] \times \cdots \times [1]\), whose cardinality is \(n!\) by the product rule.
Given \(\sigma = a_1 a_2 \cdots a_n \in S_n\), define \(r_i = |\{j : j > i, a_j < a_i\}|\) — the number of elements to the right of position \(i\) that are smaller than \(a_i\). This counts inversions contributed at position \(i\). The map \(I_n(\sigma) = (1 + r_1, \ldots, 1 + r_n)\) lands in \(Q_n\). The inverse map \(J_n: Q_n \to S_n\) reconstructs \(\sigma\): at step \(i\), place the \(h_i\)-th smallest element from the remaining values as position \(i\) of \(\sigma\). One can verify that \(J_n \circ I_n = \mathrm{id}\) and \(I_n \circ J_n = \mathrm{id}\), establishing the bijection.
This is not just a bookkeeping device. The quantity \(\mathrm{inv}(\sigma) = \sum_i r_i\) is the inversion number of \(\sigma\), the total number of pairs \((i,j)\) with \(i < j\) and \(\sigma(i) > \sigma(j)\). The Lehmer code proves more than that \(|S_n| = n!\) — it identifies inversions as the natural statistic encoding the “disorder” of a permutation, and this identification is the key to the q-analogue story in Part III.
Subsets and Binomial Coefficients
The bijective proof exhibits a bijection \(\Psi_{n,k}: S_n \to B(n,k) \times S_k \times S_{n-k}\), where \(B(n,k)\) is the set of \(k\)-element subsets. Given \(\sigma = a_1 \cdots a_n\), let \(A = \{a_1, \ldots, a_k\}\) (the first \(k\) values), let \(\beta = P_k(a_1 \cdots a_k)\) (the permutation of \(S_k\) recording the relative order of the first \(k\) values), and let \(\gamma = P_{n-k}(a_{k+1} \cdots a_n)\). The three-component output \((A, \beta, \gamma)\) uniquely determines \(\sigma\), and every triple arises from exactly one \(\sigma\). Hence
\[ n! = |S_n| = |B(n,k)| \cdot k! \cdot (n-k)! \]which gives \(|B(n,k)| = \binom{n}{k}\). This approach, while more involved than the usual proof by the multiplication principle, reveals the factorial structure underlying the binomial coefficients and will be mirrored exactly in the q-analogue theory.
Lattice Paths
A lattice path from \((0,0)\) to \((a,b)\) is a sequence of unit steps either East \((1,0)\) or North \((0,1)\). There are exactly \(\binom{a+b}{b}\) such paths: each path corresponds bijectively to the subset of positions (out of \(a+b\) total steps) at which the path steps North — a \(b\)-element subset of \([a+b]\).
This geometric interpretation immediately suggests bijective proofs of binomial identities. For example, the Vandermonde identity \(\binom{m+n}{k} = \sum_{j=0}^k \binom{m}{j}\binom{n}{k-j}\) follows from partitioning paths from \((0,0)\) to \((m+n,k)\) by where they cross the vertical line \(x = m\). The identity \(\sum_{j=0}^b \binom{a+j}{j} = \binom{a+1+b}{b}\) follows from partitioning paths in \(L(a+1,b)\) according to which edge \((a, j) \to (a+1, j)\) they use.
The lattice path model is extremely flexible. Many generalizations — paths that stay above the \(x\)-axis (Dyck paths), paths with multiple step types (Motzkin paths, Schröder paths), paths in higher-dimensional grids — each have their own generating functions and asymptotic behaviors, and each appears naturally in enumerating trees, brackets, polygon dissections, and non-crossing partitions. The Catalan numbers, in particular, count Dyck paths of semilength \(n\) (paths from \((0,0)\) to \((2n,0)\) using steps \((1,1)\) and \((1,-1)\) that never go below the \(x\)-axis), and this is just one of over 200 known Catalan-counted structures.
Stirling Numbers of the Second Kind
A set partition of \([n]\) is a collection \(\pi = \{B_1, \ldots, B_k\}\) of nonempty disjoint subsets — called blocks — whose union is \([n]\). The number of set partitions of \([n]\) into exactly \(k\) blocks is the Stirling number of the second kind \(S(n,k)\), sometimes written \(\left\{n \atop k\right\}\).
A simple recursive argument gives the recurrence \(S(n,k) = S(n-1,k-1) + k\cdot S(n-1,k)\): the element \(n\) either forms its own singleton block (contributing \(S(n-1,k-1)\) partitions) or joins one of the \(k\) blocks of a partition of \([n-1]\) into \(k\) blocks (contributing \(k \cdot S(n-1,k)\) partitions).
The exponential formula (which we prove in Chapter 9 using EGFs) gives the beautiful closed form
\[ S(n,k) = \frac{1}{k!} \sum_{j=0}^k (-1)^{k-j} \binom{k}{j} j^n. \]This is an inclusion–exclusion formula over surjections from \([n]\) to \([k]\), normalized by \(k!\) because we want unordered blocks.
Part II: The Toolbox — Generating Functions
Chapter 3: Formal Power Series and Their Algebra
Rings, Fields, and the Need for Formalism
The generating function approach encodes a counting sequence \((a_n)_{n \geq 0}\) into a single formal expression \(A(z) = \sum_{n \geq 0} a_n z^n\) and then manipulates this expression algebraically. The word “formal” here is crucial: we are not asking whether the series converges for any particular value of \(z\). The series is simply an algebraic object — a sequence of coefficients — and addition, subtraction, multiplication, and (under appropriate conditions) division of formal series are all defined term-by-term.
To set up this machinery cleanly, recall that a ring is a set \(R\) equipped with addition and multiplication satisfying the usual axioms (commutativity, associativity, distributivity, existence of \(0\) and \(1\)). Familiar rings include \(\mathbb{Z}, \mathbb{Q}, \mathbb{R}, \mathbb{C}\), polynomials \(R[z]\), and matrices \(M_n(R)\). A field is a commutative ring where every nonzero element has a multiplicative inverse (\(\mathbb{Q}, \mathbb{R}, \mathbb{C}\) are fields; \(\mathbb{Z}\) is not).
Formal Power Series: \(R[[z]]\)
Fix an integral domain \(R\) (a commutative ring with no zero-divisors). The ring of formal power series \(R[[z]]\) consists of all expressions \(F = \sum_{n \geq 0} f_n z^n\) where \(f_n \in R\), with:
- Addition: \(\left(\sum f_n z^n\right) + \left(\sum g_n z^n\right) = \sum (f_n + g_n) z^n\)
- Multiplication: \(\left(\sum f_n z^n\right)\left(\sum g_n z^n\right) = \sum_n \left(\sum_{k=0}^n f_k g_{n-k}\right) z^n\) (Cauchy product)
The valuation \(\mathrm{val}(F)\) of a nonzero series \(F\) is the smallest \(n\) with \(f_n \neq 0\). A key algebraic fact is:
The “if” direction is constructive: if \(f_0\) is invertible, we can solve recursively for the coefficients of \(G = F^{-1}\). From \(\sum_{k} f_k g_{n-k} = [n=0]\), we get \(g_0 = f_0^{-1}\) and \(g_n = -f_0^{-1} \sum_{k=1}^n f_k g_{n-k}\) for \(n \geq 1\).
This lemma is why the geometric series formula \(\frac{1}{1-z} = 1 + z + z^2 + \cdots\) is valid as a formal power series identity: the constant term of \(1-z\) is \(1\), which is invertible in any ring. No convergence is needed.
The Metric Topology and Formal Convergence
\(R[[z]]\) can be made into a metric space. Define \(d(F,G) = 2^{-\mathrm{val}(F-G)}\) for \(F \neq G\) and \(d(F,F) = 0\). Under this metric, a sequence of series converges if and only if each coefficient stabilizes: \(F_m \to F\) if for each \(n\) there exists \(M_n\) such that \([z^n]F_m = [z^n]F\) for all \(m \geq M_n\). This notion of convergence — sometimes called formal convergence or \(z\)-adic convergence — is completely independent of whether the series converge as analytic functions. In particular, the formula
\[ \frac{1}{1-z} = 1 + z + z^2 + \cdots \]holds as a formal identity for any element \(z\) in \(R[[z]]\) with positive valuation, even though \(1/(1-z)\) diverges at \(z = 1\) as an analytic function.
Formal Calculus
Differentiation and integration of formal power series are defined term-by-term. For \(F = \sum_{n \geq 0} f_n z^n\):
\[ F'(z) = \frac{d}{dz} F(z) = \sum_{n \geq 1} n f_n z^{n-1}, \qquad \int_0^z F(t)\,dt = \sum_{n \geq 0} \frac{f_n}{n+1} z^{n+1}. \]The standard rules hold formally: \((FG)' = F'G + FG'\) (product rule), \((F \circ G)' = (F' \circ G) \cdot G'\) for \(\mathrm{val}(G) \geq 1\) (chain rule). Formal composition \(F \circ G\) is defined when \(\mathrm{val}(G) \geq 1\), ensuring the coefficient of \(z^n\) in \(F \circ G\) involves only finitely many coefficients of \(G\).
A fundamental meta-theorem states that any algebraic identity for convergent power series (such as \(\exp(z+w) = \exp(z)\exp(w)\)) remains valid as a formal power series identity. This is because such identities are polynomial identities on finitely many coefficients, which are preserved when we strip out convergence considerations.
Coefficient Extraction Toolkit
Several standard coefficient extraction formulas recur throughout the course:
Geometric series: \([z^n] \frac{1}{1-az} = a^n\), or more generally \([z^n] \frac{1}{(1-z)^r} = \binom{n+r-1}{r-1}\).
Negative binomial theorem: For any \(\alpha \in \mathbb{Q}\) and \(|z| < 1\) (or formally for any integral domain),
\[ (1+z)^\alpha = \sum_{n \geq 0} \binom{\alpha}{n} z^n, \quad \text{where } \binom{\alpha}{n} = \frac{\alpha(\alpha-1)\cdots(\alpha-n+1)}{n!}. \]Setting \(\alpha = -r\) for \(r \in \mathbb{N}\) gives \([z^n](1-z)^{-r} = \binom{n+r-1}{n}\).
Logarithm and exponential: Formally, \(\exp(z) = \sum z^n/n!\) and \(\log(1+z) = \sum_{n \geq 1} (-1)^{n-1} z^n / n\). These are inverses: \(\exp(\log(1+z)) = 1+z\).
Chapter 4: The Symbolic Method — Building Classes from Pieces
Specifications and Admissible Constructions
The symbolic method is the systematic technique for translating a recursive or combinatorial description of a class into an algebraic equation for its generating function. The key idea is that certain operations on combinatorial classes correspond exactly to operations on generating functions, so that a structural description of \(\mathcal{A}\) automatically yields an equation for \(A(z)\).
An admissible construction is an operation \(\Phi\) on combinatorial classes such that the counting sequence of \(\Phi(\mathcal{A}_1, \ldots, \mathcal{A}_r)\) depends only on the counting sequences of the inputs — not on the specific internal structure of the objects. Sum, product, and sequence are admissible. More exotic constructions involving labels or symmetries can also be made admissible, and these lead to exponential generating functions (Chapter 9).
The atomic class \(\mathcal{Z}\) contains a single object of size 1, so \(Z(z) = z\). The neutral class \(\mathcal{E}\) contains a single object of size 0, so \(E(z) = 1\). These are the building blocks from which more complex classes are assembled.
Sum and Product
The disjoint union (sum) of classes \(\mathcal{C} = \mathcal{A} + \mathcal{B}\) takes objects from \(\mathcal{A}\) and \(\mathcal{B}\), treating them as distinct even if identical. The generating function is simply \(C(z) = A(z) + B(z)\).
The Cartesian product \(\mathcal{C} = \mathcal{A} \times \mathcal{B}\) consists of ordered pairs \((\alpha, \beta)\) with \(|(\alpha,\beta)| = |\alpha| + |\beta|\). Objects of size \(n\) are counted by \(\sum_{k=0}^n a_k b_{n-k} = [z^n] A(z)B(z)\), so \(C(z) = A(z)B(z)\).
A product of \(k\) copies of \(\mathcal{A}\) is written \(\mathcal{A}^k\), with generating function \(A(z)^k\).
Sequences
For a class \(\mathcal{A}\) with no objects of size 0 (i.e., \(A(0) = 0\)), the sequence class is
\[ \mathrm{SEQ}(\mathcal{A}) = \mathcal{E} + \mathcal{A} + \mathcal{A}^2 + \mathcal{A}^3 + \cdots \]consisting of all finite (possibly empty) ordered tuples of objects from \(\mathcal{A}\). Since \(A(0) = 0\), the size of a tuple of length \(k\) is the sum of its components’ sizes, and the generating function converges formally:
\[ \mathrm{SEQ}(\mathcal{A})(z) = \frac{1}{1-A(z)}. \]This requires \(\mathrm{val}(A) \geq 1\) for the formal invertibility of \(1 - A(z)\). Variants include \(\mathrm{SEQ}_{\geq r}\) (sequences of length at least \(r\)), \(\mathrm{SEQ}_{\leq r}\) (at most \(r\)), and parity-restricted sequences:
\[ \mathrm{SEQ}_{\mathrm{even}}(\mathcal{A}): \quad \frac{1}{1-A(z)^2}, \qquad \mathrm{SEQ}_{\mathrm{odd}}(\mathcal{A}): \quad \frac{A(z)}{1-A(z)^2}. \]Rational Specifications
A specification that uses only sum, product, and sequence (with atoms \(\mathcal{Z}\) and \(\mathcal{E}\)) and contains no cycles in its dependency graph is called a rational specification. Every class arising from a rational specification has a rational generating function.
The archetypal example is compositions: an integer composition of \(n\) is an ordered tuple of positive integers summing to \(n\). Taking \(\mathcal{P} = \mathrm{SEQ}_{\geq 1}(\mathcal{Z})\) (positive integers, a single atom or more), the class of compositions is \(\mathcal{C} = \mathrm{SEQ}(\mathcal{P})\), giving
\[ P(z) = \frac{z}{1-z}, \quad C(z) = \frac{1}{1-P(z)} = \frac{1-z}{1-2z}. \]The coefficient \([z^n]C(z) = 2^{n-1}\) for \(n \geq 1\) confirms there are \(2^{n-1}\) compositions of any \(n \geq 1\).
Regular languages (binary strings defined by regular expressions) are another family arising from rational specifications. A block decomposition of a binary string separates it into maximal runs of the same bit. The language of binary strings without two consecutive 1-bits is specified as \(\mathcal{L} = (\mathcal{E} + \mathcal{Z}_1)(\mathcal{Z}_0^* \mathcal{Z}_1)^* \mathcal{Z}_0^*\), giving \(L(z) = (1+z)/(1-z-z^2)\) — the Fibonacci generating function!
Stirling numbers of the second kind also admit a rational specification. A surjection \([n] \to [k]\) corresponds to a string over alphabet \([k]\) where each letter appears at least once and each first occurrence is in order. This gives the generating function
\[ S_k(z) = \frac{z^k}{(1-z)(1-2z)\cdots(1-kz)}, \]from which the closed form \(S(n,k) = k! [z^n] S_k(z)\) follows via partial fractions.
Recursive Specifications and Algebraic Functions
When a specification contains cycles — where the definition of a class refers back to itself — the generating function satisfies a polynomial equation and is typically algebraic (but not rational). The canonical example is binary trees.
A planar binary tree is either a leaf (empty tree) or an internal node with a left and a right subtree, each of which is a binary tree. This gives the specification \(\mathcal{B} = \mathcal{E} + \mathcal{Z} \times \mathcal{B} \times \mathcal{B}\), translating to
\[ B(z) = 1 + z B(z)^2. \]Solving the quadratic: \(B(z) = \frac{1 - \sqrt{1-4z}}{2z}\) (taking the root with \(B(0) = 1\)). Extracting coefficients gives the Catalan numbers:
\[ [z^n] B(z) = C_n = \frac{1}{n+1}\binom{2n}{n}. \]The first several: \(1, 1, 2, 5, 14, 42, 132, 429, \ldots\) The Catalan numbers count an extraordinary variety of objects — Dyck paths, triangulations of polygons, non-crossing partitions, monotone lattice paths, full binary trees, and hundreds more. No other sequence in combinatorics recurs in as many distinct guises.
For rooted planar trees (each node can have any number of children), the specification is \(\mathcal{T} = \mathcal{Z} \times \mathrm{SEQ}(\mathcal{T})\), giving \(T(z) = z/(1-T(z))\), hence \(T(z) = \frac{1-\sqrt{1-4z}}{2}\) and \([z^n]T(z) = \frac{1}{n}\binom{2n-2}{n-1}\) for \(n \geq 1\).
Set, Multiset, and Powerset Constructions
Beyond sequences, two important unordered constructions arise. The multiset construction \(\mathrm{MSET}(\mathcal{A})\) consists of all finite unordered collections (with repetition) of objects from \(\mathcal{A}\):
\[ \mathrm{MSET}(\mathcal{A})(z) = \exp\left(\sum_{k \geq 1} \frac{A(z^k)}{k}\right). \]The powerset construction \(\mathrm{PSET}(\mathcal{A})\) consists of finite subsets (no repetition):
\[ \mathrm{PSET}(\mathcal{A})(z) = \exp\left(\sum_{k \geq 1} \frac{(-1)^{k-1}}{k} A(z^k)\right). \]These formulas are consequences of the Polya–Burnside cycle index theory. They are especially useful for enumerating unlabeled graphs, unlabeled trees, and other objects where two structures are considered the same if one can be obtained from the other by a permutation of labels.
The pointed class \(\mathcal{A}' = \text{POINT}(\mathcal{A})\) marks a distinguished atom in each object of \(\mathcal{A}\):
\[ A'(z) = z \cdot \frac{d}{dz} A(z). \]This is because marking an atom in a size-\(n\) object gives \(n \cdot a_n\) pointed objects of size \(n\), and \(n[z^n]A(z) = [z^n] z A'(z)\). The pointing operation is a formal derivative, and it enables enumeration of rooted structures from unrooted ones.
Chapter 5: The Lagrange Implicit Function Theorem
The Problem of Implicit Equations
The symbolic method often produces equations of the form \(T(z) = z \cdot \Phi(T(z))\), where \(\Phi\) is a given power series with \(\Phi(0) \neq 0\). Binary trees satisfy \(B(z) = 1 + z B(z)^2\) (so \(\Phi(u) = u^2\) and the substitution is slightly different), and complete \(d\)-ary trees satisfy \(T_d(z) = z + z T_d(z)^d\). When \(d \geq 5\), the equation \(z u^d - u + z = 0\) has no solution in radicals — yet we can still extract coefficients explicitly. This is what the Lagrange Implicit Function Theorem (LIFT) accomplishes.
Moreover, for any \(F(u) \in D[[u]]\),
\[ [z^n] F(R(z)) = \frac{1}{n} [u^{n-1}] F'(u) \cdot \Phi(u)^n \quad \text{for all } n \geq 1. \]This formula — the Lagrange inversion formula — is remarkable: it computes the coefficient of \(z^n\) in any function of \(R(z)\) without ever explicitly solving for \(R(z)\). The right-hand side is a finite computation: it requires only the coefficient of \(u^{n-1}\) in a product of formal power series.
The proof uses Laurent series (formal series with finitely many negative powers of \(z\)) and the residue operation \(\mathrm{res}_z F = [z^{-1}]F\). Key steps are: (1) the residue of any formal derivative vanishes; (2) a change-of-variables lemma relating residues under substitution. The proof is elegant but non-trivial; a combinatorial proof due to Raney is in Wagner’s notes.
Applications of LIFT
Catalan numbers: For binary trees \(B = 1 + zB^2\), we have \(z = (B-1)/B^2\), so \(R = B-1\) and \(\Phi(u) = (1+u)^{-2}\). Actually it is cleaner to note \(B(z) = 1 + R(z)\) where \(R\) satisfies \(R = z(1+R)^2\). By LIFT with \(\Phi(u) = (1+u)^2\):
\[ [z^n]R(z) = \frac{1}{n}[u^{n-1}](1+u)^{2n} = \frac{1}{n}\binom{2n}{n-1} = \frac{1}{n+1}\binom{2n}{n} = C_n. \]Trees with restricted degrees: For \(d\)-ary trees (exactly \(d\) children per internal node, except leaves), \(T = z + zT^d\), so \(\Phi(u) = 1 + u^{d-1}\) (after adjustment). LIFT gives \([z^n]T(z) = \frac{1}{n}\binom{dn-n+n}{n-1} / \cdots\) — in general a neat closed form regardless of \(d\).
Lambert \(W\) function: The equation \(W e^W = z\) satisfies \(z = W / e^W\), so \(R = W\) and \(\Phi(u) = e^u\). LIFT gives \([z^n]W(z) = \frac{1}{n}[u^{n-1}]e^{nu} = \frac{n^{n-1}}{n!}\), confirming Cayley’s formula: the number of rooted labeled trees on \(n+1\) vertices is \((n+1)^{n-1}\).
The Kernel Method
A related technique for functional equations is the kernel method, which handles equations of the form \(A(z) = K(z) A(z) + B(z)\) or more generally bivariate equations \(A(z, u) = Q(z, u) A(z, u) + P(z, u)\). One solves for the “kernel” — a substitution \(u = u_0(z)\) that annihilates the coefficient of \(A\) — and then uses this to determine the full generating function. The kernel method is particularly powerful for lattice path problems, where steps can be north, south, east, or west with various weights and boundary conditions.
Part III: Extensions and Refinements
Chapter 6: Parameters and Multivariate Generating Functions
Tracking Additional Information
So far, generating functions have encoded only the count \(a_n = |\mathcal{A}_n|\). But often we want richer information: the distribution of the number of cycles in a random permutation, the average length of the longest increasing subsequence in a random tree, or the variance of the number of fixed points in a derangement. All of these involve a parameter — a function \(p: \mathcal{A} \to \mathbb{Z}\) defined on the objects — and the question is: what does the distribution of \(p\) look like on objects of size \(n\)?
A bivariate generating function \(A(u, z) = \sum_{\alpha \in \mathcal{A}} u^{p(\alpha)} z^{|\alpha|}\) encodes both size and parameter simultaneously. Setting \(u = 1\) recovers the ordinary generating function \(A(1, z) = A(z)\). The coefficient \([z^n]A(u, z)\) is a polynomial in \(u\) whose coefficient of \(u^k\) is the number of size-\(n\) objects with parameter value \(k\).
The admissible constructions extend cleanly to bivariate generating functions. For disjoint union, sum; for product, the parameter usually adds componentwise; for sequences, parameters accumulate. Formally: if \(p: \mathcal{A} \times \mathcal{B} \to \mathbb{Z}\) is defined as \(p(\alpha, \beta) = p_\mathcal{A}(\alpha) + p_\mathcal{B}(\beta)\), then the bivariate GF of the product class is \(A(u,z) \cdot B(u,z)\).
Expected Values and Variance
Once we have \(A(u, z)\), computing moments is a matter of differentiation with respect to \(u\).
The variance is \(\mathrm{Var}_n[p] = \mathbb{E}_n[p^2] - (\mathbb{E}_n[p])^2\), where the second moment involves \(\partial_u^2 A\) as well.
In practice, one computes \(A_u(1,z) = \frac{\partial}{\partial u} A(u,z)\big|_{u=1}\) as a univariate generating function and then extracts its \(n\)-th coefficient.
Examples
Zeros in binary strings: Let \(A(u, z) = \mathrm{SEQ}(\mathcal{Z}_1 + u\mathcal{Z}_0)\) (sequences of ones and \(u\)-weighted zeros). Then \(A(u, z) = 1/(1-z-uz)\). Setting \(u=1\) gives \(1/(1-2z)\) with \([z^n] = 2^n\), the count of binary strings of length \(n\). We have \(A_u(1,z) = z/(1-2z)^2\), so \(\mathbb{E}_n[\text{zeros}] = n/2\). By symmetry this is not surprising, but the bivariate approach would also give \(\mathrm{Var}_n[\text{zeros}] = n/4\) — confirming the Binomial\((n, 1/2)\) distribution.
Leaves in binary trees: A leaf in a binary tree is an external node. The bivariate GF satisfies \(B(u,z) = u + z B(u,z)^2\) (substituting \(1 \mapsto u\) marks leaves). Setting \(u=1\): \(B_u(1,z)\) satisfies a linear equation derived by differentiating implicitly. One finds \(\mathbb{E}_n[\text{leaves}] = (n+1)/2\) for binary trees with \(n\) internal nodes — exactly half the total nodes are leaves.
Harmonic numbers in permutations: Via EGFs (Chapter 9), the expected number of cycles in a uniformly random permutation of length \(n\) is exactly the harmonic number \(H_n = \sum_{k=1}^n 1/k \sim \ln n\). This is a beautiful instance where a parameter with a simple combinatorial definition (count the cycles) has an exact mean expressible in terms of a classical special function.
Chebyshev’s Inequality and Concentration
Once we know \(\mathbb{E}_n[p]\) and \(\mathrm{Var}_n[p]\), Chebyshev’s inequality gives tail bounds: \(\Pr[|p - \mathbb{E}_n[p]| \geq t] \leq \mathrm{Var}_n[p]/t^2\). When the variance grows much more slowly than \((\mathbb{E}_n[p])^2\) — for instance, if \(\mathrm{Var}_n[p] = O(\mathbb{E}_n[p])\) — the parameter concentrates around its mean. This is the combinatorial analogue of the law of large numbers, and it guarantees that the typical object has a very predictable value of \(p\).
Chapter 7: q-Analogues and Polynomial Refinements
The Idea of a q-Analogue
A q-analogue of a classical identity or combinatorial count is a polynomial (or formal power series) in a new variable \(q\) such that setting \(q = 1\) recovers the classical result. The variable \(q\) typically tracks some additional parameter — most often the number of inversions in a permutation or the area under a lattice path — so the q-analogue is not just a formal generalization but a counting refinement.
The q-integer for \(k \in \mathbb{N}\) is
\[ [k]_q = 1 + q + q^2 + \cdots + q^{k-1} = \frac{1-q^k}{1-q} \quad (q \neq 1). \]At \(q=1\), \([k]_1 = k\). The q-factorial is \([n]!_q = [n]_q [n-1]_q \cdots [1]_q [0]_q\), with \([0]!_q = 1\). The q-binomial coefficient (also called the Gaussian binomial coefficient) is
\[ \binom{n}{k}_q = \frac{[n]!_q}{[k]!_q \cdot [n-k]!_q}. \]At \(q=1\), these recover the ordinary binomial coefficient \(\binom{n}{k}\).
Inversions and q-Factorials
An inversion of a permutation \(\sigma = a_1 \cdots a_n\) is a pair \((i,j)\) with \(i < j\) and \(a_i > a_j\). The total number of inversions is \(\mathrm{inv}(\sigma) = \sum_i r_i\) (using the Lehmer code notation from Chapter 2). The q-factorial theorem states:
The proof uses the Lehmer code bijection. Since \(\mathrm{inv}(\sigma) = \sum_{i=1}^n r_i\) and \((r_1, \ldots, r_n) \in [0] \times [1] \times \cdots \times [n-1]\) ranges over all choices independently, the generating function factors:
\[ \sum_\sigma q^{\mathrm{inv}(\sigma)} = \prod_{i=0}^{n-1} \sum_{r=0}^i q^r = \prod_{i=1}^n [i]_q = [n]!_q. \]q-Binomial Coefficients and Subspaces
The q-binomial coefficient \(\binom{n}{k}_q\) has a beautiful combinatorial interpretation over finite fields. Let \(\mathbb{F}_q\) be the field with \(q\) elements (where \(q\) is a prime power). The number of \(k\)-dimensional subspaces of the \(n\)-dimensional vector space \(\mathbb{F}_q^n\) is exactly \(\binom{n}{k}_q\). As \(q \to 1\), a \(k\)-dimensional subspace “becomes” a \(k\)-element subset (of a basis), recovering \(\binom{n}{k}\).
The q-binomial coefficient also counts lattice paths with an area statistic:
Setting \(q = 1\) recovers \(|L(a,b)| = \binom{a+b}{b}\). The proof proceeds by observing that the area statistic satisfies the same Pascal-type recurrence as the q-binomial coefficients: the top or left step of a path splits the area naturally.
The q-Binomial Theorem
The left side is the generating function of subsets of \([n]\), weighted by \(q^{\sum_{i \in S} i}\). The right side groups by subset size \(k\) and uses the q-binomial coefficient as the weighted count. Setting \(q=1\) gives \((1+z)^n = \sum_{k=0}^n \binom{n}{k} z^k\), the ordinary binomial theorem.
The q-Binomial theorem is one of a family of q-series identities — most famously Euler’s identities for q-products and the Jacobi triple product — that arise throughout the theory of integer partitions, modular forms, and number theory.
q-Analogue Perspective
The role of the variable \(q\) is to turn a combinatorial count into a polynomial that “remembers” how objects are distributed by a secondary statistic. This perspective is extremely powerful: it often happens that the q-analogue is just as natural as the classical count, and the limiting case \(q \to 1\) is actually the harder formula to prove directly. The q-theory also connects combinatorics to representation theory (Hecke algebras, symmetric functions) and physics (quantum groups, statistical mechanics partition functions), making it a crossroads of modern mathematics.
Chapter 8: Integer Partitions
Definitions and Conventions
An integer partition of \(n\) is a way of writing \(n\) as an unordered sum of positive integers, the parts. By convention, parts are written in weakly decreasing order: \(\lambda = (\lambda_1 \geq \lambda_2 \geq \cdots \geq \lambda_\ell > 0)\) with \(|\lambda| = \sum \lambda_i = n\). We write \(\ell(\lambda)\) for the number of parts and \(p(n)\) for the total number of partitions of \(n\). By convention, \(p(0) = 1\) (the empty partition).
The values begin \(p(0) = 1, p(1) = 1, p(2) = 2, p(3) = 3, p(4) = 5, p(5) = 7, p(6) = 11, \ldots\) The sequence grows rapidly: \(p(100) = 190{,}569{,}292\) and \(p(1000) \approx 2.4 \times 10^{31}\).
The Partition Generating Function
The proof is elegant: \(\frac{1}{1-z^k} = 1 + z^k + z^{2k} + \cdots\) is the generating function for the number of times part \(k\) appears, ranging over \(0, 1, 2, \ldots\). Forming the product over all part sizes \(k = 1, 2, 3, \ldots\) independently selects how many copies of each part to include — precisely the data of a partition. The product converges formally since each factor contributes finitely to each coefficient.
More generally, one can track both size and number of parts via a bivariate generating function:
\[ \sum_{\lambda} u^{\ell(\lambda)} z^{|\lambda|} = \prod_{k \geq 1} \frac{1}{1-uz^k}. \]For partitions with parts restricted to a set \(S \subseteq \mathbb{Z}_{>0}\), the generating function is \(\prod_{k \in S} \frac{1}{1-z^k}\). For partitions with distinct parts (each part at most once), the generating function is \(\prod_{k \geq 1} (1 + z^k)\).
Euler’s Identity
One of the most striking facts about partitions is a pair of equalities that are far from obvious at first sight:
The generating function proof: the product for odd-part partitions is \(\prod_{k \geq 0} \frac{1}{1-z^{2k+1}}\), while the product for distinct-part partitions is \(\prod_{k \geq 1} (1+z^k)\). Using \(1+z^k = \frac{1-z^{2k}}{1-z^k}\), the distinct-part product telescopes:
\[ \prod_{k \geq 1} (1+z^k) = \prod_{k \geq 1} \frac{1-z^{2k}}{1-z^k} = \frac{\prod_{k \geq 1}(1-z^{2k})}{\prod_{k \geq 1}(1-z^k)} = \frac{1}{\prod_{k \geq 0}(1-z^{2k+1})} = \prod_{k \geq 0} \frac{1}{1-z^{2k+1}}. \]The intermediate cancellation between even-\(k\) factors in numerator and denominator is the key step. Bijective proofs of this identity also exist and are quite illuminating.
Ferrers Diagrams and Conjugation
A partition \(\lambda = (\lambda_1, \lambda_2, \ldots, \lambda_\ell)\) is visualized as a Ferrers diagram — an array of dots (or boxes, giving a Young diagram) with \(\lambda_i\) dots in row \(i\). The conjugate partition \(\lambda'\) has \(\lambda'_j\) equal to the number of rows of \(\lambda\) with at least \(j\) parts. Geometrically, \(\lambda'\) is the Ferrers diagram of \(\lambda\) reflected over the main diagonal.
Conjugation gives a bijection between partitions with at most \(k\) parts and partitions with maximum part at most \(k\), establishing: for all \(n\) and \(k\), the number of partitions of \(n\) into at most \(k\) parts equals the number of partitions of \(n\) with maximum part at most \(k\).
Durfee Squares and the Euler Identity
The Durfee square of \(\lambda\) is the largest square that fits in the upper-left corner of the Ferrers diagram. Its side length \(d(\lambda)\) is the largest \(d\) with \(\lambda_d \geq d\). Decomposing a partition by its Durfee square (a \(d \times d\) square, a partition to the right with at most \(d\) columns, and a partition below with parts at most \(d\)) gives the identity:
\[ \prod_{j \geq 1} \frac{1}{1-z^j} = \sum_{d \geq 0} \frac{z^{d^2}}{\prod_{i=1}^d (1-z^i)^2}, \]known as the Euler-Durfee identity. The \(d^2\) in the exponent accounts for the \(d \times d\) square, and the two factors \(\prod(1-z^i)^{-1}\) account for the two “L-shaped” strips.
Partition Asymptotics: The Hardy–Ramanujan Formula
The asymptotic growth of \(p(n)\) is one of the triumphs of early 20th-century mathematics. Hardy and Ramanujan (1918) proved:
\[ p(n) \sim \frac{1}{4n\sqrt{3}} \exp\!\left(\pi\sqrt{\frac{2n}{3}}\right) \quad \text{as } n \to \infty. \]The proof uses the saddle-point method in complex analysis applied to the partition generating function. The sub-exponential factor \(1/(4n\sqrt{3})\) is known as well; Rademacher later gave an exact convergent series for \(p(n)\), not just asymptotics.
Chapter 9: Labelled Structures and Exponential Generating Functions
Why Labels?
The constructions of Part II work for unlabelled classes, where objects are considered up to structural identity. But many important combinatorial classes — permutations, set partitions, mappings, graphs with labeled vertices — consist of objects whose internal components carry labels from a set \([n] = \{1, \ldots, n\}\). Merging labeled objects requires a relabeling protocol to avoid conflicts, and this protocol introduces an extra factor that changes the generating function from ordinary to exponential.
The exponential generating function (EGF) of a sequence \((c_n)_{n \geq 0}\) is
\[ C(z) = \sum_{n \geq 0} c_n \frac{z^n}{n!}. \]The \(n!\) in the denominator is what makes EGFs natural for labeled structures: if \(\mathcal{C}\) is a labeled class with \(c_n\) objects of size \(n\), then labeling a size-\(n\) object requires choosing a bijection from its \(n\) atoms to \([n]\), and this is exactly the \(n!\) factor that appears.
The Labeled Product
For labeled classes \(\mathcal{A}\) and \(\mathcal{B}\), the labeled product \(\mathcal{A} \star \mathcal{B}\) consists of ordered pairs \((\alpha, \beta)\) where the combined labels are \([n]\), distributed between \(\alpha\) (which keeps a subset \(S \subseteq [n]\) of size \(|\alpha|\), relabeled to \([|\alpha|]\)) and \(\beta\) (keeping \([n] \setminus S\), relabeled). For each split of \(n\) labels into sets of sizes \(k\) and \(n-k\), there are \(\binom{n}{k}\) ways to choose which labels go to \(\alpha\). Hence:
\[ c_n = \sum_{k=0}^n \binom{n}{k} a_k b_{n-k} \implies C(z) = A(z) B(z) \]as EGFs! The binomial \(\binom{n}{k}\) is absorbed into the EGF denominator, so labeled products become ordinary products of EGFs.
Set and Cycle Constructions
For labeled classes, the natural unordered construction is the set:
\[ \mathrm{SET}(\mathcal{A}): \quad \text{unordered labeled sets of objects from } \mathcal{A} \implies C(z) = \exp(A(z)). \]The cycle construction arranges objects in a cyclic order (two arrangements are the same if one is a rotation of the other):
\[ \mathrm{CYC}(\mathcal{A}): \quad C(z) = \log\!\left(\frac{1}{1-A(z)}\right) = \sum_{k \geq 1} \frac{A(z)^k}{k}. \]These two constructions are inverses in the sense that \(\mathrm{SET}(\mathrm{CYC}(\mathcal{Z})) = \) permutations: a permutation of \([n]\) decomposes uniquely into a set of cycles. This gives the EGF of permutations as \(\exp(\log(1/(1-z))) = 1/(1-z)\), consistent with \(n! \cdot [z^n/(n!)] = n!\).
Key Examples
Derangements: A derangement is a permutation with no fixed points. Fixed points are 1-cycles, so \(\mathcal{D} = \mathrm{SET}(\mathrm{CYC}_{\geq 2}(\mathcal{Z}))\). The EGF is
\[ D(z) = \exp\!\left(\sum_{k \geq 2} \frac{z^k}{k}\right) = \exp(-z + \log(1/(1-z))) = \frac{e^{-z}}{1-z}. \]Extracting coefficients: \(d_n = n! [z^n] D(z) = n! \sum_{k=0}^n (-1)^k/k!\), recovering the formula from Chapter 2.
Set partitions (Bell numbers): A set partition of \([n]\) is a set of non-empty subsets. So \(\mathcal{S} = \mathrm{SET}(\mathrm{SET}_{\geq 1}(\mathcal{Z}))\), giving
\[ S(z) = \exp(e^z - 1). \]The Bell numbers \(B_n = n! [z^n] S(z)\) count set partitions: \(B_0 = 1, B_1 = 1, B_2 = 2, B_3 = 5, B_4 = 15, B_5 = 52, \ldots\)
Labeled rooted trees (Cayley’s formula): A labeled rooted tree on \([n]\) has the EGF \(T(z)\) satisfying \(T = z e^T\) (a tree is a root plus a set of labeled subtrees). By LIFT with \(\Phi(u) = e^u\):
\[ [z^n] T(z) = \frac{1}{n} [u^{n-1}] e^{nu} = \frac{n^{n-1}}{n!}. \]So the number of labeled rooted trees on \([n]\) is \(n! \cdot n^{n-1}/n! = n^{n-1}\). Counting unrooted labeled trees on \([n]\) gives \(n^{n-2}\) — this is Cayley’s formula, which LIFT proves effortlessly.
Mappings: A mapping \(f: [n] \to [n]\) is a functional directed graph on \([n]\) where each node has out-degree 1. Every weakly connected component of a functional graph consists of a single rho-shaped path: a cycle of trees hanging off it. Thus \(\mathcal{M} = \mathrm{SET}(\mathrm{CYC}(\mathcal{T}))\) where \(\mathcal{T}\) is the class of labeled rooted trees, giving \(M(z) = \exp(\log(1/(1-T(z)))) = 1/(1-T(z))\) and \(m_n = n^n\) (since there are \(n^n\) functions \([n] \to [n]\)).
Average Number of Cycles in Permutations
This is a beautiful application of bivariate EGFs. Let \(\pi(u,z) = 1/(1-z)^u\) be the bivariate EGF of permutations where \(u\) marks the number of cycles (arising from \(\mathrm{SET}(\mathrm{CYC}(\mathcal{Z}))\) with \(u\) marking each cycle). Differentiating in \(u\) at \(u = 1\):
\[ \partial_u \pi(u,z)\big|_{u=1} = \frac{\log(1/(1-z))}{1-z}. \]The \(n\)-th coefficient of this (times \(n!\)) is \(\sum_{k=1}^n 1/k = H_n\). So the average number of cycles in a random permutation of \([n]\) is the \(n\)-th harmonic number \(H_n \sim \ln n\).
Part IV: Asymptotics
Chapter 10: C-Finite Sequences and Rational Asymptotics
C-Finite Sequences
A sequence \((f_n)_{n \geq 0}\) is C-finite (or constant-recursive) if it satisfies a linear recurrence with constant coefficients:
\[ f_n + c_1 f_{n-1} + \cdots + c_r f_{n-r} = 0 \quad \text{for all } n \geq r, \]for some constants \(c_1, \ldots, c_r \in \mathbb{C}\) with \(c_r \neq 0\). The Fibonacci sequence, the number of compositions, the Catalan numbers modulo any fixed prime, and counting sequences of any rational language are C-finite. The connection to generating functions is:
The characteristic polynomial of the recurrence is \(\chi(\lambda) = \lambda^r + c_1 \lambda^{r-1} + \cdots + c_r\). If the roots of \(\chi\) (equivalently, the reciprocals of the poles of \(F\)) are \(\lambda_1, \ldots, \lambda_s\) with multiplicities \(m_1, \ldots, m_s\), then by partial fraction decomposition:
\[ f_n = \sum_{i=1}^s p_i(n) \lambda_i^{-n}, \]where each \(p_i\) is a polynomial of degree at most \(m_i - 1\). This is the C-finite coefficient theorem: every C-finite sequence is an exact combination of geometric progressions multiplied by polynomial factors.
Dominant Poles and Asymptotic Behavior
The dominant poles of \(F(z)\) are those with minimum modulus. If there is a unique dominant pole at \(z = w\) (a simple pole), then
\[ f_n \sim \frac{-G(w)}{H'(w)} \cdot w^{-(n+1)} = C \cdot \lambda_{\min}^n \cdot n^0, \]for the root \(\lambda_{\min} = 1/w\) of maximum modulus. If there are multiple dominant poles or higher-order poles, the asymptotics are modified accordingly.
For generating functions with non-negative coefficients, the Vivanti–Pringsheim theorem guarantees that the dominant singularity lies on the positive real axis. This is extremely useful: it means that for most combinatorial generating functions, finding the asymptotics reduces to analyzing a positive real root of the denominator.
Berstel’s theorem characterizes the possible exponential bases for \(\mathbb{N}\)-rational functions (those arising from rational specifications with natural number coefficients): the roots of \(H\) must satisfy \(\lambda^r = |\lambda|^r\), which means they lie on concentric circles of equal radius. This forces periodic behavior in the subleading terms.
Efficient Computation
A C-finite recurrence of order \(r\) allows the \(N\)-th term to be computed in \(O(r^3 \log N)\) operations by matrix exponentiation: expressing \((f_n, f_{n-1}, \ldots, f_{n-r+1})\) as a matrix power times the initial conditions. The matrix is \(r \times r\), and computing its \(N\)-th power takes \(O(\log N)\) matrix multiplications. For the Fibonacci sequence (order 2), this means computing \(F_{10^6}\) — a number with over 200,000 digits — in a fraction of a second.
Open Problems
Two deep open problems about C-finite sequences illustrate the frontier of the subject:
The Skolem problem asks: given a C-finite sequence, does it contain a zero? For sequences satisfying recurrences of order at most 4, this is decidable, but the general case (order \(\geq 5\)) remains open — a rare example of a natural computational problem whose decidability is unknown.
Ultimate positivity asks: given a C-finite sequence, does there exist \(N\) such that \(f_n > 0\) for all \(n > N\)? This is undecidable in general, and even for specific low-order cases the question is subtle.
Chapter 11: Analytic Combinatorics and Singularity Analysis
From Generating Functions to Asymptotics
The rational case of the previous chapter exploits the algebraic structure of the generating function. For algebraic or D-finite generating functions — those arising from recursive specifications, like the Catalan numbers — we need analytic tools from complex analysis. The bridge is the Cauchy coefficient formula:
\[ [z^n] F(z) = \frac{1}{2\pi i} \oint F(z) z^{-(n+1)}\,dz, \]where the integral is taken over a small circle around the origin. To extract the \(n\)-th coefficient, one deforms this contour and uses the residues of the integrand at its poles — or, for algebraic singularities, asymptotic contributions from branch points.
Radius of Convergence and Exponential Growth
For a power series \(F(z) = \sum a_n z^n\) with non-negative coefficients, the radius of convergence \(R\) is the modulus of the nearest singularity. The root test gives \(\limsup |a_n|^{1/n} = 1/R = \rho\), so \(a_n = \rho^n \cdot \theta(n)\) where \(\theta(n)\) is the subexponential growth factor determined by the type of singularity.
For combinatorial generating functions with non-negative coefficients, the Vivanti–Pringsheim theorem guarantees the dominant singularity lies on the positive real axis. Finding it reduces to finding the smallest positive real root of the denominator (for rational GFs) or of the minimal polynomial (for algebraic GFs).
Types of Singularities and Transfer Theorems
The key insight of analytic combinatorics (due largely to Flajolet and Sedgewick) is that the type of singularity determines the form of the asymptotics:
| Singularity type | Example | Asymptotic form |
|---|---|---|
| Simple pole at \(z = w\) | Rational GF | \(C \cdot (1/w)^n\) |
| Algebraic branch point: \((1-z/w)^\alpha\) | \(\alpha \notin \mathbb{Z}\) | \(C n^{-\alpha-1} (1/w)^n\) |
| Logarithmic: \(\log(1-z/w)\) | EGF cycles | \(C n^{-1} (1/w)^n\) |
For algebraic generating functions with a dominant singularity of the form \(F(z) \sim c_0 + c_1 \sqrt{1 - z/\rho}\) near \(z = \rho\), the coefficients satisfy \([z^n]F(z) \sim -c_1 (n\rho)^{-1/2} \cdot 4^n\) (roughly). The Catalan numbers illustrate this perfectly: \(C_n = \frac{1}{n+1}\binom{2n}{n} \sim \frac{4^n}{n^{3/2}\sqrt{\pi}}\).
The Meromorphic Asymptotics Theorem
When a generating function is meromorphic (analytic except for isolated poles) in a region larger than the circle of convergence, we can use the residue theorem to extract asymptotics more precisely.
Applications include surjections (\(s_n \sim n!/(2(\log 2)^{n+1})\)), restricted permutations, and “alignment” structures.
The Transfer Lemma
For algebraic generating functions with a square-root singularity \(F(z) = c_0 - c_1\sqrt{1-z/\rho} + O((1-z/\rho))\), the transfer lemma (Flajolet–Odlyzko) gives:
\[ [z^n] F(z) \sim \frac{c_1}{2\sqrt{\pi}} n^{-3/2} \rho^{-n}. \]The key \(n^{-3/2}\) factor is the hallmark of square-root singularities and appears ubiquitously: Catalan numbers, the number of planar maps, paths avoiding a boundary. A cube-root singularity produces \(n^{-5/3}\), and so on. This systematic translation from singularity type to coefficient asymptotics is the heart of the “singularity analysis” program.
Two Fundamental Principles
The analytic combinatorics approach rests on two principles that together reduce asymptotic extraction to singularity analysis:
Location principle: The position of the dominant singularity determines the exponential growth rate \(\rho^n\).
Type principle: The nature of the dominant singularity (pole, algebraic branch point, logarithm, essential singularity) determines the subexponential factor \(\theta(n)\).
Together, these principles mean that to find \([z^n]F(z)\) asymptotically, one needs to: (a) find the dominant singularity, and (b) analyze the local behavior of \(F\) near that singularity. Both steps are algorithmic — especially with a computer algebra system.
Chapter 12: Random Generation
Uniform Generation
A uniform random generation algorithm for a combinatorial class \(\mathcal{C}\) takes as input a size \(n\) and outputs a uniformly random element of \(\mathcal{C}_n\). Exact uniformity — not just approximate — is required.
For simple classes, direct methods work: a uniformly random permutation of \([n]\) can be generated using a Fisher–Yates shuffle in \(O(n)\) time. A uniformly random binary string of length \(n\) is generated by \(n\) independent fair coin flips. For classes defined by generating functions, the technique of recursive random sampling exploits the decomposition tree of a class specification.
Recursive sampling for binary trees: a tree of size \(n\) consists of a root, a left subtree of size \(k\), and a right subtree of size \(n-1-k\), for some \(0 \leq k \leq n-1\). Since there are \(C_k \cdot C_{n-1-k}\) such trees and \(C_{n-1}\) trees total, the left subtree size \(k\) is chosen with probability \(C_k C_{n-1-k}/C_{n-1}\). Given \(k\), recurse independently for both subtrees. The algorithm runs in \(O(n)\) random choices after precomputing the Catalan numbers. This technique generalizes to any class specified by a finite system of combinatorial equations.
Boltzmann Sampling
Recursive sampling requires precomputing all values \(c_k\) for \(k \leq n\), which is expensive for large \(n\). Boltzmann sampling offers an elegant alternative that avoids exact size control at the cost of exact size control.
For an admissible parameter value \(v \in (0, R)\) where \(R\) is the radius of convergence of \(C(z)\), the Boltzmann distribution at \(v\) assigns each object \(\alpha \in \mathcal{C}\) probability
\[ \mathbb{P}_v(\alpha) = \frac{v^{|\alpha|}}{C(v)}. \]The expected size of a random sample is \(v C'(v)/C(v)\). By choosing \(v\) to make this expected size equal to the target \(n\), and then sampling and rejecting until an object of size in \([(1-\varepsilon)n, (1+\varepsilon)n]\) is obtained, one gets approximate-size samples in \(O(n)\) expected operations.
Boltzmann sampling is especially valuable for large structures. The Boltzmann sampler for a class with specification \(\mathcal{C} = \mathcal{A} + \mathcal{B}\) flips a weighted coin (with probabilities proportional to \(A(v)\) and \(B(v)\)) and then recursively samples from the chosen class. For a product \(\mathcal{C} = \mathcal{A} \times \mathcal{B}\), sample from \(\mathcal{A}\) and \(\mathcal{B}\) independently. For a sequence \(\mathcal{C} = \mathrm{SEQ}(\mathcal{A})\), the length \(k\) is geometric with parameter \(A(v)\).
Applications of Random Generation
Random generation is not just a computational curiosity — it is a research tool. By generating large random objects and measuring their statistics, one can:
- Discover conjectures: a random planar map of a million faces reveals limit shape behavior invisible in small examples.
- Verify asymptotic formulas: the empirical distribution of a parameter on random size-\(n\) objects should match the predicted limiting distribution.
- Test algorithms: a random binary tree of depth \(\sim 2 \log n\) tests the expected-case behavior of tree search algorithms.
Normal distributions appear ubiquitously in the limit. The number of leaves, the height, the width, the number of internal nodes of given degree — all of these, properly normalized, converge to Gaussian distributions for most standard tree classes. This “Gaussian universality” is explained by singularity analysis: when the generating function has a square-root singularity and the parameter generating function has a similar local structure, the central limit theorem follows from a transfer theorem for characteristic functions.
Course notes based on the online text An Invitation to Enumeration by Stephen Melczer, supplemented by the lecture notes of David G. Wagner (C&O 330, University of Waterloo). The online text is the primary resource for the course; Wagner’s notes provide additional depth on bijective proofs and classical results. For further reading, see H. Wilf’s generatingfunctionology and Melczer’s An Invitation to Analytic Combinatorics. Course page: melczer.ca/330/.