CO 370: Deterministic Operations Research Models

Bertrand Guenin

Estimated study time: 39 minutes

Table of contents

Based on lecture notes by Sibelius Peng — PDF source

Sources and References

Primary textbook — Hillier, F.S. and Lieberman, G.J. Introduction to Operations Research, 10th ed. McGraw-Hill, 2015. Supplementary texts — Bertsimas, D. and Tsitsiklis, J.N. Introduction to Linear Optimization. Athena Scientific, 1997. Williams, H.P. Model Building in Mathematical Programming, 5th ed. Wiley, 2013. Ben-Tal, A. and Nemirovski, A. Lectures on Modern Convex Optimization. MPS-SIAM, 2001. Online resources — MIT 15.053 Optimization Methods in Management Science (OCW). Stanford MS&E 310 Linear Programming. Cornell ORIE 3300 Optimization (available at courses.cs.cornell.edu). MIT 6.255/15.093 Optimization Methods (Ben-Tal et al. lecture notes). Lecture notes — Peng, S. CO 370 Lecture Notes (Winter 2022). https://pdf.sibeliusp.com/1221/co370.pdf

Chapter 1: Mathematical Programming and Model Formulation

Operations research translates real-world decisions into mathematical programs and then solves them. A mathematical program has three ingredients: decision variables that represent the choices to be made, an objective function to optimise, and constraints that limit the feasible choices. The power of OR lies in the breadth of problems that fit this framework — from scheduling surgeries to routing mail to designing portfolios — and in the rich theory of solution algorithms.

Linear Programming Formulations

\[ \max\; c^T x \quad \text{subject to}\quad Ax \leq b,\; x \geq 0. \]

Here $x \in \mathbb{R}^n$ is the decision vector, $c \in \mathbb{R}^n$ is the cost/profit vector, $A \in \mathbb{R}^{m \times n}$ is the constraint matrix, and $b \in \mathbb{R}^m$ is the right-hand side.

Production problems are the canonical LP formulation. A factory produces $n$ products using $m$ resources. Product $j$ uses $a_{ij}$ units of resource $i$, earns profit $c_j$ per unit, and at most $b_i$ units of resource $i$ are available. The LP maximises total profit subject to resource limits. LP is the right model when all relationships are linear — proportional resource consumption, fixed unit profits, divisible production quantities.

Minimax problems (minimise the maximum of several objectives) are also LP-representable. To minimise $\max_{i} f_i(x)$, introduce a variable $t$ and add constraints $f_i(x) \leq t$ for all $i$, then minimise $t$. This epigraph reformulation is a fundamental technique that converts a seemingly non-linear objective into a linear one.

Network Flow Formulations

Network flows capture a huge class of practical problems: shipping goods through a logistics network, routing packets through a communication network, scheduling jobs through a factory.

\[ \max\;\sum_{e = (s,v)} x_e - \sum_{e = (v,s)} x_e \quad \text{s.t.}\quad 0 \leq x_e \leq u_e,\quad \sum_{e = (v,w)} x_e = \sum_{e = (u,v)} x_e \;\forall v \neq s,t. \]

Min-cost flow problem: generalises max-flow by assigning costs to flows. Each edge has a capacity $u_e$, a lower bound $\ell_e$, and a cost $c_e$ per unit of flow. The objective minimises total cost $\sum_e c_e x_e$ subject to capacity and conservation constraints. Min-cost flow subsumes shortest path (unit demands), max-flow (zero costs), transportation (bipartite graph), and assignment problems.

The LP relaxation of min-cost flow is integral whenever the right-hand sides are integers — a consequence of the total unimodularity of the constraint matrix (the node-arc incidence matrix). This means the simplex method automatically finds an integer optimal solution, without needing integer programming.

Integer Programming Formulations

\[ \max\; c^T x \quad \text{s.t.}\quad Ax \leq b,\; x \in \mathbb{Z}^n,\; x \geq 0. \]

A mixed-integer program (MIP) allows some variables to be continuous. Binary variables ($x_j \in \{0,1\}$) model logical decisions.

IP modelling tricks that extend the expressiveness of IP formulations:

Absolute value $|z|$: introduce $z^+, z^- \geq 0$ with $z = z^+ - z^-$; then $|z| = z^+ + z^-$. Works when minimising $|z|$ (the LP will not simultaneously drive both positive).
Piecewise linear functions: a function that is linear on intervals $[a_0, a_1], [a_1, a_2], \ldots$ can be expressed using auxiliary continuous variables $\lambda_i \geq 0$ (convex combination variables) if the function is convex, or with binary variables for the general case.
Logical implications and disjunctions ($P \Rightarrow Q$, or enforce at least one of several constraints): model with binary variables and big-M constraints $g_j(x) \leq M(1 - y)$ where $y \in \{0,1\}$ indicates which constraint is active.
Union of polyhedra: model feasible region as $x \in P_1 \cup P_2 \cup \cdots \cup P_k$ using binary variables $y_j$ and the constraint $x \in P_j$ only when $y_j = 1$.

Perfect formulations: an IP has a perfect formulation if its LP relaxation is already integral (has an integer optimal solution for all integer right-hand sides). This means the LP can be solved in polynomial time and the IP solution is found automatically. The LP relaxation’s feasible polytope must be an integer hull — the convex hull of integer feasible points. Network flow problems have perfect formulations; general IPs do not.

Cone Programming

\[ \min\; c^T x \quad \text{s.t.}\quad Ax = b,\; x \in \mathcal{K}, \]

where $\mathcal{K}$ is a closed convex cone. LP corresponds to $\mathcal{K} = \mathbb{R}_+^n$ (the nonneg orthant). Other important cones:

Second-order cone (SOC): $\mathcal{K}_\text{SOC} = \{(x, t) : \|x\| \leq t\}$. Second-order cone programs (SOCPs) capture norm minimisation, robust LPs, and portfolio risk constraints.
Semidefinite cone: $\mathcal{K}_\text{SDP} = \{X \in \mathbb{R}^{n\times n} : X \succeq 0\}$. Semidefinite programs (SDPs) arise in relaxations of combinatorial problems (Goemans–Williamson MAX-CUT), control theory, and covariance estimation.

A general SOCP minimises a linear objective subject to second-order cone constraints $\|A_i x + b_i\| \leq c_i^T x + d_i$. SOCPs can be solved in polynomial time by interior-point methods and handle the ellipsoidal uncertainty sets that arise in robust optimisation.

Robust Optimisation

\[ \min_{x} \max_{(A, b, c) \in \mathcal{U}}\; c^T x \quad \text{s.t.}\quad Ax \leq b \;\forall (A, b, c) \in \mathcal{U}. \]

The tractability of the robust problem depends on the shape of $\mathcal{U}$:

Ball uncertainty $\mathcal{U} = \{a : \|a - \bar{a}\| \leq \rho\}$: the robust constraint $a^T x \leq b$ for all $a$ in the ball becomes $\bar{a}^T x + \rho \|x\| \leq b$ — a second-order cone constraint. Thus, robust LPs with ball uncertainty are equivalent to SOCPs.

Box uncertainty $\mathcal{U} = \{a : |\Delta a_j| \leq \rho_j\}$: the robust constraint becomes $\bar{a}^T x + \sum_j \rho_j |x_j| \leq b$ — still a linear program (via absolute value linearisation). Box robust LPs are solvable in polynomial time but are conservative (every component is at its worst simultaneously).

Chapter 2: Duality and the Economic Interpretation of LP

Duality is the deepest structural result in linear programming. Every LP (the primal) has an associated dual LP whose optimal value equals the primal optimal (strong duality). The dual variables have natural economic interpretations and are central to sensitivity analysis.

Duality Theory

\[ \min\; b^T y \quad \text{s.t.}\quad A^T y \geq c,\; y \geq 0. \]

Every feasible primal solution $x$ and dual solution $y$ satisfy $c^T x \leq b^T y$ (weak duality).

\[ \max\{c^T x : Ax \leq b, x \geq 0\} = \min\{b^T y : A^T y \geq c, y \geq 0\}. \]

Strong duality has a striking consequence: the primal optimal value is exactly the infimum of all upper bounds on it provided by dual feasible solutions. To certify optimality of $x^*$, exhibit a dual feasible $y^*$ with $c^T x^* = b^T y^*$.

\[ x_j^* (A^T y^* - c)_j = 0 \;\forall j, \qquad y_i^* (b - Ax^*)_i = 0 \;\forall i. \]

In words: if a primal variable $x_j^* > 0$, the $j$th dual constraint must be tight; if a dual variable $y_i^* > 0$, the $i$th primal constraint must be tight. Complementary slackness is both necessary and sufficient for optimality (given primal and dual feasibility).

Economic Interpretation of Dual Variables

In the production problem context, dual variable $y_i$ is the shadow price of resource $i$: the rate at which the optimal objective value increases per unit increase in $b_i$. If an additional unit of resource $i$ is available, the improvement in total profit equals $y_i^*$. This is the marginal value of the resource, which motivates calling dual variables “prices.”

Strong duality says: the total profit from the optimal product mix equals the total value of the resources consumed (weighted by shadow prices). Equivalently, the primal solution assigns all resource value to products, while the dual assigns all product value to resources — the two valuations agree at optimality.

Economic interpretation of complementary slackness: if resource $i$ is not fully used ($b_i - (Ax^*)_i > 0$, slack), then $y_i^* = 0$ — a scarce resource has positive shadow price, a non-scarce resource has zero price. If a product is not produced ($x_j^* = 0$), the revenue it would generate is at most equal to the cost of the resources it consumes ($(A^T y^*)_j \geq c_j$): unprofitable products are not produced.

Sensitivity Analysis

A key practical question is: how does the optimal solution change if the data changes slightly? Sensitivity analysis answers this using the dual and the structure of the simplex optimal basis.

At an LP optimum, the simplex method produces a basis $\mathcal{B}$: a set of $m$ basic variables forming a nonsingular submatrix $B$ of $A$. The optimal solution is $x_\mathcal{B}^* = B^{-1} b$ and the objective is $c_\mathcal{B}^T B^{-1} b$. The dual solution is $y^* = (B^{-T}) c_\mathcal{B}$.

Right-hand-side ranging: changing $b_i \to b_i + \Delta$ keeps the same basis optimal as long as $B^{-1}(b + \Delta e_i) \geq 0$ (primal feasibility preserved). This gives an interval $[\Delta^-, \Delta^+]$ within which the shadow price $y_i^*$ remains valid.

Objective coefficient ranging: changing $c_j \to c_j + \Delta$ keeps the same basis optimal as long as the reduced costs of non-basic variables remain non-negative (dual feasibility preserved). Again, a finite interval of valid changes.

These analyses let a decision-maker understand the robustness of their recommendation without re-solving the LP for each scenario.

Chapter 3: The Simplex Method and Its Variants

The simplex method, invented by George Dantzig in 1947, is the most practically successful optimisation algorithm ever devised. It moves along edges of the feasible polyhedron from vertex to vertex, improving the objective at each step.

Basic Feasible Solutions and Pivoting

The feasible region of an LP $\{x : Ax = b, x \geq 0\}$ (after adding slack variables) is a polyhedron. Its vertices are basic feasible solutions: solutions where exactly $m$ variables are nonzero (the “basic” variables), forming the columns of a nonsingular basis matrix $B$.

A simplex pivot moves from one BFS to an adjacent one by swapping one variable into the basis (the entering variable, chosen to improve the objective via negative reduced cost) and one out (the leaving variable, chosen to maintain feasibility via the minimum ratio test). Each pivot takes $O(mn)$ time. The simplex terminates when no entering variable improves the objective (optimality) or when the problem is detected to be unbounded.

Degeneracy occurs when a basic variable has value zero. Pivoting on a degenerate vertex doesn’t move the solution, and in pathological cases the algorithm may cycle. In practice, cycling is rare and is prevented by the Bland’s rule (always choose the smallest-index eligible variable) or by perturbation.

Tableaux and the Dual Simplex

A simplex tableau organises the LP into a form where each row corresponds to a basic variable. The tableau makes pivots mechanical: divide the pivot row by the pivot element, then eliminate the pivot column from all other rows. Reading off the optimal dual solution from the tableau is immediate: the dual variable $y_i$ is the reduced cost of the slack variable for constraint $i$.

The dual simplex method maintains dual feasibility (all reduced costs non-negative) while restoring primal feasibility. It is useful when starting from a dual feasible but primal infeasible basis — for example, after adding a cutting plane that cuts off the current LP solution.

Parametric LP: vary one parameter (say $b \to b + t \lambda$ for $t \in [0, \infty)$) and trace the optimal solution as a function of $t$. The optimal objective is piecewise linear and convex in $t$; the parametric simplex finds all breakpoints in sequence.

Chapter 4: Cutting Planes and Column Generation

For integer programs, the LP relaxation typically has a fractional optimal solution that is not integer-feasible. Cutting planes add new linear inequalities (valid for the integer hull but violated by the current LP solution) to tighten the relaxation. Column generation handles LPs with too many variables to enumerate, by generating columns (variables) on demand.

Cutting Plane Methods

\[ \sum_k \lfloor \bar{a}_{jk} \rfloor (x_k - \bar{x}_k) \leq \bar{x}_j - \lceil \bar{x}_j \rceil \]

is valid for all integer solutions but cuts off the current fractional LP solution. Repeated Gomory cuts (plus dual simplex re-optimisation) converge to the integer optimal in finite steps, but convergence can be slow.

Modern branch-and-cut solvers (CPLEX, Gurobi) combine cutting planes with branch-and-bound, using problem-specific cuts (cover cuts for knapsack, clique cuts for stable set, etc.) and heuristic cuts to dramatically tighten LP relaxations before branching.

Column Generation: The Cutting Stock Problem

Column generation solves LPs with an astronomically large number of variables by iteratively generating only those variables (columns) with negative reduced cost.

The cutting stock problem illustrates this. A factory cuts a raw roll of width $W$ into pieces of requested widths $w_1, \ldots, w_n$. A cutting pattern specifies how many pieces of each width to cut from one roll: a vector $a \in \mathbb{Z}_+^n$ with $\sum_i a_i w_i \leq W$. The IP minimises the number of rolls used. The LP relaxation has one variable $x_p \geq 0$ per cutting pattern — exponentially many.

Column generation solves the LP via the restricted master problem (a few patterns) plus a pricing subproblem: find the pattern with most negative reduced cost. The pricing subproblem is a knapsack problem: given reduced costs $\bar{c}_j$, find $a \in \mathbb{Z}_+^n$ with $\sum a_i w_i \leq W$ that maximises $\sum \bar{c}_i a_i - 1$ (the potential improvement). If this is negative, the current LP solution is optimal. The column generation terminates when no improving column exists. After LP optimality, branch-and-price combines column generation with branching.

Chapter 5: Beyond This Course

Natural Next Topics

CO 370 establishes the modelling language and theoretical tools of deterministic operations research — LP, network flows, integer programs, cone programs, robust formulations, duality, sensitivity analysis, cutting planes, and column generation. Each of these threads opens into a rich specialised field.

Stochastic programming. CO 370 assumes all data is known exactly. In practice, demands, costs, and resource availabilities are uncertain. Stochastic programming incorporates uncertainty by modelling the data as a random variable $\xi$ with known distribution and optimising in expectation.

\[ \min_{x \geq 0}\; c^T x + \mathbb{E}_\xi[Q(x, \xi)], \quad Q(x, \xi) = \min_{y \geq 0}\{q(\xi)^T y : W y = h(\xi) - T(\xi) x\}. \]\[ \theta \geq q_\nu^T y^{(\nu)} + (h(\xi^{(\nu)}) - T(\xi^{(\nu)}) x)^T \lambda^{(\nu)} \]

(where $\lambda^{(\nu)}$ is the optimal dual solution of the second-stage LP at scenario $\xi^{(\nu)}$) until no violated cut exists. This decomposes the problem into a master LP and $S$ subproblems (one per scenario), enabling parallel solution.

For continuous distributions, sample average approximation (SAA) replaces the expectation with a sample average over $S$ scenarios drawn i.i.d. from the distribution. The SAA solution converges to the true stochastic optimum as $S \to \infty$, with convergence rates governed by the central limit theorem and empirical process theory.

Multistage stochastic programs make decisions at $T > 2$ stages, with more data revealed at each stage. The scenario tree grows exponentially in $T$ and $S$, creating the curse of dimensionality. Stochastic dual dynamic programming (SDDP; Pereira–Pinto 1991) handles very large multistage problems by building piecewise-linear approximations of the value function via cutting planes, and is used for hydro-thermal dispatch scheduling in Brazil and elsewhere.

\[ \min_{x}\; \max_{P \in \mathcal{P}}\; \mathbb{E}_P[\ell(x, \xi)]. \]

The choice of ambiguity set $\mathcal{P}$ determines tractability: moment-based sets ($\mathcal{P} = \{P : \mathbb{E}_P[\xi] = \mu, \text{Cov}_P[\xi] = \Sigma\}$) lead to SDPs; Wasserstein-ball sets lead to regularised stochastic programs. DRO is appealing when the true distribution is unknown but a reference distribution (e.g., empirical) and a distance bound are available.

Combinatorial optimisation and polyhedral theory. CO 370’s integer programming material touches the surface of a rich polyhedral theory. The convex hull of feasible integer solutions is a polytope $P_I$, and solving an IP is equivalent to linear programming over $P_I$. Characterising $P_I$ via its facet-defining inequalities is the central goal of polyhedral combinatorics.

For specific combinatorial structures, $P_I$ can be fully described: the matching polytope (Edmonds 1965) for bipartite graphs is described by the LP relaxation alone (total unimodularity); for general graphs, Edmonds added the odd-set inequalities (flower inequalities) and proved completeness. The travelling salesman polytope requires exponentially many facet-defining inequalities and is the central challenge in TSP research.

Total dual integrality (TDI) is a sufficient condition for LP relaxations to give integer solutions for all integer right-hand sides. A system $Ax \leq b$ is TDI if the LP’s dual has an integer optimal solution for every integer objective. Network flow matrices, bipartite matching, and balanced matrices are TDI; recognising TDI is a major topic in combinatorial optimisation.

Semidefinite programming and the Lasserre hierarchy. Beyond second-order cones, semidefinite programs (SDPs) optimise over the cone of positive semidefinite matrices $\mathcal{S}^n_+ = \{X \in \mathbb{R}^{n \times n} : X \succeq 0\}$. Interior-point methods solve SDPs in polynomial time. The key application in combinatorics is the Goemans–Williamson MAX-CUT SDP: relax binary variables $x_i \in \{-1, +1\}$ to unit vectors $v_i \in \mathbb{R}^n$, and maximise $\frac{1}{4}\sum_{(i,j) \in E} w_{ij}(1 - v_i \cdot v_j)$ subject to $\|v_i\|^2 = 1$. The SDP value is an upper bound on MAX-CUT; rounding via a random hyperplane gives a 0.878-approximation (the best known under the Unique Games Conjecture).

The Lasserre hierarchy (Lasserre 2001, also called the SOS or sum-of-squares hierarchy) provides a sequence of increasingly tight LP/SDP relaxations of any polynomial optimisation problem. At level $d$, the relaxation has $O(n^d)$ variables, runs in time $n^{O(d)}$, and the optimal value converges to the true IP optimum. For fixed $d$, this is polynomial. Many problems (stable set, MAX-3-SAT, CSPs) require $\Omega(\sqrt{\log n})$ levels of the Lasserre hierarchy to certify optimality — a lower bound result. Whether constant-level SOS/Lasserre can solve specific NP-hard problems is a major open question.

Mechanism design and algorithmic game theory. Prices are dual variables; this is not just an analogy. In a competitive equilibrium (Arrow–Debreu economy), each commodity has a price such that supply meets demand and every agent optimises their own utility subject to their budget. The First Welfare Theorem shows this equilibrium is Pareto-efficient — equivalent to solving a social welfare LP. The dual variables of the market-clearing constraints are exactly the equilibrium prices.

Mechanism design asks: if agents have private information (their preferences), what rules should the social planner use to elicit truthful reporting and achieve a good outcome? The Vickrey–Clarke–Groves (VCG) mechanism is the canonical result: charge each agent the social cost they impose on others, making truthful reporting a dominant strategy. VCG mechanisms are computationally demanding (require solving the social welfare maximisation problem exactly) and computationally intractable for combinatorial auctions (exponentially many items). Algorithmic mechanism design (Nisan–Ronen 1999) asks how well approximately optimal mechanisms can be computed, connecting OR modelling to algorithmic game theory.

Follow-up Courses and Reading

CO 370 sits at the intersection of optimisation theory and OR practice. The following courses and resources deepen each direction.

At UWaterloo: CO 453 (Network Design) applies LP and polyhedral theory to real network problems; CO 454 (Scheduling) covers single-machine and parallel-machine scheduling models, connecting OR to algorithms and complexity. CO 466 (Continuous Optimization, at graduate level) covers the smooth optimisation theory underlying interior-point methods: self-concordant barriers, Newton’s method convergence analysis, and barrier functions for general cones. CO 750 (Combinatorial Optimisation) is the graduate-level polyhedral theory course covering TU, matchings, and Schrijver’s polyhedral theory. CO 759 (Stochastic Programming) develops the multistage theory, SDDP, and DRO in depth. STAT 842 (Statistical Learning) connects OR to statistical estimation and regression.

MIT OCW resources: MIT 15.053 Optimization Methods in Management Science and MIT 15.083 Integer Programming both have complete lecture notes and assignments online. MIT 6.255/15.093 Optimization Methods (Ben-Tal, Bertsimas) is a graduate-level convex optimisation course. CMU 15-850 Advanced Algorithms covers the algorithmic side.

Key texts: Bertsimas and Tsitsiklis Introduction to Linear Optimization (Athena Scientific, 1997) is the definitive graduate LP text, covering degeneracy, revised simplex, interior-point methods, and duality theory with complete proofs. Ben-Tal and Nemirovski Lectures on Modern Convex Optimization (MPS-SIAM, 2001) covers self-concordance and interior-point methods for conic programs. Birge and Louveaux Introduction to Stochastic Programming, 2nd ed. (Springer, 2011) is the standard stochastic programming reference. Schrijver Theory of Linear and Integer Programming (Wiley, 1986) covers polyhedral theory comprehensively. Wolsey Integer Programming, 2nd ed. (Wiley-Interscience, 2020) covers branch-and-bound and cutting planes. Williams Model Building in Mathematical Programming, 5th ed. (Wiley, 2013) is the applied modelling reference. Ben-Tal, El Ghaoui, and Nemirovski Robust Optimization (Princeton, 2009) is the reference for robust formulations.

Software: CPLEX, Gurobi, and GLPK are LP/MIP solvers; CVX (MATLAB), CVXPY (Python), and JuMP (Julia) are modelling languages. For stochastic programming: SMPS format, StructJuMP, and the PySP module of Pyomo.

Open Problems and Active Research

P vs. NP and the complexity of integer programming. The simplex method runs in exponential time in the worst case (Klee–Minty 1972), but interior-point methods (Khachian’s ellipsoid 1979, Karmarkar’s interior-point 1984) run in polynomial time for LP. For IP, no polynomial algorithm is known and NP-completeness (Cook–Levin) rules one out unless P = NP. The gap between theory (IP is NP-hard) and practice (Gurobi solves million-variable MIPs in minutes) is striking and poorly understood. The theoretical question of when branch-and-bound terminates in polynomial time on specific problem structures is open.

Machine learning for combinatorial optimisation. Modern MIP solvers (Gurobi, CPLEX) incorporate hundreds of heuristics for variable selection, cut selection, and primal heuristics — all tuned by experts over decades. Machine learning can potentially learn better heuristics from data. Learn-to-branch (Khalil et al. 2016), neural network policies for variable selection (Gasse et al. 2019), and GNN-based cut selection (Paulus et al. 2022) have shown improvements over default solver strategies. Whether learned policies can generalise across problem instances — rather than just overfitting to a training distribution — is an open research and engineering challenge.

The geometry of linear programming. Smale’s 9th problem (2000) asked: is there a polynomial-time simplex method (with a pivot rule that terminates in polynomial steps on every LP)? The Hirsch conjecture — that the diameter of an LP’s feasible polytope is at most $m + n - 2$ (where the LP has $m$ constraints and $n$ variables) — was disproved by Santos (2012) who constructed a polytope of diameter $> (1 + \varepsilon)(m + n - 2)$. The question of whether the diameter of LP polytopes is polynomial in $m$ and $n$ (the polynomial Hirsch conjecture) remains open and would have implications for the existence of polynomial simplex methods.

Conic programming and moment relaxations. The Lasserre hierarchy provides SDP relaxations of any polynomial IP that converge to the exact optimum, but requires $O(n^d)$ variables at level $d$. Determining the minimum hierarchy level needed to certify optimality for structured problems — and whether $O(1)$ levels suffice for some NP-hard classes — is a central open problem at the intersection of algebra, combinatorics, and computation. The question of whether there exist polynomial-size SDPs that capture the integer hull of a 0-1 polytope is related to the power of the “extended formulations” studied by Fiorini, Rothvoss, and Yannakakis.

Robust optimisation and adversarial machine learning. The connection between robust LP (optimise against the worst case within an uncertainty set) and adversarial robustness in machine learning (train a classifier to be robust to worst-case input perturbations) is deep. The Madry et al. (2018) PGD adversarial training objective has the exact form of a robust optimisation problem. Whether the tools of robust OR (ellipsoidal uncertainty, Soyster-type interval uncertainty, cardinality-constrained uncertainty) can be applied to improve the robustness of ML models more efficiently than gradient-based adversarial training is an active research direction combining operations research and machine learning.

Data-driven optimisation and end-to-end learning. The classical OR pipeline is: estimate model parameters from data, then optimise. The estimation and optimisation steps are decoupled — the estimation loss (squared error on parameters) doesn’t account for what parameters matter for the downstream optimisation decision. Decision-focused learning (also: predict-then-optimise, SPO+ loss) trains the prediction model to minimise the regret of the optimisation decision, not the prediction error. Elmachtoub–Grigas (2021) proposed the SPO+ loss for this purpose. Convergence analysis, scalability to large optimisation problems, and integration with complex LP/MIP solvers are active research directions.

Online learning and regret minimisation. CO 370 focuses on offline optimisation — all data is known before decisions are made. In many real applications (portfolio management, online advertising, network routing), decisions must be made in real time as data arrives, and the objective is to minimise regret relative to the best fixed strategy in hindsight. The online learning framework captures this: at each round $t$, the learner chooses action $x_t$ from a convex set, then suffers loss $f_t(x_t)$ where $f_t$ is revealed after the decision. The regret after $T$ rounds is $\sum_{t=1}^T f_t(x_t) - \min_{x} \sum_{t=1}^T f_t(x)$. The online gradient descent (OGD) algorithm achieves $O(\sqrt{T})$ regret for convex Lipschitz losses, matching the information-theoretic lower bound. For strongly convex losses, OGD achieves $O(\log T)$ regret. The connection to CO 370’s duality theory is through the primal-dual view of OGD: each gradient step is equivalent to computing a new dual variable in a Lagrangian relaxation. Applications include: ad auctions where click-through rates are uncertain, network routing protocols that adapt to changing traffic, and supply chain management with demand uncertainty.

Convex optimisation: smoothness, strong convexity, and accelerated methods. CO 370 introduces LP and convex cone programs. Smooth unconstrained convex optimisation (where the objective is differentiable with Lipschitz gradient) underpins essentially all of machine learning. The convergence rates are determined by two parameters: the Lipschitz constant of the gradient $L$ (smoothness: $\|\nabla f(x) - \nabla f(y)\| \leq L \|x - y\|$) and the strong convexity constant $\mu$ ($f(y) \geq f(x) + \nabla f(x)^T(y-x) + \frac{\mu}{2}\|y-x\|^2$). Gradient descent with step size $1/L$ converges at rate $O(1/T)$ for smooth convex functions and $O((1 - \mu/L)^T)$ (linear rate) for strongly convex functions. The condition number $\kappa = L/\mu$ controls convergence: for large $\kappa$ (ill-conditioned problems), gradient descent is slow. Nesterov’s accelerated gradient method achieves $O(1/T^2)$ for smooth convex functions — the information-theoretic optimum for first-order methods — and $O((1 - \sqrt{\mu/L})^T)$ for strongly convex functions, improving the condition number dependence from $\kappa$ to $\sqrt{\kappa}$. Interior-point methods for LP and conic programs achieve polynomial-time convergence using self-concordant barrier functions (the logarithmic barrier $-\sum_i \log(b_i - a_i^T x)$ for the LP feasible region) and Newton’s method; the barrier parameter controls the tradeoff between proximity to the central path and feasibility.

Extended formulations and communication complexity. A central question in polyhedral combinatorics is: how large must a linear program be to exactly represent a given combinatorial polytope? An extended formulation of a polytope $P \subseteq \mathbb{R}^n$ is a polytope $Q \subseteq \mathbb{R}^{n+k}$ that projects onto $P$. The size of the extended formulation is $\min(|Q_\text{facets}|)$ over all such $Q$. Yannakakis (1991) proved that every symmetric extended formulation of the TSP and matching polytopes requires exponential size — ruling out polynomial-size LP relaxations that respect symmetry. Fiorini–Massar–Pokutta–Tiwary–de Wolf (2012) proved that even asymmetric extended formulations of the TSP polytope require exponential size, resolving a 20-year question. The proof uses communication complexity: via the factorisation theorem, the size of an extended formulation equals the non-negative rank of the slack matrix $M_{ij} = b_i - a_i^T v_j$ (where $a_i^T x \leq b_i$ are the facets and $v_j$ are the vertices). Non-negative rank lower bounds follow from communication complexity lower bounds, connecting OR to theoretical computer science in a beautiful and unexpected way.

Network flows and fractional relaxations: gaps and integrality. CO 370 establishes that network flow LPs are totally unimodular (TU) and hence always have integer optimal solutions. This is a rare property, and understanding when LP relaxations are integral — or how fractional they are — is a central theme in combinatorial optimisation. The integrality gap of a relaxation is $\sup_{I} \text{OPT}_{LP}(I) / \text{OPT}_{IP}(I)$: the worst-case ratio between the LP and IP optima. For vertex cover, the LP integrality gap is exactly 2 (Nemhauser–Trotter 1975), matching the best approximation ratio. For TSP, the Held–Karp LP relaxation is conjectured to have an integrality gap of $4/3$ — this is the Traveling Salesman Conjecture and is the basis for the conjectured tight approximation ratio. Proving the gap is at most $4/3$ would establish the Christofides algorithm’s tightness. For set cover, the LP integrality gap is $\Theta(\log n)$ (Lund–Yannakakis 1994; Feige 1998) — matching the greedy algorithm’s approximation ratio. Half-integrality is a special case: the LP optimum can always be achieved with a solution where all variables are $0, 1/2,$ or $1$. Half-integral solutions appear in matching (Balinski 1965), vertex cover, and independent set on perfect graphs. Understanding which problems have half-integral LP relaxations is connected to perfection in graph theory and total dual integrality in OR.

Multi-objective optimisation and Pareto frontiers. CO 370 optimises a single linear objective. Real decisions often involve multiple conflicting objectives — minimise cost and minimise delivery time; maximise expected return and minimise variance. Multi-objective LP has a set of Pareto-optimal solutions (the Pareto frontier) rather than a single optimal solution: a solution $x^*$ is Pareto-optimal if no feasible solution is better in all objectives simultaneously. The Pareto frontier can be explored by scalarisation (optimising $\lambda_1 f_1 + \lambda_2 f_2$ for varying weights $\lambda$), the $\varepsilon$-constraint method (optimising one objective while bounding the others), or by direct computation of the frontier using parametric LP. For multi-objective integer programming, the Pareto frontier can be exponentially large; approximating it with a small representative set (the Pareto approximation set) and bounding the approximation quality is an active research direction. In practice, multi-objective OR appears in healthcare resource allocation, portfolio optimisation, and environmental planning (balancing economic and ecological objectives).

Large-scale optimisation and decomposition methods. Many practical OR problems are too large for a single LP or MIP solver. Column generation (covered in CO 370 for cutting stock) and Benders decomposition (for stochastic programming) are two decomposition paradigms that exploit problem structure to break large models into manageable subproblems. Dantzig–Wolfe decomposition reformulates block-angular LP structures (where the constraint matrix decomposes into blocks linked by a small number of coupling constraints) as a master problem over convex combinations of subproblem solutions; the resulting LP typically has far fewer constraints than variables and is solved by column generation. Lagrangian relaxation dualises complicating constraints into penalty terms in the objective, decomposing the problem into independent subproblems that can be solved in parallel; the Lagrange multipliers are updated by subgradient ascent (or by cutting planes). These decomposition methods are the backbone of large-scale models in logistics (airline crew scheduling, where unions constraints couple millions of variables), energy (unit commitment and hydro-thermal dispatch with thousands of generators and time steps), and telecommunications (capacity planning under traffic uncertainty). Convergence theory for decomposition methods — rates, stopping criteria, and the impact of approximation in the subproblems — is an active research area connecting OR, convex analysis, and distributed computing. The emergence of cloud-native OR solvers (Gurobi Compute Server, CPLEX on Azure) enables decomposition across hundreds of nodes, making problems that were previously intractable (fleet scheduling for ride-sharing services with millions of trips, or real-time energy dispatch for national grids) routinely solvable. As OR models grow to encompass richer data (real-time sensor streams, live market data, passenger behaviour models), the boundary between optimisation, machine learning, and data engineering continues to blur — a development that will define the future of the discipline CO 370 introduces.