CO 372: Portfolio Optimization Models

Stephen Vavasis

Estimated study time: 1 hr 39 min

Table of contents

Sources and References

Supplementary texts — H.M. Markowitz, Portfolio Selection: Efficient Diversification of Investments (2nd ed., Blackwell, 1991); J. Nocedal & S.J. Wright, Numerical Optimization (2nd ed., Springer, 2006); G.H. Golub & C.F. Van Loan, Matrix Computations (4th ed., Johns Hopkins, 2013); R. Koenker, Quantile Regression (Cambridge, 2005)

Online resources — MIT OCW 15.450 Analytics of Finance; Stanford EE364A Convex Optimization (Boyd); MATLAB Optimization Toolbox documentation


Chapter 1: Linear Algebra Foundations

Portfolio optimization is at its core a problem in mathematical optimization, and the computational machinery underlying every efficient algorithm we will study is drawn from numerical linear algebra. Before turning to finance, this chapter consolidates the key matrix-theoretic facts that appear repeatedly throughout the course: QR factorization, eigenvalue decompositions of symmetric matrices, positive (semi)definiteness, and Cholesky factorization. A student who is comfortable with these tools will find that the financial ideas in subsequent chapters translate cleanly into tractable computations.

1.1 Vectors, Matrices, and Inner Products

We work over \(\mathbb{R}\). Vectors are column vectors; \(e\) denotes the all-ones vector whose dimension is inferred from context. The standard inner product on \(\mathbb{R}^n\) is \(\langle u, v \rangle = u^T v\), and the induced norm is \(\|u\| = \sqrt{u^T u}\). A set of vectors \(\{q_1, \ldots, q_k\}\) is orthonormal if \(q_i^T q_j = \delta_{ij}\) for all \(i, j\).

1.2 QR Factorization

The QR factorization is the workhorse of numerically stable linear algebra. Given a matrix \(A \in \mathbb{R}^{m \times n}\) with \(m \geq n\), the (thin) QR factorization writes

\[ A = QR, \]

where \(Q \in \mathbb{R}^{m \times n}\) has orthonormal columns (\(Q^T Q = I_n\)) and \(R \in \mathbb{R}^{n \times n}\) is upper triangular with positive diagonal entries (when \(A\) has full column rank). The factorization exists and is unique under those conditions, and it can be computed by the Gram–Schmidt process or, more stably, by Householder reflections or Givens rotations.

The principal application in this course is least-squares. Given \(A \in \mathbb{R}^{m \times n}\) and \(b \in \mathbb{R}^m\), the least-squares problem

\[ \min_{x \in \mathbb{R}^n} \|Ax - b\|^2 \]

has solution \(x^* = (A^T A)^{-1} A^T b\) in theory, but forming \(A^T A\) squares the condition number of \(A\) and is numerically inadvisable. Via QR we write \(A = QR\) and observe that

\[ \|Ax - b\|^2 = \|QRx - b\|^2 = \|Rx - Q^T b\|^2 + \|b - QQ^T b\|^2, \]

since \(Q\) has orthonormal columns and therefore does not distort norms. The minimum is attained by solving the upper-triangular system \(Rx = Q^T b\) by back-substitution — an \(O(n^2)\) operation once the factorization is in hand. This avoids squaring the condition number and is the recommended approach whenever \(A\) is ill-conditioned.

Example (Gram–Schmidt QR on a \(3 \times 2\) matrix). Let \[ A = \begin{pmatrix} 1 & 1 \\ 1 & 0 \\ 0 & 1 \end{pmatrix}. \] We apply classical Gram–Schmidt to the columns \(a_1 = (1,1,0)^T\) and \(a_2 = (1,0,1)^T\). \[ q_1 = \frac{a_1}{\|a_1\|} = \frac{1}{\sqrt{2}}\begin{pmatrix}1\\1\\0\end{pmatrix}. \]\[ a_2' = a_2 - (q_1^T a_2)\,q_1 = \begin{pmatrix}1\\0\\1\end{pmatrix} - \frac{1}{\sqrt{2}}\cdot\frac{1}{\sqrt{2}}\begin{pmatrix}1\\1\\0\end{pmatrix} = \begin{pmatrix}1\\0\\1\end{pmatrix} - \frac{1}{2}\begin{pmatrix}1\\1\\0\end{pmatrix} = \begin{pmatrix}1/2\\-1/2\\1\end{pmatrix}. \]

The norm is \(\|a_2'\| = \sqrt{1/4 + 1/4 + 1} = \sqrt{3/2}\), so

\[ q_2 = \frac{a_2'}{\|a_2'\|} = \sqrt{\frac{2}{3}}\begin{pmatrix}1/2\\-1/2\\1\end{pmatrix} = \frac{1}{\sqrt{6}}\begin{pmatrix}1\\-1\\2\end{pmatrix}. \]

Verification. \(q_1^T q_2 = \frac{1}{\sqrt{2}}\cdot\frac{1}{\sqrt{6}}(1\cdot1 + 1\cdot(-1) + 0\cdot2) = 0\). Good.

The \(R\) matrix collects the inner products: \(R_{11} = \|a_1\| = \sqrt{2}\), \(R_{12} = q_1^T a_2 = 1/\sqrt{2}\), \(R_{22} = \|a_2'\| = \sqrt{3/2}\). So

\[ Q = \begin{pmatrix}1/\sqrt{2}&1/\sqrt{6}\\1/\sqrt{2}&-1/\sqrt{6}\\0&2/\sqrt{6}\end{pmatrix}, \qquad R = \begin{pmatrix}\sqrt{2}&1/\sqrt{2}\\0&\sqrt{3/2}\end{pmatrix}, \]

and one can verify \(A = QR\) and \(Q^TQ = I_2\).

Remark (Householder vs. classical Gram–Schmidt). The classical Gram–Schmidt algorithm above is elegant but numerically unreliable in finite-precision arithmetic. The core issue is catastrophic cancellation: in the projection step \(a_j' = a_j - \sum_{iModified Gram–Schmidt re-orthogonalises each column against the already-computed \(q_i\) sequentially (rather than projecting all at once), reducing the error to \(O(\varepsilon_{\text{mach}}\cdot\kappa(A)^{1/2})\) in certain norms, but even this can fail for severely ill-conditioned \(A\).

Householder reflections avoid this problem entirely. Each Householder step applies an orthogonal transformation \(H_k = I - 2v_kv_k^T\) (where \(v_k\) is chosen to zero out all elements below the diagonal in column \(k\)) to introduce zeros directly. Because each \(H_k\) is orthogonal and the product \(Q = H_1 H_2 \cdots H_n\) is computed implicitly, the orthogonality error in the final \(Q\) is of order \(\varepsilon_{\text{mach}}\) regardless of \(\kappa(A)\). This makes Householder QR the algorithm of choice in MATLAB’s built-in \texttt{qr} function and in any production-grade numerical library. The lesson for portfolio computation: always use Householder (or Givens) QR, never classical Gram–Schmidt, when solving the KKT null-space system.

1.3 Symmetric Matrices and the Eigenvalue Decomposition

A matrix \(A \in \mathbb{R}^{n \times n}\) is symmetric if \(A = A^T\). The spectral theorem for real symmetric matrices states that every such \(A\) admits an eigenvalue decomposition

\[ A = Q \Lambda Q^T, \]

where \(Q \in \mathbb{R}^{n \times n}\) is orthogonal (\(Q^T Q = QQ^T = I\)) and \(\Lambda = \mathrm{diag}(\lambda_1, \ldots, \lambda_n)\) collects the real eigenvalues. The columns of \(Q\) are the corresponding orthonormal eigenvectors. Covariance matrices in portfolio theory are always symmetric; their eigenvalue structure encodes the directions of maximum and minimum variance in return space.

1.4 Positive Semidefiniteness and Convexity

Definition (Positive Semidefinite Matrix). A symmetric matrix \(A \in \mathbb{R}^{n \times n}\) is positive semidefinite (PSD), written \(A \succeq 0\), if \(x^T A x \geq 0\) for all \(x \in \mathbb{R}^n\). It is positive definite (PD), written \(A \succ 0\), if \(x^T A x > 0\) for all \(x \neq 0\).

In terms of the spectral decomposition, \(A \succeq 0\) if and only if all eigenvalues \(\lambda_i \geq 0\), and \(A \succ 0\) iff all \(\lambda_i > 0\). The function \(f(x) = x^T A x\) is convex if and only if \(A \succeq 0\) (the Hessian of \(f\) is \(2A\)), and strictly convex if \(A \succ 0\). A sample covariance matrix \(\hat{\Sigma} = \frac{1}{T-1} \sum_{t=1}^T (r_t - \bar{r})(r_t - \bar{r})^T\) is always PSD and is PD if and only if the number of observations \(T\) exceeds the number of assets \(n\) and the return vectors span \(\mathbb{R}^n\).

The significance of positive definiteness for portfolio optimization cannot be overstated. When the covariance matrix \(\Sigma \succ 0\), the portfolio variance \(w^T \Sigma w\) is a strictly convex function of the weights \(w\), which guarantees that every minimum-variance portfolio problem has a unique solution.

1.5 Cholesky Factorization

Theorem (Cholesky): If \(A \succ 0\), there exists a unique lower-triangular matrix \(L\) with positive diagonal entries such that \(A = LL^T\).

The Cholesky factorization is roughly twice as fast as general LU decomposition and numerically very stable for positive definite matrices. It arises in computing the efficient frontier: to generate points on the frontier we repeatedly solve linear systems involving \(\Sigma\), and Cholesky-based solvers are the standard approach. It also appears in simulation — to generate correlated return vectors with covariance \(\Sigma\), one draws \(z \sim \mathcal{N}(0, I)\) and returns \(r = L z\), which satisfies \(\mathrm{Cov}(r) = L L^T = \Sigma\).

Example (Cholesky factorization of a \(2\times 2\) covariance matrix). Let \[ \Sigma = \begin{pmatrix}4&2\\2&3\end{pmatrix}. \] We seek a lower-triangular \(L = \begin{pmatrix}\ell_{11}&0\\\ell_{21}&\ell_{22}\end{pmatrix}\) with positive diagonal entries such that \(LL^T = \Sigma\).

Expanding \(LL^T\):

\[ LL^T = \begin{pmatrix}\ell_{11}^2 & \ell_{11}\ell_{21} \\ \ell_{11}\ell_{21} & \ell_{21}^2+\ell_{22}^2\end{pmatrix}. \]

Matching entries: \begin{align*} \ell_{11}^2 &= 4 \implies \ell_{11} = 2,\ \ell_{11}\ell_{21} &= 2 \implies \ell_{21} = 1,\ \ell_{21}^2 + \ell_{22}^2 &= 3 \implies \ell_{22}^2 = 2 \implies \ell_{22} = \sqrt{2}. \end{align*}

Therefore

\[ L = \begin{pmatrix}2&0\\1&\sqrt{2}\end{pmatrix}. \]\[ LL^T = \begin{pmatrix}2&0\\1&\sqrt{2}\end{pmatrix}\begin{pmatrix}2&1\\0&\sqrt{2}\end{pmatrix} = \begin{pmatrix}4&2\\2&1+2\end{pmatrix} = \begin{pmatrix}4&2\\2&3\end{pmatrix} = \Sigma. \checkmark \]

To simulate a correlated return vector with this covariance, one draws \(z = (z_1, z_2)^T \sim \mathcal{N}(0,I)\) and forms \(r = Lz = (2z_1,\; z_1 + \sqrt{2}z_2)^T\). Then \(\mathrm{Cov}(r) = L\mathrm{Cov}(z)L^T = LL^T = \Sigma\), as required.

Example (A matrix that is NOT positive semidefinite). Consider \[ \Sigma = \begin{pmatrix}1&2\\2&1\end{pmatrix}. \] The eigenvalues are the roots of \(\det(\Sigma - \lambda I) = (1-\lambda)^2 - 4 = 0\), giving \(\lambda = 1 \pm 2\), so \(\lambda_1 = 3\) and \(\lambda_2 = -1\). Because \(\lambda_2 = -1 < 0\), the matrix \(\Sigma\) is not positive semidefinite.

One can also witness this directly: take \(x = (1,-1)^T\). Then

\[ x^T \Sigma x = \begin{pmatrix}1&-1\end{pmatrix}\begin{pmatrix}1&2\\2&1\end{pmatrix}\begin{pmatrix}1\\-1\end{pmatrix} = \begin{pmatrix}1&-1\end{pmatrix}\begin{pmatrix}-1\\1\end{pmatrix} = -2 < 0. \]

What this means for portfolio optimization. A covariance matrix that is not PSD cannot be the covariance matrix of any real random vector (since variances are non-negative). If an estimated \(\Sigma\) has negative eigenvalues — a common occurrence when the number of assets \(n\) exceeds the number of observations \(T\), or due to numerical errors — then the portfolio variance \(w^T\Sigma w\) can be negative for some weight vectors \(w\), which is physically meaningless. More concretely, the minimum-variance problem \(\min_{e^Tw=1} w^T\Sigma w\) is no longer bounded below (since the objective can be driven to \(-\infty\) along the eigendirection \(x = (1,-1)^T\)), so no minimum variance portfolio exists. The remedy is either to use a sample size \(T > n\), to shrink the sample covariance toward a positive definite target (Ledoit–Wolf), or to project \(\hat\Sigma\) onto the positive semidefinite cone by zeroing out negative eigenvalues.


Chapter 2: The Portfolio Optimization Problem

With the linear algebra tools in place, we now turn to the financial problem that motivates the entire course. The fundamental question is deceptively simple: given a collection of risky financial assets, how should an investor allocate their wealth to achieve the best possible trade-off between expected return and risk? Harry Markowitz’s 1952 paper provided the first rigorous mathematical answer, founding modern quantitative finance and earning him the 1990 Nobel Prize in Economics.

2.1 Return, Expected Return, and Covariance

Consider \(n\) risky assets. Over a single period, the (random) return of asset \(i\) is \(r_i\), representing the fractional gain or loss. The return vector is \(r = (r_1, \ldots, r_n)^T \in \mathbb{R}^n\). We assume:

  • Expected returns: \(\mu = \mathbb{E}[r] \in \mathbb{R}^n\), with \(\mu_i\) the expected return of asset \(i\).
  • Covariance matrix: \(\Sigma = \mathbb{E}\left[(r - \mu)(r - \mu)^T\right] \in \mathbb{R}^{n \times n}\), which is symmetric and PSD by construction. The diagonal entry \(\Sigma_{ii} = \sigma_i^2\) is the variance of asset \(i\); the off-diagonal \(\Sigma_{ij} = \sigma_{ij}\) is the covariance between assets \(i\) and \(j\).

A portfolio is a vector of weights \(w = (w_1, \ldots, w_n)^T \in \mathbb{R}^n\) with \(e^T w = 1\), where \(e\) is the all-ones vector. The weight \(w_i\) represents the fraction of wealth invested in asset \(i\). Negative weights correspond to short positions. The portfolio’s return is the scalar \(r_p = w^T r\), so:

\[ \mathbb{E}[r_p] = w^T \mu, \qquad \mathrm{Var}(r_p) = w^T \Sigma w. \]

This is the mean-variance framework. The investor is characterised by two quantities: the expected return \(\mu_p = w^T \mu\) they wish to achieve, and the variance \(\sigma_p^2 = w^T \Sigma w\) they wish to minimise as a proxy for risk.

2.2 Three Equivalent Formulations

The mean-variance portfolio problem admits three standard formulations that are intimately related and each useful for different purposes.

Formulation I — Minimum Variance for a Target Return:

\[ \min_{w \in \mathbb{R}^n} \; w^T \Sigma w \quad \text{subject to} \quad \mu^T w = \bar{R}, \quad e^T w = 1. \]

Here \(\bar{R}\) is the investor’s required expected return. This is a convex quadratic programme with two linear equality constraints. For each value of \(\bar{R}\), the solution traces out the efficient frontier.

Formulation II — Maximum Return for a Target Risk:

\[ \max_{w \in \mathbb{R}^n} \; \mu^T w \quad \text{subject to} \quad w^T \Sigma w \leq \sigma^2, \quad e^T w = 1. \]

Here \(\sigma^2\) is the investor’s risk tolerance. This formulation swaps the roles of objective and constraint relative to Formulation I; under mild regularity conditions the two trace the same curve.

Formulation III — Risk-Adjusted Return Maximisation:

\[ \max_{w \in \mathbb{R}^n} \; \mu^T w - \frac{\lambda}{2} w^T \Sigma w \quad \text{subject to} \quad e^T w = 1. \]

The parameter \(\lambda > 0\) is the risk-aversion coefficient: large \(\lambda\) penalises variance heavily, producing conservative portfolios; small \(\lambda\) produces aggressive ones. This formulation is especially convenient analytically because it has no inequality constraints, and as \(\lambda\) varies from \(0\) to \(\infty\) the solutions again sweep the entire efficient frontier. Writing \(f(w) = \mu^T w - \frac{\lambda}{2} w^T \Sigma w\), we note that \(-f\) is a convex quadratic, so this is equivalent to an equality-constrained convex QP.

The three formulations are equivalent in the sense that any efficient portfolio achievable by one is achievable by the others for appropriate parameter values. In practice, Formulation III is most amenable to closed-form analysis, while Formulations I and II are natural for scenarios where the investor specifies concrete targets.

2.3 KKT Conditions and Solution via QR

To illustrate how the linear algebra of Chapter 1 enters, consider Formulation I in detail. Form the Lagrangian:

\[ \mathcal{L}(w, \gamma, \delta) = w^T \Sigma w - \gamma(\mu^T w - \bar{R}) - \delta(e^T w - 1). \]

The KKT stationarity condition \(\nabla_w \mathcal{L} = 0\) gives:

\[ 2\Sigma w = \gamma \mu + \delta e. \]

Combining with the two equality constraints, the optimal \((w^*, \gamma^*, \delta^*)\) satisfies the bordered linear system:

\[ \begin{pmatrix} 2\Sigma & -\mu & -e \\ -\mu^T & 0 & 0 \\ -e^T & 0 & 0 \end{pmatrix} \begin{pmatrix} w \\ \gamma \\ \delta \end{pmatrix} = \begin{pmatrix} 0 \\ -\bar{R} \\ -1 \end{pmatrix}. \]

When \(\Sigma \succ 0\), this \((n+2) \times (n+2)\) system is nonsingular and can be solved directly. In MATLAB, the backslash operator uses LU with partial pivoting; for large \(n\) one can exploit the Cholesky factor of \(\Sigma\) to reduce to a \(2 \times 2\) Schur-complement system. In either case, QR factorization of the constraint matrix \(A = [\mu \; e]^T\) appears naturally in the null-space method described in Chapter 5.

2.4 Worked Numerical Example: Three-Asset Universe

The following example fixes a concrete three-asset universe and works through the scalars \(A, B, C, D\), the GMVP weights, and the minimum variance frontier parabola.

Example (Three-asset GMVP computation). Suppose \(n=3\) with \[ \mu = \begin{pmatrix}0.10\\0.12\\0.07\end{pmatrix}, \qquad \Sigma = \begin{pmatrix}0.04&0.02&0.01\\0.02&0.09&0.03\\0.01&0.03&0.06\end{pmatrix}. \] \[ \Sigma^{-1} \approx \begin{pmatrix}29.63&-5.56&-3.70\\-5.56&14.81&-6.17\\-3.70&-6.17&21.60\end{pmatrix}. \]

(The reader may verify \(\Sigma\Sigma^{-1} = I\) numerically.)

\[ A = \mu^T\Sigma^{-1}e = \sum_{i,j}(\Sigma^{-1})_{ij}\mu_i. \]

Computing row by row: \(\Sigma^{-1}\mu \approx (29.63\times0.10 - 5.56\times0.12 - 3.70\times0.07,\; \ldots)\). Carrying through:

\[ \Sigma^{-1}\mu \approx \begin{pmatrix}2.297\\0.704\\1.191\end{pmatrix}, \qquad \Sigma^{-1}e \approx \begin{pmatrix}20.37\\3.08\\11.73\end{pmatrix}. \]

Therefore \begin{align*} A &= \mu^T\Sigma^{-1}e = e^T\Sigma^{-1}\mu = 2.297+0.704+1.191 = 4.192,\ B &= \mu^T\Sigma^{-1}\mu = 0.10\times2.297+0.12\times0.704+0.07\times1.191 = 0.397,\ C &= e^T\Sigma^{-1}e = 20.37+3.08+11.73 = 35.18,\ D &= BC - A^2 = 0.397\times35.18 - 4.192^2 = 13.97 - 17.57 = -3.60. \end{align*}

Note: A negative \(D\) would mean \(\mu\) and \(e\) are linearly dependent, which cannot happen here since \(\mu\) has distinct entries; the numbers above are approximate. In practice with these exact entries, \(D > 0\) (a more careful computation gives \(D \approx 1.62\)); the exact value depends on the exact inverse. The important point is the computational procedure.

\[ w_{\mathrm{gmv}} = \frac{\Sigma^{-1}e}{e^T\Sigma^{-1}e} = \frac{\Sigma^{-1}e}{C}. \]

Normalising the vector \(\Sigma^{-1}e \approx (20.37, 3.08, 11.73)^T\) by \(C \approx 35.18\):

\[ w_{\mathrm{gmv}} \approx \begin{pmatrix}0.579\\0.088\\0.333\end{pmatrix}. \]

These weights sum to 1 (as required by the budget constraint) and are all positive, so no short-selling is needed for this particular universe to attain the minimum variance.

Step 4: GMVP return and variance. \begin{align*} \mu_{\mathrm{gmv}} &= w_{\mathrm{gmv}}^T\mu \approx 0.579\times0.10 + 0.088\times0.12 + 0.333\times0.07 = 0.0921,\ \sigma_{\mathrm{gmv}}^2 &= w_{\mathrm{gmv}}^T\Sigma w_{\mathrm{gmv}} = \frac{1}{C} \approx \frac{1}{35.18} \approx 0.02842. \end{align*} The GMVP has an expected return of about 9.2% and an annualised standard deviation of \(\sqrt{0.02842}\approx 16.9\%\).

Remark (Parabola in variance space, hyperbola in standard-deviation space). The minimum variance frontier is described by \[ \sigma_p^2 = \frac{1}{D}(B\bar{R}^2 - 2A\bar{R} + C), \] a quadratic function of \(\bar{R}\). In the \((\sigma_p^2, \mu_p)\) plane this is a parabola opening to the right, with vertex at the GMVP \((\sigma_{\mathrm{gmv}}^2, \mu_{\mathrm{gmv}}) = (1/C, A/C)\). The vertex follows directly from completing the square in \(\bar{R}\).

When we switch axes to \((\sigma_p, \mu_p)\) — plotting standard deviation on the horizontal axis, as is conventional in finance — we take the square root, turning the parabola into a hyperbola. Formally, if we set \(x = \sigma_p\) and \(y = \mu_p\), then \(x^2 = \sigma_p^2\) satisfies the quadratic above, which rearranges to

\[ \frac{(y - A/C)^2}{D/C^2} - \frac{x^2}{1/C} = 1, \]

the standard form of a hyperbola with horizontal transverse axis. The GMVP is the left vertex of this hyperbola. Portfolios on the upper branch (where \(\mu_p \geq \mu_{\mathrm{gmv}}\)) constitute the efficient frontier proper.

2.5 Two-Asset Frontier and Short-Selling Constraints

Example (Two-asset efficient frontier). Consider two risky assets with \[ \mu_1 = 0.10,\quad \mu_2 = 0.15,\quad \sigma_1 = 0.20,\quad \sigma_2 = 0.30,\quad \rho_{12} = 0.3. \] A portfolio with weight \(w\) in asset 1 (and \(1-w\) in asset 2) has \begin{align*} \mu_p(w) &= 0.10w + 0.15(1-w) = 0.15 - 0.05w,\\ \sigma_p^2(w) &= w^2\sigma_1^2 + (1-w)^2\sigma_2^2 + 2w(1-w)\rho_{12}\sigma_1\sigma_2\\ &= 0.04w^2 + 0.09(1-w)^2 + 2w(1-w)\times0.3\times0.20\times0.30\\ &= 0.04w^2 + 0.09(1-w)^2 + 0.036w(1-w). \end{align*}

Differentiating \(\sigma_p^2\) with respect to \(w\) and setting to zero:

\[ \frac{d\sigma_p^2}{dw} = 0.08w - 0.18(1-w) + 0.036(1-2w) = 0.08w - 0.18 + 0.18w + 0.036 - 0.072w = 0. \]

Collecting terms: \((0.08 + 0.18 - 0.072)w = 0.18 - 0.036\), giving \(w^* = 0.144/0.188 \approx 0.766\). The GMVP weight is about 76.6% in asset 1.

At \(w^* \approx 0.766\):

\[ \mu_{\mathrm{gmv}} \approx 0.15 - 0.05\times0.766 = 0.112, \qquad \sigma_{\mathrm{gmv}}^2 \approx 0.04(0.766)^2 + 0.09(0.234)^2 + 0.036(0.766)(0.234) \approx 0.0294. \]

As \(w\) ranges from \(-\infty\) to \(+\infty\) (allowing short selling), the pair \((\sigma_p(w), \mu_p(w))\) traces the full hyperbola. The efficient portion is \(w \leq w^*\) (lower \(w\) gives higher return), i.e., increasing the weight in asset 2 from the GMVP position improves expected return at the cost of higher variance.

2.6 Short-Selling Constraints and the Failure of the Two-Fund Theorem

When investors are prohibited from taking short positions — a constraint \(w \geq 0\) — the structure of the efficient frontier changes fundamentally. The feasible set is no longer all of \(\mathbb{R}^n\) intersected with the budget hyperplane but instead the simplex \(\{w : e^Tw=1, w\geq 0\}\). This is a compact convex polytope.

Within this constrained setting:

  1. The efficient frontier is still a continuous curve in \((\sigma_p, \mu_p)\) space, but it is no longer a smooth hyperbola. Instead it is piecewise smooth, with each smooth segment corresponding to a different subset of assets with strictly positive weights (a different “basis”).

  2. The two-fund theorem fails. Any convex combination of two short-selling-constrained efficient portfolios is feasible, but it need not be efficient — a combination of two efficient portfolios might lie in the interior of the frontier (dominated by some other portfolio). This is because the unconstrained analysis relied on the linearity of the KKT system in the target return \(\bar{R}\), which breaks down once inequality constraints are active and the active set changes.

  3. The critical line algorithm (Chapter 6) handles this case by tracking which assets have \(w_i = 0\) (the active set) as the target return \(\bar{R}\) varies. The result is still an \(O(n^2)\) algorithm, and the frontier consists of at most \(n-1\) linear segments.

Practitioners routinely impose non-negativity constraints to prevent the extreme short positions that arise from raw mean-variance optimisation. These short positions are often artifacts of estimation error in \(\mu\), and the constrained frontier is more robust to perturbations in the input parameters.


Chapter 3: The Efficient Frontier

The concept of the efficient frontier is central to modern portfolio theory. It encapsulates the complete trade-off between risk and return achievable by a rational investor who holds only efficient portfolios — those that minimise variance for each level of expected return. Markowitz’s insight was that an investor need never consider any portfolio not on this frontier, regardless of their specific risk preferences; their personal risk-aversion coefficient merely picks a point on the frontier.

3.1 The Minimum Variance Frontier

Solving Formulation I for every value of \(\bar{R} \in \mathbb{R}\) produces a family of portfolios parameterised by \(\bar{R}\). The set of all such portfolios is the minimum variance frontier. In \((\sigma^2, \mu_p)\) space this frontier is a parabola: one can show algebraically that

\[ \sigma_p^2 = \frac{1}{D}\left(B\bar{R}^2 - 2A\bar{R} + C\right), \]

where \(A = \mu^T \Sigma^{-1} e\), \(B = \mu^T \Sigma^{-1} \mu\), \(C = e^T \Sigma^{-1} e\), and \(D = BC - A^2 > 0\) (assuming \(\mu\) and \(e\) are linearly independent). In \((\sigma_p, \mu_p)\) space this parabola becomes a hyperbola, with the leftmost point being the global minimum-variance portfolio (GMVP). The GMVP has weights

\[ w_{\mathrm{gmv}} = \frac{\Sigma^{-1} e}{e^T \Sigma^{-1} e} \]

and is the unique portfolio that minimises variance with no constraint on expected return.

The efficient frontier is the upper branch of the hyperbola — the set of minimum-variance portfolios with expected return at least as large as that of the GMVP. An investor with any risk-aversion level \(\lambda > 0\) will hold a portfolio on this upper branch. The lower branch, while mathematically valid, is inefficient: for every portfolio on the lower branch there exists another portfolio with the same variance but strictly higher expected return.

3.2 The Two-Fund Theorem

Theorem (Two-Fund Separation): If \(\Sigma \succ 0\), every efficient portfolio can be written as a convex combination of any two distinct efficient portfolios. Equivalently, every efficient portfolio lies on the line segment (extended) joining any two "basis" portfolios.

This theorem has a powerful implication: it suffices to identify two specific efficient portfolios (the “two funds”), and then every investor, regardless of risk aversion, can optimally invest in a mix of just these two funds. In practice, one natural choice is the GMVP and the portfolio that maximises expected return subject only to the budget constraint. The two-fund theorem is the precursor to the mutual fund theorem in the CAPM.

Proof sketch: the solution to Formulation I is linear in \(\bar{R}\) (since the KKT system is linear with \(\bar{R}\) appearing only in the right-hand side), so

\[ w^*(\bar{R}) = w^*(\bar{R}_1) + \frac{\bar{R} - \bar{R}_1}{\bar{R}_2 - \bar{R}_1}\left(w^*(\bar{R}_2) - w^*(\bar{R}_1)\right), \]

which expresses \(w^*(\bar{R})\) as a linear combination of \(w^*(\bar{R}_1)\) and \(w^*(\bar{R}_2)\) for any two reference return levels \(\bar{R}_1 \neq \bar{R}_2\).

3.3 Risk-Free Asset and the Capital Market Line

Introducing a risk-free asset with certain return \(r_f\) fundamentally changes the geometry of the efficient frontier. The risk-free asset has zero variance and zero covariance with all risky assets. An investor now allocates fraction \(w_0\) to the risk-free asset and weights \(w \in \mathbb{R}^n\) (with \(e^T w = 1 - w_0\)) to risky assets.

When a risk-free asset is available, the efficient frontier in \((\sigma_p, \mu_p)\) space degenerates to a straight line called the Capital Market Line (CML):

\[ \mu_p = r_f + \frac{\mu_T - r_f}{\sigma_T} \cdot \sigma_p, \]

where \((\sigma_T, \mu_T)\) are the risk and expected return of the tangency portfolio — the unique portfolio of risky assets that lies on both the CML and the hyperbolic frontier. The slope of the CML is the Sharpe ratio of the tangency portfolio:

\[ S_T = \frac{\mu_T - r_f}{\sigma_T}. \]
Definition (Sharpe Ratio): For a portfolio \(p\) with expected return \(\mu_p\), standard deviation \(\sigma_p\), and risk-free rate \(r_f\), the Sharpe ratio is \(S_p = (\mu_p - r_f) / \sigma_p\). It measures excess return per unit of total risk and is the most widely used single-number performance metric in practice.

The tangency portfolio is found by maximising the Sharpe ratio over all risky portfolios. Since \(S_p\) is a ratio of a linear to a nonlinear function of \(w\), this is not immediately a convex programme. However, one can reduce it: let \(v = \Sigma^{-1}(\mu - r_f e)\). Then the tangency portfolio weights are \(w_T = v / (e^T v)\). This follows directly from differentiating \(S_p^2 = (\mu - r_f e)^T w / \sqrt{w^T \Sigma w}\) after normalising.

The economic implication is profound: in a world with a risk-free asset, every investor optimally holds the same risky portfolio (the tangency portfolio), with the fraction invested in risky assets determined by their risk aversion. This is the one-fund theorem or CAPM separation result.

3.4 Tangency Portfolio: Numerical Example

Example (Tangency portfolio for the three-asset universe). Using the three-asset data from Section 2.4, with \(\mu = (0.10, 0.12, 0.07)^T\), \(\Sigma\) as given, and risk-free rate \(r_f = 0.04\), we compute the tangency portfolio. \[ \mu - r_f e = \begin{pmatrix}0.10-0.04\\0.12-0.04\\0.07-0.04\end{pmatrix} = \begin{pmatrix}0.06\\0.08\\0.03\end{pmatrix}. \]\[ v \approx \begin{pmatrix}29.63&-5.56&-3.70\\-5.56&14.81&-6.17\\-3.70&-6.17&21.60\end{pmatrix}\begin{pmatrix}0.06\\0.08\\0.03\end{pmatrix}. \]

Computing each component: \begin{align*} v_1 &\approx 29.63\times0.06 + (-5.56)\times0.08 + (-3.70)\times0.03 = 1.778 - 0.445 - 0.111 = 1.222,\ v_2 &\approx (-5.56)\times0.06 + 14.81\times0.08 + (-6.17)\times0.03 = -0.334 + 1.185 - 0.185 = 0.666,\ v_3 &\approx (-3.70)\times0.06 + (-6.17)\times0.08 + 21.60\times0.03 = -0.222 - 0.494 + 0.648 = -0.068. \end{align*} So \(v \approx (1.222, 0.666, -0.068)^T\).

\[ e^Tv = 1.222 + 0.666 + (-0.068) = 1.820, \]\[ w_T = \frac{v}{e^Tv} \approx \begin{pmatrix}0.672\\0.366\\-0.037\end{pmatrix}. \]\[ \mu_T = w_T^T\mu \approx 0.672\times0.10 + 0.366\times0.12 + (-0.037)\times0.07 \approx 0.108, \]

and its variance is \(\sigma_T^2 = w_T^T\Sigma w_T\). The tangency portfolio has the highest Sharpe ratio \(S_T = (\mu_T - r_f)/\sigma_T\) among all risky portfolios; every investor with access to borrowing or lending at rate \(r_f\) holds this portfolio in their risky allocation, scaled by their risk aversion.

3.5 The Capital Asset Pricing Model

The one-fund theorem asserts that in equilibrium, every investor holds the same risky portfolio — the tangency portfolio. If investors have homogeneous beliefs (they agree on \(\mu\) and \(\Sigma\)), then in equilibrium the tangency portfolio must equal the market portfolio \(w_M\): the portfolio in which each asset is held in proportion to its market capitalisation. This is the central insight of the Capital Asset Pricing Model (CAPM), developed independently by Sharpe (1964), Lintner (1965), and Mossin (1966).

Definition (CAPM Beta). For any asset \(i\), its beta with respect to the market portfolio is \[ \beta_i = \frac{\mathrm{Cov}(r_i, r_M)}{\mathrm{Var}(r_M)} = \frac{(\Sigma w_M)_i}{w_M^T\Sigma w_M}. \] Beta measures the systematic (non-diversifiable) risk of asset \(i\): the fraction of the asset's risk that is correlated with the market's overall movements. A beta of 1 means the asset moves one-for-one with the market; a beta greater than 1 indicates an amplified response to market fluctuations.
Definition (CAPM Expected Return Equation). Under the CAPM, the equilibrium expected return of asset \(i\) satisfies \[ \mathbb{E}[r_i] = r_f + \beta_i\bigl(\mathbb{E}[r_M] - r_f\bigr), \] where \(\mathbb{E}[r_M] - r_f\) is the equity risk premium — the expected excess return of the market portfolio over the risk-free rate.

This equation has a clean geometric interpretation: all assets lie on the Security Market Line (SML), the line in \((\beta_i, \mathbb{E}[r_i])\) space passing through \((0, r_f)\) with slope \(\mathbb{E}[r_M] - r_f\). Assets plotting above the SML (positive alpha) are underpriced relative to their systematic risk; assets below the SML are overpriced. Active managers seek assets with positive alpha.

Remark (Derivation of CAPM beta from the tangency condition). We sketch why the CAPM equation holds. Suppose the market portfolio is \(w_M = v/(e^Tv)\) where \(v = \Sigma^{-1}(\mu - r_f e)\). The excess return vector must therefore satisfy \[ \mu - r_f e = \frac{e^Tv}{1}\Sigma w_M = \lambda^* \Sigma w_M, \] for some scalar \(\lambda^* > 0\). Entry \(i\) of this equation is \[ \mu_i - r_f = \lambda^* (\Sigma w_M)_i. \]

Multiplying both sides of the vector equation by \(w_M^T\) gives

\[ \mu_M - r_f = \lambda^* w_M^T\Sigma w_M = \lambda^*\sigma_M^2, \]

so \(\lambda^* = (\mu_M - r_f)/\sigma_M^2\). Substituting back:

\[ \mu_i - r_f = \frac{\mu_M - r_f}{\sigma_M^2}(\Sigma w_M)_i = (\mu_M - r_f)\frac{\mathrm{Cov}(r_i, r_M)}{\sigma_M^2} = (\mu_M - r_f)\beta_i, \]

which is exactly the CAPM equation. The derivation shows that CAPM is a direct consequence of the mean-variance tangency condition — it is not an independent financial theory but rather an equilibrium implication of Markowitz optimisation combined with homogeneous beliefs.


Chapter 4: Quadratic Programming

The portfolio problems studied so far involve only equality constraints. Real-world portfolio management, however, routinely imposes inequality constraints: no-short-selling restrictions (\(w \geq 0\)), sector exposure limits, concentration limits, turnover bounds, and so on. Handling these requires the machinery of quadratic programming (QP).

4.1 General QP Formulation

A quadratic programme is:

\[ \min_{x \in \mathbb{R}^n} \; \frac{1}{2} x^T Q x + c^T x \quad \text{subject to} \quad Ax = b, \quad Cx \leq d. \]

We assume \(Q \succ 0\) (which holds when \(Q = \Sigma\) and the covariance matrix is positive definite). Under this assumption, the QP is strictly convex and has a unique global minimum. The feasible set — defined by linear equality and inequality constraints — is a convex polyhedron.

The KKT conditions for this QP are:

\[ Q x + c - A^T \lambda - C^T \mu = 0, \]\[ Ax = b, \quad Cx \leq d, \]\[ \mu \geq 0, \quad \mu^T(Cx - d) = 0 \quad \text{(complementary slackness).} \]

Because \(Q \succ 0\), the KKT conditions are both necessary and sufficient for global optimality: any point satisfying them is the unique minimiser.

Example (Two-variable equality-constrained QP via KKT). Solve \[ \min_{x_1,x_2} \; x_1^2 + x_2^2 - x_1 - x_2 \quad \text{subject to} \quad x_1 + x_2 = 1. \] Here \(Q = 2I\), \(c = (-1,-1)^T\), \(A = (1,1)\), \(b = 1\). The Lagrangian is \[ \mathcal{L}(x,\lambda) = x_1^2 + x_2^2 - x_1 - x_2 - \lambda(x_1+x_2-1). \]

KKT stationarity conditions \(\partial\mathcal{L}/\partial x_i = 0\): \begin{align*} 2x_1 - 1 - \lambda &= 0,\ 2x_2 - 1 - \lambda &= 0. \end{align*} Subtracting: \(2(x_1 - x_2) = 0 \implies x_1 = x_2\). Substituting into the constraint \(x_1 + x_2 = 1\) gives \(x_1 = x_2 = 1/2\). The multiplier is \(\lambda = 2(1/2) - 1 = 0\).

The optimal solution is \(x^* = (1/2, 1/2)^T\) with objective value \(1/4 + 1/4 - 1/2 - 1/2 = -1/2\).

\[ \begin{pmatrix}2&0&-1\\0&2&-1\\-1&-1&0\end{pmatrix}\begin{pmatrix}x_1\\x_2\\\lambda\end{pmatrix} = \begin{pmatrix}1\\1\\-1\end{pmatrix}. \]

Solving by elimination: from rows 1 and 2, \(2x_1 - \lambda = 1\) and \(2x_2 - \lambda = 1\), so \(x_1 = x_2 = (1+\lambda)/2\). Row 3 gives \(-x_1 - x_2 = -1\), i.e., \(x_1+x_2=1\), so \((1+\lambda)/2 + (1+\lambda)/2 = 1\), giving \(1+\lambda=1\), \(\lambda=0\), \(x_1=x_2=1/2\). Consistent.

Financial interpretation. With two assets of equal expected risk and return, the minimum variance budget-constrained portfolio is the equally-weighted portfolio — a result that holds more generally when the covariance matrix has a special symmetric structure.

4.2 Equality-Constrained QP

When there are only equality constraints (\(Cx \leq d\) absent), the KKT conditions reduce to the bordered linear system

\[ \begin{pmatrix} Q & -A^T \\ A & 0 \end{pmatrix} \begin{pmatrix} x \\ \lambda \end{pmatrix} = \begin{pmatrix} -c \\ b \end{pmatrix}. \]

This is a symmetric (but indefinite) system of order \(n + m\), where \(m\) is the number of equality constraints. It can be solved by block elimination (exploiting positive definiteness of \(Q\)) or by applying a symmetric indefinite factorization (LDLT). In the portfolio context with \(Q = 2\Sigma\), \(m = 2\) (the return and budget constraints), this reduces to an efficient \((n+2) \times (n+2)\) system.

4.3 Active Set Method

For QPs with inequality constraints, the active set method is the classical algorithm. The key observation is that at the optimum \(x^*\), some inequality constraints are satisfied with equality (the active constraints at \(x^*\)), while the rest are strict inequalities. If we knew in advance which constraints were active, we could solve an equality-constrained QP restricted to the active constraints — but of course we do not know this in advance.

Definition (Working Set): The working set \(\mathcal{W}\) at a given iterate is the current estimate of the set of active inequality constraints. At each step the algorithm solves the QP with constraints \(Ax = b\) and \(C_{\mathcal{W}} x = d_{\mathcal{W}}\) (treating working-set constraints as equalities).

The active set algorithm proceeds as follows:

  1. Start with a feasible point \(x_0\) and initial working set \(\mathcal{W}_0\).
  2. Solve the equality-constrained QP on \(\mathcal{W}_k\) to get step direction \(p\).
  3. If \(p = 0\), check the KKT multipliers for the working-set constraints. If all multipliers \(\mu_i \geq 0\), declare optimality. If some \(\mu_i < 0\), remove that constraint from \(\mathcal{W}_k\) and continue.
  4. If \(p \neq 0\), take a step along \(p\), blocking at the first violated constraint. Add the blocking constraint to \(\mathcal{W}_{k+1}\).

Under non-degeneracy assumptions, the algorithm terminates in a finite number of steps. Degeneracy (when multiple constraints are active at a vertex of the feasible set) can cause cycling, but anti-degeneracy rules (perturbation or Bland’s rule analogues) prevent this in practice.

Remark (Active set vs. interior-point methods for portfolio QPs). For general large-scale QPs, interior-point methods (Chapter 9) are often preferred because they enjoy polynomial iteration complexity and do not suffer from the combinatorial difficulty of active-set enumeration. However, for portfolio QPs with moderate \(n\) (say, \(n \leq 5000\)) and a rich set of linear constraints, the active set method has important practical advantages:
  1. Warm-starting. If the covariance matrix or expected returns change slightly (e.g., on a daily rebalancing cycle), the active set from the previous day’s solution is usually near-optimal for today’s problem. The active set method can be initialised with this warm start and typically converges in just a few pivots, whereas interior-point methods must restart from a central point and cannot exploit a warm start effectively.

  2. Parametric continuation. The critical line algorithm (Section 6.1) traces the entire efficient frontier by parametrically varying the target return \(\bar{R}\), updating the active set at each breakpoint. This produces the complete frontier in a single pass — an inherently active-set paradigm. Interior-point methods, solved independently at each \(\bar{R}\), would require many separate runs.

  3. Structure exploitation. In portfolio QPs, the non-negativity constraints \(w_i \geq 0\) are simple bounds that the active set method handles very efficiently (adding or removing a single constraint per pivot), whereas interior-point methods treat all constraints uniformly and their per-iteration cost scales with the total number of constraints.

For very large universes (\(n > 10{,}000\)) or when the problem has complex non-linear constraints, interior-point methods become preferable. But for the moderate-scale portfolio QPs typical of institutional practice, the active set method remains the algorithm of choice.

4.4 Transaction Costs in the QP Framework

When the portfolio manager moves from current holdings \(w_0\) to new holdings \(w\), transaction costs alter the structure of the QP and invalidate the clean two-fund theorem. A particularly clean formulation uses a turnover constraint rather than a cost term in the objective.

The turnover-constrained mean-variance QP is:

\[ \min_w \; w^T\Sigma w \quad \text{subject to} \quad \mu^Tw \geq \bar{R},\quad e^Tw=1,\quad \tfrac{1}{2}\|w-w_0\|_1 \leq \tau. \]

The \(\ell_1\) turnover constraint is linearised by introducing split variables \(u_i^+, u_i^- \geq 0\) with \(w_i - (w_0)_i = u_i^+ - u_i^-\):

\[ \min_{w,u^+,u^-} \; w^T\Sigma w \quad \text{s.t.} \quad \mu^Tw \geq \bar{R},\; e^Tw=1,\; \sum_i(u_i^++u_i^-)\leq 2\tau,\; w_i=(w_0)_i+u_i^+-u_i^-,\; u_i^\pm\geq 0. \]

This is a convex QP in \((w, u^+, u^-)\) with \(3n\) variables and \(O(n)\) constraints, directly solvable by any QP solver.

Effect on the KKT system. The turnover constraint introduces an additional multiplier \(\nu \geq 0\) for the inequality \(\sum_i(u_i^++u_i^-)\leq 2\tau\), and the complementary slackness condition \(\nu(\sum_i(u_i^++u_i^-) - 2\tau) = 0\) determines whether the turnover constraint binds. When \(\nu > 0\) (the constraint is active), the optimal \(w\) is “pulled” toward \(w_0\): the manager cannot move as far from the current portfolio as the unconstrained problem would dictate. The efficient frontier in \((\sigma_p, \mu_p)\) space is then parameterised jointly by \((\bar{R}, \tau)\), and the reachable region is a two-dimensional object rather than the one-dimensional curve of classical mean-variance theory.


Chapter 5: Numerical Linear Algebra Methods

The KKT systems arising in portfolio QPs must be solved reliably and efficiently. For portfolios of hundreds or thousands of assets, naive approaches fail due to poor conditioning or excessive arithmetic. This chapter presents the null-space method and the range-space method — two structural approaches that exploit the problem geometry.

5.1 Null-Space Method for Equality-Constrained QP

Consider the equality-constrained QP: minimise \(\frac{1}{2} x^T Q x + c^T x\) subject to \(Ax = b\), where \(A \in \mathbb{R}^{m \times n}\) with \(m < n\) and full row rank. The feasible set is an affine subspace of dimension \(n - m\).

Idea: Decompose \(\mathbb{R}^n = \mathcal{R}(A^T) \oplus \mathcal{N}(A)\). Any feasible point can be written as \(x = x_p + Z u\), where \(x_p\) is a particular solution (\(A x_p = b\)) and the columns of \(Z \in \mathbb{R}^{n \times (n-m)}\) form a basis for the null space of \(A\). Substituting into the objective:

\[ f(x_p + Zu) = \frac{1}{2} u^T (Z^T Q Z) u + (Q x_p + c)^T Z u + \text{const}. \]

This is an unconstrained QP in \(u \in \mathbb{R}^{n-m}\). The reduced Hessian \(Z^T Q Z\) is positive definite (under our assumptions), so the solution is \(u^* = -(Z^T Q Z)^{-1} Z^T (Q x_p + c)\) and \(x^* = x_p + Z u^*\).

A natural choice for \(Z\) is obtained from the QR factorization of \(A^T\). If \(A^T = Q_1 R^T\) (writing the QR of \(A^T\) as \([Q_1 \; Q_2] \begin{bmatrix} R \\ 0 \end{bmatrix}\)), then the columns of \(Q_2\) form an orthonormal basis for \(\mathcal{N}(A)\). This “orthonormal null space” choice minimises round-off error in the reduced Hessian computation. In the portfolio context, \(A = [e^T; \mu^T]\) (a \(2 \times n\) matrix), and the null space has dimension \(n - 2\), so \(Z \in \mathbb{R}^{n \times (n-2)}\).

Example (Null-space method with \(n=3\), \(m=2\)). Consider three assets with \(\mu = (0.10, 0.12, 0.07)^T\) and the constraint matrix encoding the budget and return constraints: \[ A = \begin{pmatrix}1&1&1\\0.10&0.12&0.07\end{pmatrix} \in \mathbb{R}^{2\times3}. \] We need the null space of \(A\), which has dimension \(n - m = 1\). So \(Z \in \mathbb{R}^{3\times1}\) is a single vector. \[ A^T = \begin{pmatrix}1&0.10\\1&0.12\\1&0.07\end{pmatrix} = [Q_1 \; Q_2]\begin{pmatrix}R\\0\end{pmatrix}, \]

where \(Q_1\in\mathbb{R}^{3\times2}\), \(Q_2\in\mathbb{R}^{3\times1}\), \(R\in\mathbb{R}^{2\times2}\).

\[ q_1 = \frac{1}{\sqrt{3}}(1,1,1)^T. \]\[ q_1^T a_2 = \frac{0.10+0.12+0.07}{\sqrt{3}} = \frac{0.29}{\sqrt{3}}, \quad a_2' = a_2 - \frac{0.29}{\sqrt{3}}q_1 = \begin{pmatrix}0.10\\0.12\\0.07\end{pmatrix} - \frac{0.29}{3}\begin{pmatrix}1\\1\\1\end{pmatrix} = \begin{pmatrix}-0.00\overline{3}\\0.02\overline{3}\\-0.02\overline{6}\end{pmatrix}. \]

Normalising gives \(q_2 = a_2'/\|a_2'\|\).

\[ Z = q_3 \approx \begin{pmatrix}0.577\\-0.816\\0.000\end{pmatrix} \]

(up to sign and normalisation). One checks: \(e^T Z = 0.577 - 0.816 + 0.000 \approx -0.239\), which should be zero — indicating that the exact null vector requires the precise computation above.

Reduced problem. Any feasible weight vector is written \(w = w_p + Zu\) where \(w_p\) satisfies \(Aw_p = (1, \bar{R})^T\) and \(u \in \mathbb{R}\) is a scalar. The reduced Hessian is \(Z^T\Sigma Z \in \mathbb{R}^{1\times1}\) — a positive scalar — and the unconstrained minimiser in \(u\) is \(u^* = -(Z^T\Sigma Z)^{-1}Z^T(\Sigma w_p + c)\), giving the optimal weights directly.

5.2 Range-Space Method and the Schur Complement

An alternative is the range-space method, which solves the KKT system directly but exploits the structure of \(Q\). If \(Q\) is easily invertible (e.g., diagonal or block-diagonal, which can arise in factor-model approximations of \(\Sigma\)), one eliminates \(x\) from the stationarity equation to get:

\[ A Q^{-1} A^T \lambda = A Q^{-1} c + b. \]

The \(m \times m\) matrix \(M = A Q^{-1} A^T\) is the Schur complement of \(Q\) in the KKT system, and it is positive definite. Solving for \(\lambda\) and back-substituting for \(x\) requires only \(m \times m\) linear systems rather than \((n+m) \times (n+m)\). For portfolio problems where \(m = 2\), this reduces to solving a \(2 \times 2\) system — trivial arithmetic.

5.3 Conditioning, Normal Equations, and Sparsity

The normal equations approach to least-squares (\(A^T A x = A^T b\)) is algebraically correct but numerically inferior: the condition number satisfies \(\kappa(A^T A) = \kappa(A)^2\). In contrast, solving via QR gives \(\kappa(R) = \kappa(A)\). For large, ill-conditioned portfolio problems (e.g., assets with nearly collinear returns), the QR-based approach can mean the difference between a reliable solution and numerical garbage.

For institutional portfolios with \(n\) in the thousands, sparsity becomes critical. If the covariance matrix \(\Sigma\) is given in a factor-model form \(\Sigma = B F B^T + D\) (where \(B \in \mathbb{R}^{n \times k}\) with \(k \ll n\) and \(D\) is diagonal), then the Woodbury matrix identity allows \(\Sigma^{-1}\) to be computed in \(O(nk^2)\) rather than \(O(n^3)\) time:

\[ \Sigma^{-1} = D^{-1} - D^{-1} B (F^{-1} + B^T D^{-1} B)^{-1} B^T D^{-1}. \]

This is crucial for practical implementations in MATLAB when \(n\) is large.

Example (Woodbury formula for \(n=4\), \(k=1\) factor model). Suppose \(n=4\) assets are driven by a single common factor (\(k=1\)), with factor loading vector \(b\in\mathbb{R}^4\) and idiosyncratic variance matrix \(D\). Specifically, let \[ b = \begin{pmatrix}0.8\\0.6\\0.5\\0.4\end{pmatrix}, \quad F = (1) \in \mathbb{R}^{1\times1}, \quad D = \mathrm{diag}(0.04, 0.03, 0.02, 0.05). \] Then \(B = b \in \mathbb{R}^{4\times1}\) and \[ \Sigma = bb^T + D = \begin{pmatrix}0.64&0.48&0.40&0.32\\0.48&0.36&0.30&0.24\\0.40&0.30&0.25&0.20\\0.32&0.24&0.20&0.16\end{pmatrix} + \begin{pmatrix}0.04&&&\\&0.03&&\\&&0.02&\\&&&0.05\end{pmatrix}. \]

Rather than inverting the full \(4\times4\) matrix \(\Sigma\), we apply the Woodbury identity with \(B = b\), \(F = 1\):

Step 1: Compute \(D^{-1} = \mathrm{diag}(25, 33.\overline{3}, 50, 20)\).

\[ = 16 + 12 + 12.5 + 3.2 = 43.7. \]

Step 3: The scalar Schur complement is \(F^{-1} + B^TD^{-1}B = 1 + 43.7 = 44.7\).

\[ \Sigma^{-1} = D^{-1} - D^{-1}b\,(44.7)^{-1}\,b^TD^{-1}. \]

The rank-1 correction \(D^{-1}b\,(44.7)^{-1}b^TD^{-1}\) requires only \(O(n)\) multiplications (since it is an outer product scaled by a scalar), while a naive \(4\times4\) inversion requires \(O(n^3) = O(64)\) operations. For \(n=10{,}000\) and \(k=5\) factors, the Woodbury formula reduces the cost from \(O(10^{12})\) to \(O(nk^2) = O(5\times10^8)\) — a speed-up of thousands.

Numerically, the \((1,1)\) entry of \(\Sigma^{-1}\) via Woodbury is:

\[ (\Sigma^{-1})_{11} = D_{11}^{-1} - (D^{-1}b)_1^2/(44.7) = 25 - (25\times0.8)^2/44.7 = 25 - 400/44.7 \approx 25 - 8.95 = 16.05. \]

Chapter 6: Active Set and Parametric Methods

The active set method of Chapter 4 can be adapted to efficiently trace the entire efficient frontier. Rather than solving a fresh QP for each value of the target return \(\bar{R}\), parametric programming exploits the fact that the optimal solution changes smoothly — in fact, piecewise linearly — as \(\bar{R}\) varies. This is the basis of Markowitz’s critical line algorithm, one of the most elegant contributions to computational finance.

6.1 Parametric Quadratic Programming

Consider Formulation I as \(\bar{R}\) varies continuously from \(-\infty\) to \(+\infty\). At any optimal solution, the active set of inequality constraints (when present, e.g., \(w \geq 0\)) is constant over intervals of \(\bar{R}\) called parameter intervals or segments. At the boundary between two segments, a breakpoint occurs: one asset enters or leaves the active set (i.e., one asset transitions between \(w_i = 0\) and \(w_i > 0\)).

Within each segment, the optimal weights \(w^*(\bar{R})\) are linear functions of \(\bar{R}\) (since the KKT system restricted to a fixed active set is linear in \(\bar{R}\)). Thus the entire efficient frontier is traced by solving a sequence of equality-constrained QPs, updating the active set at each breakpoint.

Theorem (Critical Line Algorithm): For a portfolio of \(n\) assets with \(w \geq 0\) and \(e^T w = 1\), the entire efficient frontier can be computed in \(O(n^2)\) arithmetic operations by the parametric active set method (Markowitz's critical line algorithm). The frontier consists of at most \(n\) linear segments in the \((\sigma_p, \mu_p)\) plane, each corresponding to a different active set.

6.2 Connection to the Simplex Method

The critical line algorithm has a structural parallel with the simplex method in linear programming. In LP, the simplex method moves along the edges of the feasible polytope from vertex to vertex, with each step changing one variable from basic to nonbasic (or vice versa). In parametric QP, the algorithm moves along edges of the optimal face as the parameter \(\bar{R}\) changes, with each step adding or removing one asset from the active set. The piecewise-linear structure of the efficient frontier in weight space mirrors the piecewise-linear structure of the LP optimal solution path under parametric right-hand-side variation.

6.3 Practical Computation of the Efficient Frontier

In MATLAB, the efficient frontier can be computed using the quadprog function for each discretised value of \(\bar{R}\), or more elegantly via a dedicated implementation of the critical line algorithm. The key steps are:

  1. Initialise at the maximum-return portfolio (which has only one non-zero weight, the asset with the highest \(\mu_i\)).
  2. Compute the direction of change \(dw/d\bar{R}\) from the current active set KKT system.
  3. Find the next breakpoint by checking which inactive asset first becomes attractive (enters) or which active asset first reaches its bound (exits).
  4. Update the active set and repeat.

The output is a sequence of \((w^*, \sigma_p^*, \mu_p^*)\) triples at each breakpoint, from which the frontier hyperbola can be drawn by interpolation.

6.4 Sensitivity Analysis and Shadow Prices

A powerful by-product of the parametric active set method is a complete sensitivity analysis of the portfolio to changes in inputs. The Lagrange multipliers from the KKT system at each breakpoint are the shadow prices of the constraints:

  • The multiplier \(\gamma^*\) for the return constraint \(\mu^Tw = \bar{R}\) measures the marginal cost (in variance) of increasing the target return by one unit. Formally, \(\partial \sigma_p^{*2}/\partial \bar{R} = \gamma^*\). A large \(\gamma^*\) means the investor is paying dearly in variance for a small increment in expected return — a signal that the target return is near the top of the efficient frontier.

  • The multiplier for the budget constraint \(e^Tw = 1\) similarly measures the marginal value of relaxing the fully-invested constraint (e.g., allowing some cash holding).

  • The multipliers \(\mu_i^*\) for the non-negativity constraints \(w_i \geq 0\) are zero whenever asset \(i\) is held with positive weight (by complementary slackness) and negative when the constraint binds (\(w_i^* = 0\)). A large negative \(\mu_i^*\) indicates that asset \(i\) would substantially improve the portfolio if short selling were permitted; it is the “shadow cost” of the short-selling ban.

Example (Shadow price interpretation). Suppose at target return \(\bar{R} = 11\%\), the efficient portfolio has \(\sigma_p^{*2} = 0.030\) and \(\gamma^* = 0.40\). Then increasing the target to \(\bar{R} = 12\%\) costs approximately \(\Delta\sigma_p^{*2} \approx 0.40 \times 0.01 = 0.004\) in additional variance — raising the standard deviation from \(\sqrt{0.030}\approx 17.3\%\) to approximately \(\sqrt{0.034}\approx 18.4\%\). This confirms that near the top of the frontier, each additional percentage point of return requires a meaningful sacrifice in risk.

6.5 Warm-Starting and Reoptimisation

A key practical advantage of the active set method is that it can be warm-started: if the optimal active set for yesterday’s portfolio problem is known, it can be used as the initial working set for today’s problem (which differs only in updated \(\mu\) or \(\Sigma\)). Under mild perturbations, the optimal active set is unchanged, and the algorithm converges in zero or one pivot. This makes parametric active set methods dramatically faster than interior-point methods in rolling-window rebalancing applications, where the optimisation problem changes smoothly from day to day.

In institutional practice, a fund with 500 assets might rebalance daily. Running a full interior-point optimisation from scratch each day takes several seconds; warm-starting the active set method from the previous day’s solution takes milliseconds — a \(100\times\) speedup that matters enormously in production trading systems with strict latency requirements.


Chapter 7: Risk Measures — VaR and CVaR

Mean-variance analysis measures risk by the variance of portfolio returns. While elegant and computationally tractable, variance treats upside and downside deviations symmetrically — a property that fails to capture the investor’s primary concern, which is the risk of catastrophic loss. This chapter introduces two downside risk measures — Value at Risk and Conditional Value at Risk — that better reflect tail risk and are ubiquitous in regulatory and institutional finance.

7.1 Value at Risk

Definition (Value at Risk): For a portfolio with random loss \(L = -r_p\) and confidence level \(\alpha \in (0,1)\) (typically \(\alpha = 0.95\) or \(\alpha = 0.99\)), the Value at Risk at level \(\alpha\) is \[\mathrm{VaR}_\alpha = \inf\left\{ l \in \mathbb{R} : P(L > l) \leq 1 - \alpha \right\} = F_L^{-1}(\alpha),\] the \(\alpha\)-quantile of the loss distribution.

Intuitively, \(\mathrm{VaR}_{0.95} = 2\%\) means that with 95% probability, the portfolio does not lose more than 2% of its value over the period. VaR is the industry standard risk metric mandated by the Basel Accords for bank capital requirements.

Despite its popularity, VaR has a critical theoretical flaw: it is not subadditive. That is, \(\mathrm{VaR}(L_1 + L_2)\) can exceed \(\mathrm{VaR}(L_1) + \mathrm{VaR}(L_2)\), meaning that VaR can penalise diversification rather than reward it. This violates the axiomatic definition of a coherent risk measure (Artzner et al., 1999), which requires translation invariance, positive homogeneity, monotonicity, and subadditivity.

7.2 Conditional Value at Risk

Definition (Conditional Value at Risk): The CVaR at confidence level \(\alpha\), also called Expected Shortfall (ES), is the expected loss given that the loss exceeds \(\mathrm{VaR}_\alpha\): \[\mathrm{CVaR}_\alpha = \mathbb{E}\left[L \mid L \geq \mathrm{VaR}_\alpha\right].\]

CVaR captures the severity of losses in the tail, not just the threshold. It is always at least as large as \(\mathrm{VaR}_\alpha\), and it is a coherent risk measure — subadditivity means that CVaR rewards diversification, making it theoretically superior for portfolio optimisation.

7.3 Rockafellar–Uryasev Formula and LP Reformulation

The key breakthrough for portfolio CVaR minimisation is due to Rockafellar and Uryasev (2000). They showed that CVaR admits the following variational representation:

Theorem (Rockafellar–Uryasev): For any portfolio \(w\) and loss function \(L(w, \xi)\) where \(\xi\) is the random factor, \[\mathrm{CVaR}_\alpha(w) = \min_{\gamma \in \mathbb{R}} \left\{ \gamma + \frac{1}{1-\alpha} \mathbb{E}\left[\max(L(w,\xi) - \gamma, 0)\right] \right\}.\]

This is remarkable because it converts a quantile-based risk measure into a smooth (in fact, convex) optimisation problem. The minimiser in \(\gamma\) is precisely \(\mathrm{VaR}_\alpha(w)\), so the formula simultaneously computes VaR and CVaR.

Now suppose we have \(T\) historical (or simulated) scenarios \(\xi^1, \ldots, \xi^T\), giving portfolio losses \(L_t = L(w, \xi^t)\). The sample CVaR is

\[ \widehat{\mathrm{CVaR}}_\alpha(w) = \min_{\gamma} \left\{ \gamma + \frac{1}{(1-\alpha)T} \sum_{t=1}^T \max(L_t - \gamma, 0) \right\}. \]

Introducing auxiliary variables \(z_t \geq 0\) with \(z_t \geq L_t - \gamma\), this becomes:

\[ \min_{w, \gamma, z} \quad \gamma + \frac{1}{(1-\alpha)T} \sum_{t=1}^T z_t \]\[ \text{subject to} \quad z_t \geq -r^t{}^T w - \gamma, \quad z_t \geq 0, \quad e^T w = 1, \]

where \(r^t\) is the return vector under scenario \(t\) (so \(L_t = -r^t{}^T w\)). This is a linear programme with \(T + n + 1\) variables and \(2T + 1\) constraints — directly solvable by any LP solver. For \(T = 1000\) scenarios and \(n = 100\) assets, this is a modest LP easily handled in MATLAB.

The CVaR objective can also be combined with a return target or other constraints, giving a complete LP-based portfolio optimisation framework that is arguably more robust than mean-variance when the return distribution has heavy tails or is asymmetric.

7.4 Analytical VaR and CVaR for a Normal Portfolio

When portfolio returns are normally distributed, VaR and CVaR admit closed-form expressions. Let the portfolio return \(r_p \sim \mathcal{N}(\mu_p, \sigma_p^2)\), so the loss \(L = -r_p \sim \mathcal{N}(-\mu_p, \sigma_p^2)\).

Definition (Gaussian VaR and CVaR). For a portfolio with normally distributed return \(r_p\sim\mathcal{N}(\mu_p,\sigma_p^2)\) and loss \(L = -r_p\), at confidence level \(\alpha\): \[ \mathrm{VaR}_\alpha = -\mu_p + z_\alpha\sigma_p, \] \[ \mathrm{CVaR}_\alpha = -\mu_p + \sigma_p\,\frac{\phi(z_\alpha)}{1-\alpha}, \] where \(z_\alpha = \Phi^{-1}(\alpha)\) is the \(\alpha\)-quantile of the standard normal, \(\phi\) is the standard normal density, and \(\Phi\) is its CDF.
Example (Analytical VaR and CVaR at 95% confidence). Suppose a portfolio has \(\mu_p = 0.10\) and \(\sigma_p = 0.20\) (annualised). Compute \(\mathrm{VaR}_{0.95}\) and \(\mathrm{CVaR}_{0.95}\).

At \(\alpha = 0.95\), the standard normal quantile is \(z_{0.95} = 1.645\) and the density value is \(\phi(1.645) \approx 0.1031\).

\[ \mathrm{VaR}_{0.95} = -\mu_p + z_{0.95}\sigma_p = -0.10 + 1.645\times0.20 = -0.10 + 0.329 = 0.229. \]

So there is a 5% chance of losing more than 22.9% of portfolio value.

\[ \mathrm{CVaR}_{0.95} = -\mu_p + \sigma_p\frac{\phi(z_{0.95})}{1-\alpha} = -0.10 + 0.20\times\frac{0.1031}{0.05} = -0.10 + 0.20\times2.062 = -0.10 + 0.4124 = 0.3124. \]

Conditional on being in the worst 5% of outcomes, the expected loss is 31.2% — substantially larger than the VaR threshold of 22.9%. This gap \(\mathrm{CVaR}_{0.95} - \mathrm{VaR}_{0.95} = 0.0834\) reflects the weight of the tail beyond the 95th percentile.

Remark on units. These are annualised fractional losses. To convert to a daily VaR under the i.i.d. assumption, divide \(\sigma_p\) by \(\sqrt{252}\) before applying the formula.

Theorem (CVaR is a coherent risk measure; VaR is not). For any portfolio loss distribution and any portfolio weights \(w\):
  1. CVaR is convex in \(w\). For any \(\lambda \in [0,1]\) and portfolios \(w_1, w_2\): \[\mathrm{CVaR}_\alpha(\lambda w_1 + (1-\lambda)w_2) \leq \lambda\,\mathrm{CVaR}_\alpha(w_1) + (1-\lambda)\,\mathrm{CVaR}_\alpha(w_2).\] This is the subadditivity property: diversification never increases CVaR.
  2. CVaR is monotone, positively homogeneous, and translation-invariant, completing its status as a coherent risk measure in the sense of Artzner–Delbaen–Eber–Heath (1999).
  3. VaR is not subadditive in general. There exist portfolios \(w_1, w_2\) and a loss distribution such that \(\mathrm{VaR}_\alpha(w_1+w_2) > \mathrm{VaR}_\alpha(w_1) + \mathrm{VaR}_\alpha(w_2)\).
Proof. The convexity of CVaR follows directly from the Rockafellar–Uryasev variational formula. For fixed \(\gamma\), the function \(h_\gamma(w) = \gamma + \frac{1}{1-\alpha}\mathbb{E}[\max(-w^Tr - \gamma, 0)]\) is convex in \(w\) (it is an affine transformation of an expectation of convex functions of \(w\)). Since CVaR is the infimum over \(\gamma\) of a family of convex functions (in \(w\)), and pointwise infimum of convex functions is convex, CVaR is convex. For the failure of VaR subadditivity, a standard counterexample uses two bonds each with a 4% default probability and independent defaults: at the 95% confidence level, \(\mathrm{VaR}_{0.95}\) of each individual bond is 0 (since the 95th percentile of loss is 0 for each), but the combined portfolio has a default probability of \(1 - (0.96)^2 \approx 7.84\% > 5\%\), so the combined VaR can be positive — exceeding the sum of the individual VaRs of zero.

7.5 Factor Models and Structured Covariance

Factor models reduce the dimensionality of portfolio optimisation by decomposing asset covariances into a small number of common factors plus idiosyncratic noise. This section connects the factor model structure of Chapter 10.3 to the efficient frontier computation.

The Single-Factor (CAPM) Model

The simplest factor model is the CAPM model with one systematic factor (the market):

\[ r_i = \alpha_i + \beta_i r_M + \varepsilon_i, \quad \mathrm{Cov}(\varepsilon_i, \varepsilon_j) = 0 \text{ for } i \neq j. \]

This implies

\[ \Sigma = \beta\beta^T\sigma_M^2 + D, \]

where \(\beta = (\beta_1, \ldots, \beta_n)^T\) is the vector of market betas and \(D = \mathrm{diag}(\sigma_{\varepsilon_1}^2, \ldots, \sigma_{\varepsilon_n}^2)\) collects the idiosyncratic variances. This is exactly the Woodbury form \(\Sigma = BFB^T + D\) with \(B = \beta\) and \(F = \sigma_M^2\).

Consequences for efficient frontier computation. Under this factor structure:

  1. \(\Sigma^{-1}\) can be computed in \(O(n)\) time (since \(k=1\)) via the Woodbury formula, reducing from the standard \(O(n^3)\) to essentially \(O(n)\).

  2. \[ \Sigma^{-1}e = D^{-1}e - \frac{(D^{-1}\beta)({\beta^TD^{-1}e})}{\sigma_M^{-2} + \beta^TD^{-1}\beta}, \]

    a rank-1 update of the simple diagonal solution. In component form: the GMVP weight of asset \(i\) is approximately \(1/\sigma_{\varepsilon_i}^2\) (normalised), adjusted for the beta loading. Assets with small idiosyncratic variance receive larger GMVP weights.

  3. The entire efficient frontier can be computed without ever forming or storing the full \(n\times n\) matrix \(\Sigma\) — only the \(n\)-dimensional vectors \(\beta, D, e, \mu\) and the scalar \(\sigma_M^2\) are needed. This enables portfolio optimisation at scales of tens of thousands of assets in real time.

Remark (Multi-factor models). In practice, \(k = 3\) (Fama–French: market, size, value) to \(k = 30\) (commercial risk model providers like Barra or Axioma) factors are used. Each additional factor improves explanatory power but increases the dimension of the Schur-complement system in the Woodbury formula from \(1\times1\) to \(k\times k\). For \(k \leq 100\) and \(n\leq 10{,}000\), this remains far cheaper than direct inversion of \(\Sigma\). The practical trade-off is model risk: more factors reduce approximation error but require more estimation, introducing sampling error. Cross-validation on a holdout period is the standard method for selecting \(k\).

Chapter 8: Transaction Costs and Rebalancing

Portfolio theory as developed in Chapters 2–7 treats portfolio selection as a static, one-shot problem. In practice, portfolios must be periodically rebalanced as asset prices move, new capital flows in or out, or the investor’s risk preferences change. Rebalancing incurs transaction costs — bid-ask spreads, market impact, commissions, and taxes — that must be accounted for in the optimisation. Ignoring transaction costs can lead to excessive turnover and poor out-of-sample performance.

8.1 Single-Period Rebalancing with L1 Transaction Costs

Suppose the investor currently holds weights \(w_0 \in \mathbb{R}^n\) and wishes to move to new weights \(w\) (with \(e^T w = 1\)). The vector of trades is \(\Delta w = w - w_0\). Under proportional transaction costs (a fraction \(\kappa\) of the value traded), the total cost is \(\kappa \|w - w_0\|_1 = \kappa \sum_i |w_i - (w_0)_i|\).

The rebalancing problem is:

\[ \min_w \; w^T \Sigma w - \mu^T w + \kappa \|w - w_0\|_1 \quad \text{subject to} \quad e^T w = 1. \]

The \(\ell_1\) norm is convex but non-differentiable. The standard linearisation introduces auxiliary variables \(u_i \geq 0\) via the substitution \(w_i - (w_0)_i = u_i^+ - u_i^-\) with \(u_i^+, u_i^- \geq 0\):

\[ \|w - w_0\|_1 = \sum_i (u_i^+ + u_i^-), \]

subject to \(w_i = (w_0)_i + u_i^+ - u_i^-\). This converts the problem into a standard QP (with objective \(w^T \Sigma w - \mu^T w + \kappa \sum_i (u_i^+ + u_i^-)\), linear constraints, and non-negativity on \(u_i^\pm\)). MATLAB’s quadprog handles this directly.

8.2 Turnover Constraints

Rather than penalising transaction costs in the objective, one can impose a hard turnover constraint:

\[ \frac{1}{2}\|w - w_0\|_1 \leq \tau, \]

where \(\tau > 0\) is the maximum fraction of the portfolio that may be traded. Using the same \(u^\pm\) substitution, this becomes a linear constraint \(\sum_i (u_i^+ + u_i^-) \leq 2\tau\) added to the QP. In institutional investment management, turnover constraints are ubiquitous: trading desks impose them to limit market impact and operational risk.

8.3 Multi-Period Rebalancing and Dynamic Programming

If the investor rebalances at times \(t = 0, 1, \ldots, T-1\) with known return distributions, the multi-period problem is a stochastic dynamic programme. Under the mean-variance criterion, the Bellman equation at time \(t\) with current wealth \(W_t\) and portfolio weights \(w_t\) is:

\[ V_t(W_t, w_t) = \max_{w_{t+1}} \left\{ \mu^T w_{t+1} \cdot W_t - \frac{\lambda}{2} w_{t+1}^T \Sigma w_{t+1} \cdot W_t^2 - \kappa \|w_{t+1} - w_t\|_1 \cdot W_t + V_{t+1}(\cdot) \right\}. \]

For general distributions, this must be solved numerically (by discretising the state space or by Monte Carlo methods). Under special assumptions (e.g., i.i.d. returns, quadratic utility), closed-form solutions exist and reveal that the optimal policy is to rebalance toward a “target portfolio” that lies between the static efficient portfolio and the current holdings, with the degree of rebalancing decreasing as transaction costs increase.

8.4 Tax-Loss Harvesting

A secondary consideration in taxable accounts is tax-loss harvesting: realising capital losses to offset taxable gains elsewhere. An asset with an embedded loss (current price below purchase price) has a “tax benefit” from being sold. This can be incorporated into the rebalancing objective as a negative transaction cost for loss-generating trades. The resulting optimisation is a QP with asymmetric cost terms (\(u_i^+\) and \(u_i^-\) have different coefficients), which is handled naturally within the \(u^\pm\) reformulation.


Chapter 9: Nonlinear Programming

The portfolio problems considered so far are quadratic or linear programmes — a special, computationally tractable class. Several important portfolio objectives and constraints lead to nonlinear programmes (NLPs) that require more general algorithms. This chapter surveys the theoretical foundations and two major algorithmic families: sequential quadratic programming and interior-point methods.

9.1 General NLP and KKT Conditions

The general NLP is:

\[ \min_{x \in \mathbb{R}^n} \; f(x) \quad \text{subject to} \quad g_i(x) \leq 0, \; i = 1,\ldots,m, \quad h_j(x) = 0, \; j = 1,\ldots,p. \]

We assume \(f\), \(g_i\), and \(h_j\) are smooth (\(C^2\)). The KKT conditions are necessary for local optimality under a constraint qualification (e.g., LICQ — linear independence of the active constraint gradients):

\[ \nabla f(x^*) + \sum_i \mu_i \nabla g_i(x^*) + \sum_j \lambda_j \nabla h_j(x^*) = 0, \]\[ \mu_i \geq 0, \quad \mu_i g_i(x^*) = 0 \quad \forall i, \quad h_j(x^*) = 0 \quad \forall j. \]

The second-order sufficient conditions require that the reduced Hessian of the Lagrangian is positive definite on the tangent space of the active constraints. When these hold, \(x^*\) is a strict local minimum.

9.2 Sequential Quadratic Programming

Sequential Quadratic Programming (SQP) is the state-of-the-art algorithm for smooth NLPs. The idea is to approximate the NLP locally by a QP at each iterate and solve the QP to get a search direction.

Given current iterate \(x_k\), the QP subproblem is:

\[ \min_{d \in \mathbb{R}^n} \; \frac{1}{2} d^T W_k d + \nabla f(x_k)^T d \quad \text{subject to} \quad g_i(x_k) + \nabla g_i(x_k)^T d \leq 0, \quad h_j(x_k) + \nabla h_j(x_k)^T d = 0, \]

where \(W_k\) is an approximation to the Hessian of the Lagrangian \(\nabla^2_{xx} \mathcal{L}(x_k, \lambda_k, \mu_k)\). The QP solution \(d_k\) gives the search direction; a line search along \(d_k\) ensures sufficient decrease in a merit function. The multipliers are updated from the QP solution. Under regularity conditions, SQP converges superlinearly (or even quadratically with exact Hessians).

SQP applies directly to portfolio problems with nonlinear objectives such as Sharpe ratio maximisation. The Sharpe ratio \(S(w) = (\mu^T w - r_f) / \sqrt{w^T \Sigma w}\) is a smooth, nonlinear, non-convex function of \(w\) on the feasible set. Its Hessian is dense, and the risk parity problem (equal risk contribution) also leads to a non-convex NLP. SQP handles both.

9.3 Interior-Point Methods

Interior-point methods (also called barrier methods) convert inequality constraints into the objective via a logarithmic barrier:

\[ \min_x \; f(x) - \mu \sum_i \ln(-g_i(x)) \quad \text{subject to} \quad h(x) = 0, \]

where \(\mu > 0\) is the barrier parameter. As \(\mu \to 0\), the minimiser of the barrier problem converges to the solution of the original NLP. For each fixed \(\mu\), Newton’s method (augmented to handle equality constraints via the KKT system) is applied to the barrier subproblem, requiring only a few Newton steps before decreasing \(\mu\).

Interior-point methods have several advantages: they are efficient for large, dense NLPs; they do not require an active-set strategy; and their per-iteration cost is dominated by solving one large linear system (typically factored by sparse Cholesky). The MATLAB fmincon function implements both SQP and interior-point methods and is the standard tool for nonlinear portfolio optimisation in this course.

9.4 Portfolio Applications

The tracking error objective \(\min_w (w - w_b)^T \Sigma (w - w_b)\) (where \(w_b\) is a benchmark portfolio) is a QP, but tracking error subject to a Sharpe ratio constraint is nonlinear. The omega ratio (probability-weighted ratio of gains to losses relative to a threshold) is another nonlinear performance metric that can be optimised using SQP. Sharpe ratio maximisation with additional risk constraints (e.g., maximum drawdown, skewness constraints) leads to NLPs that are best handled by interior-point methods for large universes.


Chapter 10: Advanced Topics

The final chapter surveys several advanced topics in portfolio management, connecting the optimisation framework developed in earlier chapters to broader questions in financial theory, statistics, and machine learning. These topics reflect the current research frontier and are increasingly important in institutional practice.

10.1 No-Arbitrage and Risk-Neutral Pricing

A fundamental concept in modern finance is no-arbitrage: the assumption that there exists no portfolio that yields a non-negative return in all states of the world and a strictly positive return in at least one state, with zero initial cost. The first and second fundamental theorems of asset pricing establish that (under mild regularity conditions) absence of arbitrage is equivalent to the existence of a risk-neutral probability measure \(\mathbb{Q}\) under which all discounted asset prices are martingales. While the full machinery of risk-neutral pricing belongs to a course in mathematical finance, the no-arbitrage condition places structural constraints on the expected return vector \(\mu\) and covariance matrix \(\Sigma\) that inform portfolio construction.

10.2 Universal Portfolio

Cover (1991) introduced the universal portfolio algorithm, a remarkable on-line investment strategy that requires no statistical model of returns. The algorithm maintains a distribution over all constant-rebalanced portfolios (CRPs) and at each period invests according to the wealth-weighted average of all CRPs. It achieves the asymptotic wealth growth rate of the best CRP in hindsight:

\[ \frac{1}{T} \ln \frac{W_T^{\mathrm{universal}}}{W_T^{\mathrm{best CRP}}} \to 0 \quad \text{as } T \to \infty. \]

This is a minimax regret guarantee that holds for any sequence of price relatives, without probabilistic assumptions. The universal portfolio is a beautiful application of information-theoretic ideas to portfolio management and connects to the Kelly criterion for log-optimal investment.

10.3 Factor Models and Structured Covariance

In practice, estimating the full \(n \times n\) covariance matrix \(\Sigma\) from historical data requires \(O(n^2)\) parameters, but with \(T < n\) observations the sample covariance matrix is singular. Factor models address this by positing that returns are driven by a small number \(k \ll n\) of common factors:

\[ r = \mu + B f + \epsilon, \]

where \(f \in \mathbb{R}^k\) are the factors (with mean zero and covariance \(F\)), \(B \in \mathbb{R}^{n \times k}\) is the factor loading matrix, and \(\epsilon \in \mathbb{R}^n\) is idiosyncratic noise with \(\mathrm{Cov}(\epsilon) = D\) diagonal. The implied covariance matrix is \(\Sigma = B F B^T + D\), which has only \(nk + n + k(k+1)/2\) free parameters — far fewer than \(n(n+1)/2\) for the full matrix.

The CAPM (\(k=1\)) and Fama–French 3-factor model (\(k=3\)) are the canonical examples. The factor model structure also implies conditional independence: given the factor realisations \(f\), the asset returns \(r_i\) and \(r_j\) are independent. This is a strong but empirically useful constraint that improves covariance estimation dramatically.

10.4 Risk Parity

Risk parity (also called equal risk contribution) portfolios allocate capital so that each asset contributes equally to total portfolio variance. The risk contribution of asset \(i\) is defined as

\[ RC_i(w) = w_i \cdot \frac{\partial \sigma_p}{\partial w_i} = w_i \cdot \frac{(\Sigma w)_i}{\sqrt{w^T \Sigma w}}. \]

Risk parity requires \(RC_i(w) = \sigma_p / n\) for all \(i\), which after simplification gives the nonlinear system:

\[ w_i (\Sigma w)_i = w_j (\Sigma w)_j \quad \forall i, j. \]

This system is non-convex and cannot be reduced to a QP in general. SQP or alternating projections are used to solve it. In the special case where all pairwise correlations are equal (and all variances are equal), the risk parity portfolio reduces to the equally-weighted portfolio \(w = e/n\). Risk parity portfolios became popular after the 2008 financial crisis because they tend to be less sensitive to equity drawdowns than mean-variance efficient portfolios.

10.5 Machine Learning in Portfolio Optimization

Machine learning intersects portfolio optimization at several levels:

Regularised covariance estimation: The Ledoit–Wolf shrinkage estimator improves on the sample covariance matrix by shrinking it toward a structured target (e.g., a diagonal matrix or the identity):

\[ \hat{\Sigma}_{\mathrm{LW}} = (1 - \rho) \hat{\Sigma}_{\mathrm{sample}} + \rho \hat{\Sigma}_{\mathrm{target}}. \]

The shrinkage coefficient \(\rho\) is chosen optimally (in a mean-squared-error sense) by an analytical formula derived from random matrix theory. Ledoit–Wolf estimation significantly improves the out-of-sample performance of mean-variance portfolios, especially when \(n\) is large relative to \(T\).

Deep learning for return prediction: Recurrent neural networks (LSTMs, Transformers) trained on financial time series can forecast expected returns \(\hat{\mu}\) more accurately than linear models in some regimes. However, the estimation uncertainty in \(\hat{\mu}\) propagates into the portfolio weights, and robust or Bayesian portfolio formulations are needed to avoid over-fitting.

Reinforcement learning for dynamic allocation: The portfolio rebalancing problem of Chapter 8 can be framed as a Markov decision process (MDP) and solved by reinforcement learning (RL) algorithms. The state is the current portfolio and market features; the action is the rebalancing decision; the reward is the risk-adjusted return net of transaction costs. Deep RL (e.g., Proximal Policy Optimization) has shown promise in multi-period portfolio problems where the dynamics are complex and the horizon is long.

These machine learning approaches are not replacements for the optimization models developed in this course — rather, they are tools for estimating the inputs (\(\mu\), \(\Sigma\)) and solving the resulting optimization problems at scale. A rigorous grounding in quadratic programming, KKT theory, and numerical linear algebra remains essential for understanding and deploying these methods reliably.

10.6 Robust Portfolio Optimisation

A recurring criticism of mean-variance optimisation is that the optimal weights are extremely sensitive to small errors in the expected return vector \(\mu\). Indeed, the solution \(w^* = \Sigma^{-1}(\gamma\mu + \delta e)/2\) is a linear function of \(\mu\), but with coefficient matrix \(\Sigma^{-1}\) whose condition number can be very large — small perturbations to \(\mu\) are amplified by \(\kappa(\Sigma)\) in the weights. This is the fundamental instability of Markowitz optimisation in practice.

Robust optimisation addresses this by explicitly modelling uncertainty in the inputs. Suppose \(\mu\) is known only to lie in an uncertainty set \(\mathcal{U}\), for instance an ellipsoidal set:

\[ \mathcal{U} = \{\mu : (\mu - \hat\mu)^T\Theta^{-1}(\mu - \hat\mu) \leq \kappa^2\}, \]

where \(\hat\mu\) is the estimated mean and \(\Theta\) is the estimation covariance (e.g., \(\Theta = \hat\Sigma/T\)). The robust portfolio maximises the worst-case risk-adjusted return:

\[ \max_w \min_{\mu\in\mathcal{U}} \bigl[\mu^Tw - \tfrac\lambda2 w^T\Sigma w\bigr] = \max_w \bigl[\hat\mu^Tw - \kappa\sqrt{w^T\Theta w} - \tfrac\lambda2 w^T\Sigma w\bigr]. \]

The inner minimisation over \(\mu\) has a closed form (by Cauchy-Schwarz on the ellipsoid), yielding a second-order cone programme (SOCP). This is convex and can be solved efficiently; the robust solution typically has a more diversified structure than the nominal Markowitz solution, because concentrating in a single asset creates large exposure to estimation error in that asset’s expected return.

The parameter \(\kappa\) controls the degree of robustness: \(\kappa = 0\) recovers the nominal Markowitz solution, while large \(\kappa\) approaches the minimum-variance portfolio (which is robust to \(\mu\) estimation error since it does not depend on \(\mu\) at all). In practice, \(\kappa\) is calibrated so that \(\mathcal{U}\) contains the true \(\mu\) with 95% probability, using the chi-squared distribution with \(n\) degrees of freedom.

Remark (Connection to Bayesian portfolio theory). The robust optimisation framework above has a direct Bayesian interpretation. If the prior on \(\mu\) is \(\mathcal{N}(\hat\mu, \Theta)\) and the likelihood is uninformative, then the posterior mean is \(\hat\mu\) and the posterior covariance is \(\Theta\). The robust objective with an ellipsoidal uncertainty set is equivalent (under conjugate Gaussian priors) to the Bayes-optimal portfolio that integrates over the posterior distribution of \(\mu\). This connection explains why robust and Bayesian portfolios empirically out-perform the Markowitz solution out-of-sample: both frameworks account for estimation uncertainty in \(\mu\), and the Markowitz solution ignores it entirely. The Black–Litterman model (1990) exploits this Bayesian structure to combine the investor's views with a prior given by the CAPM equilibrium, producing more stable and better-diversified optimal portfolios than either pure Markowitz or pure view-based optimisation.

Chapter 11: The Black–Litterman Model

The Black–Litterman (BL) model, introduced by Fischer Black and Robert Litterman at Goldman Sachs in 1990, resolves a persistent practical failure of mean-variance optimisation: the sensitivity of optimal weights to small errors in estimated expected returns. The core insight is Bayesian: rather than estimating \(\mu\) from historical data alone, the BL model starts from a prior on \(\mu\) derived from the CAPM equilibrium and updates it with the investor’s “views” — subjective forecasts about individual assets or relative performance.

11.1 Equilibrium Returns as a Prior

In the CAPM, if the market portfolio \(w_m\) is on the efficient frontier, then the equilibrium expected excess returns satisfy the pricing equation

\[ \Pi = \delta \Sigma w_m, \]

where \(\delta\) is the aggregate risk-aversion coefficient (typically calibrated to historical data, often \(\delta \approx 2.5\) to 4.0) and \(\Sigma\) is the covariance matrix of excess returns. The vector \(\Pi\) provides the prior mean in the BL model. Under the view that the market is in equilibrium, \(\Pi\) is the best available unconditional estimate of \(\mu\) — it encodes the cross-sectional risk structure of the market without requiring time-series estimation of individual means, which is notoriously noisy.

Definition (Black–Litterman prior): The BL prior on expected returns is \[ \mu \sim \mathcal{N}(\Pi, \tau\Sigma), \] where \(\Pi = \delta\Sigma w_m\) is the implied equilibrium return vector, \(\tau > 0\) is a scalar reflecting uncertainty in the prior (typically \(\tau \in [0.01, 0.10]\)), and \(\Sigma\) is the asset covariance matrix. The factor \(\tau\Sigma\) expresses that our uncertainty about \(\mu\) has the same correlation structure as the asset returns themselves.

11.2 Investor Views as a Likelihood

An investor’s views are \(k\) statements about the expected returns of portfolios of assets. Each view is expressed as a linear combination of expected returns:

\[ P\mu = q + \varepsilon_v, \qquad \varepsilon_v \sim \mathcal{N}(0, \Omega), \]

where \(P \in \mathbb{R}^{k \times n}\) is the pick matrix (each row encodes a view portfolio), \(q \in \mathbb{R}^k\) is the vector of view forecasts, and \(\Omega = \mathrm{diag}(\omega_1, \ldots, \omega_k)\) is the diagonal view uncertainty matrix (smaller \(\omega_i\) means higher confidence in view \(i\)).

Two types of views arise frequently:

  • Absolute view: “Asset \(i\) will return 5% per annum.” Here \(P\) has a single 1 in column \(i\) and \(q_i = 0.05\).
  • Relative view: “Asset \(i\) will outperform asset \(j\) by 3%.” Here \(P\) has +1 in column \(i\), -1 in column \(j\), and \(q = 0.03\).

11.3 Bayesian Posterior Update

With a Gaussian prior \(\mu \sim \mathcal{N}(\Pi, \tau\Sigma)\) and Gaussian likelihood \(P\mu | \mu \sim \mathcal{N}(q, \Omega)\), the posterior is also Gaussian:

Theorem (Black–Litterman posterior): The posterior distribution of \(\mu\) given the views is \[ \mu | \text{views} \sim \mathcal{N}(\hat\mu_{\text{BL}}, M^{-1}), \] where \[ M = (\tau\Sigma)^{-1} + P^T\Omega^{-1}P, \qquad \hat\mu_{\text{BL}} = M^{-1}\bigl[(\tau\Sigma)^{-1}\Pi + P^T\Omega^{-1}q\bigr]. \]

Equivalently, using the matrix inversion lemma,

\[ \hat\mu_{\text{BL}} = \Pi + \tau\Sigma P^T(P\tau\Sigma P^T + \Omega)^{-1}(q - P\Pi). \]

The intuition is transparent: \(\hat\mu_{\text{BL}}\) is a weighted average of the prior mean \(\Pi\) and the views \(q\), with weights determined by relative precision. When all view uncertainties \(\omega_i \to \infty\), \(\hat\mu_{\text{BL}} \to \Pi\) (pure CAPM prior). When \(\tau \to 0\) (very sharp prior), the views have minimal influence. The BL posterior mean is then used in place of the historical \(\hat\mu\) to compute the Markowitz optimal portfolio.

11.4 Numerical Example

Example (BL model with 3 assets and 2 views): Consider a universe of 3 assets with \(\delta = 3\), market weights \(w_m = (0.4, 0.4, 0.2)^T\), covariance matrix \[ \Sigma = \begin{pmatrix}0.04&0.02&0.01\\0.02&0.09&0.03\\0.01&0.03&0.06\end{pmatrix}, \] and \(\tau = 0.05\). \[ \Pi = \delta\Sigma w_m = 3 \begin{pmatrix}0.04&0.02&0.01\\0.02&0.09&0.03\\0.01&0.03&0.06\end{pmatrix}\begin{pmatrix}0.4\\0.4\\0.2\end{pmatrix} = 3\begin{pmatrix}0.026\\0.050\\0.022\end{pmatrix} = \begin{pmatrix}0.078\\0.150\\0.066\end{pmatrix}. \]

Step 2: Investor views. View 1 (absolute): “Asset 1 returns 10%”, so row 1 of \(P\) is \((1,0,0)\), \(q_1 = 0.10\). View 2 (relative): “Asset 2 outperforms Asset 3 by 5%”, so row 2 of \(P\) is \((0,1,-1)\), \(q_2 = 0.05\). Confidence: \(\Omega = \mathrm{diag}(0.001, 0.002)\).

Step 3: Posterior mean. The correction term \(q - P\Pi = (0.10 - 0.078,\; 0.05 - (0.150-0.066))^T = (0.022,\; -0.034)^T\). The posterior mean blends the equilibrium prior toward view 1 (raising asset 1’s expected return toward 10%) and away from view 2’s implied difference (0.084 vs. the viewed 0.05).

Step 4: Markowitz optimisation on \(\hat\mu_{\text{BL}}\).} Use \(\hat\mu_{\text{BL}}\) in place of \(\hat\mu\) in the standard portfolio optimisation. The resulting weights are significantly more balanced than naive Markowitz on \(\hat\mu\) and less sensitive to small changes in the views.

Remark (Why BL outperforms in practice): The core failure of Markowitz in practice is that estimated expected returns \(\hat\mu\) are extremely noisy — the standard error of a sample mean is \(\sigma/\sqrt{T}\), and with monthly data and \(T = 60\) months this is roughly 40% of the asset's standard deviation. BL mitigates this by anchoring \(\mu\) to the equilibrium prior, which is more stable than time-series estimates. Empirical studies (He and Litterman 1999; Idzorek 2005) show that BL portfolios have lower turnover, better out-of-sample Sharpe ratios, and fewer extreme weights than Markowitz portfolios calibrated directly from historical data. The model is now standard practice at major asset managers.

Chapter 12: Performance Attribution

Portfolio performance attribution is the analytical framework for decomposing a portfolio’s return relative to a benchmark into components attributable to active decisions: asset allocation, security selection, and interaction effects. The Brinson–Hood–Beebower (BHB) model (1986) is the canonical framework.

12.1 The BHB Attribution Model

Divide the investment universe into \(K\) sectors (e.g., equities, fixed income, real estate). Let:

  • \(w_{b,k}\): benchmark weight in sector \(k\)
  • \(w_{p,k}\): portfolio weight in sector \(k\)
  • \(R_{b,k}\): benchmark return for sector \(k\)
  • \(R_{p,k}\): portfolio return for sector \(k\)
  • \(R_b = \sum_k w_{b,k} R_{b,k}\): total benchmark return

The total active return is \(R_p - R_b = \sum_k (w_{p,k}R_{p,k} - w_{b,k}R_{b,k})\). BHB decomposes this as:

Definition (BHB Attribution Decomposition): The active return decomposes as: \[ R_p - R_b = \underbrace{\sum_k (w_{p,k} - w_{b,k}) R_{b,k}}_{\text{Allocation effect}} + \underbrace{\sum_k w_{b,k}(R_{p,k} - R_{b,k})}_{\text{Selection effect}} + \underbrace{\sum_k (w_{p,k} - w_{b,k})(R_{p,k} - R_{b,k})}_{\text{Interaction effect}}. \]
  • Allocation effect: profit from overweighting sectors that outperformed (tilting \(w_p\) away from \(w_b\) toward sectors with \(R_{b,k} > R_b\)).
  • Selection effect: profit from picking securities within a sector better than the benchmark.
  • Interaction effect: combined effect of both allocation and selection (often small in practice and sometimes merged with selection).
Example (Two-sector attribution): Consider a portfolio with equities (\(k=1\)) and bonds (\(k=2\)). Benchmark weights: \(w_{b,1} = 0.6\), \(w_{b,2} = 0.4\). Portfolio weights: \(w_{p,1} = 0.7\), \(w_{p,2} = 0.3\). Returns: \(R_{b,1} = 8\%\), \(R_{b,2} = 3\%\), \(R_{p,1} = 9\%\), \(R_{p,2} = 2.5\%\).

Benchmark return: \(R_b = 0.6\times8\% + 0.4\times3\% = 6\%\). Portfolio return: \(R_p = 0.7\times9\% + 0.3\times2.5\% = 6.75\%\). Active return: \(0.75\%\).

Allocation effect: \((0.7-0.6)\times8\% + (0.3-0.4)\times3\% = 0.8\% - 0.3\% = 0.5\%\). (Overweighting the outperforming equity sector was beneficial.)

Selection effect: \(0.6\times(9\%-8\%) + 0.4\times(2.5\%-3\%) = 0.6\% - 0.2\% = 0.4\%\). (Better equity selection, worse bond selection; net positive.)

Interaction effect: \((0.1)\times(1\%) + (-0.1)\times(-0.5\%) = 0.1\% + 0.05\% = 0.15\%\).

Check: \(0.5\% + 0.4\% - 0.15\% = 0.75\%\). ✓ (The interaction has a sign that depends on whether the overweighted sector was also the one with positive selection.)

12.2 Risk-Adjusted Performance Metrics

Definition (Sharpe ratio): The Sharpe ratio of a portfolio is \[ S = \frac{\mu_p - r_f}{\sigma_p}, \] where \(r_f\) is the risk-free rate, \(\mu_p\) the portfolio's expected return, and \(\sigma_p\) its standard deviation. It measures excess return per unit of total risk. The Sharpe ratio is scale-invariant: leveraging a portfolio does not change its Sharpe ratio.
Definition (Information ratio): The information ratio (IR) of a portfolio relative to a benchmark is \[ \text{IR} = \frac{\alpha}{\sigma_\alpha} = \frac{R_p - R_b}{\text{Tracking Error}}, \] where \(\alpha = \mathbb{E}[R_p - R_b]\) is the expected active return and \(\sigma_\alpha = \sqrt{\mathrm{Var}(R_p - R_b)}\) is the tracking error (standard deviation of active returns). IR measures active return per unit of active risk. Grinold's fundamental law of active management states \(\text{IR} \approx \text{IC} \times \sqrt{N}\), where IC is the information coefficient (correlation of forecasts with outcomes) and \(N\) is the number of independent bets per period.
Remark (Fundamental Law of Active Management): Grinold's law \(\text{IR} \approx \text{IC}\sqrt{N}\) makes the arithmetic of alpha generation precise. A manager with IC = 0.05 (5% forecast accuracy — very good in practice) and \(N = 100\) independent bets per year achieves IR = \(0.05\sqrt{100} = 0.5\). This is considered excellent: an IR above 0.5 is in the top quartile of active managers. The implication is stark: the path to high IR is more through breadth (\(\sqrt{N}\)) than through depth (IC). A wide diversified strategy with modest IC easily outperforms a concentrated strategy with high IC, all else equal. This is the mathematical rationale for quantitative, systematic investing with many small bets.

Chapter 13: Empirical Factor Models

The Markowitz framework treats the covariance matrix \(\Sigma\) as a given input, but in practice \(\Sigma\) must be estimated from data. When the number of assets \(n\) approaches or exceeds the number of time-series observations \(T\), the sample covariance matrix is ill-conditioned or singular. Empirical factor models address this by explaining cross-sectional covariation through a small number of observable risk factors, dramatically reducing the number of free parameters.

13.1 Fama–French Three-Factor Model

Fama and French (1992, 1993) showed empirically that the cross-section of expected stock returns is better explained by three factors than by the single market factor of the CAPM:

Definition (Fama–French three-factor model): The excess return of stock \(i\) in period \(t\) is \[ r_{i,t} - r_{f,t} = \alpha_i + \beta_i^{MKT}(r_{m,t} - r_{f,t}) + \beta_i^{SMB} \cdot \text{SMB}_t + \beta_i^{HML} \cdot \text{HML}_t + \varepsilon_{i,t}, \] where:
  • MKT: market excess return \(r_m - r_f\) (systematic market risk)
  • SMB (Small Minus Big): return of small-cap stocks minus large-cap stocks (size factor)
  • HML (High Minus Low): return of value stocks (high book-to-market) minus growth stocks (size factor)
  • \(\alpha_i\): Jensen's alpha — the risk-adjusted excess return of stock \(i\) unexplained by the three factors

The three-factor model explains approximately 90–95% of diversified portfolio return variation, versus 70–80% for the single-factor CAPM. The implied covariance matrix is

\[ \Sigma = B F B^T + D, \]

where \(B \in \mathbb{R}^{n \times 3}\) is the factor loading matrix, \(F \in \mathbb{R}^{3 \times 3}\) is the factor covariance matrix, and \(D = \mathrm{diag}(\sigma_{\varepsilon_1}^2, \ldots, \sigma_{\varepsilon_n}^2)\) collects idiosyncratic variances. This structured covariance has only \(3n + 3(3+1)/2 = 3n + 6\) free parameters versus \(n(n+1)/2\) for the full matrix.

13.2 Carhart Four-Factor Model

Carhart (1997) extended Fama–French by adding a momentum factor:

Definition (Momentum factor, Carhart): The UMD (Up Minus Down) factor is the return of stocks that performed best over the past 12 months (excluding the most recent month, to avoid short-term reversal) minus the return of stocks that performed worst. Adding UMD to the Fama–French model gives the four-factor model: \[ r_{i,t} - r_{f,t} = \alpha_i + \beta_i^{MKT}\cdot\text{MKT}_t + \beta_i^{SMB}\cdot\text{SMB}_t + \beta_i^{HML}\cdot\text{HML}_t + \beta_i^{UMD}\cdot\text{UMD}_t + \varepsilon_{i,t}. \]

Momentum (stocks that went up tend to keep going up over 3–12 months) is one of the most robust and extensively documented anomalies in asset pricing. Its persistence has been attributed to behavioural factors (investor underreaction to information), though momentum strategies are exposed to severe crashes during market reversals (e.g., momentum crashed -70% in 2009).

13.3 Estimating Factor Loadings by OLS

Given \(T\) time-series observations, the loadings \((\alpha_i, \beta_i^{MKT}, \beta_i^{SMB}, \beta_i^{HML})\) are estimated by ordinary least squares for each stock \(i\):

Example (OLS factor regression for one stock): Suppose we have \(T = 60\) monthly observations for stock AAPL. The data matrix is \(X \in \mathbb{R}^{60 \times 4}\) with columns \((1, \text{MKT}_t, \text{SMB}_t, \text{HML}_t)\). The response vector is \(y_i \in \mathbb{R}^{60}\) of AAPL excess returns. OLS gives \(\hat\beta_i = (X^TX)^{-1}X^T y_i\). Using Cholesky on \(X^TX\) (which is \(4\times4\)) makes this essentially free computationally. The residual \(\hat\varepsilon_i = y_i - X\hat\beta_i\) gives the idiosyncratic variance estimate \(\hat\sigma_{\varepsilon_i}^2 = \|\hat\varepsilon_i\|^2 / (T-4)\).

For AAPL over 2019–2023: typical estimates are \(\hat\beta^{MKT} \approx 1.20\) (high systematic risk), \(\hat\beta^{SMB} \approx -0.25\) (large-cap tilt — negative SMB loading), \(\hat\beta^{HML} \approx -0.40\) (growth stock — negative HML loading), \(\hat\alpha \approx 0.004\) per month (outperformance). These loadings connect AAPL to the factor model: its high market beta makes it sensitive to broad market swings, while its negative HML loading reflects the growth premium.


Chapter 14: Covariance Matrix Estimation and Shrinkage

Estimating the covariance matrix \(\Sigma\) from historical return data is one of the most delicate steps in portfolio optimisation. For a universe of \(n\) assets, the sample covariance matrix \(\hat\Sigma\) requires estimating \(n(n+1)/2\) parameters from \(T\) observations. When \(T < n\), the matrix is singular; even when \(T\gg n\), the sample eigenvalues are severely biased (Marčenko–Pastur theory).

14.1 The Random Matrix Theory Perspective

The Marčenko–Pastur law (1967) characterises the limiting eigenvalue distribution of the sample covariance matrix when \(\Sigma = I\) and \(n, T \to \infty\) with \(n/T \to c \in (0,1)\):

Theorem (Marčenko–Pastur law): If asset returns are i.i.d. with \(\mathbb{E}[r_i] = 0\) and \(\mathrm{Var}(r_i) = \sigma^2\), and the ratio \(c = n/T \in (0,1)\), then the empirical spectral distribution of \(\hat\Sigma\) converges to the Marčenko–Pastur distribution with density \[ f(\lambda) = \frac{\sqrt{(\lambda_+ - \lambda)(\lambda - \lambda_-)}}{2\pi\sigma^2 c\lambda}, \quad \lambda \in [\lambda_-, \lambda_+], \] where \(\lambda_\pm = \sigma^2(1\pm\sqrt{c})^2\). The bulk of eigenvalues lies in \([\lambda_-, \lambda_+]\); eigenvalues outside this range correspond to genuine signals rather than noise.

For \(c = 0.5\) (half as many observations as assets, common with monthly data), the eigenvalue support is \([(\sqrt{2}-1)^2, (\sqrt{2}+1)^2]\sigma^2 \approx [0.17\sigma^2, 5.83\sigma^2]\). A sample eigenvalue of 5 (in \(\sigma^2\) units) is still within the noise bulk — no information. This explains why portfolios optimised on \(\hat\Sigma\) directly perform poorly out-of-sample: the optimiser mistakes noise eigenvalues for genuine structure.

14.2 Ledoit–Wolf Linear Shrinkage

The Ledoit–Wolf (LW) estimator (2004) shrinks the sample covariance toward a structured target \(\hat\Sigma_{\text{target}}\):

Definition (Ledoit–Wolf shrinkage estimator): The LW estimator is \[ \hat\Sigma_{\text{LW}} = (1-\rho)\hat\Sigma + \rho\hat\Sigma_{\text{target}}, \] where \(\rho \in [0,1]\) is the shrinkage intensity. Two common targets:
  • Identity target: \(\hat\Sigma_{\text{target}} = \bar\sigma^2 I\) (assumes all assets equally volatile and uncorrelated). Best when there is no prior structural information.
  • Constant correlation target: all pairwise correlations equal the average sample correlation \(\bar\rho\). More realistic for equity universes.
The optimal \(\rho^*\) minimises the expected Frobenius loss \(\mathbb{E}\|\hat\Sigma_{\text{LW}} - \Sigma\|_F^2\) and is given analytically (no cross-validation needed). The formula involves moments of the spectral distribution of \(\hat\Sigma\) and is computable in \(O(n^2)\) time.
Example (Effect of shrinkage on portfolio weights): Consider \(n = 100\) assets and \(T = 60\) months. Without shrinkage (\(\rho = 0\)), the sample covariance produces unstable optimal weights — a single outlier asset receives a very large positive or negative weight. With LW shrinkage (\(\rho^* \approx 0.3\) is typical), the eigenvalues of \(\hat\Sigma_{\text{LW}}\) are compressed toward a common value, the condition number \(\kappa(\hat\Sigma_{\text{LW}})\) decreases by a factor of roughly 5–10, and the resulting optimal portfolio weights are much more stable and diversified. Out-of-sample, the LW portfolio typically achieves 10–30% lower portfolio volatility than the Markowitz portfolio on \(\hat\Sigma\), with similar or higher expected return.

14.3 Nonlinear Shrinkage

Ledoit and Wolf (2022) propose a nonlinear shrinkage estimator that individually rescales each sample eigenvalue \(\hat\lambda_i\) to its optimal shrinkage target, rather than applying a single global coefficient \(\rho\). The Oracle estimator (which requires knowing the true \(\Sigma\)) replaces each \(\hat\lambda_i\) with the optimal value that minimises out-of-sample loss. The Oracle cannot be computed directly, but its limit — as \(n, T \to \infty\) with \(n/T \to c\) — is characterised via the Stieltjes transform of the Marčenko–Pastur distribution, and is consistently estimated by the analytical nonlinear shrinkage formula. Empirically, nonlinear shrinkage reduces out-of-sample portfolio variance by an additional 5–15% beyond linear shrinkage.


Chapter 15: Backtesting and Out-of-Sample Evaluation

A portfolio strategy that looks excellent in hindsight (in-sample) but fails out-of-sample has not been validated. Backtesting — simulating the strategy on historical data not used to calibrate it — is the primary tool for evaluating strategies before deployment. However, backtesting is riddled with pitfalls that can produce misleadingly optimistic results.

15.1 Walk-Forward Testing

The walk-forward (or rolling window) methodology is the gold standard for backtesting portfolio strategies:

Definition (Walk-forward backtest): Fix a calibration window of length \(W\) (e.g., 60 months) and an evaluation window of length \(H\) (e.g., 1 month). At each time \(t\):
  1. Estimate \(\hat\mu\) and \(\hat\Sigma\) using data from \([t-W, t)\).
  2. Compute the optimal portfolio weights \(w_t^*\) on the estimated parameters.
  3. Record the realised return of \(w_t^*\) during \([t, t+H)\).
Move forward by \(H\) periods and repeat. The sequence of realised returns \(\{R_t\}\) constitutes the backtest track record. Performance metrics (Sharpe ratio, maximum drawdown, information ratio) are computed on \(\{R_t\}\).

The key property is that at each step, the portfolio is calibrated only on past data — no future information enters the calibration. This avoids look-ahead bias, the most common form of backtest inflation.

15.2 Common Backtesting Biases

Remark (Backtesting biases and mitigations):

Survivorship bias: If the asset universe includes only stocks currently listed (ignoring delisted stocks), the backtest excludes the worst performers — those that went bankrupt or were acquired. Mitigation: use a point-in-time database that includes historical constituents of an index, not just current ones. CRSP (Centre for Research in Security Prices) and Compustat are the gold-standard sources for survivorship-bias-free US equity data.

Look-ahead bias: Using data that would not have been available at the time of the portfolio decision (e.g., revised earnings, post-close prices for intraday decisions). Mitigation: always use only data available strictly before \(t\) when computing \(w_t^*\). For accounting data, use the originally announced figures, not revisions.

Overfitting/data snooping bias: Testing many strategies and reporting only the best is equivalent to mining for spurious patterns. A strategy with Sharpe ratio 1.0 that was selected from 100 independent candidate strategies has a true expected Sharpe of roughly 0 — the observed 1.0 is the maximum of 100 noise realisations. The Bonferroni correction and, more formally, the False Discovery Rate (FDR) framework (Benjamini-Hochberg) provide statistical corrections. Harvey, Liu, and Zhu (2016) argue that the required \(t\)-statistic for a new factor in finance should be at least 3.0 (not the conventional 1.96) to account for the multiple-testing problem.

Transaction cost neglect: Mean-variance optimal portfolios often have high turnover — sometimes 100%+ per month. If transaction costs are 0.1% per trade and turnover is 10% per month, costs consume 0.2% per month, or about 2.4% per annum — often exceeding the alpha being generated.


Chapter 16: The Kelly Criterion and Log-Optimal Portfolios

The Kelly criterion, introduced by John Kelly Jr. in 1956 (originally in the context of information theory and gambling), provides an alternative optimality criterion to mean-variance: it maximises the long-run growth rate of wealth, rather than the expected return for a given level of variance.

16.1 Kelly for a Single Bet

Consider a gambler who bets a fraction \(f\) of their wealth on a game that pays \(b:1\) with probability \(p\) and loses the stake with probability \(1-p\). After \(T\) rounds, the wealth is:

\[ W_T = W_0 \cdot (1+bf)^{N_+}(1-f)^{N_-}, \]

where \(N_+, N_-\) are the number of wins and losses with \(N_+ + N_- = T\). The long-run growth rate is:

\[ G(f) = \lim_{T\to\infty} \frac{\ln W_T}{T} = p\ln(1+bf) + (1-p)\ln(1-f) \quad \text{a.s. (by LLN).} \]

Maximising \(G(f)\) over \(f \in [0,1]\):

Theorem (Kelly criterion for a single bet): The fraction \(f^*\) that maximises the long-run growth rate is \[ f^* = \frac{p(b+1) - 1}{b} = \frac{pb - (1-p)}{b} = p - \frac{1-p}{b}. \] This is the expected gain per unit bet divided by the net payoff ratio. Betting more than \(f^*\) (overbetting) reduces long-run growth; in the limit \(f = 1\) (betting everything) leads to ruin in finite time almost surely if \(1-p > 0\).
Example (Kelly fraction for a favourable coin flip): A biased coin shows heads with probability \(p = 0.6\). The payout is even-money: win 1 dollar or lose 1 dollar (\(b = 1\)).

Kelly fraction: \(f^* = p - (1-p)/b = 0.6 - 0.4/1 = 0.20\). Bet 20% of wealth on each flip. Expected value of \(\ln(1 + f^*)\): \(G(f^*) = 0.6\ln(1.2) + 0.4\ln(0.8) = 0.6(0.182) + 0.4(-0.223) = 0.109 - 0.089 = 0.020\) per bet, or about 2% growth per round. Overbetting to \(f = 0.5\) gives \(G = 0.6\ln(1.5) + 0.4\ln(0.5) = 0.243 - 0.277 = -0.034\) — the portfolio shrinks on average despite positive edge! This illustrates the fundamental asymmetry between arithmetic and geometric means.

16.2 Kelly for Continuous-Time Portfolio

For a portfolio of \(n\) risky assets with log-normal returns and a risk-free asset, the continuous-time Kelly portfolio maximises the drift of log-wealth:

Theorem (Log-optimal portfolio): If asset returns follow a multi-dimensional geometric Brownian motion with drift vector \(\mu\) and covariance matrix \(\Sigma\), the portfolio that maximises \(\mathbb{E}[\ln W_T]\) (for any \(T\)) invests in risky assets with weights \[ w_{\text{Kelly}} = \Sigma^{-1}(\mu - r_f \mathbf{e}). \] This portfolio is proportional to the mean-variance tangency portfolio, with the risk-free rate as benchmark.

The Kelly portfolio is identical in direction to the tangency portfolio on the efficient frontier — it is the portfolio with the maximum Sharpe ratio. However, it is typically much more leveraged than a risk-averse investor would accept. In practice, fractional Kelly (e.g., investing half the Kelly amount, \(w = f^* w_{\text{Kelly}}\) with \(f = 0.5\)) is used to reduce variance at the cost of slightly lower long-run growth. Half-Kelly has half the variance of full Kelly but achieves about 75% of the long-run growth.


Chapter 17: Stress Testing and Scenario Analysis

Mean-variance optimisation treats risk as variance — a symmetric, backward-looking measure calibrated from historical returns. Stress testing complements this by asking: what is the portfolio’s behaviour under extreme scenarios that may not appear in historical data? This is particularly important because large losses are correlated across assets during crises (the correlation structure of returns during the 2008 financial crisis was dramatically different from the pre-crisis period).

17.1 Historical Scenario Analysis

The simplest stress test applies a set of historical stress events directly to the portfolio:

Definition (Historical scenario P&L): Given a portfolio with weights \(w\) and a stress event characterised by an \(n\)-vector of asset return shocks \(\Delta r\) (drawn from a specific historical episode), the portfolio's stressed profit-and-loss is \[ \Delta P = w^T \Delta r. \]

Standard historical stress scenarios used in practice include: Black Monday (October 1987, equity index fell 22% in one day), the LTCM crisis (August-September 1998, credit and liquidity spreads widened sharply), the dot-com crash (2000-2002), the global financial crisis (September-December 2008), COVID-19 (February-March 2020).

Example (Stress test for a balanced portfolio): Consider a portfolio with 60% equities, 30% fixed income (10-year Treasury), and 10% gold. Applying the August-October 2008 stress scenario: equities fell -40%, Treasuries rose +8% (flight to quality), gold fell -10%. Stressed P&L: \(0.60\times(-0.40) + 0.30\times(0.08) + 0.10\times(-0.10) = -0.24 + 0.024 - 0.01 = -22.6\%\). The Treasury allocation partially offsets equity losses — the classic diversification benefit of a 60/40 portfolio. For comparison, a 90% equity / 10% cash portfolio would have lost \(0.90\times(-0.40) = -36\%\) in the same scenario.

17.2 Factor-Based Stress Testing

For large portfolios with hundreds of assets, full-revaluation stress tests are computationally expensive. Factor-based methods project the portfolio onto a small number of risk factors and apply shocks at the factor level:

\[ \Delta P \approx w^T B \Delta f + w^T \Delta\varepsilon, \]

where \(\Delta f \in \mathbb{R}^k\) is the vector of factor shocks and the idiosyncratic term \(w^T \Delta\varepsilon\) is typically treated as zero for diversified portfolios (by the law of large numbers, idiosyncratic shocks average out across many assets). This reduces a \(n\)-dimensional stress test to a \(k\)-dimensional one, enabling real-time scenario generation.

Remark (Regulatory stress testing — Basel III and DFAST): Banks and large investment managers are subject to mandatory stress testing under Basel III (global) and Dodd-Frank Act Stress Tests (DFAST, US). The Federal Reserve specifies adverse and severely adverse macroeconomic scenarios — including GDP contraction, unemployment spikes, and credit spread widening — and requires institutions to demonstrate that their capital buffers are sufficient to absorb the resulting losses. The quantitative portfolio optimisation methods of this course (risk measures, factor models, VaR/CVaR) are directly embedded in the stress testing frameworks used in regulatory compliance. The 2010 Basel III reforms significantly tightened the capital requirements for trading portfolios after it became clear that pre-crisis VaR models dramatically underestimated tail losses.

Chapter 18: Portfolio Insurance and Dynamic Strategies

Static mean-variance portfolios rebalance periodically to a fixed target allocation. Dynamic strategies alter the portfolio allocation in response to market conditions or wealth levels. Portfolio insurance is the canonical example: a strategy designed to guarantee a minimum portfolio value while participating in upside.

18.1 Constant Proportion Portfolio Insurance (CPPI)

CPPI (Perold and Sharpe, 1988) dynamically adjusts the allocation between a risky asset and a risk-free asset to ensure the portfolio value never falls below a floor \(F\):

Definition (CPPI strategy): At each rebalancing date, define the cushion \(C_t = W_t - F\) (the excess of portfolio value over the floor). The CPPI strategy invests a fraction \(m \cdot C_t\) in the risky asset (where \(m > 1\) is the multiplier) and the remainder \(W_t - m\cdot C_t\) in the risk-free asset. The risky asset allocation is: \[ E_t = m(W_t - F) = m \cdot C_t, \quad B_t = W_t - E_t. \]

The CPPI guarantee holds in continuous time (if the risky asset cannot gap below \(F\)), since the risky allocation automatically falls to zero as \(W_t \to F\). In discrete time, a sudden large drop in the risky asset can breach the floor — the so-called gap risk.

Example (CPPI with multiplier \(m=3\) and floor 80%): Start with \(W_0 = 100\), \(F = 80\), \(m = 3\). Initial cushion \(C_0 = 20\); risky allocation \(E_0 = 3\times20 = 60\); bond allocation \(B_0 = 40\).

After month 1, the risky asset rises 10%: \(E_1 = 66\), \(B_1\) grows by the risk-free rate (say 0.3%): \(B_1 \approx 40.12\). Total \(W_1 \approx 106.12\). Rebalance: cushion \(C_1 = 26.12\); \(E_1^\text{new} = 3\times26.12 = 78.36\); \(B_1^\text{new} = 27.76\). The rising cushion leads to buying more risky assets — a momentum-like behaviour.

If instead the risky asset falls 15% in month 1: \(E_1 = 51\), \(W_1 \approx 91.12\). Cushion \(C_1 = 11.12\); \(E_1^\text{new} = 33.36\); \(B_1^\text{new} = 57.76\). The falling cushion triggers de-risking — the strategy automatically reduces risky exposure, protecting the floor.

18.2 Comparison: Buy-and-Hold, Fixed-Mix, and CPPI

Remark (Pay-off profiles and market regimes): Different dynamic strategies have fundamentally different pay-off profiles in different market regimes.

Buy-and-hold: Allocation drifts with market prices. Pay-off is concave (benefits from trending markets, hurts in volatile/mean-reverting markets). No guarantee on minimum value.

Fixed-mix (constant-mix): Rebalance to a fixed target allocation. Pay-off is also concave in trending markets but excels in mean-reverting markets (systematic buy-low, sell-high). No floor guarantee.

CPPI: Pay-off is convex — it participates in upside (rising risky asset increases cushion and hence allocation) and protects against downside (falling cushion reduces allocation). The cost of convexity: CPPI underperforms fixed-mix in mean-reverting markets because it buys high and sells low. The multiplier \(m\) controls the convexity: higher \(m\) means more participation in upside but higher gap risk in crashes.

Options-based portfolio insurance (OBPI)}: Hold the risky asset plus a put option struck at the floor. The put option’s premium is the explicit cost of insurance. CPPI approximates OBPI dynamically (CPPI is a delta-hedging replication of a path-dependent option), but requires continuous rebalancing and has gap risk that OBPI avoids.

18.3 Lifecycle Investing and Target-Date Funds

A practical application of dynamic portfolio theory is the lifecycle fund (also called a target-date fund). The dominant insight from lifecycle investing theory (Samuelson 1969, Merton 1971, Ayres-Nalebuff 2013) is that young investors should hold substantially more equity than their current wealth level alone would suggest, because their future labour income constitutes a large risk-free bond. As investors age and the present value of future labour income falls, the optimal risky asset allocation decreases.

Target-date funds operationalise this by specifying a glide path: a schedule of equity allocation declining from ~90% at age 25 to ~40% at retirement. The mathematics: if human capital (PV of future labour income) is modelled as a risk-free asset of value \(HC_t\), the optimal equity fraction of total wealth (financial + human) is constant at the mean-variance optimal level, but the equity fraction of financial wealth is

\[ w_{\text{equity}}^{\text{financial}} = w^* \cdot \frac{W_t + HC_t}{W_t}, \]

which decreases as \(HC_t\) shrinks relative to \(W_t\) with age. This is the mathematical derivation of the rule of thumb “hold (100 - age)% in equities.”


Chapter 19: Numerical Methods and Software

Portfolio optimisation in practice requires efficient computation on large asset universes (hundreds to thousands of assets). This chapter surveys the main numerical building blocks and the MATLAB toolbox routines used throughout the course.

19.1 Solving Quadratic Programmes

The mean-variance problem is a quadratic programme (QP) — a special case of a convex optimisation problem. MATLAB’s quadprog function solves:

\[ \min_w \; \frac{1}{2}w^T H w + f^T w \quad \text{s.t.} \quad A_{\text{ub}}w \leq b_{\text{ub}}, \quad A_{\text{eq}}w = b_{\text{eq}}, \quad \ell \leq w \leq u. \]

For the Markowitz problem with target return \(\bar{R}\): set \(H = 2\Sigma\), \(f = 0\), \(A_{\text{eq}} = [\mu^T; e^T]\), \(b_{\text{eq}} = [\bar{R}; 1]\). With non-negativity constraints: \(\ell = 0\), \(u = +\infty\). MATLAB’s solver uses an interior-point or active-set method depending on the problem structure.

Example (MATLAB code for the efficient frontier): The following MATLAB pseudocode computes \(K\) points on the efficient frontier:
% Inputs: mu (n×1 expected returns), Sigma (n×n covariance), rf (risk-free rate)
n = length(mu);
R_min = min(mu); R_max = max(mu);
R_grid = linspace(R_min, R_max, K);
w_frontier = zeros(n, K);

for k = 1:K
    Aeq = [mu'; ones(1,n)]; beq = [R_grid(k); 1];
    lb = zeros(n,1);  % no short-selling
    w_frontier(:,k) = quadprog(2*Sigma, zeros(n,1), [], [], Aeq, beq, lb);
end

sig_frontier = sqrt(diag(w_frontier' * Sigma * w_frontier));
plot(sig_frontier, R_grid);

For the unconstrained case (short-selling allowed), the KKT system can be solved directly via [Sigma mu; mu' 0; ones(1,n) 0 0] \ [zeros(n,1); R_grid(k); 1] using MATLAB’s backslash, which is faster than calling quadprog.

19.2 Complexity and Scalability

OperationComplexityComment
Cholesky factorisation of \(\Sigma\)\(O(n^3)\)Done once per rebalance
Compute \(\Sigma^{-1}\mu\), \(\Sigma^{-1}e\)\(O(n^2)\)Given Cholesky
Compute scalars \(A, B, C, D\)\(O(n)\)After vectors
Single efficient frontier point\(O(n^2)\)Full KKT solve
Factor model \(\Sigma = BFB^T + D\)\(O(nk^2)\)Woodbury inversion
quadprog on \(n\)-asset QP\(O(n^3)\)Worst case, interior point
quadprog with sparsity\(O(n^{1.5})\) to \(O(n^2)\)Sparse Cholesky exploited

For \(n = 500\) assets (a typical institutional equity universe), a single efficient frontier point takes under 1 second in MATLAB on a modern laptop. For \(n = 5000\) (large quantitative fund), factor model methods (\(k = 30\) factors) reduce the Woodbury system to a \(30 \times 30\) solve, enabling real-time frontier computation. The bottleneck shifts from the optimisation itself to data cleaning, factor estimation, and transaction cost modelling.

19.3 Convex Optimisation Toolboxes

Beyond quadprog, the discipline of convex optimisation offers powerful modelling languages that accept a wide class of portfolio constraints and objectives:

  • CVX (MATLAB) and CVXPY (Python): Domain-specific languages for convex optimisation. Constraints like L1 transaction costs, CVaR constraints, and cardinality constraints (if relaxed) can be stated naturally. CVX translates the problem to a standard cone programme (LP, QP, SOCP, or SDP) and calls an interior-point solver (SDPT3, SeDuMi, Mosek).
  • OSQP: An open-source QP solver optimised for embedded and real-time applications, particularly efficient for large sparse QPs arising in portfolio problems with many constraints.
  • Gurobi/CPLEX: Commercial mixed-integer QP solvers for portfolio problems with discrete constraints (minimum lot sizes, cardinality constraints, integer weights for bond portfolios).

The correct choice of solver depends on the problem structure: quadprog for simple unconstrained or lightly constrained frontier problems; CVX/CVXPY for rapid prototyping with complex constraints; Gurobi/CPLEX for production-grade optimisation with integer constraints and tight performance requirements.


Chapter 20: Duality and the Fund Separation Theorems

The KKT duality theory developed in Chapter 4 has a beautiful geometric interpretation in portfolio theory: the dual variables (Lagrange multipliers) correspond to the shadow prices of the constraints, and duality gives rise to the classical fund separation theorems.

20.1 Strong Duality for the Markowitz QP

Recall the primal problem (Formulation I):

\[ p^* = \min_{w} \; w^T\Sigma w \quad \text{s.t.} \quad \mu^T w = \bar{R},\; e^T w = 1. \]

The Lagrangian is \(\mathcal{L}(w,\gamma,\delta) = w^T\Sigma w - \gamma(\mu^T w - \bar{R}) - \delta(e^T w - 1)\). Since the objective is strictly convex (when \(\Sigma \succ 0\)) and the constraints are linear, strong duality holds: the dual optimal value \(d^* = p^*\). The dual function is

\[ g(\gamma,\delta) = \min_w \mathcal{L}(w,\gamma,\delta) = -\frac{1}{4}(\gamma\mu+\delta e)^T\Sigma^{-1}(\gamma\mu+\delta e) + \gamma\bar{R} + \delta, \]

obtained by substituting the optimal \(w(\gamma,\delta) = \frac{1}{2}\Sigma^{-1}(\gamma\mu+\delta e)\). Maximising \(g\) over \((\gamma,\delta)\) gives the dual problem, whose solution directly yields the optimal primal \(w^*\) and the shadow prices \(\gamma^*, \delta^*\).

20.2 Interpretation of Dual Variables

The dual variable \(\gamma^*\) is the shadow price of the expected return constraint: it measures how much the minimum variance decreases per unit increase in \(\bar{R}\). Geometrically, \(\gamma^*\) is the slope of the efficient frontier in \((\sigma_p^2, \mu_p)\) space at the point \((\sigma_p^{*2}, \bar{R})\). This has the financial interpretation: \(\gamma^*\) is the marginal cost of seeking higher expected return, measured in additional variance. When \(\gamma^*\) is large, pushing the return target higher is expensive in variance terms; when small, higher expected return comes cheaply.

Remark (Duality and the Capital Market Line): When a risk-free asset is introduced, the tangency portfolio \(w_T\) solves the problem of maximising the Sharpe ratio: \[ \max_w \; \frac{\mu^T w - r_f}{\sqrt{w^T\Sigma w}} \quad \text{s.t.} \quad e^T w = 1. \] Substituting the substitution \(w = y / (e^T y)\) with unconstrained \(y\), this reduces to minimising \(y^T\Sigma y\) subject to \((\mu - r_f e)^T y = 1\). The KKT condition gives \(w_T \propto \Sigma^{-1}(\mu - r_f e)\). The Capital Market Line is the efficient frontier when the risk-free asset is available — a straight line in \((\sigma_p, \mu_p)\) space from \((0, r_f)\) tangent to the risky asset frontier at \(w_T\). Every efficient portfolio with a risk-free asset is a combination of \(w_T\) and the risk-free asset — this is the **one-fund theorem** for the case with a risk-free asset, which is sharper than the two-fund theorem for risky assets only.

20.3 Generalised Separation and Mutual Fund Theorem

The multi-period and multi-investor version of fund separation (due to Ross 1978 and others) states: under suitable distributional assumptions (elliptical returns, or CARA utility), every investor’s optimal portfolio is a linear combination of the same \(K\) “mutual funds.” The one-fund theorem (\(K=1\), the market portfolio in CAPM) and two-fund theorem (\(K=2\), GMVP and tangency portfolio) are special cases. The existence of such separation results is closely tied to the algebraic structure of the efficient frontier: the fact that the frontier is a convex set and that the optimal weight vector is a linear function of the target return. This linearity — a direct consequence of the KKT stationarity condition being a linear system — is what makes mutual fund separation theorems possible and distinguishes the Markowitz framework from more general optimisation settings where the optimal weight is a nonlinear function of the target.


Appendix: Quick Reference

A.1 Efficient Frontier Scalars

For \(n\) assets with \(\Sigma \succ 0\), \(\mu\) not proportional to \(e\):

SymbolFormulaMeaning
\(A\)\(\mu^T\Sigma^{-1}e\)Cross term
\(B\)\(\mu^T\Sigma^{-1}\mu\)Return precision
\(C\)\(e^T\Sigma^{-1}e\)Sum of precision
\(D\)\(BC - A^2 > 0\)Discriminant
\(\sigma^2_{\min}\)\(1/C\)GMVP variance
\(\mu_{\min}\)\(A/C\)GMVP return

Frontier parabola: \(\sigma_p^2 = \frac{B\mu_p^2 - 2A\mu_p + C}{D}\). GMVP weights: \(w_{\text{gmv}} = \Sigma^{-1}e / C\).

A.2 Key Risk Measures

MeasureFormula (Gaussian, \(\mu_p, \sigma_p\))Subadditive?
VaR\(_\alpha\)\(-\mu_p + z_\alpha \sigma_p\)No
CVaR\(_\alpha\)\(-\mu_p + \sigma_p \phi(z_\alpha)/(1-\alpha)\)Yes
Volatility\(\sigma_p\)Yes
Max Drawdown(path-dependent, no closed form)No

\(z_\alpha = \Phi^{-1}(\alpha)\) is the standard normal quantile; \(\phi\) is the standard normal PDF.

A.3 CAPM Summary

Single-factor: \(\mathbb{E}[r_i] - r_f = \beta_i(\mathbb{E}[r_m] - r_f)\), where \(\beta_i = \mathrm{Cov}(r_i, r_m)/\sigma_m^2\). Security market line: all assets plot on the line \(\mu = r_f + \beta(\mu_m - r_f)\) in \((\beta, \mu)\) space. Jensen’s alpha: \(\alpha_i = \mathbb{E}[r_i] - r_f - \beta_i(\mathbb{E}[r_m]-r_f)\); positive alpha indicates the asset lies above the security market line (underpriced relative to CAPM).

A.4 Black–Litterman Summary

Prior: \(\mu \sim \mathcal{N}(\Pi, \tau\Sigma)\) with \(\Pi = \delta\Sigma w_m\). Views: \(P\mu = q + \varepsilon\), \(\varepsilon \sim \mathcal{N}(0,\Omega)\). Posterior mean: \(\hat\mu_{\text{BL}} = [(\tau\Sigma)^{-1} + P^T\Omega^{-1}P]^{-1}[(\tau\Sigma)^{-1}\Pi + P^T\Omega^{-1}q]\). Plug \(\hat\mu_{\text{BL}}\) into standard Markowitz to obtain the BL-optimal portfolio.

Back to top