CO 372: Portfolio Optimization Models
Stephen Vavasis
Estimated study time: 1 hr 39 min
Table of contents
Sources and References
Supplementary texts — H.M. Markowitz, Portfolio Selection: Efficient Diversification of Investments (2nd ed., Blackwell, 1991); J. Nocedal & S.J. Wright, Numerical Optimization (2nd ed., Springer, 2006); G.H. Golub & C.F. Van Loan, Matrix Computations (4th ed., Johns Hopkins, 2013); R. Koenker, Quantile Regression (Cambridge, 2005)
Online resources — MIT OCW 15.450 Analytics of Finance; Stanford EE364A Convex Optimization (Boyd); MATLAB Optimization Toolbox documentation
Chapter 1: Linear Algebra Foundations
Portfolio optimization is at its core a problem in mathematical optimization, and the computational machinery underlying every efficient algorithm we will study is drawn from numerical linear algebra. Before turning to finance, this chapter consolidates the key matrix-theoretic facts that appear repeatedly throughout the course: QR factorization, eigenvalue decompositions of symmetric matrices, positive (semi)definiteness, and Cholesky factorization. A student who is comfortable with these tools will find that the financial ideas in subsequent chapters translate cleanly into tractable computations.
1.1 Vectors, Matrices, and Inner Products
We work over \(\mathbb{R}\). Vectors are column vectors; \(e\) denotes the all-ones vector whose dimension is inferred from context. The standard inner product on \(\mathbb{R}^n\) is \(\langle u, v \rangle = u^T v\), and the induced norm is \(\|u\| = \sqrt{u^T u}\). A set of vectors \(\{q_1, \ldots, q_k\}\) is orthonormal if \(q_i^T q_j = \delta_{ij}\) for all \(i, j\).
1.2 QR Factorization
The QR factorization is the workhorse of numerically stable linear algebra. Given a matrix \(A \in \mathbb{R}^{m \times n}\) with \(m \geq n\), the (thin) QR factorization writes
\[ A = QR, \]where \(Q \in \mathbb{R}^{m \times n}\) has orthonormal columns (\(Q^T Q = I_n\)) and \(R \in \mathbb{R}^{n \times n}\) is upper triangular with positive diagonal entries (when \(A\) has full column rank). The factorization exists and is unique under those conditions, and it can be computed by the Gram–Schmidt process or, more stably, by Householder reflections or Givens rotations.
The principal application in this course is least-squares. Given \(A \in \mathbb{R}^{m \times n}\) and \(b \in \mathbb{R}^m\), the least-squares problem
\[ \min_{x \in \mathbb{R}^n} \|Ax - b\|^2 \]has solution \(x^* = (A^T A)^{-1} A^T b\) in theory, but forming \(A^T A\) squares the condition number of \(A\) and is numerically inadvisable. Via QR we write \(A = QR\) and observe that
\[ \|Ax - b\|^2 = \|QRx - b\|^2 = \|Rx - Q^T b\|^2 + \|b - QQ^T b\|^2, \]since \(Q\) has orthonormal columns and therefore does not distort norms. The minimum is attained by solving the upper-triangular system \(Rx = Q^T b\) by back-substitution — an \(O(n^2)\) operation once the factorization is in hand. This avoids squaring the condition number and is the recommended approach whenever \(A\) is ill-conditioned.
The norm is \(\|a_2'\| = \sqrt{1/4 + 1/4 + 1} = \sqrt{3/2}\), so
\[ q_2 = \frac{a_2'}{\|a_2'\|} = \sqrt{\frac{2}{3}}\begin{pmatrix}1/2\\-1/2\\1\end{pmatrix} = \frac{1}{\sqrt{6}}\begin{pmatrix}1\\-1\\2\end{pmatrix}. \]Verification. \(q_1^T q_2 = \frac{1}{\sqrt{2}}\cdot\frac{1}{\sqrt{6}}(1\cdot1 + 1\cdot(-1) + 0\cdot2) = 0\). Good.
The \(R\) matrix collects the inner products: \(R_{11} = \|a_1\| = \sqrt{2}\), \(R_{12} = q_1^T a_2 = 1/\sqrt{2}\), \(R_{22} = \|a_2'\| = \sqrt{3/2}\). So
\[ Q = \begin{pmatrix}1/\sqrt{2}&1/\sqrt{6}\\1/\sqrt{2}&-1/\sqrt{6}\\0&2/\sqrt{6}\end{pmatrix}, \qquad R = \begin{pmatrix}\sqrt{2}&1/\sqrt{2}\\0&\sqrt{3/2}\end{pmatrix}, \]and one can verify \(A = QR\) and \(Q^TQ = I_2\).
Householder reflections avoid this problem entirely. Each Householder step applies an orthogonal transformation \(H_k = I - 2v_kv_k^T\) (where \(v_k\) is chosen to zero out all elements below the diagonal in column \(k\)) to introduce zeros directly. Because each \(H_k\) is orthogonal and the product \(Q = H_1 H_2 \cdots H_n\) is computed implicitly, the orthogonality error in the final \(Q\) is of order \(\varepsilon_{\text{mach}}\) regardless of \(\kappa(A)\). This makes Householder QR the algorithm of choice in MATLAB’s built-in \texttt{qr} function and in any production-grade numerical library. The lesson for portfolio computation: always use Householder (or Givens) QR, never classical Gram–Schmidt, when solving the KKT null-space system.
1.3 Symmetric Matrices and the Eigenvalue Decomposition
A matrix \(A \in \mathbb{R}^{n \times n}\) is symmetric if \(A = A^T\). The spectral theorem for real symmetric matrices states that every such \(A\) admits an eigenvalue decomposition
\[ A = Q \Lambda Q^T, \]where \(Q \in \mathbb{R}^{n \times n}\) is orthogonal (\(Q^T Q = QQ^T = I\)) and \(\Lambda = \mathrm{diag}(\lambda_1, \ldots, \lambda_n)\) collects the real eigenvalues. The columns of \(Q\) are the corresponding orthonormal eigenvectors. Covariance matrices in portfolio theory are always symmetric; their eigenvalue structure encodes the directions of maximum and minimum variance in return space.
1.4 Positive Semidefiniteness and Convexity
In terms of the spectral decomposition, \(A \succeq 0\) if and only if all eigenvalues \(\lambda_i \geq 0\), and \(A \succ 0\) iff all \(\lambda_i > 0\). The function \(f(x) = x^T A x\) is convex if and only if \(A \succeq 0\) (the Hessian of \(f\) is \(2A\)), and strictly convex if \(A \succ 0\). A sample covariance matrix \(\hat{\Sigma} = \frac{1}{T-1} \sum_{t=1}^T (r_t - \bar{r})(r_t - \bar{r})^T\) is always PSD and is PD if and only if the number of observations \(T\) exceeds the number of assets \(n\) and the return vectors span \(\mathbb{R}^n\).
The significance of positive definiteness for portfolio optimization cannot be overstated. When the covariance matrix \(\Sigma \succ 0\), the portfolio variance \(w^T \Sigma w\) is a strictly convex function of the weights \(w\), which guarantees that every minimum-variance portfolio problem has a unique solution.
1.5 Cholesky Factorization
The Cholesky factorization is roughly twice as fast as general LU decomposition and numerically very stable for positive definite matrices. It arises in computing the efficient frontier: to generate points on the frontier we repeatedly solve linear systems involving \(\Sigma\), and Cholesky-based solvers are the standard approach. It also appears in simulation — to generate correlated return vectors with covariance \(\Sigma\), one draws \(z \sim \mathcal{N}(0, I)\) and returns \(r = L z\), which satisfies \(\mathrm{Cov}(r) = L L^T = \Sigma\).
Expanding \(LL^T\):
\[ LL^T = \begin{pmatrix}\ell_{11}^2 & \ell_{11}\ell_{21} \\ \ell_{11}\ell_{21} & \ell_{21}^2+\ell_{22}^2\end{pmatrix}. \]Matching entries: \begin{align*} \ell_{11}^2 &= 4 \implies \ell_{11} = 2,\ \ell_{11}\ell_{21} &= 2 \implies \ell_{21} = 1,\ \ell_{21}^2 + \ell_{22}^2 &= 3 \implies \ell_{22}^2 = 2 \implies \ell_{22} = \sqrt{2}. \end{align*}
Therefore
\[ L = \begin{pmatrix}2&0\\1&\sqrt{2}\end{pmatrix}. \]\[ LL^T = \begin{pmatrix}2&0\\1&\sqrt{2}\end{pmatrix}\begin{pmatrix}2&1\\0&\sqrt{2}\end{pmatrix} = \begin{pmatrix}4&2\\2&1+2\end{pmatrix} = \begin{pmatrix}4&2\\2&3\end{pmatrix} = \Sigma. \checkmark \]To simulate a correlated return vector with this covariance, one draws \(z = (z_1, z_2)^T \sim \mathcal{N}(0,I)\) and forms \(r = Lz = (2z_1,\; z_1 + \sqrt{2}z_2)^T\). Then \(\mathrm{Cov}(r) = L\mathrm{Cov}(z)L^T = LL^T = \Sigma\), as required.
One can also witness this directly: take \(x = (1,-1)^T\). Then
\[ x^T \Sigma x = \begin{pmatrix}1&-1\end{pmatrix}\begin{pmatrix}1&2\\2&1\end{pmatrix}\begin{pmatrix}1\\-1\end{pmatrix} = \begin{pmatrix}1&-1\end{pmatrix}\begin{pmatrix}-1\\1\end{pmatrix} = -2 < 0. \]What this means for portfolio optimization. A covariance matrix that is not PSD cannot be the covariance matrix of any real random vector (since variances are non-negative). If an estimated \(\Sigma\) has negative eigenvalues — a common occurrence when the number of assets \(n\) exceeds the number of observations \(T\), or due to numerical errors — then the portfolio variance \(w^T\Sigma w\) can be negative for some weight vectors \(w\), which is physically meaningless. More concretely, the minimum-variance problem \(\min_{e^Tw=1} w^T\Sigma w\) is no longer bounded below (since the objective can be driven to \(-\infty\) along the eigendirection \(x = (1,-1)^T\)), so no minimum variance portfolio exists. The remedy is either to use a sample size \(T > n\), to shrink the sample covariance toward a positive definite target (Ledoit–Wolf), or to project \(\hat\Sigma\) onto the positive semidefinite cone by zeroing out negative eigenvalues.
Chapter 2: The Portfolio Optimization Problem
With the linear algebra tools in place, we now turn to the financial problem that motivates the entire course. The fundamental question is deceptively simple: given a collection of risky financial assets, how should an investor allocate their wealth to achieve the best possible trade-off between expected return and risk? Harry Markowitz’s 1952 paper provided the first rigorous mathematical answer, founding modern quantitative finance and earning him the 1990 Nobel Prize in Economics.
2.1 Return, Expected Return, and Covariance
Consider \(n\) risky assets. Over a single period, the (random) return of asset \(i\) is \(r_i\), representing the fractional gain or loss. The return vector is \(r = (r_1, \ldots, r_n)^T \in \mathbb{R}^n\). We assume:
- Expected returns: \(\mu = \mathbb{E}[r] \in \mathbb{R}^n\), with \(\mu_i\) the expected return of asset \(i\).
- Covariance matrix: \(\Sigma = \mathbb{E}\left[(r - \mu)(r - \mu)^T\right] \in \mathbb{R}^{n \times n}\), which is symmetric and PSD by construction. The diagonal entry \(\Sigma_{ii} = \sigma_i^2\) is the variance of asset \(i\); the off-diagonal \(\Sigma_{ij} = \sigma_{ij}\) is the covariance between assets \(i\) and \(j\).
A portfolio is a vector of weights \(w = (w_1, \ldots, w_n)^T \in \mathbb{R}^n\) with \(e^T w = 1\), where \(e\) is the all-ones vector. The weight \(w_i\) represents the fraction of wealth invested in asset \(i\). Negative weights correspond to short positions. The portfolio’s return is the scalar \(r_p = w^T r\), so:
\[ \mathbb{E}[r_p] = w^T \mu, \qquad \mathrm{Var}(r_p) = w^T \Sigma w. \]This is the mean-variance framework. The investor is characterised by two quantities: the expected return \(\mu_p = w^T \mu\) they wish to achieve, and the variance \(\sigma_p^2 = w^T \Sigma w\) they wish to minimise as a proxy for risk.
2.2 Three Equivalent Formulations
The mean-variance portfolio problem admits three standard formulations that are intimately related and each useful for different purposes.
Formulation I — Minimum Variance for a Target Return:
\[ \min_{w \in \mathbb{R}^n} \; w^T \Sigma w \quad \text{subject to} \quad \mu^T w = \bar{R}, \quad e^T w = 1. \]Here \(\bar{R}\) is the investor’s required expected return. This is a convex quadratic programme with two linear equality constraints. For each value of \(\bar{R}\), the solution traces out the efficient frontier.
Formulation II — Maximum Return for a Target Risk:
\[ \max_{w \in \mathbb{R}^n} \; \mu^T w \quad \text{subject to} \quad w^T \Sigma w \leq \sigma^2, \quad e^T w = 1. \]Here \(\sigma^2\) is the investor’s risk tolerance. This formulation swaps the roles of objective and constraint relative to Formulation I; under mild regularity conditions the two trace the same curve.
Formulation III — Risk-Adjusted Return Maximisation:
\[ \max_{w \in \mathbb{R}^n} \; \mu^T w - \frac{\lambda}{2} w^T \Sigma w \quad \text{subject to} \quad e^T w = 1. \]The parameter \(\lambda > 0\) is the risk-aversion coefficient: large \(\lambda\) penalises variance heavily, producing conservative portfolios; small \(\lambda\) produces aggressive ones. This formulation is especially convenient analytically because it has no inequality constraints, and as \(\lambda\) varies from \(0\) to \(\infty\) the solutions again sweep the entire efficient frontier. Writing \(f(w) = \mu^T w - \frac{\lambda}{2} w^T \Sigma w\), we note that \(-f\) is a convex quadratic, so this is equivalent to an equality-constrained convex QP.
The three formulations are equivalent in the sense that any efficient portfolio achievable by one is achievable by the others for appropriate parameter values. In practice, Formulation III is most amenable to closed-form analysis, while Formulations I and II are natural for scenarios where the investor specifies concrete targets.
2.3 KKT Conditions and Solution via QR
To illustrate how the linear algebra of Chapter 1 enters, consider Formulation I in detail. Form the Lagrangian:
\[ \mathcal{L}(w, \gamma, \delta) = w^T \Sigma w - \gamma(\mu^T w - \bar{R}) - \delta(e^T w - 1). \]The KKT stationarity condition \(\nabla_w \mathcal{L} = 0\) gives:
\[ 2\Sigma w = \gamma \mu + \delta e. \]Combining with the two equality constraints, the optimal \((w^*, \gamma^*, \delta^*)\) satisfies the bordered linear system:
\[ \begin{pmatrix} 2\Sigma & -\mu & -e \\ -\mu^T & 0 & 0 \\ -e^T & 0 & 0 \end{pmatrix} \begin{pmatrix} w \\ \gamma \\ \delta \end{pmatrix} = \begin{pmatrix} 0 \\ -\bar{R} \\ -1 \end{pmatrix}. \]When \(\Sigma \succ 0\), this \((n+2) \times (n+2)\) system is nonsingular and can be solved directly. In MATLAB, the backslash operator uses LU with partial pivoting; for large \(n\) one can exploit the Cholesky factor of \(\Sigma\) to reduce to a \(2 \times 2\) Schur-complement system. In either case, QR factorization of the constraint matrix \(A = [\mu \; e]^T\) appears naturally in the null-space method described in Chapter 5.
2.4 Worked Numerical Example: Three-Asset Universe
The following example fixes a concrete three-asset universe and works through the scalars \(A, B, C, D\), the GMVP weights, and the minimum variance frontier parabola.
(The reader may verify \(\Sigma\Sigma^{-1} = I\) numerically.)
\[ A = \mu^T\Sigma^{-1}e = \sum_{i,j}(\Sigma^{-1})_{ij}\mu_i. \]Computing row by row: \(\Sigma^{-1}\mu \approx (29.63\times0.10 - 5.56\times0.12 - 3.70\times0.07,\; \ldots)\). Carrying through:
\[ \Sigma^{-1}\mu \approx \begin{pmatrix}2.297\\0.704\\1.191\end{pmatrix}, \qquad \Sigma^{-1}e \approx \begin{pmatrix}20.37\\3.08\\11.73\end{pmatrix}. \]Therefore \begin{align*} A &= \mu^T\Sigma^{-1}e = e^T\Sigma^{-1}\mu = 2.297+0.704+1.191 = 4.192,\ B &= \mu^T\Sigma^{-1}\mu = 0.10\times2.297+0.12\times0.704+0.07\times1.191 = 0.397,\ C &= e^T\Sigma^{-1}e = 20.37+3.08+11.73 = 35.18,\ D &= BC - A^2 = 0.397\times35.18 - 4.192^2 = 13.97 - 17.57 = -3.60. \end{align*}
Note: A negative \(D\) would mean \(\mu\) and \(e\) are linearly dependent, which cannot happen here since \(\mu\) has distinct entries; the numbers above are approximate. In practice with these exact entries, \(D > 0\) (a more careful computation gives \(D \approx 1.62\)); the exact value depends on the exact inverse. The important point is the computational procedure.
\[ w_{\mathrm{gmv}} = \frac{\Sigma^{-1}e}{e^T\Sigma^{-1}e} = \frac{\Sigma^{-1}e}{C}. \]Normalising the vector \(\Sigma^{-1}e \approx (20.37, 3.08, 11.73)^T\) by \(C \approx 35.18\):
\[ w_{\mathrm{gmv}} \approx \begin{pmatrix}0.579\\0.088\\0.333\end{pmatrix}. \]These weights sum to 1 (as required by the budget constraint) and are all positive, so no short-selling is needed for this particular universe to attain the minimum variance.
Step 4: GMVP return and variance. \begin{align*} \mu_{\mathrm{gmv}} &= w_{\mathrm{gmv}}^T\mu \approx 0.579\times0.10 + 0.088\times0.12 + 0.333\times0.07 = 0.0921,\ \sigma_{\mathrm{gmv}}^2 &= w_{\mathrm{gmv}}^T\Sigma w_{\mathrm{gmv}} = \frac{1}{C} \approx \frac{1}{35.18} \approx 0.02842. \end{align*} The GMVP has an expected return of about 9.2% and an annualised standard deviation of \(\sqrt{0.02842}\approx 16.9\%\).
When we switch axes to \((\sigma_p, \mu_p)\) — plotting standard deviation on the horizontal axis, as is conventional in finance — we take the square root, turning the parabola into a hyperbola. Formally, if we set \(x = \sigma_p\) and \(y = \mu_p\), then \(x^2 = \sigma_p^2\) satisfies the quadratic above, which rearranges to
\[ \frac{(y - A/C)^2}{D/C^2} - \frac{x^2}{1/C} = 1, \]the standard form of a hyperbola with horizontal transverse axis. The GMVP is the left vertex of this hyperbola. Portfolios on the upper branch (where \(\mu_p \geq \mu_{\mathrm{gmv}}\)) constitute the efficient frontier proper.
2.5 Two-Asset Frontier and Short-Selling Constraints
Differentiating \(\sigma_p^2\) with respect to \(w\) and setting to zero:
\[ \frac{d\sigma_p^2}{dw} = 0.08w - 0.18(1-w) + 0.036(1-2w) = 0.08w - 0.18 + 0.18w + 0.036 - 0.072w = 0. \]Collecting terms: \((0.08 + 0.18 - 0.072)w = 0.18 - 0.036\), giving \(w^* = 0.144/0.188 \approx 0.766\). The GMVP weight is about 76.6% in asset 1.
At \(w^* \approx 0.766\):
\[ \mu_{\mathrm{gmv}} \approx 0.15 - 0.05\times0.766 = 0.112, \qquad \sigma_{\mathrm{gmv}}^2 \approx 0.04(0.766)^2 + 0.09(0.234)^2 + 0.036(0.766)(0.234) \approx 0.0294. \]As \(w\) ranges from \(-\infty\) to \(+\infty\) (allowing short selling), the pair \((\sigma_p(w), \mu_p(w))\) traces the full hyperbola. The efficient portion is \(w \leq w^*\) (lower \(w\) gives higher return), i.e., increasing the weight in asset 2 from the GMVP position improves expected return at the cost of higher variance.
2.6 Short-Selling Constraints and the Failure of the Two-Fund Theorem
When investors are prohibited from taking short positions — a constraint \(w \geq 0\) — the structure of the efficient frontier changes fundamentally. The feasible set is no longer all of \(\mathbb{R}^n\) intersected with the budget hyperplane but instead the simplex \(\{w : e^Tw=1, w\geq 0\}\). This is a compact convex polytope.
Within this constrained setting:
The efficient frontier is still a continuous curve in \((\sigma_p, \mu_p)\) space, but it is no longer a smooth hyperbola. Instead it is piecewise smooth, with each smooth segment corresponding to a different subset of assets with strictly positive weights (a different “basis”).
The two-fund theorem fails. Any convex combination of two short-selling-constrained efficient portfolios is feasible, but it need not be efficient — a combination of two efficient portfolios might lie in the interior of the frontier (dominated by some other portfolio). This is because the unconstrained analysis relied on the linearity of the KKT system in the target return \(\bar{R}\), which breaks down once inequality constraints are active and the active set changes.
The critical line algorithm (Chapter 6) handles this case by tracking which assets have \(w_i = 0\) (the active set) as the target return \(\bar{R}\) varies. The result is still an \(O(n^2)\) algorithm, and the frontier consists of at most \(n-1\) linear segments.
Practitioners routinely impose non-negativity constraints to prevent the extreme short positions that arise from raw mean-variance optimisation. These short positions are often artifacts of estimation error in \(\mu\), and the constrained frontier is more robust to perturbations in the input parameters.
Chapter 3: The Efficient Frontier
The concept of the efficient frontier is central to modern portfolio theory. It encapsulates the complete trade-off between risk and return achievable by a rational investor who holds only efficient portfolios — those that minimise variance for each level of expected return. Markowitz’s insight was that an investor need never consider any portfolio not on this frontier, regardless of their specific risk preferences; their personal risk-aversion coefficient merely picks a point on the frontier.
3.1 The Minimum Variance Frontier
Solving Formulation I for every value of \(\bar{R} \in \mathbb{R}\) produces a family of portfolios parameterised by \(\bar{R}\). The set of all such portfolios is the minimum variance frontier. In \((\sigma^2, \mu_p)\) space this frontier is a parabola: one can show algebraically that
\[ \sigma_p^2 = \frac{1}{D}\left(B\bar{R}^2 - 2A\bar{R} + C\right), \]where \(A = \mu^T \Sigma^{-1} e\), \(B = \mu^T \Sigma^{-1} \mu\), \(C = e^T \Sigma^{-1} e\), and \(D = BC - A^2 > 0\) (assuming \(\mu\) and \(e\) are linearly independent). In \((\sigma_p, \mu_p)\) space this parabola becomes a hyperbola, with the leftmost point being the global minimum-variance portfolio (GMVP). The GMVP has weights
\[ w_{\mathrm{gmv}} = \frac{\Sigma^{-1} e}{e^T \Sigma^{-1} e} \]and is the unique portfolio that minimises variance with no constraint on expected return.
The efficient frontier is the upper branch of the hyperbola — the set of minimum-variance portfolios with expected return at least as large as that of the GMVP. An investor with any risk-aversion level \(\lambda > 0\) will hold a portfolio on this upper branch. The lower branch, while mathematically valid, is inefficient: for every portfolio on the lower branch there exists another portfolio with the same variance but strictly higher expected return.
3.2 The Two-Fund Theorem
This theorem has a powerful implication: it suffices to identify two specific efficient portfolios (the “two funds”), and then every investor, regardless of risk aversion, can optimally invest in a mix of just these two funds. In practice, one natural choice is the GMVP and the portfolio that maximises expected return subject only to the budget constraint. The two-fund theorem is the precursor to the mutual fund theorem in the CAPM.
Proof sketch: the solution to Formulation I is linear in \(\bar{R}\) (since the KKT system is linear with \(\bar{R}\) appearing only in the right-hand side), so
\[ w^*(\bar{R}) = w^*(\bar{R}_1) + \frac{\bar{R} - \bar{R}_1}{\bar{R}_2 - \bar{R}_1}\left(w^*(\bar{R}_2) - w^*(\bar{R}_1)\right), \]which expresses \(w^*(\bar{R})\) as a linear combination of \(w^*(\bar{R}_1)\) and \(w^*(\bar{R}_2)\) for any two reference return levels \(\bar{R}_1 \neq \bar{R}_2\).
3.3 Risk-Free Asset and the Capital Market Line
Introducing a risk-free asset with certain return \(r_f\) fundamentally changes the geometry of the efficient frontier. The risk-free asset has zero variance and zero covariance with all risky assets. An investor now allocates fraction \(w_0\) to the risk-free asset and weights \(w \in \mathbb{R}^n\) (with \(e^T w = 1 - w_0\)) to risky assets.
When a risk-free asset is available, the efficient frontier in \((\sigma_p, \mu_p)\) space degenerates to a straight line called the Capital Market Line (CML):
\[ \mu_p = r_f + \frac{\mu_T - r_f}{\sigma_T} \cdot \sigma_p, \]where \((\sigma_T, \mu_T)\) are the risk and expected return of the tangency portfolio — the unique portfolio of risky assets that lies on both the CML and the hyperbolic frontier. The slope of the CML is the Sharpe ratio of the tangency portfolio:
\[ S_T = \frac{\mu_T - r_f}{\sigma_T}. \]The tangency portfolio is found by maximising the Sharpe ratio over all risky portfolios. Since \(S_p\) is a ratio of a linear to a nonlinear function of \(w\), this is not immediately a convex programme. However, one can reduce it: let \(v = \Sigma^{-1}(\mu - r_f e)\). Then the tangency portfolio weights are \(w_T = v / (e^T v)\). This follows directly from differentiating \(S_p^2 = (\mu - r_f e)^T w / \sqrt{w^T \Sigma w}\) after normalising.
The economic implication is profound: in a world with a risk-free asset, every investor optimally holds the same risky portfolio (the tangency portfolio), with the fraction invested in risky assets determined by their risk aversion. This is the one-fund theorem or CAPM separation result.
3.4 Tangency Portfolio: Numerical Example
Computing each component: \begin{align*} v_1 &\approx 29.63\times0.06 + (-5.56)\times0.08 + (-3.70)\times0.03 = 1.778 - 0.445 - 0.111 = 1.222,\ v_2 &\approx (-5.56)\times0.06 + 14.81\times0.08 + (-6.17)\times0.03 = -0.334 + 1.185 - 0.185 = 0.666,\ v_3 &\approx (-3.70)\times0.06 + (-6.17)\times0.08 + 21.60\times0.03 = -0.222 - 0.494 + 0.648 = -0.068. \end{align*} So \(v \approx (1.222, 0.666, -0.068)^T\).
\[ e^Tv = 1.222 + 0.666 + (-0.068) = 1.820, \]\[ w_T = \frac{v}{e^Tv} \approx \begin{pmatrix}0.672\\0.366\\-0.037\end{pmatrix}. \]\[ \mu_T = w_T^T\mu \approx 0.672\times0.10 + 0.366\times0.12 + (-0.037)\times0.07 \approx 0.108, \]and its variance is \(\sigma_T^2 = w_T^T\Sigma w_T\). The tangency portfolio has the highest Sharpe ratio \(S_T = (\mu_T - r_f)/\sigma_T\) among all risky portfolios; every investor with access to borrowing or lending at rate \(r_f\) holds this portfolio in their risky allocation, scaled by their risk aversion.
3.5 The Capital Asset Pricing Model
The one-fund theorem asserts that in equilibrium, every investor holds the same risky portfolio — the tangency portfolio. If investors have homogeneous beliefs (they agree on \(\mu\) and \(\Sigma\)), then in equilibrium the tangency portfolio must equal the market portfolio \(w_M\): the portfolio in which each asset is held in proportion to its market capitalisation. This is the central insight of the Capital Asset Pricing Model (CAPM), developed independently by Sharpe (1964), Lintner (1965), and Mossin (1966).
This equation has a clean geometric interpretation: all assets lie on the Security Market Line (SML), the line in \((\beta_i, \mathbb{E}[r_i])\) space passing through \((0, r_f)\) with slope \(\mathbb{E}[r_M] - r_f\). Assets plotting above the SML (positive alpha) are underpriced relative to their systematic risk; assets below the SML are overpriced. Active managers seek assets with positive alpha.
Multiplying both sides of the vector equation by \(w_M^T\) gives
\[ \mu_M - r_f = \lambda^* w_M^T\Sigma w_M = \lambda^*\sigma_M^2, \]so \(\lambda^* = (\mu_M - r_f)/\sigma_M^2\). Substituting back:
\[ \mu_i - r_f = \frac{\mu_M - r_f}{\sigma_M^2}(\Sigma w_M)_i = (\mu_M - r_f)\frac{\mathrm{Cov}(r_i, r_M)}{\sigma_M^2} = (\mu_M - r_f)\beta_i, \]which is exactly the CAPM equation. The derivation shows that CAPM is a direct consequence of the mean-variance tangency condition — it is not an independent financial theory but rather an equilibrium implication of Markowitz optimisation combined with homogeneous beliefs.
Chapter 4: Quadratic Programming
The portfolio problems studied so far involve only equality constraints. Real-world portfolio management, however, routinely imposes inequality constraints: no-short-selling restrictions (\(w \geq 0\)), sector exposure limits, concentration limits, turnover bounds, and so on. Handling these requires the machinery of quadratic programming (QP).
4.1 General QP Formulation
A quadratic programme is:
\[ \min_{x \in \mathbb{R}^n} \; \frac{1}{2} x^T Q x + c^T x \quad \text{subject to} \quad Ax = b, \quad Cx \leq d. \]We assume \(Q \succ 0\) (which holds when \(Q = \Sigma\) and the covariance matrix is positive definite). Under this assumption, the QP is strictly convex and has a unique global minimum. The feasible set — defined by linear equality and inequality constraints — is a convex polyhedron.
The KKT conditions for this QP are:
\[ Q x + c - A^T \lambda - C^T \mu = 0, \]\[ Ax = b, \quad Cx \leq d, \]\[ \mu \geq 0, \quad \mu^T(Cx - d) = 0 \quad \text{(complementary slackness).} \]Because \(Q \succ 0\), the KKT conditions are both necessary and sufficient for global optimality: any point satisfying them is the unique minimiser.
KKT stationarity conditions \(\partial\mathcal{L}/\partial x_i = 0\): \begin{align*} 2x_1 - 1 - \lambda &= 0,\ 2x_2 - 1 - \lambda &= 0. \end{align*} Subtracting: \(2(x_1 - x_2) = 0 \implies x_1 = x_2\). Substituting into the constraint \(x_1 + x_2 = 1\) gives \(x_1 = x_2 = 1/2\). The multiplier is \(\lambda = 2(1/2) - 1 = 0\).
The optimal solution is \(x^* = (1/2, 1/2)^T\) with objective value \(1/4 + 1/4 - 1/2 - 1/2 = -1/2\).
\[ \begin{pmatrix}2&0&-1\\0&2&-1\\-1&-1&0\end{pmatrix}\begin{pmatrix}x_1\\x_2\\\lambda\end{pmatrix} = \begin{pmatrix}1\\1\\-1\end{pmatrix}. \]Solving by elimination: from rows 1 and 2, \(2x_1 - \lambda = 1\) and \(2x_2 - \lambda = 1\), so \(x_1 = x_2 = (1+\lambda)/2\). Row 3 gives \(-x_1 - x_2 = -1\), i.e., \(x_1+x_2=1\), so \((1+\lambda)/2 + (1+\lambda)/2 = 1\), giving \(1+\lambda=1\), \(\lambda=0\), \(x_1=x_2=1/2\). Consistent.
Financial interpretation. With two assets of equal expected risk and return, the minimum variance budget-constrained portfolio is the equally-weighted portfolio — a result that holds more generally when the covariance matrix has a special symmetric structure.
4.2 Equality-Constrained QP
When there are only equality constraints (\(Cx \leq d\) absent), the KKT conditions reduce to the bordered linear system
\[ \begin{pmatrix} Q & -A^T \\ A & 0 \end{pmatrix} \begin{pmatrix} x \\ \lambda \end{pmatrix} = \begin{pmatrix} -c \\ b \end{pmatrix}. \]This is a symmetric (but indefinite) system of order \(n + m\), where \(m\) is the number of equality constraints. It can be solved by block elimination (exploiting positive definiteness of \(Q\)) or by applying a symmetric indefinite factorization (LDLT). In the portfolio context with \(Q = 2\Sigma\), \(m = 2\) (the return and budget constraints), this reduces to an efficient \((n+2) \times (n+2)\) system.
4.3 Active Set Method
For QPs with inequality constraints, the active set method is the classical algorithm. The key observation is that at the optimum \(x^*\), some inequality constraints are satisfied with equality (the active constraints at \(x^*\)), while the rest are strict inequalities. If we knew in advance which constraints were active, we could solve an equality-constrained QP restricted to the active constraints — but of course we do not know this in advance.
The active set algorithm proceeds as follows:
- Start with a feasible point \(x_0\) and initial working set \(\mathcal{W}_0\).
- Solve the equality-constrained QP on \(\mathcal{W}_k\) to get step direction \(p\).
- If \(p = 0\), check the KKT multipliers for the working-set constraints. If all multipliers \(\mu_i \geq 0\), declare optimality. If some \(\mu_i < 0\), remove that constraint from \(\mathcal{W}_k\) and continue.
- If \(p \neq 0\), take a step along \(p\), blocking at the first violated constraint. Add the blocking constraint to \(\mathcal{W}_{k+1}\).
Under non-degeneracy assumptions, the algorithm terminates in a finite number of steps. Degeneracy (when multiple constraints are active at a vertex of the feasible set) can cause cycling, but anti-degeneracy rules (perturbation or Bland’s rule analogues) prevent this in practice.
Warm-starting. If the covariance matrix or expected returns change slightly (e.g., on a daily rebalancing cycle), the active set from the previous day’s solution is usually near-optimal for today’s problem. The active set method can be initialised with this warm start and typically converges in just a few pivots, whereas interior-point methods must restart from a central point and cannot exploit a warm start effectively.
Parametric continuation. The critical line algorithm (Section 6.1) traces the entire efficient frontier by parametrically varying the target return \(\bar{R}\), updating the active set at each breakpoint. This produces the complete frontier in a single pass — an inherently active-set paradigm. Interior-point methods, solved independently at each \(\bar{R}\), would require many separate runs.
Structure exploitation. In portfolio QPs, the non-negativity constraints \(w_i \geq 0\) are simple bounds that the active set method handles very efficiently (adding or removing a single constraint per pivot), whereas interior-point methods treat all constraints uniformly and their per-iteration cost scales with the total number of constraints.
For very large universes (\(n > 10{,}000\)) or when the problem has complex non-linear constraints, interior-point methods become preferable. But for the moderate-scale portfolio QPs typical of institutional practice, the active set method remains the algorithm of choice.
4.4 Transaction Costs in the QP Framework
When the portfolio manager moves from current holdings \(w_0\) to new holdings \(w\), transaction costs alter the structure of the QP and invalidate the clean two-fund theorem. A particularly clean formulation uses a turnover constraint rather than a cost term in the objective.
The turnover-constrained mean-variance QP is:
\[ \min_w \; w^T\Sigma w \quad \text{subject to} \quad \mu^Tw \geq \bar{R},\quad e^Tw=1,\quad \tfrac{1}{2}\|w-w_0\|_1 \leq \tau. \]The \(\ell_1\) turnover constraint is linearised by introducing split variables \(u_i^+, u_i^- \geq 0\) with \(w_i - (w_0)_i = u_i^+ - u_i^-\):
\[ \min_{w,u^+,u^-} \; w^T\Sigma w \quad \text{s.t.} \quad \mu^Tw \geq \bar{R},\; e^Tw=1,\; \sum_i(u_i^++u_i^-)\leq 2\tau,\; w_i=(w_0)_i+u_i^+-u_i^-,\; u_i^\pm\geq 0. \]This is a convex QP in \((w, u^+, u^-)\) with \(3n\) variables and \(O(n)\) constraints, directly solvable by any QP solver.
Effect on the KKT system. The turnover constraint introduces an additional multiplier \(\nu \geq 0\) for the inequality \(\sum_i(u_i^++u_i^-)\leq 2\tau\), and the complementary slackness condition \(\nu(\sum_i(u_i^++u_i^-) - 2\tau) = 0\) determines whether the turnover constraint binds. When \(\nu > 0\) (the constraint is active), the optimal \(w\) is “pulled” toward \(w_0\): the manager cannot move as far from the current portfolio as the unconstrained problem would dictate. The efficient frontier in \((\sigma_p, \mu_p)\) space is then parameterised jointly by \((\bar{R}, \tau)\), and the reachable region is a two-dimensional object rather than the one-dimensional curve of classical mean-variance theory.
Chapter 5: Numerical Linear Algebra Methods
The KKT systems arising in portfolio QPs must be solved reliably and efficiently. For portfolios of hundreds or thousands of assets, naive approaches fail due to poor conditioning or excessive arithmetic. This chapter presents the null-space method and the range-space method — two structural approaches that exploit the problem geometry.
5.1 Null-Space Method for Equality-Constrained QP
Consider the equality-constrained QP: minimise \(\frac{1}{2} x^T Q x + c^T x\) subject to \(Ax = b\), where \(A \in \mathbb{R}^{m \times n}\) with \(m < n\) and full row rank. The feasible set is an affine subspace of dimension \(n - m\).
Idea: Decompose \(\mathbb{R}^n = \mathcal{R}(A^T) \oplus \mathcal{N}(A)\). Any feasible point can be written as \(x = x_p + Z u\), where \(x_p\) is a particular solution (\(A x_p = b\)) and the columns of \(Z \in \mathbb{R}^{n \times (n-m)}\) form a basis for the null space of \(A\). Substituting into the objective:
\[ f(x_p + Zu) = \frac{1}{2} u^T (Z^T Q Z) u + (Q x_p + c)^T Z u + \text{const}. \]This is an unconstrained QP in \(u \in \mathbb{R}^{n-m}\). The reduced Hessian \(Z^T Q Z\) is positive definite (under our assumptions), so the solution is \(u^* = -(Z^T Q Z)^{-1} Z^T (Q x_p + c)\) and \(x^* = x_p + Z u^*\).
A natural choice for \(Z\) is obtained from the QR factorization of \(A^T\). If \(A^T = Q_1 R^T\) (writing the QR of \(A^T\) as \([Q_1 \; Q_2] \begin{bmatrix} R \\ 0 \end{bmatrix}\)), then the columns of \(Q_2\) form an orthonormal basis for \(\mathcal{N}(A)\). This “orthonormal null space” choice minimises round-off error in the reduced Hessian computation. In the portfolio context, \(A = [e^T; \mu^T]\) (a \(2 \times n\) matrix), and the null space has dimension \(n - 2\), so \(Z \in \mathbb{R}^{n \times (n-2)}\).
where \(Q_1\in\mathbb{R}^{3\times2}\), \(Q_2\in\mathbb{R}^{3\times1}\), \(R\in\mathbb{R}^{2\times2}\).
\[ q_1 = \frac{1}{\sqrt{3}}(1,1,1)^T. \]\[ q_1^T a_2 = \frac{0.10+0.12+0.07}{\sqrt{3}} = \frac{0.29}{\sqrt{3}}, \quad a_2' = a_2 - \frac{0.29}{\sqrt{3}}q_1 = \begin{pmatrix}0.10\\0.12\\0.07\end{pmatrix} - \frac{0.29}{3}\begin{pmatrix}1\\1\\1\end{pmatrix} = \begin{pmatrix}-0.00\overline{3}\\0.02\overline{3}\\-0.02\overline{6}\end{pmatrix}. \]Normalising gives \(q_2 = a_2'/\|a_2'\|\).
\[ Z = q_3 \approx \begin{pmatrix}0.577\\-0.816\\0.000\end{pmatrix} \](up to sign and normalisation). One checks: \(e^T Z = 0.577 - 0.816 + 0.000 \approx -0.239\), which should be zero — indicating that the exact null vector requires the precise computation above.
Reduced problem. Any feasible weight vector is written \(w = w_p + Zu\) where \(w_p\) satisfies \(Aw_p = (1, \bar{R})^T\) and \(u \in \mathbb{R}\) is a scalar. The reduced Hessian is \(Z^T\Sigma Z \in \mathbb{R}^{1\times1}\) — a positive scalar — and the unconstrained minimiser in \(u\) is \(u^* = -(Z^T\Sigma Z)^{-1}Z^T(\Sigma w_p + c)\), giving the optimal weights directly.
5.2 Range-Space Method and the Schur Complement
An alternative is the range-space method, which solves the KKT system directly but exploits the structure of \(Q\). If \(Q\) is easily invertible (e.g., diagonal or block-diagonal, which can arise in factor-model approximations of \(\Sigma\)), one eliminates \(x\) from the stationarity equation to get:
\[ A Q^{-1} A^T \lambda = A Q^{-1} c + b. \]The \(m \times m\) matrix \(M = A Q^{-1} A^T\) is the Schur complement of \(Q\) in the KKT system, and it is positive definite. Solving for \(\lambda\) and back-substituting for \(x\) requires only \(m \times m\) linear systems rather than \((n+m) \times (n+m)\). For portfolio problems where \(m = 2\), this reduces to solving a \(2 \times 2\) system — trivial arithmetic.
5.3 Conditioning, Normal Equations, and Sparsity
The normal equations approach to least-squares (\(A^T A x = A^T b\)) is algebraically correct but numerically inferior: the condition number satisfies \(\kappa(A^T A) = \kappa(A)^2\). In contrast, solving via QR gives \(\kappa(R) = \kappa(A)\). For large, ill-conditioned portfolio problems (e.g., assets with nearly collinear returns), the QR-based approach can mean the difference between a reliable solution and numerical garbage.
For institutional portfolios with \(n\) in the thousands, sparsity becomes critical. If the covariance matrix \(\Sigma\) is given in a factor-model form \(\Sigma = B F B^T + D\) (where \(B \in \mathbb{R}^{n \times k}\) with \(k \ll n\) and \(D\) is diagonal), then the Woodbury matrix identity allows \(\Sigma^{-1}\) to be computed in \(O(nk^2)\) rather than \(O(n^3)\) time:
\[ \Sigma^{-1} = D^{-1} - D^{-1} B (F^{-1} + B^T D^{-1} B)^{-1} B^T D^{-1}. \]This is crucial for practical implementations in MATLAB when \(n\) is large.
Rather than inverting the full \(4\times4\) matrix \(\Sigma\), we apply the Woodbury identity with \(B = b\), \(F = 1\):
Step 1: Compute \(D^{-1} = \mathrm{diag}(25, 33.\overline{3}, 50, 20)\).
\[ = 16 + 12 + 12.5 + 3.2 = 43.7. \]Step 3: The scalar Schur complement is \(F^{-1} + B^TD^{-1}B = 1 + 43.7 = 44.7\).
\[ \Sigma^{-1} = D^{-1} - D^{-1}b\,(44.7)^{-1}\,b^TD^{-1}. \]The rank-1 correction \(D^{-1}b\,(44.7)^{-1}b^TD^{-1}\) requires only \(O(n)\) multiplications (since it is an outer product scaled by a scalar), while a naive \(4\times4\) inversion requires \(O(n^3) = O(64)\) operations. For \(n=10{,}000\) and \(k=5\) factors, the Woodbury formula reduces the cost from \(O(10^{12})\) to \(O(nk^2) = O(5\times10^8)\) — a speed-up of thousands.
Numerically, the \((1,1)\) entry of \(\Sigma^{-1}\) via Woodbury is:
\[ (\Sigma^{-1})_{11} = D_{11}^{-1} - (D^{-1}b)_1^2/(44.7) = 25 - (25\times0.8)^2/44.7 = 25 - 400/44.7 \approx 25 - 8.95 = 16.05. \]Chapter 6: Active Set and Parametric Methods
The active set method of Chapter 4 can be adapted to efficiently trace the entire efficient frontier. Rather than solving a fresh QP for each value of the target return \(\bar{R}\), parametric programming exploits the fact that the optimal solution changes smoothly — in fact, piecewise linearly — as \(\bar{R}\) varies. This is the basis of Markowitz’s critical line algorithm, one of the most elegant contributions to computational finance.
6.1 Parametric Quadratic Programming
Consider Formulation I as \(\bar{R}\) varies continuously from \(-\infty\) to \(+\infty\). At any optimal solution, the active set of inequality constraints (when present, e.g., \(w \geq 0\)) is constant over intervals of \(\bar{R}\) called parameter intervals or segments. At the boundary between two segments, a breakpoint occurs: one asset enters or leaves the active set (i.e., one asset transitions between \(w_i = 0\) and \(w_i > 0\)).
Within each segment, the optimal weights \(w^*(\bar{R})\) are linear functions of \(\bar{R}\) (since the KKT system restricted to a fixed active set is linear in \(\bar{R}\)). Thus the entire efficient frontier is traced by solving a sequence of equality-constrained QPs, updating the active set at each breakpoint.
6.2 Connection to the Simplex Method
The critical line algorithm has a structural parallel with the simplex method in linear programming. In LP, the simplex method moves along the edges of the feasible polytope from vertex to vertex, with each step changing one variable from basic to nonbasic (or vice versa). In parametric QP, the algorithm moves along edges of the optimal face as the parameter \(\bar{R}\) changes, with each step adding or removing one asset from the active set. The piecewise-linear structure of the efficient frontier in weight space mirrors the piecewise-linear structure of the LP optimal solution path under parametric right-hand-side variation.
6.3 Practical Computation of the Efficient Frontier
In MATLAB, the efficient frontier can be computed using the quadprog function for each discretised value of \(\bar{R}\), or more elegantly via a dedicated implementation of the critical line algorithm. The key steps are:
- Initialise at the maximum-return portfolio (which has only one non-zero weight, the asset with the highest \(\mu_i\)).
- Compute the direction of change \(dw/d\bar{R}\) from the current active set KKT system.
- Find the next breakpoint by checking which inactive asset first becomes attractive (enters) or which active asset first reaches its bound (exits).
- Update the active set and repeat.
The output is a sequence of \((w^*, \sigma_p^*, \mu_p^*)\) triples at each breakpoint, from which the frontier hyperbola can be drawn by interpolation.
6.4 Sensitivity Analysis and Shadow Prices
A powerful by-product of the parametric active set method is a complete sensitivity analysis of the portfolio to changes in inputs. The Lagrange multipliers from the KKT system at each breakpoint are the shadow prices of the constraints:
The multiplier \(\gamma^*\) for the return constraint \(\mu^Tw = \bar{R}\) measures the marginal cost (in variance) of increasing the target return by one unit. Formally, \(\partial \sigma_p^{*2}/\partial \bar{R} = \gamma^*\). A large \(\gamma^*\) means the investor is paying dearly in variance for a small increment in expected return — a signal that the target return is near the top of the efficient frontier.
The multiplier for the budget constraint \(e^Tw = 1\) similarly measures the marginal value of relaxing the fully-invested constraint (e.g., allowing some cash holding).
The multipliers \(\mu_i^*\) for the non-negativity constraints \(w_i \geq 0\) are zero whenever asset \(i\) is held with positive weight (by complementary slackness) and negative when the constraint binds (\(w_i^* = 0\)). A large negative \(\mu_i^*\) indicates that asset \(i\) would substantially improve the portfolio if short selling were permitted; it is the “shadow cost” of the short-selling ban.
6.5 Warm-Starting and Reoptimisation
A key practical advantage of the active set method is that it can be warm-started: if the optimal active set for yesterday’s portfolio problem is known, it can be used as the initial working set for today’s problem (which differs only in updated \(\mu\) or \(\Sigma\)). Under mild perturbations, the optimal active set is unchanged, and the algorithm converges in zero or one pivot. This makes parametric active set methods dramatically faster than interior-point methods in rolling-window rebalancing applications, where the optimisation problem changes smoothly from day to day.
In institutional practice, a fund with 500 assets might rebalance daily. Running a full interior-point optimisation from scratch each day takes several seconds; warm-starting the active set method from the previous day’s solution takes milliseconds — a \(100\times\) speedup that matters enormously in production trading systems with strict latency requirements.
Chapter 7: Risk Measures — VaR and CVaR
Mean-variance analysis measures risk by the variance of portfolio returns. While elegant and computationally tractable, variance treats upside and downside deviations symmetrically — a property that fails to capture the investor’s primary concern, which is the risk of catastrophic loss. This chapter introduces two downside risk measures — Value at Risk and Conditional Value at Risk — that better reflect tail risk and are ubiquitous in regulatory and institutional finance.
7.1 Value at Risk
Intuitively, \(\mathrm{VaR}_{0.95} = 2\%\) means that with 95% probability, the portfolio does not lose more than 2% of its value over the period. VaR is the industry standard risk metric mandated by the Basel Accords for bank capital requirements.
Despite its popularity, VaR has a critical theoretical flaw: it is not subadditive. That is, \(\mathrm{VaR}(L_1 + L_2)\) can exceed \(\mathrm{VaR}(L_1) + \mathrm{VaR}(L_2)\), meaning that VaR can penalise diversification rather than reward it. This violates the axiomatic definition of a coherent risk measure (Artzner et al., 1999), which requires translation invariance, positive homogeneity, monotonicity, and subadditivity.
7.2 Conditional Value at Risk
CVaR captures the severity of losses in the tail, not just the threshold. It is always at least as large as \(\mathrm{VaR}_\alpha\), and it is a coherent risk measure — subadditivity means that CVaR rewards diversification, making it theoretically superior for portfolio optimisation.
7.3 Rockafellar–Uryasev Formula and LP Reformulation
The key breakthrough for portfolio CVaR minimisation is due to Rockafellar and Uryasev (2000). They showed that CVaR admits the following variational representation:
This is remarkable because it converts a quantile-based risk measure into a smooth (in fact, convex) optimisation problem. The minimiser in \(\gamma\) is precisely \(\mathrm{VaR}_\alpha(w)\), so the formula simultaneously computes VaR and CVaR.
Now suppose we have \(T\) historical (or simulated) scenarios \(\xi^1, \ldots, \xi^T\), giving portfolio losses \(L_t = L(w, \xi^t)\). The sample CVaR is
\[ \widehat{\mathrm{CVaR}}_\alpha(w) = \min_{\gamma} \left\{ \gamma + \frac{1}{(1-\alpha)T} \sum_{t=1}^T \max(L_t - \gamma, 0) \right\}. \]Introducing auxiliary variables \(z_t \geq 0\) with \(z_t \geq L_t - \gamma\), this becomes:
\[ \min_{w, \gamma, z} \quad \gamma + \frac{1}{(1-\alpha)T} \sum_{t=1}^T z_t \]\[ \text{subject to} \quad z_t \geq -r^t{}^T w - \gamma, \quad z_t \geq 0, \quad e^T w = 1, \]where \(r^t\) is the return vector under scenario \(t\) (so \(L_t = -r^t{}^T w\)). This is a linear programme with \(T + n + 1\) variables and \(2T + 1\) constraints — directly solvable by any LP solver. For \(T = 1000\) scenarios and \(n = 100\) assets, this is a modest LP easily handled in MATLAB.
The CVaR objective can also be combined with a return target or other constraints, giving a complete LP-based portfolio optimisation framework that is arguably more robust than mean-variance when the return distribution has heavy tails or is asymmetric.
7.4 Analytical VaR and CVaR for a Normal Portfolio
When portfolio returns are normally distributed, VaR and CVaR admit closed-form expressions. Let the portfolio return \(r_p \sim \mathcal{N}(\mu_p, \sigma_p^2)\), so the loss \(L = -r_p \sim \mathcal{N}(-\mu_p, \sigma_p^2)\).
At \(\alpha = 0.95\), the standard normal quantile is \(z_{0.95} = 1.645\) and the density value is \(\phi(1.645) \approx 0.1031\).
\[ \mathrm{VaR}_{0.95} = -\mu_p + z_{0.95}\sigma_p = -0.10 + 1.645\times0.20 = -0.10 + 0.329 = 0.229. \]So there is a 5% chance of losing more than 22.9% of portfolio value.
\[ \mathrm{CVaR}_{0.95} = -\mu_p + \sigma_p\frac{\phi(z_{0.95})}{1-\alpha} = -0.10 + 0.20\times\frac{0.1031}{0.05} = -0.10 + 0.20\times2.062 = -0.10 + 0.4124 = 0.3124. \]Conditional on being in the worst 5% of outcomes, the expected loss is 31.2% — substantially larger than the VaR threshold of 22.9%. This gap \(\mathrm{CVaR}_{0.95} - \mathrm{VaR}_{0.95} = 0.0834\) reflects the weight of the tail beyond the 95th percentile.
Remark on units. These are annualised fractional losses. To convert to a daily VaR under the i.i.d. assumption, divide \(\sigma_p\) by \(\sqrt{252}\) before applying the formula.
- CVaR is convex in \(w\). For any \(\lambda \in [0,1]\) and portfolios \(w_1, w_2\): \[\mathrm{CVaR}_\alpha(\lambda w_1 + (1-\lambda)w_2) \leq \lambda\,\mathrm{CVaR}_\alpha(w_1) + (1-\lambda)\,\mathrm{CVaR}_\alpha(w_2).\] This is the subadditivity property: diversification never increases CVaR.
- CVaR is monotone, positively homogeneous, and translation-invariant, completing its status as a coherent risk measure in the sense of Artzner–Delbaen–Eber–Heath (1999).
- VaR is not subadditive in general. There exist portfolios \(w_1, w_2\) and a loss distribution such that \(\mathrm{VaR}_\alpha(w_1+w_2) > \mathrm{VaR}_\alpha(w_1) + \mathrm{VaR}_\alpha(w_2)\).
7.5 Factor Models and Structured Covariance
Factor models reduce the dimensionality of portfolio optimisation by decomposing asset covariances into a small number of common factors plus idiosyncratic noise. This section connects the factor model structure of Chapter 10.3 to the efficient frontier computation.
The Single-Factor (CAPM) Model
The simplest factor model is the CAPM model with one systematic factor (the market):
\[ r_i = \alpha_i + \beta_i r_M + \varepsilon_i, \quad \mathrm{Cov}(\varepsilon_i, \varepsilon_j) = 0 \text{ for } i \neq j. \]This implies
\[ \Sigma = \beta\beta^T\sigma_M^2 + D, \]where \(\beta = (\beta_1, \ldots, \beta_n)^T\) is the vector of market betas and \(D = \mathrm{diag}(\sigma_{\varepsilon_1}^2, \ldots, \sigma_{\varepsilon_n}^2)\) collects the idiosyncratic variances. This is exactly the Woodbury form \(\Sigma = BFB^T + D\) with \(B = \beta\) and \(F = \sigma_M^2\).
Consequences for efficient frontier computation. Under this factor structure:
\(\Sigma^{-1}\) can be computed in \(O(n)\) time (since \(k=1\)) via the Woodbury formula, reducing from the standard \(O(n^3)\) to essentially \(O(n)\).
- \[
\Sigma^{-1}e = D^{-1}e - \frac{(D^{-1}\beta)({\beta^TD^{-1}e})}{\sigma_M^{-2} + \beta^TD^{-1}\beta},
\]
a rank-1 update of the simple diagonal solution. In component form: the GMVP weight of asset \(i\) is approximately \(1/\sigma_{\varepsilon_i}^2\) (normalised), adjusted for the beta loading. Assets with small idiosyncratic variance receive larger GMVP weights.
The entire efficient frontier can be computed without ever forming or storing the full \(n\times n\) matrix \(\Sigma\) — only the \(n\)-dimensional vectors \(\beta, D, e, \mu\) and the scalar \(\sigma_M^2\) are needed. This enables portfolio optimisation at scales of tens of thousands of assets in real time.
Chapter 8: Transaction Costs and Rebalancing
Portfolio theory as developed in Chapters 2–7 treats portfolio selection as a static, one-shot problem. In practice, portfolios must be periodically rebalanced as asset prices move, new capital flows in or out, or the investor’s risk preferences change. Rebalancing incurs transaction costs — bid-ask spreads, market impact, commissions, and taxes — that must be accounted for in the optimisation. Ignoring transaction costs can lead to excessive turnover and poor out-of-sample performance.
8.1 Single-Period Rebalancing with L1 Transaction Costs
Suppose the investor currently holds weights \(w_0 \in \mathbb{R}^n\) and wishes to move to new weights \(w\) (with \(e^T w = 1\)). The vector of trades is \(\Delta w = w - w_0\). Under proportional transaction costs (a fraction \(\kappa\) of the value traded), the total cost is \(\kappa \|w - w_0\|_1 = \kappa \sum_i |w_i - (w_0)_i|\).
The rebalancing problem is:
\[ \min_w \; w^T \Sigma w - \mu^T w + \kappa \|w - w_0\|_1 \quad \text{subject to} \quad e^T w = 1. \]The \(\ell_1\) norm is convex but non-differentiable. The standard linearisation introduces auxiliary variables \(u_i \geq 0\) via the substitution \(w_i - (w_0)_i = u_i^+ - u_i^-\) with \(u_i^+, u_i^- \geq 0\):
\[ \|w - w_0\|_1 = \sum_i (u_i^+ + u_i^-), \]subject to \(w_i = (w_0)_i + u_i^+ - u_i^-\). This converts the problem into a standard QP (with objective \(w^T \Sigma w - \mu^T w + \kappa \sum_i (u_i^+ + u_i^-)\), linear constraints, and non-negativity on \(u_i^\pm\)). MATLAB’s quadprog handles this directly.
8.2 Turnover Constraints
Rather than penalising transaction costs in the objective, one can impose a hard turnover constraint:
\[ \frac{1}{2}\|w - w_0\|_1 \leq \tau, \]where \(\tau > 0\) is the maximum fraction of the portfolio that may be traded. Using the same \(u^\pm\) substitution, this becomes a linear constraint \(\sum_i (u_i^+ + u_i^-) \leq 2\tau\) added to the QP. In institutional investment management, turnover constraints are ubiquitous: trading desks impose them to limit market impact and operational risk.
8.3 Multi-Period Rebalancing and Dynamic Programming
If the investor rebalances at times \(t = 0, 1, \ldots, T-1\) with known return distributions, the multi-period problem is a stochastic dynamic programme. Under the mean-variance criterion, the Bellman equation at time \(t\) with current wealth \(W_t\) and portfolio weights \(w_t\) is:
\[ V_t(W_t, w_t) = \max_{w_{t+1}} \left\{ \mu^T w_{t+1} \cdot W_t - \frac{\lambda}{2} w_{t+1}^T \Sigma w_{t+1} \cdot W_t^2 - \kappa \|w_{t+1} - w_t\|_1 \cdot W_t + V_{t+1}(\cdot) \right\}. \]For general distributions, this must be solved numerically (by discretising the state space or by Monte Carlo methods). Under special assumptions (e.g., i.i.d. returns, quadratic utility), closed-form solutions exist and reveal that the optimal policy is to rebalance toward a “target portfolio” that lies between the static efficient portfolio and the current holdings, with the degree of rebalancing decreasing as transaction costs increase.
8.4 Tax-Loss Harvesting
A secondary consideration in taxable accounts is tax-loss harvesting: realising capital losses to offset taxable gains elsewhere. An asset with an embedded loss (current price below purchase price) has a “tax benefit” from being sold. This can be incorporated into the rebalancing objective as a negative transaction cost for loss-generating trades. The resulting optimisation is a QP with asymmetric cost terms (\(u_i^+\) and \(u_i^-\) have different coefficients), which is handled naturally within the \(u^\pm\) reformulation.
Chapter 9: Nonlinear Programming
The portfolio problems considered so far are quadratic or linear programmes — a special, computationally tractable class. Several important portfolio objectives and constraints lead to nonlinear programmes (NLPs) that require more general algorithms. This chapter surveys the theoretical foundations and two major algorithmic families: sequential quadratic programming and interior-point methods.
9.1 General NLP and KKT Conditions
The general NLP is:
\[ \min_{x \in \mathbb{R}^n} \; f(x) \quad \text{subject to} \quad g_i(x) \leq 0, \; i = 1,\ldots,m, \quad h_j(x) = 0, \; j = 1,\ldots,p. \]We assume \(f\), \(g_i\), and \(h_j\) are smooth (\(C^2\)). The KKT conditions are necessary for local optimality under a constraint qualification (e.g., LICQ — linear independence of the active constraint gradients):
\[ \nabla f(x^*) + \sum_i \mu_i \nabla g_i(x^*) + \sum_j \lambda_j \nabla h_j(x^*) = 0, \]\[ \mu_i \geq 0, \quad \mu_i g_i(x^*) = 0 \quad \forall i, \quad h_j(x^*) = 0 \quad \forall j. \]The second-order sufficient conditions require that the reduced Hessian of the Lagrangian is positive definite on the tangent space of the active constraints. When these hold, \(x^*\) is a strict local minimum.
9.2 Sequential Quadratic Programming
Sequential Quadratic Programming (SQP) is the state-of-the-art algorithm for smooth NLPs. The idea is to approximate the NLP locally by a QP at each iterate and solve the QP to get a search direction.
Given current iterate \(x_k\), the QP subproblem is:
\[ \min_{d \in \mathbb{R}^n} \; \frac{1}{2} d^T W_k d + \nabla f(x_k)^T d \quad \text{subject to} \quad g_i(x_k) + \nabla g_i(x_k)^T d \leq 0, \quad h_j(x_k) + \nabla h_j(x_k)^T d = 0, \]where \(W_k\) is an approximation to the Hessian of the Lagrangian \(\nabla^2_{xx} \mathcal{L}(x_k, \lambda_k, \mu_k)\). The QP solution \(d_k\) gives the search direction; a line search along \(d_k\) ensures sufficient decrease in a merit function. The multipliers are updated from the QP solution. Under regularity conditions, SQP converges superlinearly (or even quadratically with exact Hessians).
SQP applies directly to portfolio problems with nonlinear objectives such as Sharpe ratio maximisation. The Sharpe ratio \(S(w) = (\mu^T w - r_f) / \sqrt{w^T \Sigma w}\) is a smooth, nonlinear, non-convex function of \(w\) on the feasible set. Its Hessian is dense, and the risk parity problem (equal risk contribution) also leads to a non-convex NLP. SQP handles both.
9.3 Interior-Point Methods
Interior-point methods (also called barrier methods) convert inequality constraints into the objective via a logarithmic barrier:
\[ \min_x \; f(x) - \mu \sum_i \ln(-g_i(x)) \quad \text{subject to} \quad h(x) = 0, \]where \(\mu > 0\) is the barrier parameter. As \(\mu \to 0\), the minimiser of the barrier problem converges to the solution of the original NLP. For each fixed \(\mu\), Newton’s method (augmented to handle equality constraints via the KKT system) is applied to the barrier subproblem, requiring only a few Newton steps before decreasing \(\mu\).
Interior-point methods have several advantages: they are efficient for large, dense NLPs; they do not require an active-set strategy; and their per-iteration cost is dominated by solving one large linear system (typically factored by sparse Cholesky). The MATLAB fmincon function implements both SQP and interior-point methods and is the standard tool for nonlinear portfolio optimisation in this course.
9.4 Portfolio Applications
The tracking error objective \(\min_w (w - w_b)^T \Sigma (w - w_b)\) (where \(w_b\) is a benchmark portfolio) is a QP, but tracking error subject to a Sharpe ratio constraint is nonlinear. The omega ratio (probability-weighted ratio of gains to losses relative to a threshold) is another nonlinear performance metric that can be optimised using SQP. Sharpe ratio maximisation with additional risk constraints (e.g., maximum drawdown, skewness constraints) leads to NLPs that are best handled by interior-point methods for large universes.
Chapter 10: Advanced Topics
The final chapter surveys several advanced topics in portfolio management, connecting the optimisation framework developed in earlier chapters to broader questions in financial theory, statistics, and machine learning. These topics reflect the current research frontier and are increasingly important in institutional practice.
10.1 No-Arbitrage and Risk-Neutral Pricing
A fundamental concept in modern finance is no-arbitrage: the assumption that there exists no portfolio that yields a non-negative return in all states of the world and a strictly positive return in at least one state, with zero initial cost. The first and second fundamental theorems of asset pricing establish that (under mild regularity conditions) absence of arbitrage is equivalent to the existence of a risk-neutral probability measure \(\mathbb{Q}\) under which all discounted asset prices are martingales. While the full machinery of risk-neutral pricing belongs to a course in mathematical finance, the no-arbitrage condition places structural constraints on the expected return vector \(\mu\) and covariance matrix \(\Sigma\) that inform portfolio construction.
10.2 Universal Portfolio
Cover (1991) introduced the universal portfolio algorithm, a remarkable on-line investment strategy that requires no statistical model of returns. The algorithm maintains a distribution over all constant-rebalanced portfolios (CRPs) and at each period invests according to the wealth-weighted average of all CRPs. It achieves the asymptotic wealth growth rate of the best CRP in hindsight:
\[ \frac{1}{T} \ln \frac{W_T^{\mathrm{universal}}}{W_T^{\mathrm{best CRP}}} \to 0 \quad \text{as } T \to \infty. \]This is a minimax regret guarantee that holds for any sequence of price relatives, without probabilistic assumptions. The universal portfolio is a beautiful application of information-theoretic ideas to portfolio management and connects to the Kelly criterion for log-optimal investment.
10.3 Factor Models and Structured Covariance
In practice, estimating the full \(n \times n\) covariance matrix \(\Sigma\) from historical data requires \(O(n^2)\) parameters, but with \(T < n\) observations the sample covariance matrix is singular. Factor models address this by positing that returns are driven by a small number \(k \ll n\) of common factors:
\[ r = \mu + B f + \epsilon, \]where \(f \in \mathbb{R}^k\) are the factors (with mean zero and covariance \(F\)), \(B \in \mathbb{R}^{n \times k}\) is the factor loading matrix, and \(\epsilon \in \mathbb{R}^n\) is idiosyncratic noise with \(\mathrm{Cov}(\epsilon) = D\) diagonal. The implied covariance matrix is \(\Sigma = B F B^T + D\), which has only \(nk + n + k(k+1)/2\) free parameters — far fewer than \(n(n+1)/2\) for the full matrix.
The CAPM (\(k=1\)) and Fama–French 3-factor model (\(k=3\)) are the canonical examples. The factor model structure also implies conditional independence: given the factor realisations \(f\), the asset returns \(r_i\) and \(r_j\) are independent. This is a strong but empirically useful constraint that improves covariance estimation dramatically.
10.4 Risk Parity
Risk parity (also called equal risk contribution) portfolios allocate capital so that each asset contributes equally to total portfolio variance. The risk contribution of asset \(i\) is defined as
\[ RC_i(w) = w_i \cdot \frac{\partial \sigma_p}{\partial w_i} = w_i \cdot \frac{(\Sigma w)_i}{\sqrt{w^T \Sigma w}}. \]Risk parity requires \(RC_i(w) = \sigma_p / n\) for all \(i\), which after simplification gives the nonlinear system:
\[ w_i (\Sigma w)_i = w_j (\Sigma w)_j \quad \forall i, j. \]This system is non-convex and cannot be reduced to a QP in general. SQP or alternating projections are used to solve it. In the special case where all pairwise correlations are equal (and all variances are equal), the risk parity portfolio reduces to the equally-weighted portfolio \(w = e/n\). Risk parity portfolios became popular after the 2008 financial crisis because they tend to be less sensitive to equity drawdowns than mean-variance efficient portfolios.
10.5 Machine Learning in Portfolio Optimization
Machine learning intersects portfolio optimization at several levels:
Regularised covariance estimation: The Ledoit–Wolf shrinkage estimator improves on the sample covariance matrix by shrinking it toward a structured target (e.g., a diagonal matrix or the identity):
\[ \hat{\Sigma}_{\mathrm{LW}} = (1 - \rho) \hat{\Sigma}_{\mathrm{sample}} + \rho \hat{\Sigma}_{\mathrm{target}}. \]The shrinkage coefficient \(\rho\) is chosen optimally (in a mean-squared-error sense) by an analytical formula derived from random matrix theory. Ledoit–Wolf estimation significantly improves the out-of-sample performance of mean-variance portfolios, especially when \(n\) is large relative to \(T\).
Deep learning for return prediction: Recurrent neural networks (LSTMs, Transformers) trained on financial time series can forecast expected returns \(\hat{\mu}\) more accurately than linear models in some regimes. However, the estimation uncertainty in \(\hat{\mu}\) propagates into the portfolio weights, and robust or Bayesian portfolio formulations are needed to avoid over-fitting.
Reinforcement learning for dynamic allocation: The portfolio rebalancing problem of Chapter 8 can be framed as a Markov decision process (MDP) and solved by reinforcement learning (RL) algorithms. The state is the current portfolio and market features; the action is the rebalancing decision; the reward is the risk-adjusted return net of transaction costs. Deep RL (e.g., Proximal Policy Optimization) has shown promise in multi-period portfolio problems where the dynamics are complex and the horizon is long.
These machine learning approaches are not replacements for the optimization models developed in this course — rather, they are tools for estimating the inputs (\(\mu\), \(\Sigma\)) and solving the resulting optimization problems at scale. A rigorous grounding in quadratic programming, KKT theory, and numerical linear algebra remains essential for understanding and deploying these methods reliably.
10.6 Robust Portfolio Optimisation
A recurring criticism of mean-variance optimisation is that the optimal weights are extremely sensitive to small errors in the expected return vector \(\mu\). Indeed, the solution \(w^* = \Sigma^{-1}(\gamma\mu + \delta e)/2\) is a linear function of \(\mu\), but with coefficient matrix \(\Sigma^{-1}\) whose condition number can be very large — small perturbations to \(\mu\) are amplified by \(\kappa(\Sigma)\) in the weights. This is the fundamental instability of Markowitz optimisation in practice.
Robust optimisation addresses this by explicitly modelling uncertainty in the inputs. Suppose \(\mu\) is known only to lie in an uncertainty set \(\mathcal{U}\), for instance an ellipsoidal set:
\[ \mathcal{U} = \{\mu : (\mu - \hat\mu)^T\Theta^{-1}(\mu - \hat\mu) \leq \kappa^2\}, \]where \(\hat\mu\) is the estimated mean and \(\Theta\) is the estimation covariance (e.g., \(\Theta = \hat\Sigma/T\)). The robust portfolio maximises the worst-case risk-adjusted return:
\[ \max_w \min_{\mu\in\mathcal{U}} \bigl[\mu^Tw - \tfrac\lambda2 w^T\Sigma w\bigr] = \max_w \bigl[\hat\mu^Tw - \kappa\sqrt{w^T\Theta w} - \tfrac\lambda2 w^T\Sigma w\bigr]. \]The inner minimisation over \(\mu\) has a closed form (by Cauchy-Schwarz on the ellipsoid), yielding a second-order cone programme (SOCP). This is convex and can be solved efficiently; the robust solution typically has a more diversified structure than the nominal Markowitz solution, because concentrating in a single asset creates large exposure to estimation error in that asset’s expected return.
The parameter \(\kappa\) controls the degree of robustness: \(\kappa = 0\) recovers the nominal Markowitz solution, while large \(\kappa\) approaches the minimum-variance portfolio (which is robust to \(\mu\) estimation error since it does not depend on \(\mu\) at all). In practice, \(\kappa\) is calibrated so that \(\mathcal{U}\) contains the true \(\mu\) with 95% probability, using the chi-squared distribution with \(n\) degrees of freedom.
Chapter 11: The Black–Litterman Model
The Black–Litterman (BL) model, introduced by Fischer Black and Robert Litterman at Goldman Sachs in 1990, resolves a persistent practical failure of mean-variance optimisation: the sensitivity of optimal weights to small errors in estimated expected returns. The core insight is Bayesian: rather than estimating \(\mu\) from historical data alone, the BL model starts from a prior on \(\mu\) derived from the CAPM equilibrium and updates it with the investor’s “views” — subjective forecasts about individual assets or relative performance.
11.1 Equilibrium Returns as a Prior
In the CAPM, if the market portfolio \(w_m\) is on the efficient frontier, then the equilibrium expected excess returns satisfy the pricing equation
\[ \Pi = \delta \Sigma w_m, \]where \(\delta\) is the aggregate risk-aversion coefficient (typically calibrated to historical data, often \(\delta \approx 2.5\) to 4.0) and \(\Sigma\) is the covariance matrix of excess returns. The vector \(\Pi\) provides the prior mean in the BL model. Under the view that the market is in equilibrium, \(\Pi\) is the best available unconditional estimate of \(\mu\) — it encodes the cross-sectional risk structure of the market without requiring time-series estimation of individual means, which is notoriously noisy.
11.2 Investor Views as a Likelihood
An investor’s views are \(k\) statements about the expected returns of portfolios of assets. Each view is expressed as a linear combination of expected returns:
\[ P\mu = q + \varepsilon_v, \qquad \varepsilon_v \sim \mathcal{N}(0, \Omega), \]where \(P \in \mathbb{R}^{k \times n}\) is the pick matrix (each row encodes a view portfolio), \(q \in \mathbb{R}^k\) is the vector of view forecasts, and \(\Omega = \mathrm{diag}(\omega_1, \ldots, \omega_k)\) is the diagonal view uncertainty matrix (smaller \(\omega_i\) means higher confidence in view \(i\)).
Two types of views arise frequently:
- Absolute view: “Asset \(i\) will return 5% per annum.” Here \(P\) has a single 1 in column \(i\) and \(q_i = 0.05\).
- Relative view: “Asset \(i\) will outperform asset \(j\) by 3%.” Here \(P\) has +1 in column \(i\), -1 in column \(j\), and \(q = 0.03\).
11.3 Bayesian Posterior Update
With a Gaussian prior \(\mu \sim \mathcal{N}(\Pi, \tau\Sigma)\) and Gaussian likelihood \(P\mu | \mu \sim \mathcal{N}(q, \Omega)\), the posterior is also Gaussian:
Equivalently, using the matrix inversion lemma,
\[ \hat\mu_{\text{BL}} = \Pi + \tau\Sigma P^T(P\tau\Sigma P^T + \Omega)^{-1}(q - P\Pi). \]The intuition is transparent: \(\hat\mu_{\text{BL}}\) is a weighted average of the prior mean \(\Pi\) and the views \(q\), with weights determined by relative precision. When all view uncertainties \(\omega_i \to \infty\), \(\hat\mu_{\text{BL}} \to \Pi\) (pure CAPM prior). When \(\tau \to 0\) (very sharp prior), the views have minimal influence. The BL posterior mean is then used in place of the historical \(\hat\mu\) to compute the Markowitz optimal portfolio.
11.4 Numerical Example
Step 2: Investor views. View 1 (absolute): “Asset 1 returns 10%”, so row 1 of \(P\) is \((1,0,0)\), \(q_1 = 0.10\). View 2 (relative): “Asset 2 outperforms Asset 3 by 5%”, so row 2 of \(P\) is \((0,1,-1)\), \(q_2 = 0.05\). Confidence: \(\Omega = \mathrm{diag}(0.001, 0.002)\).
Step 3: Posterior mean. The correction term \(q - P\Pi = (0.10 - 0.078,\; 0.05 - (0.150-0.066))^T = (0.022,\; -0.034)^T\). The posterior mean blends the equilibrium prior toward view 1 (raising asset 1’s expected return toward 10%) and away from view 2’s implied difference (0.084 vs. the viewed 0.05).
Step 4: Markowitz optimisation on \(\hat\mu_{\text{BL}}\).} Use \(\hat\mu_{\text{BL}}\) in place of \(\hat\mu\) in the standard portfolio optimisation. The resulting weights are significantly more balanced than naive Markowitz on \(\hat\mu\) and less sensitive to small changes in the views.
Chapter 12: Performance Attribution
Portfolio performance attribution is the analytical framework for decomposing a portfolio’s return relative to a benchmark into components attributable to active decisions: asset allocation, security selection, and interaction effects. The Brinson–Hood–Beebower (BHB) model (1986) is the canonical framework.
12.1 The BHB Attribution Model
Divide the investment universe into \(K\) sectors (e.g., equities, fixed income, real estate). Let:
- \(w_{b,k}\): benchmark weight in sector \(k\)
- \(w_{p,k}\): portfolio weight in sector \(k\)
- \(R_{b,k}\): benchmark return for sector \(k\)
- \(R_{p,k}\): portfolio return for sector \(k\)
- \(R_b = \sum_k w_{b,k} R_{b,k}\): total benchmark return
The total active return is \(R_p - R_b = \sum_k (w_{p,k}R_{p,k} - w_{b,k}R_{b,k})\). BHB decomposes this as:
- Allocation effect: profit from overweighting sectors that outperformed (tilting \(w_p\) away from \(w_b\) toward sectors with \(R_{b,k} > R_b\)).
- Selection effect: profit from picking securities within a sector better than the benchmark.
- Interaction effect: combined effect of both allocation and selection (often small in practice and sometimes merged with selection).
Benchmark return: \(R_b = 0.6\times8\% + 0.4\times3\% = 6\%\). Portfolio return: \(R_p = 0.7\times9\% + 0.3\times2.5\% = 6.75\%\). Active return: \(0.75\%\).
Allocation effect: \((0.7-0.6)\times8\% + (0.3-0.4)\times3\% = 0.8\% - 0.3\% = 0.5\%\). (Overweighting the outperforming equity sector was beneficial.)
Selection effect: \(0.6\times(9\%-8\%) + 0.4\times(2.5\%-3\%) = 0.6\% - 0.2\% = 0.4\%\). (Better equity selection, worse bond selection; net positive.)
Interaction effect: \((0.1)\times(1\%) + (-0.1)\times(-0.5\%) = 0.1\% + 0.05\% = 0.15\%\).
Check: \(0.5\% + 0.4\% - 0.15\% = 0.75\%\). ✓ (The interaction has a sign that depends on whether the overweighted sector was also the one with positive selection.)
12.2 Risk-Adjusted Performance Metrics
Chapter 13: Empirical Factor Models
The Markowitz framework treats the covariance matrix \(\Sigma\) as a given input, but in practice \(\Sigma\) must be estimated from data. When the number of assets \(n\) approaches or exceeds the number of time-series observations \(T\), the sample covariance matrix is ill-conditioned or singular. Empirical factor models address this by explaining cross-sectional covariation through a small number of observable risk factors, dramatically reducing the number of free parameters.
13.1 Fama–French Three-Factor Model
Fama and French (1992, 1993) showed empirically that the cross-section of expected stock returns is better explained by three factors than by the single market factor of the CAPM:
- MKT: market excess return \(r_m - r_f\) (systematic market risk)
- SMB (Small Minus Big): return of small-cap stocks minus large-cap stocks (size factor)
- HML (High Minus Low): return of value stocks (high book-to-market) minus growth stocks (size factor)
- \(\alpha_i\): Jensen's alpha — the risk-adjusted excess return of stock \(i\) unexplained by the three factors
The three-factor model explains approximately 90–95% of diversified portfolio return variation, versus 70–80% for the single-factor CAPM. The implied covariance matrix is
\[ \Sigma = B F B^T + D, \]where \(B \in \mathbb{R}^{n \times 3}\) is the factor loading matrix, \(F \in \mathbb{R}^{3 \times 3}\) is the factor covariance matrix, and \(D = \mathrm{diag}(\sigma_{\varepsilon_1}^2, \ldots, \sigma_{\varepsilon_n}^2)\) collects idiosyncratic variances. This structured covariance has only \(3n + 3(3+1)/2 = 3n + 6\) free parameters versus \(n(n+1)/2\) for the full matrix.
13.2 Carhart Four-Factor Model
Carhart (1997) extended Fama–French by adding a momentum factor:
Momentum (stocks that went up tend to keep going up over 3–12 months) is one of the most robust and extensively documented anomalies in asset pricing. Its persistence has been attributed to behavioural factors (investor underreaction to information), though momentum strategies are exposed to severe crashes during market reversals (e.g., momentum crashed -70% in 2009).
13.3 Estimating Factor Loadings by OLS
Given \(T\) time-series observations, the loadings \((\alpha_i, \beta_i^{MKT}, \beta_i^{SMB}, \beta_i^{HML})\) are estimated by ordinary least squares for each stock \(i\):
For AAPL over 2019–2023: typical estimates are \(\hat\beta^{MKT} \approx 1.20\) (high systematic risk), \(\hat\beta^{SMB} \approx -0.25\) (large-cap tilt — negative SMB loading), \(\hat\beta^{HML} \approx -0.40\) (growth stock — negative HML loading), \(\hat\alpha \approx 0.004\) per month (outperformance). These loadings connect AAPL to the factor model: its high market beta makes it sensitive to broad market swings, while its negative HML loading reflects the growth premium.
Chapter 14: Covariance Matrix Estimation and Shrinkage
Estimating the covariance matrix \(\Sigma\) from historical return data is one of the most delicate steps in portfolio optimisation. For a universe of \(n\) assets, the sample covariance matrix \(\hat\Sigma\) requires estimating \(n(n+1)/2\) parameters from \(T\) observations. When \(T < n\), the matrix is singular; even when \(T\gg n\), the sample eigenvalues are severely biased (Marčenko–Pastur theory).
14.1 The Random Matrix Theory Perspective
The Marčenko–Pastur law (1967) characterises the limiting eigenvalue distribution of the sample covariance matrix when \(\Sigma = I\) and \(n, T \to \infty\) with \(n/T \to c \in (0,1)\):
For \(c = 0.5\) (half as many observations as assets, common with monthly data), the eigenvalue support is \([(\sqrt{2}-1)^2, (\sqrt{2}+1)^2]\sigma^2 \approx [0.17\sigma^2, 5.83\sigma^2]\). A sample eigenvalue of 5 (in \(\sigma^2\) units) is still within the noise bulk — no information. This explains why portfolios optimised on \(\hat\Sigma\) directly perform poorly out-of-sample: the optimiser mistakes noise eigenvalues for genuine structure.
14.2 Ledoit–Wolf Linear Shrinkage
The Ledoit–Wolf (LW) estimator (2004) shrinks the sample covariance toward a structured target \(\hat\Sigma_{\text{target}}\):
- Identity target: \(\hat\Sigma_{\text{target}} = \bar\sigma^2 I\) (assumes all assets equally volatile and uncorrelated). Best when there is no prior structural information.
- Constant correlation target: all pairwise correlations equal the average sample correlation \(\bar\rho\). More realistic for equity universes.
14.3 Nonlinear Shrinkage
Ledoit and Wolf (2022) propose a nonlinear shrinkage estimator that individually rescales each sample eigenvalue \(\hat\lambda_i\) to its optimal shrinkage target, rather than applying a single global coefficient \(\rho\). The Oracle estimator (which requires knowing the true \(\Sigma\)) replaces each \(\hat\lambda_i\) with the optimal value that minimises out-of-sample loss. The Oracle cannot be computed directly, but its limit — as \(n, T \to \infty\) with \(n/T \to c\) — is characterised via the Stieltjes transform of the Marčenko–Pastur distribution, and is consistently estimated by the analytical nonlinear shrinkage formula. Empirically, nonlinear shrinkage reduces out-of-sample portfolio variance by an additional 5–15% beyond linear shrinkage.
Chapter 15: Backtesting and Out-of-Sample Evaluation
A portfolio strategy that looks excellent in hindsight (in-sample) but fails out-of-sample has not been validated. Backtesting — simulating the strategy on historical data not used to calibrate it — is the primary tool for evaluating strategies before deployment. However, backtesting is riddled with pitfalls that can produce misleadingly optimistic results.
15.1 Walk-Forward Testing
The walk-forward (or rolling window) methodology is the gold standard for backtesting portfolio strategies:
- Estimate \(\hat\mu\) and \(\hat\Sigma\) using data from \([t-W, t)\).
- Compute the optimal portfolio weights \(w_t^*\) on the estimated parameters.
- Record the realised return of \(w_t^*\) during \([t, t+H)\).
The key property is that at each step, the portfolio is calibrated only on past data — no future information enters the calibration. This avoids look-ahead bias, the most common form of backtest inflation.
15.2 Common Backtesting Biases
Survivorship bias: If the asset universe includes only stocks currently listed (ignoring delisted stocks), the backtest excludes the worst performers — those that went bankrupt or were acquired. Mitigation: use a point-in-time database that includes historical constituents of an index, not just current ones. CRSP (Centre for Research in Security Prices) and Compustat are the gold-standard sources for survivorship-bias-free US equity data.
Look-ahead bias: Using data that would not have been available at the time of the portfolio decision (e.g., revised earnings, post-close prices for intraday decisions). Mitigation: always use only data available strictly before \(t\) when computing \(w_t^*\). For accounting data, use the originally announced figures, not revisions.
Overfitting/data snooping bias: Testing many strategies and reporting only the best is equivalent to mining for spurious patterns. A strategy with Sharpe ratio 1.0 that was selected from 100 independent candidate strategies has a true expected Sharpe of roughly 0 — the observed 1.0 is the maximum of 100 noise realisations. The Bonferroni correction and, more formally, the False Discovery Rate (FDR) framework (Benjamini-Hochberg) provide statistical corrections. Harvey, Liu, and Zhu (2016) argue that the required \(t\)-statistic for a new factor in finance should be at least 3.0 (not the conventional 1.96) to account for the multiple-testing problem.
Transaction cost neglect: Mean-variance optimal portfolios often have high turnover — sometimes 100%+ per month. If transaction costs are 0.1% per trade and turnover is 10% per month, costs consume 0.2% per month, or about 2.4% per annum — often exceeding the alpha being generated.
Chapter 16: The Kelly Criterion and Log-Optimal Portfolios
The Kelly criterion, introduced by John Kelly Jr. in 1956 (originally in the context of information theory and gambling), provides an alternative optimality criterion to mean-variance: it maximises the long-run growth rate of wealth, rather than the expected return for a given level of variance.
16.1 Kelly for a Single Bet
Consider a gambler who bets a fraction \(f\) of their wealth on a game that pays \(b:1\) with probability \(p\) and loses the stake with probability \(1-p\). After \(T\) rounds, the wealth is:
\[ W_T = W_0 \cdot (1+bf)^{N_+}(1-f)^{N_-}, \]where \(N_+, N_-\) are the number of wins and losses with \(N_+ + N_- = T\). The long-run growth rate is:
\[ G(f) = \lim_{T\to\infty} \frac{\ln W_T}{T} = p\ln(1+bf) + (1-p)\ln(1-f) \quad \text{a.s. (by LLN).} \]Maximising \(G(f)\) over \(f \in [0,1]\):
Kelly fraction: \(f^* = p - (1-p)/b = 0.6 - 0.4/1 = 0.20\). Bet 20% of wealth on each flip. Expected value of \(\ln(1 + f^*)\): \(G(f^*) = 0.6\ln(1.2) + 0.4\ln(0.8) = 0.6(0.182) + 0.4(-0.223) = 0.109 - 0.089 = 0.020\) per bet, or about 2% growth per round. Overbetting to \(f = 0.5\) gives \(G = 0.6\ln(1.5) + 0.4\ln(0.5) = 0.243 - 0.277 = -0.034\) — the portfolio shrinks on average despite positive edge! This illustrates the fundamental asymmetry between arithmetic and geometric means.
16.2 Kelly for Continuous-Time Portfolio
For a portfolio of \(n\) risky assets with log-normal returns and a risk-free asset, the continuous-time Kelly portfolio maximises the drift of log-wealth:
The Kelly portfolio is identical in direction to the tangency portfolio on the efficient frontier — it is the portfolio with the maximum Sharpe ratio. However, it is typically much more leveraged than a risk-averse investor would accept. In practice, fractional Kelly (e.g., investing half the Kelly amount, \(w = f^* w_{\text{Kelly}}\) with \(f = 0.5\)) is used to reduce variance at the cost of slightly lower long-run growth. Half-Kelly has half the variance of full Kelly but achieves about 75% of the long-run growth.
Chapter 17: Stress Testing and Scenario Analysis
Mean-variance optimisation treats risk as variance — a symmetric, backward-looking measure calibrated from historical returns. Stress testing complements this by asking: what is the portfolio’s behaviour under extreme scenarios that may not appear in historical data? This is particularly important because large losses are correlated across assets during crises (the correlation structure of returns during the 2008 financial crisis was dramatically different from the pre-crisis period).
17.1 Historical Scenario Analysis
The simplest stress test applies a set of historical stress events directly to the portfolio:
Standard historical stress scenarios used in practice include: Black Monday (October 1987, equity index fell 22% in one day), the LTCM crisis (August-September 1998, credit and liquidity spreads widened sharply), the dot-com crash (2000-2002), the global financial crisis (September-December 2008), COVID-19 (February-March 2020).
17.2 Factor-Based Stress Testing
For large portfolios with hundreds of assets, full-revaluation stress tests are computationally expensive. Factor-based methods project the portfolio onto a small number of risk factors and apply shocks at the factor level:
\[ \Delta P \approx w^T B \Delta f + w^T \Delta\varepsilon, \]where \(\Delta f \in \mathbb{R}^k\) is the vector of factor shocks and the idiosyncratic term \(w^T \Delta\varepsilon\) is typically treated as zero for diversified portfolios (by the law of large numbers, idiosyncratic shocks average out across many assets). This reduces a \(n\)-dimensional stress test to a \(k\)-dimensional one, enabling real-time scenario generation.
Chapter 18: Portfolio Insurance and Dynamic Strategies
Static mean-variance portfolios rebalance periodically to a fixed target allocation. Dynamic strategies alter the portfolio allocation in response to market conditions or wealth levels. Portfolio insurance is the canonical example: a strategy designed to guarantee a minimum portfolio value while participating in upside.
18.1 Constant Proportion Portfolio Insurance (CPPI)
CPPI (Perold and Sharpe, 1988) dynamically adjusts the allocation between a risky asset and a risk-free asset to ensure the portfolio value never falls below a floor \(F\):
The CPPI guarantee holds in continuous time (if the risky asset cannot gap below \(F\)), since the risky allocation automatically falls to zero as \(W_t \to F\). In discrete time, a sudden large drop in the risky asset can breach the floor — the so-called gap risk.
After month 1, the risky asset rises 10%: \(E_1 = 66\), \(B_1\) grows by the risk-free rate (say 0.3%): \(B_1 \approx 40.12\). Total \(W_1 \approx 106.12\). Rebalance: cushion \(C_1 = 26.12\); \(E_1^\text{new} = 3\times26.12 = 78.36\); \(B_1^\text{new} = 27.76\). The rising cushion leads to buying more risky assets — a momentum-like behaviour.
If instead the risky asset falls 15% in month 1: \(E_1 = 51\), \(W_1 \approx 91.12\). Cushion \(C_1 = 11.12\); \(E_1^\text{new} = 33.36\); \(B_1^\text{new} = 57.76\). The falling cushion triggers de-risking — the strategy automatically reduces risky exposure, protecting the floor.
18.2 Comparison: Buy-and-Hold, Fixed-Mix, and CPPI
Buy-and-hold: Allocation drifts with market prices. Pay-off is concave (benefits from trending markets, hurts in volatile/mean-reverting markets). No guarantee on minimum value.
Fixed-mix (constant-mix): Rebalance to a fixed target allocation. Pay-off is also concave in trending markets but excels in mean-reverting markets (systematic buy-low, sell-high). No floor guarantee.
CPPI: Pay-off is convex — it participates in upside (rising risky asset increases cushion and hence allocation) and protects against downside (falling cushion reduces allocation). The cost of convexity: CPPI underperforms fixed-mix in mean-reverting markets because it buys high and sells low. The multiplier \(m\) controls the convexity: higher \(m\) means more participation in upside but higher gap risk in crashes.
Options-based portfolio insurance (OBPI)}: Hold the risky asset plus a put option struck at the floor. The put option’s premium is the explicit cost of insurance. CPPI approximates OBPI dynamically (CPPI is a delta-hedging replication of a path-dependent option), but requires continuous rebalancing and has gap risk that OBPI avoids.
18.3 Lifecycle Investing and Target-Date Funds
A practical application of dynamic portfolio theory is the lifecycle fund (also called a target-date fund). The dominant insight from lifecycle investing theory (Samuelson 1969, Merton 1971, Ayres-Nalebuff 2013) is that young investors should hold substantially more equity than their current wealth level alone would suggest, because their future labour income constitutes a large risk-free bond. As investors age and the present value of future labour income falls, the optimal risky asset allocation decreases.
Target-date funds operationalise this by specifying a glide path: a schedule of equity allocation declining from ~90% at age 25 to ~40% at retirement. The mathematics: if human capital (PV of future labour income) is modelled as a risk-free asset of value \(HC_t\), the optimal equity fraction of total wealth (financial + human) is constant at the mean-variance optimal level, but the equity fraction of financial wealth is
\[ w_{\text{equity}}^{\text{financial}} = w^* \cdot \frac{W_t + HC_t}{W_t}, \]which decreases as \(HC_t\) shrinks relative to \(W_t\) with age. This is the mathematical derivation of the rule of thumb “hold (100 - age)% in equities.”
Chapter 19: Numerical Methods and Software
Portfolio optimisation in practice requires efficient computation on large asset universes (hundreds to thousands of assets). This chapter surveys the main numerical building blocks and the MATLAB toolbox routines used throughout the course.
19.1 Solving Quadratic Programmes
The mean-variance problem is a quadratic programme (QP) — a special case of a convex optimisation problem. MATLAB’s quadprog function solves:
For the Markowitz problem with target return \(\bar{R}\): set \(H = 2\Sigma\), \(f = 0\), \(A_{\text{eq}} = [\mu^T; e^T]\), \(b_{\text{eq}} = [\bar{R}; 1]\). With non-negativity constraints: \(\ell = 0\), \(u = +\infty\). MATLAB’s solver uses an interior-point or active-set method depending on the problem structure.
% Inputs: mu (n×1 expected returns), Sigma (n×n covariance), rf (risk-free rate)
n = length(mu);
R_min = min(mu); R_max = max(mu);
R_grid = linspace(R_min, R_max, K);
w_frontier = zeros(n, K);
for k = 1:K
Aeq = [mu'; ones(1,n)]; beq = [R_grid(k); 1];
lb = zeros(n,1); % no short-selling
w_frontier(:,k) = quadprog(2*Sigma, zeros(n,1), [], [], Aeq, beq, lb);
end
sig_frontier = sqrt(diag(w_frontier' * Sigma * w_frontier));
plot(sig_frontier, R_grid);
For the unconstrained case (short-selling allowed), the KKT system can be solved directly via [Sigma mu; mu' 0; ones(1,n) 0 0] \ [zeros(n,1); R_grid(k); 1] using MATLAB’s backslash, which is faster than calling quadprog.
19.2 Complexity and Scalability
| Operation | Complexity | Comment |
|---|---|---|
| Cholesky factorisation of \(\Sigma\) | \(O(n^3)\) | Done once per rebalance |
| Compute \(\Sigma^{-1}\mu\), \(\Sigma^{-1}e\) | \(O(n^2)\) | Given Cholesky |
| Compute scalars \(A, B, C, D\) | \(O(n)\) | After vectors |
| Single efficient frontier point | \(O(n^2)\) | Full KKT solve |
| Factor model \(\Sigma = BFB^T + D\) | \(O(nk^2)\) | Woodbury inversion |
quadprog on \(n\)-asset QP | \(O(n^3)\) | Worst case, interior point |
quadprog with sparsity | \(O(n^{1.5})\) to \(O(n^2)\) | Sparse Cholesky exploited |
For \(n = 500\) assets (a typical institutional equity universe), a single efficient frontier point takes under 1 second in MATLAB on a modern laptop. For \(n = 5000\) (large quantitative fund), factor model methods (\(k = 30\) factors) reduce the Woodbury system to a \(30 \times 30\) solve, enabling real-time frontier computation. The bottleneck shifts from the optimisation itself to data cleaning, factor estimation, and transaction cost modelling.
19.3 Convex Optimisation Toolboxes
Beyond quadprog, the discipline of convex optimisation offers powerful modelling languages that accept a wide class of portfolio constraints and objectives:
- CVX (MATLAB) and CVXPY (Python): Domain-specific languages for convex optimisation. Constraints like L1 transaction costs, CVaR constraints, and cardinality constraints (if relaxed) can be stated naturally. CVX translates the problem to a standard cone programme (LP, QP, SOCP, or SDP) and calls an interior-point solver (SDPT3, SeDuMi, Mosek).
- OSQP: An open-source QP solver optimised for embedded and real-time applications, particularly efficient for large sparse QPs arising in portfolio problems with many constraints.
- Gurobi/CPLEX: Commercial mixed-integer QP solvers for portfolio problems with discrete constraints (minimum lot sizes, cardinality constraints, integer weights for bond portfolios).
The correct choice of solver depends on the problem structure: quadprog for simple unconstrained or lightly constrained frontier problems; CVX/CVXPY for rapid prototyping with complex constraints; Gurobi/CPLEX for production-grade optimisation with integer constraints and tight performance requirements.
Chapter 20: Duality and the Fund Separation Theorems
The KKT duality theory developed in Chapter 4 has a beautiful geometric interpretation in portfolio theory: the dual variables (Lagrange multipliers) correspond to the shadow prices of the constraints, and duality gives rise to the classical fund separation theorems.
20.1 Strong Duality for the Markowitz QP
Recall the primal problem (Formulation I):
\[ p^* = \min_{w} \; w^T\Sigma w \quad \text{s.t.} \quad \mu^T w = \bar{R},\; e^T w = 1. \]The Lagrangian is \(\mathcal{L}(w,\gamma,\delta) = w^T\Sigma w - \gamma(\mu^T w - \bar{R}) - \delta(e^T w - 1)\). Since the objective is strictly convex (when \(\Sigma \succ 0\)) and the constraints are linear, strong duality holds: the dual optimal value \(d^* = p^*\). The dual function is
\[ g(\gamma,\delta) = \min_w \mathcal{L}(w,\gamma,\delta) = -\frac{1}{4}(\gamma\mu+\delta e)^T\Sigma^{-1}(\gamma\mu+\delta e) + \gamma\bar{R} + \delta, \]obtained by substituting the optimal \(w(\gamma,\delta) = \frac{1}{2}\Sigma^{-1}(\gamma\mu+\delta e)\). Maximising \(g\) over \((\gamma,\delta)\) gives the dual problem, whose solution directly yields the optimal primal \(w^*\) and the shadow prices \(\gamma^*, \delta^*\).
20.2 Interpretation of Dual Variables
The dual variable \(\gamma^*\) is the shadow price of the expected return constraint: it measures how much the minimum variance decreases per unit increase in \(\bar{R}\). Geometrically, \(\gamma^*\) is the slope of the efficient frontier in \((\sigma_p^2, \mu_p)\) space at the point \((\sigma_p^{*2}, \bar{R})\). This has the financial interpretation: \(\gamma^*\) is the marginal cost of seeking higher expected return, measured in additional variance. When \(\gamma^*\) is large, pushing the return target higher is expensive in variance terms; when small, higher expected return comes cheaply.
20.3 Generalised Separation and Mutual Fund Theorem
The multi-period and multi-investor version of fund separation (due to Ross 1978 and others) states: under suitable distributional assumptions (elliptical returns, or CARA utility), every investor’s optimal portfolio is a linear combination of the same \(K\) “mutual funds.” The one-fund theorem (\(K=1\), the market portfolio in CAPM) and two-fund theorem (\(K=2\), GMVP and tangency portfolio) are special cases. The existence of such separation results is closely tied to the algebraic structure of the efficient frontier: the fact that the frontier is a convex set and that the optimal weight vector is a linear function of the target return. This linearity — a direct consequence of the KKT stationarity condition being a linear system — is what makes mutual fund separation theorems possible and distinguishes the Markowitz framework from more general optimisation settings where the optimal weight is a nonlinear function of the target.
Appendix: Quick Reference
A.1 Efficient Frontier Scalars
For \(n\) assets with \(\Sigma \succ 0\), \(\mu\) not proportional to \(e\):
| Symbol | Formula | Meaning |
|---|---|---|
| \(A\) | \(\mu^T\Sigma^{-1}e\) | Cross term |
| \(B\) | \(\mu^T\Sigma^{-1}\mu\) | Return precision |
| \(C\) | \(e^T\Sigma^{-1}e\) | Sum of precision |
| \(D\) | \(BC - A^2 > 0\) | Discriminant |
| \(\sigma^2_{\min}\) | \(1/C\) | GMVP variance |
| \(\mu_{\min}\) | \(A/C\) | GMVP return |
Frontier parabola: \(\sigma_p^2 = \frac{B\mu_p^2 - 2A\mu_p + C}{D}\). GMVP weights: \(w_{\text{gmv}} = \Sigma^{-1}e / C\).
A.2 Key Risk Measures
| Measure | Formula (Gaussian, \(\mu_p, \sigma_p\)) | Subadditive? |
|---|---|---|
| VaR\(_\alpha\) | \(-\mu_p + z_\alpha \sigma_p\) | No |
| CVaR\(_\alpha\) | \(-\mu_p + \sigma_p \phi(z_\alpha)/(1-\alpha)\) | Yes |
| Volatility | \(\sigma_p\) | Yes |
| Max Drawdown | (path-dependent, no closed form) | No |
\(z_\alpha = \Phi^{-1}(\alpha)\) is the standard normal quantile; \(\phi\) is the standard normal PDF.
A.3 CAPM Summary
Single-factor: \(\mathbb{E}[r_i] - r_f = \beta_i(\mathbb{E}[r_m] - r_f)\), where \(\beta_i = \mathrm{Cov}(r_i, r_m)/\sigma_m^2\). Security market line: all assets plot on the line \(\mu = r_f + \beta(\mu_m - r_f)\) in \((\beta, \mu)\) space. Jensen’s alpha: \(\alpha_i = \mathbb{E}[r_i] - r_f - \beta_i(\mathbb{E}[r_m]-r_f)\); positive alpha indicates the asset lies above the security market line (underpriced relative to CAPM).
A.4 Black–Litterman Summary
Prior: \(\mu \sim \mathcal{N}(\Pi, \tau\Sigma)\) with \(\Pi = \delta\Sigma w_m\). Views: \(P\mu = q + \varepsilon\), \(\varepsilon \sim \mathcal{N}(0,\Omega)\). Posterior mean: \(\hat\mu_{\text{BL}} = [(\tau\Sigma)^{-1} + P^T\Omega^{-1}P]^{-1}[(\tau\Sigma)^{-1}\Pi + P^T\Omega^{-1}q]\). Plug \(\hat\mu_{\text{BL}}\) into standard Markowitz to obtain the BL-optimal portfolio.