AFM 323: Quantitative Foundations for Finance
Tony Wirjanto
Estimated study time: 1 hr 49 min
Table of contents
Sources and References
Primary textbook — Hull, J. C. Options, Futures, and Other Derivatives, 11th Edition. Pearson, 2022. McDonald, R. L. Derivatives Markets, 3rd Edition. Pearson, 2013. Shreve, S. E. Stochastic Calculus for Finance I & II. Springer, 2004.
Supplementary texts — Campbell, J. Y., Lo, A. W., and MacKinlay, A. C. The Econometrics of Financial Markets. Princeton University Press, 1997. Tsay, R. S. Analysis of Financial Time Series, 3rd Edition. Wiley, 2010. Greene, W. H. Econometric Analysis, 8th Edition. Pearson, 2018.
Online resources — FRED (Federal Reserve Economic Data); Kenneth French Data Library (Fama-French factors); Yahoo Finance; MIT OCW 18.S096 Topics in Mathematics with Applications in Finance.
Chapter 1: Probability Foundations for Finance
1.1 Probability Spaces
Financial modeling begins with a rigorous probabilistic framework. A probability space is a triple \((\Omega, \mathcal{F}, P)\) where:
Sigma-algebra (event space) \(\mathcal{F}\): A collection of subsets of \(\Omega\) (called events) that is closed under complementation and countable unions. The sigma-algebra \(\mathcal{F}\) encodes what can be observed — what questions about outcomes are well-posed.
Probability measure \(P\): A function \(P: \mathcal{F} \to [0,1]\) satisfying:
- \(P(\Omega) = 1\)
- \(P(\emptyset) = 0\)
- For disjoint events \(A_1, A_2, \ldots\): \(P\!\left(\bigcup_{i=1}^\infty A_i\right) = \sum_{i=1}^\infty P(A_i)\) (countable additivity)
The probability measure \(P\) is called the physical or real-world measure. In derivative pricing, we will also encounter the risk-neutral measure \(Q\), under which discounted asset prices are martingales. These two measures are related by Girsanov’s theorem.
Conditional probability is fundamental to finance because information arrives over time and beliefs must be updated:
\[ P(A \mid B) = \frac{P(A \cap B)}{P(B)}, \quad P(B) > 0 \]Bayes’ theorem — the core engine of belief updating — follows directly:
\[ P(A \mid B) = \frac{P(B \mid A) \cdot P(A)}{P(B)} \]The posterior default probability jumps from 2% to 14% — a seven-fold increase reflecting the informational content of the downgrade.
1.2 Random Variables and Their Distributions
Cumulative distribution function (CDF):
\[ F_X(x) = P(X \leq x) \]For a continuous random variable with pdf \(f_X\):
\[ F_X(x) = \int_{-\infty}^{x} f_X(u)\, du \]Key properties of the CDF: \(F\) is non-decreasing, right-continuous, \(\lim_{x \to -\infty} F(x) = 0\), and \(\lim_{x \to \infty} F(x) = 1\).
1.2.1 The Normal Distribution
The normal (Gaussian) distribution is the cornerstone of classical statistics and financial modeling.
with mean \(E[X] = \mu\) and variance \(\text{Var}(X) = \sigma^2\).
The standard normal \(Z \sim N(0,1)\) has pdf \(\phi(z) = (2\pi)^{-1/2} e^{-z^2/2}\) and CDF \(\Phi(z)\). Any normal random variable can be standardized: if \(X \sim N(\mu,\sigma^2)\) then \(Z = (X-\mu)/\sigma \sim N(0,1)\).
The normal distribution is characterized by the 68-95-99.7 rule:
\[ P(\mu - \sigma \leq X \leq \mu + \sigma) \approx 0.683 \]\[ P(\mu - 2\sigma \leq X \leq \mu + 2\sigma) \approx 0.954 \]\[ P(\mu - 3\sigma \leq X \leq \mu + 3\sigma) \approx 0.997 \]Moment generating function (MGF) of the normal:
\[ M_X(t) = E[e^{tX}] = \exp\!\left(\mu t + \frac{\sigma^2 t^2}{2}\right) \]The MGF is useful for deriving moments: \(E[X] = M_X'(0) = \mu\) and \(E[X^2] = M_X''(0) = \mu^2 + \sigma^2\), confirming \(\text{Var}(X) = \sigma^2\).
1.2.2 The Lognormal Distribution
The lognormal arises naturally as the distribution of asset prices when log returns are normally distributed. Its moments are:
\[ E[Y] = e^{\mu + \sigma^2/2} \]\[ \text{Var}(Y) = e^{2\mu + \sigma^2}(e^{\sigma^2} - 1) \]so \(S_T = S_0 \exp\!\left(\sum r_t\right)\) is lognormal. Prices are bounded below by zero (limited liability) and the percentage change distribution is symmetric — both properties matched by the lognormal but not the normal.
1.2.3 The Poisson Distribution
The Poisson distribution models the number of rare events occurring in a fixed time interval.
with mean \(E[X] = \lambda\) and variance \(\text{Var}(X) = \lambda\) (equidispersion property).
In finance, Poisson processes model jump arrivals in jump-diffusion models (Merton, 1976), credit events (default arrivals in reduced-form credit models), and trade arrivals in market microstructure. The Poisson process \(\{N_t\}_{t \geq 0}\) counts events up to time \(t\): \(N_t - N_s \sim \text{Poisson}(\lambda(t-s))\) for \(t > s\), with independent increments.
1.3 Expectation, Variance, and Covariance
For a function \(g(X)\): \(E[g(X)] = \int_{-\infty}^{\infty} g(x) f(x)\, dx\) (Law of the Unconscious Statistician).
Variance:
\[ \text{Var}(X) = E[(X - E[X])^2] = E[X^2] - (E[X])^2 \]Standard deviation: \(\sigma_X = \sqrt{\text{Var}(X)}\)
Covariance:
\[ \text{Cov}(X,Y) = E[(X - E[X])(Y - E[Y])] = E[XY] - E[X]E[Y] \]Correlation:
\[ \rho_{XY} = \frac{\text{Cov}(X,Y)}{\sigma_X \sigma_Y} \in [-1, 1] \]Key algebraic properties:
- Linearity of expectation: \(E[aX + bY] = aE[X] + bE[Y]\) (always, regardless of dependence)
- Variance of a linear combination: \(\text{Var}(aX + bY) = a^2\text{Var}(X) + b^2\text{Var}(Y) + 2ab\,\text{Cov}(X,Y)\)
- If \(X\) and \(Y\) are independent: \(\text{Cov}(X,Y) = 0\) and \(\text{Var}(X + Y) = \text{Var}(X) + \text{Var}(Y)\)
This is used extensively in dynamic programming and option pricing. It says: to compute the unconditional expectation of \(X\), first compute the conditional expectation given \(Y\), then average over \(Y\).
Law of Total Variance:
\[ \text{Var}(X) = E[\text{Var}(X \mid Y)] + \text{Var}(E[X \mid Y]) \]Total variance = average within-group variance + variance of group means.
1.4 Moment Generating Functions
defined for all \(t\) in some neighborhood of 0. The \(n\)-th moment of \(X\) is:
\[ E[X^n] = M_X^{(n)}(0) = \left.\frac{d^n}{dt^n} M_X(t)\right|_{t=0} \]MGFs of common distributions:
| Distribution | MGF \(M_X(t)\) |
|---|---|
| \(N(\mu, \sigma^2)\) | \(\exp(\mu t + \sigma^2 t^2/2)\) |
| \(\text{Poisson}(\lambda)\) | \(\exp(\lambda(e^t - 1))\) |
| \(\text{Exponential}(\lambda)\) | \(\lambda/(\lambda - t)\), \(t < \lambda\) |
| \(\text{Bernoulli}(p)\) | \(1 - p + pe^t\) |
The MGF uniquely determines a distribution (when it exists in a neighborhood of 0). A key application: if \(X\) and \(Y\) are independent with MGFs \(M_X\) and \(M_Y\), then \(M_{X+Y}(t) = M_X(t) \cdot M_Y(t)\). This immediately proves that the sum of independent normals is normal.
Cumulant generating function (CGF): \(K_X(t) = \ln M_X(t)\). The cumulants \(\kappa_n = K_X^{(n)}(0)\) have the property that \(\kappa_1 = E[X]\), \(\kappa_2 = \text{Var}(X)\), \(\kappa_3\) is related to skewness, \(\kappa_4\) to excess kurtosis. For the normal distribution, all cumulants of order 3 and higher are zero — deviation from this provides a measure of non-normality.
Chapter 2: Financial Data and Descriptive Statistics
2.1 Asset Returns as Random Variables
Net return: \( r_t = R_t - 1 = (P_t - P_{t-1})/P_{t-1} \)
Continuously compounded (log) return:
\[ r_t^{cc} = \ln P_t - \ln P_{t-1} = \ln(1 + r_t) \]Log returns are additive over time: \(r_t^{cc}(k) = \sum_{i=0}^{k-1} r_{t-i}^{cc}\), whereas simple returns are multiplicative: \(R_t(k) = \prod_{i=0}^{k-1} R_{t-i}\). For small \(r\), \(\ln(1+r) \approx r\), so log and simple returns are approximately equal at daily or weekly frequencies.
2.2 Descriptive Statistics
Central tendency:
\[ \bar{r} = \frac{1}{T}\sum_{t=1}^T r_t \quad \text{(arithmetic mean)} \]\[ \bar{r}_g = \left(\prod_{t=1}^T (1+r_t)\right)^{1/T} - 1 \quad \text{(geometric mean)} \]The arithmetic mean overstates long-run compounded growth. The geometric mean exceeds the arithmetic mean only if all returns are equal; otherwise \(\bar{r}_g < \bar{r}\), with approximate gap \(\approx \sigma^2/2\).
Dispersion:
\[ s^2 = \frac{1}{T-1}\sum_{t=1}^T (r_t - \bar{r})^2 \quad \text{(sample variance)} \]Skewness and kurtosis:
\[ \text{Skew} = \frac{1}{T}\sum_{t=1}^T \left(\frac{r_t - \bar{r}}{s}\right)^3, \quad \text{Kurt} = \frac{1}{T}\sum_{t=1}^T \left(\frac{r_t - \bar{r}}{s}\right)^4 \]Excess kurtosis \(= \text{Kurt} - 3\). Financial returns routinely show negative skew and positive excess kurtosis (fat tails), violating the normality assumption underlying many classical models.
Jarque-Bera test for normality:
\[ JB = \frac{T}{6}\left(\text{Skew}^2 + \frac{(\text{Kurt}-3)^2}{4}\right) \sim \chi^2_2 \text{ under } H_0: \text{normal} \]Chapter 3: Statistical Inference for Finance
3.1 Sampling Distributions and the Central Limit Theorem
The sample mean is approximately normally distributed in large samples, regardless of the underlying distribution of \(X_i\). This is the foundation for large-sample inference in finance.
The CLT requires finite variance — a condition that may fail for heavy-tailed distributions (e.g., stable distributions with tail index \(\alpha < 2\)). When the CLT applies, confidence intervals and t-tests are asymptotically valid even without normality.
3.2 Confidence Intervals
A 95% confidence interval for the population mean \(\mu\), given a sample of size \(n\) with known \(\sigma\):
\[ \bar{X} \pm 1.96 \cdot \frac{\sigma}{\sqrt{n}} \]When \(\sigma\) is unknown (the usual case), use the sample standard deviation \(s\) and replace the standard normal critical value with a t-critical value:
\[ \bar{X} \pm t_{\alpha/2,\, n-1} \cdot \frac{s}{\sqrt{n}} \]The t-distribution with \(\nu\) degrees of freedom has heavier tails than the normal; as \(\nu \to \infty\) it converges to \(N(0,1)\).
The interval includes zero — we cannot statistically reject the hypothesis that the true mean return is zero, even though the point estimate is positive. This illustrates the difficulty of detecting a risk premium with limited data.
3.3 Hypothesis Testing
Step-by-step hypothesis testing:
- State \(H_0\) (null) and \(H_1\) (alternative).
- Choose significance level \(\alpha\) (typically 1%, 5%, or 10%).
- Compute the test statistic.
- Determine the rejection region or compute the p-value.
- Make a decision: reject \(H_0\) if the test statistic falls in the rejection region (or equivalently, if p-value \(< \alpha\)).
One-sample t-test: \(H_0: \mu = \mu_0\) vs. \(H_1: \mu \neq \mu_0\):
\[ t = \frac{\bar{X} - \mu_0}{s/\sqrt{n}} \sim t_{n-1} \text{ under } H_0 \]Type I error (false positive): Reject \(H_0\) when it is true. Probability = \(\alpha\). Type II error (false negative): Fail to reject \(H_0\) when it is false. Probability = \(\beta\). Power of a test: \(1 - \beta\). The probability of correctly rejecting a false \(H_0\).
Two-sample t-test for equality of means: Used to compare performance of two portfolios, two trading strategies, etc. Under equal variances:
\[ t = \frac{\bar{X}_1 - \bar{X}_2}{s_p\sqrt{1/n_1 + 1/n_2}}, \quad s_p^2 = \frac{(n_1-1)s_1^2 + (n_2-1)s_2^2}{n_1+n_2-2} \]Chapter 4: Simple Linear Regression
4.1 The OLS Estimator
The simple linear regression model:
\[ y_t = \alpha + \beta x_t + \varepsilon_t, \quad t = 1, \ldots, T \]OLS minimizes the sum of squared residuals (SSR):
\[ \min_{\alpha,\beta} \sum_{t=1}^T (y_t - \alpha - \beta x_t)^2 \]Taking first-order conditions and solving:
\[ \hat{\beta} = \frac{\sum_{t=1}^T (x_t - \bar{x})(y_t - \bar{y})}{\sum_{t=1}^T (x_t - \bar{x})^2} = \frac{\widehat{\text{Cov}}(x,y)}{\widehat{\text{Var}}(x)} \]\[ \hat{\alpha} = \bar{y} - \hat{\beta}\bar{x} \]The regression line always passes through the point \((\bar{x}, \bar{y})\).
4.2 Gauss-Markov Theorem and Classical Assumptions
- Linearity: The DGP is \(y = X\beta + \varepsilon\).
- Full rank: \(X\) has rank \(k+1\) (no perfect multicollinearity).
- Exogeneity: \(E[\varepsilon \mid X] = 0\).
- Spherical errors: \(\text{Var}(\varepsilon \mid X) = \sigma^2 I_T\) (homoskedasticity and no autocorrelation).
the OLS estimator \(\hat{\beta}\) is BLUE — Best (minimum variance) Linear Unbiased Estimator. No other linear unbiased estimator has smaller variance.
Unbiasedness: Under assumptions 1–3:
\[ E[\hat{\beta}] = E\left[(X^TX)^{-1}X^Ty\right] = (X^TX)^{-1}X^TX\beta = \beta \]Variance of OLS estimator:
\[ \text{Var}(\hat{\beta}) = \sigma^2(X^TX)^{-1} \]estimated by substituting \(\hat{\sigma}^2 = SSR/(T-k-1)\) for \(\sigma^2\).
4.3 Goodness of Fit
Decomposition of total variation:
\[ \underbrace{\sum(y_t - \bar{y})^2}_{SST} = \underbrace{\sum(\hat{y}_t - \bar{y})^2}_{SSM} + \underbrace{\sum\hat{\varepsilon}_t^2}_{SSR} \]\[ R^2 = \frac{SSM}{SST} = 1 - \frac{SSR}{SST} \in [0,1] \]In simple regression, \(R^2 = \hat{\rho}^2_{xy}\) — the squared sample correlation. In the CAPM context, \(R^2\) equals the fraction of total return variance attributable to market-wide (systematic) risk.
4.4 Inference in Regression
t-test for a single coefficient (\(H_0: \beta = 0\)):
\[ t = \frac{\hat{\beta}}{SE(\hat{\beta})} \sim t_{T-2} \text{ under } H_0 \]where \(SE(\hat{\beta}) = \hat{\sigma}/\sqrt{\sum(x_t - \bar{x})^2}\).
F-test for overall significance (\(H_0: \beta = 0\) in simple regression, or joint zero in multiple):
\[ F = \frac{SSM/k}{SSR/(T-k-1)} \sim F_{k, T-k-1} \text{ under } H_0 \]In simple regression, \(F = t^2\) — the two tests are equivalent.
4.5 CAPM as a Regression
The Security Characteristic Line (SCL):
\[ r_{it} - r_{ft} = \alpha_i + \beta_i(r_{mt} - r_{ft}) + \varepsilon_{it} \]Jensen’s alpha \(\alpha_i\): the intercept measures abnormal return relative to the CAPM benchmark. Under the CAPM null, \(\alpha_i = 0\) for all assets in equilibrium.
Market beta \(\beta_i\): measures systematic risk — the sensitivity of asset \(i\)’s excess return to the market excess return.
\[ \beta_i = \frac{\text{Cov}(r_i, r_m)}{\text{Var}(r_m)} \]Portfolio beta is the value-weighted average of constituent betas: \(\beta_p = \sum_i w_i \beta_i\).
which exceeds the 5% two-tailed critical value of approximately 2.00, so we reject \(\beta = 1\) at the 5% level. The stock is significantly more volatile than the market.
Chapter 5: Multiple Linear Regression and Factor Models
5.1 The Multiple Regression Model in Matrix Form
With \(k\) regressors and \(T\) observations, write \(\mathbf{y} = \mathbf{X}\boldsymbol{\beta} + \boldsymbol{\varepsilon}\) where:
- \(\mathbf{y}\) is \(T \times 1\)
- \(\mathbf{X}\) is \(T \times (k+1)\) (including a column of ones for the intercept)
- \(\boldsymbol{\beta}\) is \((k+1) \times 1\)
- \(\boldsymbol{\varepsilon}\) is \(T \times 1\)
The OLS estimator is:
\[ \hat{\boldsymbol{\beta}} = (\mathbf{X}^T\mathbf{X})^{-1}\mathbf{X}^T\mathbf{y} \]The fitted values are \(\hat{\mathbf{y}} = \mathbf{X}\hat{\boldsymbol{\beta}} = \mathbf{H}\mathbf{y}\) where \(\mathbf{H} = \mathbf{X}(\mathbf{X}^T\mathbf{X})^{-1}\mathbf{X}^T\) is the hat matrix (projection onto the column space of \(\mathbf{X}\)).
Adjusted \(R^2\) penalizes for additional regressors:
\[ \bar{R}^2 = 1 - \frac{SSR/(T-k-1)}{SST/(T-1)} = 1 - (1-R^2)\frac{T-1}{T-k-1} \]Unlike \(R^2\), \(\bar{R}^2\) can decrease when an irrelevant regressor is added.
5.2 The Fama-French Factor Models
Three-factor model (Fama and French, 1993):
\[ r_{it} - r_{ft} = \alpha_i + \beta_i^{MKT} MKT_t + \beta_i^{SMB} SMB_t + \beta_i^{HML} HML_t + \varepsilon_{it} \]- MKT: Market excess return \(r_m - r_f\)
- SMB (Small Minus Big): Return spread between small-cap and large-cap portfolios, capturing the size premium
- HML (High Minus Low): Return spread between high and low book-to-market portfolios, capturing the value premium
Five-factor model (Fama and French, 2015) adds:
- RMW (Robust Minus Weak): Profitability factor
- CMA (Conservative Minus Aggressive): Investment factor
Carhart (1997) four-factor model adds momentum:
- MOM (Winners Minus Losers): Past-12-month return spread
These multifactor regressions are estimated by OLS; the intercepts (alphas) test whether a portfolio or fund earns returns beyond the systematic factor exposures.
5.3 Regression Diagnostics
5.3.1 Heteroskedasticity
Breusch-Pagan test: Regress squared OLS residuals \(\hat{\varepsilon}_t^2\) on the regressors. The LM statistic \(= T \cdot R^2\) from that auxiliary regression is \(\chi^2_k\) under \(H_0\) of homoskedasticity.
White (1980) heteroskedasticity-robust standard errors correct inference without specifying the form of heteroskedasticity:
\[ \widehat{\text{Var}}_{HC}(\hat{\boldsymbol{\beta}}) = (\mathbf{X}^T\mathbf{X})^{-1}\left(\sum_{t=1}^T \hat{\varepsilon}_t^2 \mathbf{x}_t\mathbf{x}_t^T\right)(\mathbf{X}^T\mathbf{X})^{-1} \]This “sandwich” variance estimator is asymptotically valid whether or not errors are homoskedastic.
5.3.2 Autocorrelation
Durbin-Watson statistic:
\[ DW = \frac{\sum_{t=2}^T (\hat{\varepsilon}_t - \hat{\varepsilon}_{t-1})^2}{\sum_{t=1}^T \hat{\varepsilon}_t^2} \approx 2(1 - \hat{\rho}_1) \]DW near 2 signals no first-order autocorrelation; below 2 indicates positive, above 2 negative autocorrelation.
Newey-West HAC standard errors are valid under both heteroskedasticity and autocorrelation of unknown form:
\[ \widehat{\text{Var}}_{NW}(\hat{\boldsymbol{\beta}}) = (\mathbf{X}^T\mathbf{X})^{-1} \hat{\mathbf{S}} (\mathbf{X}^T\mathbf{X})^{-1} \]where \(\hat{\mathbf{S}}\) is a weighted sum of lagged autocovariance matrices of the score.
5.3.3 Multicollinearity
Variance Inflation Factor:
\[ VIF_j = \frac{1}{1 - R_j^2} \]where \(R_j^2\) is from regressing \(x_j\) on all other predictors. \(VIF_j > 10\) indicates severe multicollinearity; coefficient estimates are imprecise even if the model fits well overall.
Chapter 6: Linear Algebra for Portfolio Management
6.1 Vectors and Matrices
A portfolio of \(n\) assets is fully described by a weight vector:
\[ \mathbf{w} = (w_1, w_2, \ldots, w_n)^T, \quad \sum_{i=1}^n w_i = 1 \](allowing short positions: \(w_i < 0\) is permissible). The portfolio return is the inner product:
\[ r_p = \mathbf{w}^T \mathbf{r} = \sum_{i=1}^n w_i r_i \]where \(\mathbf{r} = (r_1, \ldots, r_n)^T\) is the vector of asset returns.
Portfolio expected return:
\[ E[r_p] = \mathbf{w}^T \boldsymbol{\mu}, \quad \boldsymbol{\mu} = E[\mathbf{r}] \]Portfolio variance:
\[ \text{Var}(r_p) = \mathbf{w}^T \boldsymbol{\Sigma} \mathbf{w} \]where \(\boldsymbol{\Sigma}\) is the \(n \times n\) covariance matrix with \(\Sigma_{ij} = \text{Cov}(r_i, r_j)\).
6.2 Properties of the Covariance Matrix
- Symmetric: \(\boldsymbol{\Sigma} = \boldsymbol{\Sigma}^T\) since \(\text{Cov}(r_i, r_j) = \text{Cov}(r_j, r_i)\).
- Positive semi-definite: \(\mathbf{w}^T\boldsymbol{\Sigma}\mathbf{w} \geq 0\) for all \(\mathbf{w}\), since portfolio variance is non-negative.
- Positive definite (if no redundant assets): \(\mathbf{w}^T\boldsymbol{\Sigma}\mathbf{w} > 0\) for all \(\mathbf{w} \neq \mathbf{0}\), guaranteeing \(\boldsymbol{\Sigma}^{-1}\) exists.
The covariance matrix can be written as:
\[ \boldsymbol{\Sigma} = \mathbf{D}\boldsymbol{\rho}\mathbf{D} \]where \(\mathbf{D} = \text{diag}(\sigma_1, \ldots, \sigma_n)\) and \(\boldsymbol{\rho}\) is the correlation matrix. In practice \(\boldsymbol{\Sigma}\) is estimated from historical data; with \(T\) observations and \(n\) assets:
\[ \hat{\boldsymbol{\Sigma}} = \frac{1}{T-1}\sum_{t=1}^T (\mathbf{r}_t - \bar{\mathbf{r}})(\mathbf{r}_t - \bar{\mathbf{r}})^T \]Reliable estimation requires \(T \gg n\); with \(n = 100\) assets, one needs hundreds of months of data for a stable sample covariance matrix.
6.3 Mean-Variance Optimization
The Markowitz minimum-variance problem:
\[ \min_{\mathbf{w}} \mathbf{w}^T\boldsymbol{\Sigma}\mathbf{w} \quad \text{subject to} \quad \mathbf{w}^T\boldsymbol{\mu} = \mu_p, \quad \mathbf{w}^T\mathbf{1} = 1 \]Using Lagrange multipliers with multipliers \(\lambda_1\) and \(\lambda_2\):
\[ \mathcal{L} = \mathbf{w}^T\boldsymbol{\Sigma}\mathbf{w} - \lambda_1(\mathbf{w}^T\boldsymbol{\mu} - \mu_p) - \lambda_2(\mathbf{w}^T\mathbf{1} - 1) \]First-order condition: \(\partial \mathcal{L}/\partial \mathbf{w} = 2\boldsymbol{\Sigma}\mathbf{w} - \lambda_1\boldsymbol{\mu} - \lambda_2\mathbf{1} = \mathbf{0}\), so:
\[ \mathbf{w}^* = \frac{\lambda_1}{2}\boldsymbol{\Sigma}^{-1}\boldsymbol{\mu} + \frac{\lambda_2}{2}\boldsymbol{\Sigma}^{-1}\mathbf{1} \]The Lagrange multipliers are solved by imposing the two constraints, yielding closed-form expressions involving the scalars \(A = \boldsymbol{\mu}^T\boldsymbol{\Sigma}^{-1}\boldsymbol{\mu}\), \(B = \boldsymbol{\mu}^T\boldsymbol{\Sigma}^{-1}\mathbf{1}\), \(C = \mathbf{1}^T\boldsymbol{\Sigma}^{-1}\mathbf{1}\), and \(D = AC - B^2\).
The efficient frontier in \((\sigma_p, \mu_p)\) space is a hyperbola:
\[ \sigma_p^2 = \frac{C\mu_p^2 - 2B\mu_p + A}{D} \]The global minimum variance (GMV) portfolio has weights:
\[ \mathbf{w}_{GMV} = \frac{\boldsymbol{\Sigma}^{-1}\mathbf{1}}{\mathbf{1}^T\boldsymbol{\Sigma}^{-1}\mathbf{1}} \]with minimum variance \(\sigma^2_{GMV} = 1/C\) and expected return \(\mu_{GMV} = B/C\).
Tangency (maximum Sharpe ratio) portfolio with risk-free rate \(r_f\):
\[ \mathbf{w}_{tan} = \frac{\boldsymbol{\Sigma}^{-1}(\boldsymbol{\mu} - r_f\mathbf{1})}{\mathbf{1}^T\boldsymbol{\Sigma}^{-1}(\boldsymbol{\mu} - r_f\mathbf{1})} \]All portfolios on the Capital Market Line (CML) combine the tangency portfolio and the risk-free asset:
\[ \mu_p = r_f + \frac{\mu_{tan} - r_f}{\sigma_{tan}}\sigma_p \]The slope \((\mu_{tan} - r_f)/\sigma_{tan}\) is the Sharpe ratio of the tangency portfolio, which is the highest achievable Sharpe ratio.
6.4 Eigendecomposition and Principal Components
The covariance matrix \(\boldsymbol{\Sigma}\) is symmetric positive semi-definite, so it admits an eigendecomposition:
\[ \boldsymbol{\Sigma} = \mathbf{P}\boldsymbol{\Lambda}\mathbf{P}^T \]where \(\boldsymbol{\Lambda} = \text{diag}(\lambda_1, \ldots, \lambda_n)\) contains eigenvalues (\(\lambda_1 \geq \lambda_2 \geq \cdots \geq \lambda_n \geq 0\)) and \(\mathbf{P}\) is orthogonal (eigenvectors as columns).
Principal Component Analysis (PCA) projects returns onto the eigenvectors: the first principal component (eigenvector corresponding to \(\lambda_1\)) explains the largest fraction of total return variance (\(\lambda_1/\sum \lambda_i\)). In equity markets, the first PC typically represents a market-wide factor; subsequent PCs capture sector and style factors.
Chapter 7: Financial Mathematics — Time Value and Optimization
7.1 Time Value of Money
Simple interest: \(FV = PV(1 + r \cdot T)\)
Compound interest (discrete, \(m\) periods per year):
\[ FV = PV\left(1 + \frac{r}{m}\right)^{mT} \]Continuous compounding (limit as \(m \to \infty\)):
\[ FV = PV \cdot e^{rT} \]This follows from \(\lim_{m\to\infty}(1 + r/m)^m = e^r\). Continuous compounding is used extensively in derivative pricing and stochastic calculus.
Present value as an integral: For a continuous cash-flow stream \(c(t)\) received over \([0, T]\) at continuously compounded discount rate \(r\):
\[ PV = \int_0^T c(t)\, e^{-rt}\, dt \]For a constant annuity \(c(t) = C\):
\[ PV = C\int_0^T e^{-rt}\, dt = C\left[\frac{-e^{-rt}}{r}\right]_0^T = \frac{C}{r}(1 - e^{-rT}) \]As \(T \to \infty\) (perpetuity): \(PV = C/r\).
7.2 Bond Pricing and Yield
A bond paying coupon \(C\) semi-annually for \(T\) years with face value \(F\) has price:
\[ P = \sum_{t=1}^{2T} \frac{C/2}{(1 + y/2)^t} + \frac{F}{(1 + y/2)^{2T}} \]where \(y\) is the yield to maturity (YTM) — the single discount rate that equates the present value of all cash flows to the market price. This equation has no closed form solution for \(y\); it must be solved numerically.
Newton-Raphson method for yield: Given price \(P^*\), start with an initial guess \(y_0\) and iterate:
\[ y_{n+1} = y_n - \frac{f(y_n)}{f'(y_n)} \]where \(f(y) = P(y) - P^*\) and \(P'(y) = dP/dy\). The derivative \(dP/dy\) equals \(-D_{\text{mod}} \cdot P\) where \(D_{\text{mod}}\) is the modified duration:
\[ D_{\text{mod}} = \frac{1}{P} \sum_{t=1}^{2T} \frac{t/2 \cdot C/2}{(1+y/2)^{t+1}} + \frac{T \cdot F}{(1+y/2)^{2T+1}} \]7.3 Calculus Review for Finance
Optimization: A differentiable function \(f: \mathbb{R} \to \mathbb{R}\) achieves a local minimum at \(x^*\) if:
\[ f'(x^*) = 0 \quad \text{(first-order condition)} \]\[ f''(x^*) > 0 \quad \text{(second-order condition)} \]For multivariate optimization of \(f: \mathbb{R}^n \to \mathbb{R}\), the first-order condition is \(\nabla f(\mathbf{x}^*) = \mathbf{0}\) and the Hessian matrix \(\mathbf{H} = [\partial^2 f / \partial x_i \partial x_j]\) must be positive definite at \(\mathbf{x}^*\) for a local minimum.
Constrained optimization — Lagrangian: To minimize \(f(\mathbf{x})\) subject to \(g(\mathbf{x}) = 0\):
\[ \mathcal{L}(\mathbf{x}, \lambda) = f(\mathbf{x}) - \lambda g(\mathbf{x}) \]FOC: \(\nabla_{\mathbf{x}} \mathcal{L} = \mathbf{0}\) and \(g(\mathbf{x}) = 0\). The scalar \(\lambda\) (Lagrange multiplier) is the marginal cost of tightening the constraint: it measures by how much the optimal value of \(f\) changes per unit increase in the constraint.
Taylor expansion (used in duration/convexity analysis):
\[ f(x + \Delta x) \approx f(x) + f'(x)\Delta x + \frac{1}{2}f''(x)(\Delta x)^2 + \cdots \]Applied to bond price as a function of yield:
\[ \Delta P \approx -D_{\text{mod}} \cdot P \cdot \Delta y + \frac{1}{2}\text{Convexity} \cdot P \cdot (\Delta y)^2 \]Chapter 8: Time Series Analysis
8.1 Stationarity
Weak (covariance) stationarity: A time series \(\{X_t\}\) is weakly stationary if:
- \(E[X_t] = \mu\) (constant mean)
- \(\text{Var}(X_t) = \sigma^2 < \infty\) (constant, finite variance)
- \(\text{Cov}(X_t, X_{t-k}) = \gamma(k)\) depends only on lag \(k\), not on \(t\)
Most time-series modeling assumes weak stationarity. Unit root processes (random walks) are not stationary: their variance grows without bound over time.
Autocovariance and autocorrelation functions:
\[ \gamma(k) = \text{Cov}(X_t, X_{t-k}) \]\[ \rho(k) = \frac{\gamma(k)}{\gamma(0)} \in [-1,1] \]The sample ACF:
\[ \hat{\rho}(k) = \frac{\sum_{t=k+1}^T (X_t - \bar{X})(X_{t-k} - \bar{X})}{\sum_{t=1}^T (X_t - \bar{X})^2} \]Under the null of white noise, \(\hat{\rho}(k) \approx N(0, 1/T)\), so approximate 95% confidence bands are \(\pm 1.96/\sqrt{T}\).
8.2 AR(p) Models
where \(\varepsilon_t \sim WN(0, \sigma^2)\) (white noise: zero mean, constant variance, uncorrelated).
Stationarity condition: The AR(\(p\)) is stationary if all roots of the characteristic polynomial
\[ \Phi(z) = 1 - \phi_1 z - \phi_2 z^2 - \cdots - \phi_p z^p = 0 \]lie outside the unit circle (\(|z| > 1\)).
For the AR(1) model \(X_t = \phi X_{t-1} + \varepsilon_t\), stationarity requires \(|\phi| < 1\). When stationary:
\[ E[X_t] = 0, \quad \text{Var}(X_t) = \frac{\sigma^2}{1-\phi^2}, \quad \rho(k) = \phi^k \]The ACF decays geometrically — the hallmark of an AR(1). The PACF (partial autocorrelation function) cuts off after lag \(p\) for an AR(\(p\)) process.
8.3 MA(q) Models
An MA(\(q\)) process is always stationary. Its ACF cuts off after lag \(q\) (all autocorrelations beyond lag \(q\) are exactly zero), while the PACF decays geometrically. This is the mirror image of the AR(\(p\)) pattern.
Invertibility condition: The MA(\(q\)) is invertible (has a well-defined AR(\(\infty\)) representation) if all roots of \(\Theta(z) = 1 + \theta_1 z + \cdots + \theta_q z^q = 0\) lie outside the unit circle.
8.4 ARMA(p, q) Models
Stationary and invertible if roots of both \(\Phi(z)\) and \(\Theta(z)\) lie strictly outside the unit circle.
Model identification (Box-Jenkins methodology):
- Identify tentative \(p, q\) from ACF/PACF patterns.
- Estimate by maximum likelihood or conditional least squares.
- Check residuals: ACF of \(\hat{\varepsilon}_t\) should show no significant autocorrelation (Ljung-Box Q-test).
Ljung-Box statistic:
\[ Q(m) = T(T+2)\sum_{k=1}^m \frac{\hat{\rho}_k^2}{T-k} \sim \chi^2_{m-p-q} \text{ under } H_0 \]Information criteria for model selection:
\[ AIC = -2\ln(\hat{L}) + 2(p+q+1) \]\[ BIC = -2\ln(\hat{L}) + (p+q+1)\ln T \]BIC imposes a heavier penalty for model complexity than AIC and tends to select more parsimonious models. Both criteria should be minimized.
8.5 Unit Roots and Integrated Processes
A random walk \(X_t = X_{t-1} + \varepsilon_t\) has a unit root: \(\phi = 1\) in the AR(1). Such a process is I(1) — integrated of order 1. Its variance grows linearly in \(t\): \(\text{Var}(X_t) = t\sigma^2\).
Augmented Dickey-Fuller (ADF) test for a unit root in \(X_t\):
Test regression: \(\Delta X_t = \alpha + \delta t + \gamma X_{t-1} + \sum_{j=1}^p c_j \Delta X_{t-j} + u_t\)
\(H_0: \gamma = 0\) (unit root, non-stationary) vs. \(H_1: \gamma < 0\) (stationary).
The ADF test statistic does not follow a standard t-distribution under \(H_0\); Dickey-Fuller critical values must be used (more negative than standard t-critical values).
8.6 Forecasting with Time Series Models
For a stationary AR(1) model \(X_t = \phi X_{t-1} + \varepsilon_t\), the optimal (minimum MSE) \(h\)-step-ahead forecast from time \(T\) is:
\[ \hat{X}_{T+h} = \phi^h X_T \]which reverts toward zero (the unconditional mean) geometrically. The forecast error variance:
\[ \text{Var}(X_{T+h} - \hat{X}_{T+h}) = \sigma^2\frac{1 - \phi^{2h}}{1-\phi^2} \to \frac{\sigma^2}{1-\phi^2} = \text{Var}(X_t) \text{ as } h \to \infty \]Long-horizon forecasts become uninformative as uncertainty grows toward the unconditional variance.
Chapter 9: Volatility Modeling — ARCH and GARCH
9.1 Volatility Clustering
One of the most robust empirical regularities in financial data is volatility clustering: large changes in asset prices tend to be followed by large changes (of either sign), and small changes by small changes. Mandelbrot (1963) first documented this; Engle (1982) provided a formal econometric model.
ARCH effects test: To detect time-varying volatility, regress squared residuals on their lags (auxiliary regression):
\[ \hat{\varepsilon}_t^2 = \gamma_0 + \gamma_1 \hat{\varepsilon}_{t-1}^2 + \cdots + \gamma_q \hat{\varepsilon}_{t-q}^2 + u_t \]The Engle LM statistic \(= T \cdot R^2\) is \(\chi^2_q\) under \(H_0\) of no ARCH effects.
9.2 ARCH Model
with \(\omega > 0\), \(\alpha_i \geq 0\), and \(\sum_{i=1}^q \alpha_i < 1\) for covariance stationarity.
Interpretation: Today’s conditional variance \(\sigma_t^2\) depends on yesterday’s squared shock \(\varepsilon_{t-1}^2\). A large shock yesterday inflates today’s variance, creating volatility clustering. ARCH(\(q\)) requires many lags to fit observed persistence well, motivating GARCH.
9.3 GARCH Model
with \(\omega > 0\), \(\alpha_i \geq 0\), \(\beta_j \geq 0\), and \(\sum \alpha_i + \sum \beta_j < 1\) for stationarity.
The GARCH(1,1) model is the most widely used specification:
\[ \sigma_t^2 = \omega + \alpha\varepsilon_{t-1}^2 + \beta\sigma_{t-1}^2 \]Interpretation:
- \(\omega/(1-\alpha-\beta)\) is the long-run (unconditional) variance.
- \(\alpha\) captures the immediate impact of shocks on volatility (ARCH effect).
- \(\beta\) measures the persistence of volatility — how much yesterday’s conditional variance carries over.
In practice, \(\hat{\alpha} + \hat{\beta} \approx 0.97\) to \(0.99\) for daily equity returns, implying very high volatility persistence. When \(\alpha + \beta = 1\), we have the IGARCH (Integrated GARCH) model, where volatility shocks are permanent.
GARCH volatility forecasting:
\[ E_t[\sigma_{t+h}^2] = \bar{\sigma}^2 + (\alpha+\beta)^{h-1}(\sigma_{t+1}^2 - \bar{\sigma}^2) \]where \(\bar{\sigma}^2 = \omega/(1-\alpha-\beta)\). Forecasts revert toward the long-run variance, with speed governed by \(\alpha+\beta\).
9.4 Extensions: EGARCH and GJR-GARCH
GJR-GARCH (Glosten-Jagannathan-Runkle): Adds a leverage effect — negative shocks have a larger impact on volatility than positive shocks:
\[ \sigma_t^2 = \omega + (\alpha + \gamma\mathbf{1}_{\varepsilon_{t-1}<0})\varepsilon_{t-1}^2 + \beta\sigma_{t-1}^2 \]where \(\mathbf{1}_{\varepsilon_{t-1}<0}\) is an indicator for negative shocks. Empirically, \(\hat{\gamma} > 0\) for equity indices (the leverage effect or volatility asymmetry).
EGARCH (Nelson, 1991): Models log-variance to ensure positivity without parameter constraints:
\[ \ln\sigma_t^2 = \omega + \sum_{j=1}^p \beta_j \ln\sigma_{t-j}^2 + \sum_{i=1}^q \left(\alpha_i |z_{t-i}| + \gamma_i z_{t-i}\right) \]The \(\gamma_i\) terms allow asymmetric response: negative \(z_{t-i}\) (negative shocks) increase log-variance by more than positive \(z_{t-i}\).
Realized volatility: With intraday data available, one can estimate daily volatility non-parametrically as the sum of squared intraday log-returns:
\[ RV_t = \sum_{j=1}^m r_{t,j}^2 \]where \(r_{t,j}\) are intraday returns sampled at frequency \(m\) per day. As \(m \to \infty\), \(RV_t \to \int_t^{t+1} \sigma_s^2 ds\) (integrated variance).
Chapter 10: Monte Carlo Simulation
10.1 Random Number Generation
Monte Carlo (MC) simulation uses pseudo-random number generators to approximate integrals, option prices, and risk measures that have no closed-form solution.
Linear congruential generator: Produces a sequence of pseudo-random integers via:
\[ X_{n+1} = (aX_n + c) \mod m \]Uniform \((0,1)\) samples: \(U_n = X_n/m\). These pass many statistical tests for randomness though they are deterministic.
Generating normal random variables — Box-Muller transform: Given two i.i.d. \(U_1, U_2 \sim U(0,1)\):
\[ Z_1 = \sqrt{-2\ln U_1}\cos(2\pi U_2), \quad Z_2 = \sqrt{-2\ln U_1}\sin(2\pi U_2) \]Then \(Z_1, Z_2 \sim i.i.d.\, N(0,1)\) — an exact transformation. A general normal sample is obtained as \(X = \mu + \sigma Z\).
Generating correlated normals: For a two-asset portfolio with correlation \(\rho\), generate \(Z_1, Z_2 \sim N(0,1)\) i.i.d. and set:
\[ \tilde{Z}_1 = Z_1, \quad \tilde{Z}_2 = \rho Z_1 + \sqrt{1-\rho^2} Z_2 \]Then \(\text{Cor}(\tilde{Z}_1, \tilde{Z}_2) = \rho\). In the \(n\)-asset case, use the Cholesky decomposition: \(\boldsymbol{\Sigma} = \mathbf{L}\mathbf{L}^T\) where \(\mathbf{L}\) is lower triangular, and generate \(\mathbf{r} = \mathbf{L}\mathbf{z}\) for i.i.d. standard normal \(\mathbf{z}\).
10.2 Simulating Asset Prices
Under the Black-Scholes model, the stock price at time \(T\) is:
\[ S_T = S_0 \exp\!\left(\left(\mu - \frac{\sigma^2}{2}\right)T + \sigma\sqrt{T}\,Z\right), \quad Z \sim N(0,1) \]For a path-dependent option or risk measure requiring \(n\) time steps, discretize the GBM via:
\[ S_{t+\Delta t} = S_t \exp\!\left(\left(\mu - \frac{\sigma^2}{2}\right)\Delta t + \sigma\sqrt{\Delta t}\,Z_t\right) \]where \(Z_t \sim N(0,1)\) i.i.d. The Euler-Maruyama scheme for a general SDE \(dS = a(S,t)dt + b(S,t)dW\):
\[ S_{t+\Delta t} \approx S_t + a(S_t,t)\Delta t + b(S_t,t)\sqrt{\Delta t}\,Z_t \]10.3 Variance Reduction Techniques
Antithetic variates: For each uniform sample \(U\), also use \(1-U\) (its antithetic pair). The estimator:
\[ \hat{\theta}_{AV} = \frac{1}{N}\sum_{i=1}^N \frac{f(Z_i) + f(-Z_i)}{2} \]Since \(f(Z)\) and \(f(-Z)\) are negatively correlated for monotone \(f\), the variance of \(\hat{\theta}_{AV}\) is smaller than that of the crude estimator. No additional simulations are required.
Control variates: If \(g(S)\) is a payoff with known analytical expectation \(E[g(S)] = \mu_g\), use:
\[ \hat{\theta}_{CV} = \hat{\theta} - c(\hat{\mu}_g - \mu_g) \]Optimal control coefficient: \(c^* = \text{Cov}(f,g)/\text{Var}(g)\). Maximum variance reduction is \(1 - \rho_{f,g}^2\).
Importance sampling: Shift the sampling distribution to the region of interest (e.g., the tail for VaR/ES estimation), then reweight:
\[ E_P[f(X)] = E_Q\left[f(X) \frac{dP}{dQ}(X)\right] \]where \(dP/dQ\) is the likelihood ratio (Radon-Nikodym derivative).
MC convergence: The standard error of a MC estimator with \(N\) simulations is:
\[ SE = \frac{\hat{\sigma}_f}{\sqrt{N}} \]To halve the standard error, quadruple the number of simulations. Variance reduction techniques improve the effective \(N\) without additional computation.
Chapter 11: Numerical Methods in Finance
11.1 The Binomial Tree Model
The binomial tree (Cox, Ross, Rubinstein, 1979) discretizes the asset price evolution into up and down moves per period.
One-period binomial model: In each period \(\Delta t = T/N\), the stock moves:
\[ S_u = S \cdot u, \quad S_d = S \cdot d \]with risk-neutral probabilities:
\[ p = \frac{e^{r\Delta t} - d}{u - d}, \quad 1-p = \frac{u - e^{r\Delta t}}{u - d} \]The CRR parameterization: \(u = e^{\sigma\sqrt{\Delta t}}\), \(d = 1/u = e^{-\sigma\sqrt{\Delta t}}\), so the tree is recombining (same node reached by “up then down” and “down then up”).
Multi-period tree pricing by backward induction: At terminal nodes, option payoffs are known. Working backward:
\[ V_t = e^{-r\Delta t}[p\, V_{t+1,u} + (1-p)\, V_{t+1,d}] \]For European options, this converges to the Black-Scholes price as \(N \to \infty\). For American options, at each node compare the hold value \(V_t\) (above) with the immediate exercise value; the option value is the maximum:
\[ V_t^{Am} = \max(\text{exercise value},\, e^{-r\Delta t}[p\, V_{t+1,u}^{Am} + (1-p)\, V_{t+1,d}^{Am}]) \]Early exercise of American puts (but generally not American calls on non-dividend-paying stocks) can be optimal when deep in-the-money.
11.2 Finite Difference Methods
The Black-Scholes PDE for a European option price \(V(S,t)\):
\[ \frac{\partial V}{\partial t} + \frac{1}{2}\sigma^2 S^2 \frac{\partial^2 V}{\partial S^2} + rS\frac{\partial V}{\partial S} - rV = 0 \]with terminal condition \(V(S,T) = \text{payoff}(S)\).
Finite difference methods discretize the \((S,t)\) grid. Let \(S_i = i\Delta S\) and \(t_j = j\Delta t\). Denote \(V_{i,j} \approx V(i\Delta S, j\Delta t)\).
Explicit scheme (forward difference in time):
\[ \frac{V_{i,j+1} - V_{i,j}}{\Delta t} + \frac{\sigma^2 S_i^2}{2} \frac{V_{i+1,j} - 2V_{i,j} + V_{i-1,j}}{(\Delta S)^2} + rS_i\frac{V_{i+1,j} - V_{i-1,j}}{2\Delta S} - rV_{i,j} = 0 \]Solving for \(V_{i,j}\) in terms of known values at time \(j\) gives explicit updates. Stable only if the time step satisfies the CFL condition: \(\Delta t \leq (\Delta S)^2/(\sigma^2 S_{max}^2)\).
Implicit (Crank-Nicolson) scheme averages the explicit and implicit discretizations, achieving second-order accuracy in both time and space. It is unconditionally stable.
Chapter 12: Introduction to Stochastic Calculus
12.1 Brownian Motion
- \(W_0 = 0\)
- Independent increments: \(W_t - W_s \perp W_s - W_u\) for \(u < s < t\)
- Normal increments: \(W_t - W_s \sim N(0, t-s)\) for \(s < t\)
- Continuous paths: \(t \mapsto W_t\) is almost surely continuous
Properties:
- \(E[W_t] = 0\), \(\text{Var}(W_t) = t\)
- \(\text{Cov}(W_s, W_t) = \min(s,t)\)
- Brownian motion is nowhere differentiable (paths have infinite variation on any interval)
- Quadratic variation: \([W,W]_t = t\) (the heuristic rule \((dW_t)^2 = dt\))
Geometric Brownian motion (GBM): The standard model for stock prices:
\[ dS_t = \mu S_t\, dt + \sigma S_t\, dW_t \]By Itô’s lemma (see below), the solution is:
\[ S_t = S_0 \exp\!\left(\left(\mu - \frac{\sigma^2}{2}\right)t + \sigma W_t\right) \]12.2 Itô’s Lemma
The extra \(\frac{1}{2}b_t^2 f_{xx}\) term (compared to ordinary calculus) arises from the quadratic variation of Brownian motion: \((dW_t)^2 = dt\).
Integrating from 0 to \(T\): \(\ln S_T - \ln S_0 = (\mu - \sigma^2/2)T + \sigma W_T\), confirming \(S_T = S_0 e^{(\mu-\sigma^2/2)T + \sigma W_T}\).
12.3 Black-Scholes Derivation Sketch
Black and Scholes (1973) derived the no-arbitrage price of a European option by constructing a delta-hedged portfolio that is instantaneously risk-free.
Step 1 — Delta-hedged portfolio: Hold one option long and \(\Delta = \partial V/\partial S\) shares short. Portfolio value: \(\Pi = V - \Delta S\).
Step 2 — Portfolio dynamics via Itô’s Lemma:
\[ d\Pi = dV - \Delta\, dS = \left(\frac{\partial V}{\partial t} + \frac{1}{2}\sigma^2 S^2 \frac{\partial^2 V}{\partial S^2}\right)dt \]The stochastic term \(\sigma S\, dW_t\) cancels perfectly. The portfolio is instantaneously riskless.
Step 3 — No-arbitrage: The riskless portfolio must earn the risk-free rate:
\[ d\Pi = r\Pi\, dt = r(V - \Delta S)\, dt \]Step 4 — Black-Scholes PDE:
\[ \frac{\partial V}{\partial t} + \frac{1}{2}\sigma^2 S^2 \frac{\partial^2 V}{\partial S^2} + rS\frac{\partial V}{\partial S} - rV = 0 \]Step 5 — Closed-form solution for a European call with strike \(K\) and maturity \(T\):
\[ C(S, t) = S_t\Phi(d_1) - Ke^{-r(T-t)}\Phi(d_2) \]\[ d_1 = \frac{\ln(S_t/K) + (r + \sigma^2/2)(T-t)}{\sigma\sqrt{T-t}}, \quad d_2 = d_1 - \sigma\sqrt{T-t} \]Put-Call Parity (model-free, from no-arbitrage):
\[ C - P = S - Ke^{-r(T-t)} \]Option Greeks:
| Greek | Formula | Interpretation |
|---|---|---|
| Delta \(\Delta\) | \(\Phi(d_1)\) (call) | Price sensitivity to \(S\) |
| Gamma \(\Gamma\) | \(\phi(d_1)/(S\sigma\sqrt{T-t})\) | Rate of change of delta |
| Vega \(\mathcal{V}\) | \(S\phi(d_1)\sqrt{T-t}\) | Sensitivity to \(\sigma\) |
| Theta \(\Theta\) | Complex negative expression | Time decay (value lost per day) |
| Rho \(\rho\) | \(K(T-t)e^{-r(T-t)}\Phi(d_2)\) | Sensitivity to \(r\) |
Chapter 13: Value at Risk and Expected Shortfall
13.1 Value at Risk
Equivalently, \(VaR_\alpha\) is the \((1-\alpha)\) quantile of the loss distribution. At 95% confidence: \(P(\text{Loss} > VaR_{0.95}) = 0.05\).
Three methods for estimating VaR:
13.1.1 Historical Simulation
Given a window of \(T\) past returns \(\{r_1, \ldots, r_T\}\), apply those returns to the current portfolio value \(V_0\) to generate \(T\) hypothetical P&L scenarios:
\[ \text{Loss}_t = -V_0 \cdot r_t \]Sort the losses; VaR at 95% is the 5th percentile of simulated losses (i.e., the 5th-worst outcome in a window of 100).
Advantages: Non-parametric; automatically captures fat tails, skewness, and correlations as observed historically. No distributional assumptions.
Disadvantages: Entirely backward-looking; uses only \(T\) scenarios; sudden regime changes are poorly handled; equal-weighting of all historical periods.
13.1.2 Parametric (Variance-Covariance) VaR
Assume portfolio returns are normally distributed: \(r_p \sim N(\mu_p, \sigma_p^2)\). Then:
\[ VaR_\alpha = -\mu_p + z_\alpha \sigma_p \]where \(z_\alpha = \Phi^{-1}(\alpha)\) (e.g., \(z_{0.95} = 1.645\), \(z_{0.99} = 2.326\)).
For a portfolio with value \(V_0\):
\[ VaR_\alpha = V_0(-\mu_p + z_\alpha\sigma_p) \]Multi-asset parametric VaR: With weight vector \(\mathbf{w}\) and covariance matrix \(\boldsymbol{\Sigma}\):
\[ \sigma_p = \sqrt{\mathbf{w}^T\boldsymbol{\Sigma}\mathbf{w}} \]\[ VaR_\alpha = V_0(z_\alpha\sigma_p - \mu_p) \]Advantages: Simple, fast, easily aggregated across positions. Disadvantages: Normality assumption understates tail risk; VaR is not subadditive (a merged portfolio can have higher VaR than the sum of parts in non-elliptical distributions).
13.1.3 Monte Carlo VaR
Simulate \(N\) scenarios of portfolio returns using a specified distributional model (e.g., multivariate normal, t-distribution, GARCH-based, jump-diffusion). Compute portfolio P&L for each scenario and estimate VaR as the empirical quantile of simulated losses.
This approach can accommodate complex portfolios (options, non-linear instruments), non-normal return distributions, and dynamic hedging strategies at the cost of computational intensity.
13.2 Expected Shortfall
ES is the average of all VaR levels above \(\alpha\) — it captures the full extent of tail losses beyond the VaR threshold.
Parametric ES under normality:
\[ ES_\alpha = \mu_p + \sigma_p\frac{\phi(z_\alpha)}{1-\alpha} \]where \(\phi\) is the standard normal pdf. Since \(\phi(z_\alpha)/(1-\alpha) > z_\alpha\), ES is always larger than VaR.
Advantages of ES over VaR:
- Coherent risk measure: ES satisfies subadditivity — \(ES(\text{portfolio}) \leq ES(A) + ES(B)\) — encouraging diversification. VaR is not always subadditive.
- Captures severity: VaR says “losses exceed \(X\) with 5% probability” but says nothing about how large those losses are. ES provides information about the entire tail.
- Basel III/IV regulatory adoption: ES (at 97.5%) has replaced VaR (at 99%) as the internal model standard for market risk capital.
So the expected loss on the worst 1% of days is $26,700 — about 15% larger than the VaR threshold.
13.3 Backtesting VaR
Kupiec proportion of failures (POF) test: Over \(T\) days, let \(N\) be the number of VaR exceedances (loss > VaR). Under a correctly specified VaR model:
\[ N \sim \text{Binomial}(T, 1-\alpha) \]The likelihood ratio test statistic:
\[ LR_{POF} = -2\ln\left[\frac{(1-\alpha)^{T-N}\alpha^N}{(1-N/T)^{T-N}(N/T)^N}\right] \sim \chi^2_1 \text{ under } H_0 \]A model producing too many or too few exceptions fails the backtest and may require recalibration.
Christoffersen (1998) conditional coverage test jointly tests whether exceptions occur at the correct frequency and are independently distributed over time (no clustering of exceptions). VaR exceptions should be serially independent; clustering indicates the model does not respond quickly enough to volatility changes.
Chapter 14: Integration of Methods — A Complete Quantitative Workflow
14.1 From Data to Decision
A complete quantitative finance workflow integrates every tool developed in this course.
Phase 1 — Data acquisition and cleaning: Obtain return data from reliable sources (e.g., FRED for macro data, French Data Library for factor returns). Compute log returns. Check for data quality: missing values, stock splits and dividends (use adjusted prices), survivorship bias in fund data, look-ahead bias in accounting data.
Phase 2 — Exploratory data analysis: Compute descriptive statistics: mean, standard deviation, skewness, excess kurtosis, minimum, maximum, and percentiles. Produce time series plots, histograms with normal overlay, Q-Q plots. Run the Jarque-Bera test for normality. Plot the ACF of returns and squared returns to detect serial correlation and ARCH effects.
Phase 3 — Model specification and estimation: Specify the econometric model (regression model, ARMA-GARCH, etc.) motivated by the EDA and theory. Estimate by OLS or maximum likelihood. Report parameter estimates with standard errors, t-statistics, and p-values. For regression, report \(R^2\), adjusted \(R^2\), and the F-statistic.
Phase 4 — Diagnostic testing: Test regression residuals for normality (JB test), heteroskedasticity (Breusch-Pagan or White test), autocorrelation (DW or Ljung-Box), and multicollinearity (VIF). Test for ARCH effects in volatility models.
Phase 5 — Inference and interpretation: Distinguish statistical significance from economic significance. A large sample may produce tiny p-values for economically negligible effects. Report effect sizes (e.g., the magnitude of alpha in basis points, the economic impact of a one-standard-deviation move in a regressor).
Phase 6 — Forecasting and risk assessment: Use estimated models to produce point forecasts and forecast intervals. Compute VaR and Expected Shortfall for risk management. Backtest risk models against historical exceedances.
Phase 7 — Reporting: Produce a well-organized research report with clear motivation, transparent methodology, complete results tables, proper diagnostics, and honest limitations. Tables should use asterisks to denote significance: \(^* p < 0.10\), \(^{**} p < 0.05\), \(^{***} p < 0.01\).
14.2 Common Pitfalls in Quantitative Finance
14.3 A Unified Formula Reference
Statistics:
\[ \bar{x} = \frac{1}{n}\sum x_i, \quad s^2 = \frac{\sum(x_i-\bar{x})^2}{n-1}, \quad t = \frac{\bar{x}-\mu_0}{s/\sqrt{n}} \]OLS:
\[ \hat{\boldsymbol{\beta}} = (\mathbf{X}^T\mathbf{X})^{-1}\mathbf{X}^T\mathbf{y}, \quad R^2 = 1 - \frac{SSR}{SST}, \quad F = \frac{SSM/k}{SSR/(T-k-1)} \]Portfolio:
\[ \mu_p = \mathbf{w}^T\boldsymbol{\mu}, \quad \sigma_p^2 = \mathbf{w}^T\boldsymbol{\Sigma}\mathbf{w}, \quad SR = \frac{\mu_p - r_f}{\sigma_p} \]Time series:
\[ \rho(k) = \frac{\text{Cov}(X_t, X_{t-k})}{\text{Var}(X_t)}, \quad AIC = -2\ln\hat{L} + 2k, \quad BIC = -2\ln\hat{L} + k\ln T \]GARCH(1,1):
\[ \sigma_t^2 = \omega + \alpha\varepsilon_{t-1}^2 + \beta\sigma_{t-1}^2, \quad \bar{\sigma}^2 = \frac{\omega}{1-\alpha-\beta} \]Black-Scholes:
\[ C = S\Phi(d_1) - Ke^{-rT}\Phi(d_2), \quad d_{1,2} = \frac{\ln(S/K)+(r\pm\sigma^2/2)T}{\sigma\sqrt{T}} \]Risk measures:
\[ VaR_\alpha = \mu_p - z_\alpha\sigma_p, \quad ES_\alpha = \mu_p + \sigma_p\frac{\phi(z_\alpha)}{1-\alpha} \]Summary
AFM 323 builds the quantitative scaffold that underlies virtually every area of modern finance. Starting from probability theory — probability spaces, random variables, the normal and lognormal distributions, and moment generating functions — the course develops the statistical machinery of hypothesis testing, confidence intervals, and regression analysis. The OLS estimator is treated rigorously through the Gauss-Markov theorem and extended to multiple regression in matrix form, with applications in the CAPM and Fama-French factor models. Diagnostic methods (heteroskedasticity, autocorrelation, multicollinearity) translate abstract assumptions into actionable tests.
The linear algebra chapter makes portfolio mathematics precise: the quadratic form \(\mathbf{w}^T\boldsymbol{\Sigma}\mathbf{w}\) is the portfolio variance; constrained optimization via Lagrange multipliers traces the efficient frontier analytically. Financial mathematics connects continuous compounding, bond pricing, and the Newton-Raphson algorithm for yield computation into a coherent set of tools. Time series analysis — AR, MA, ARMA models, the ACF/PACF identification scheme, and ADF unit root tests — provides the vocabulary for dynamic modeling of return processes.
ARCH and GARCH models formalize the empirically pervasive phenomenon of volatility clustering, with GARCH(1,1) serving as the industry workhorse for daily volatility estimation and forecasting. Monte Carlo simulation with variance reduction techniques (antithetic variates, control variates, importance sampling) provides flexible numerical integration for complex payoffs and risk measures. Finite difference methods and the binomial tree offer structured alternatives for option pricing that converge to the Black-Scholes price as the grid is refined.
The Black-Scholes derivation — resting on Brownian motion, Itô’s Lemma, and the delta-hedging argument — illustrates how continuous-time stochastic calculus produces exact, model-dependent pricing formulas. Finally, Value at Risk and Expected Shortfall translate statistical models into the risk capital numbers that drive regulatory compliance and portfolio management decisions.
Throughout, the emphasis is on connecting mathematical rigor to economic interpretation and on the discipline of honest empirical practice: specifying models before testing, distinguishing in-sample fit from out-of-sample validity, and translating statistical findings into economically meaningful conclusions.