AFM 323: Quantitative Foundations for Finance

Tony Wirjanto

Estimated study time: 1 hr 49 min

Table of contents

Sources and References

Primary textbook — Hull, J. C. Options, Futures, and Other Derivatives, 11th Edition. Pearson, 2022. McDonald, R. L. Derivatives Markets, 3rd Edition. Pearson, 2013. Shreve, S. E. Stochastic Calculus for Finance I & II. Springer, 2004.

Supplementary texts — Campbell, J. Y., Lo, A. W., and MacKinlay, A. C. The Econometrics of Financial Markets. Princeton University Press, 1997. Tsay, R. S. Analysis of Financial Time Series, 3rd Edition. Wiley, 2010. Greene, W. H. Econometric Analysis, 8th Edition. Pearson, 2018.

Online resources — FRED (Federal Reserve Economic Data); Kenneth French Data Library (Fama-French factors); Yahoo Finance; MIT OCW 18.S096 Topics in Mathematics with Applications in Finance.

Chapter 1: Probability Foundations for Finance

1.1 Probability Spaces

Financial modeling begins with a rigorous probabilistic framework. A probability space is a triple $(\Omega, \mathcal{F}, P)$ where:

Sample space $\Omega$: The set of all possible outcomes. In a coin flip, $\Omega = \{H, T\}$; in a model of stock prices over one period, $\Omega$ might be the set of all possible price trajectories.

Sigma-algebra (event space) $\mathcal{F}$: A collection of subsets of $\Omega$ (called events) that is closed under complementation and countable unions. The sigma-algebra $\mathcal{F}$ encodes what can be observed — what questions about outcomes are well-posed.

Probability measure $P$: A function $P: \mathcal{F} \to [0,1]$ satisfying:

$P(\Omega) = 1$
$P(\emptyset) = 0$
For disjoint events $A_1, A_2, \ldots$: $P\!\left(\bigcup_{i=1}^\infty A_i\right) = \sum_{i=1}^\infty P(A_i)$ (countable additivity)

The probability measure $P$ is called the physical or real-world measure. In derivative pricing, we will also encounter the risk-neutral measure $Q$, under which discounted asset prices are martingales. These two measures are related by Girsanov’s theorem.

Conditional probability is fundamental to finance because information arrives over time and beliefs must be updated:

\[ P(A \mid B) = \frac{P(A \cap B)}{P(B)}, \quad P(B) > 0 \]

Bayes’ theorem — the core engine of belief updating — follows directly:

\[ P(A \mid B) = \frac{P(B \mid A) \cdot P(A)}{P(B)} \]

Example — Bayesian updating in credit risk: Suppose a bond issuer defaults with probability $P(\text{default}) = 0.02$ a priori. A credit rating downgrade occurs with probability $P(\text{downgrade} \mid \text{default}) = 0.80$ and $P(\text{downgrade} \mid \text{no default}) = 0.10$. After observing a downgrade: \[ P(\text{default} \mid \text{downgrade}) = \frac{0.80 \times 0.02}{0.80 \times 0.02 + 0.10 \times 0.98} = \frac{0.016}{0.114} \approx 0.140 \]

The posterior default probability jumps from 2% to 14% — a seven-fold increase reflecting the informational content of the downgrade.

1.2 Random Variables and Their Distributions

Random variable: A measurable function $X: \Omega \to \mathbb{R}$ that maps each outcome to a real number. A random variable is discrete if it takes countably many values; continuous if it has a probability density function (pdf).

Cumulative distribution function (CDF):

\[ F_X(x) = P(X \leq x) \]

For a continuous random variable with pdf $f_X$:

\[ F_X(x) = \int_{-\infty}^{x} f_X(u)\, du \]

Key properties of the CDF: $F$ is non-decreasing, right-continuous, $\lim_{x \to -\infty} F(x) = 0$, and $\lim_{x \to \infty} F(x) = 1$.

1.2.1 The Normal Distribution

The normal (Gaussian) distribution is the cornerstone of classical statistics and financial modeling.

Normal distribution: $X \sim N(\mu, \sigma^2)$ has pdf: \[ f_X(x) = \frac{1}{\sigma\sqrt{2\pi}} \exp\!\left(-\frac{(x-\mu)^2}{2\sigma^2}\right), \quad x \in \mathbb{R} \]

with mean $E[X] = \mu$ and variance $\text{Var}(X) = \sigma^2$.

The standard normal $Z \sim N(0,1)$ has pdf $\phi(z) = (2\pi)^{-1/2} e^{-z^2/2}$ and CDF $\Phi(z)$. Any normal random variable can be standardized: if $X \sim N(\mu,\sigma^2)$ then $Z = (X-\mu)/\sigma \sim N(0,1)$.

The normal distribution is characterized by the 68-95-99.7 rule:

\[ P(\mu - \sigma \leq X \leq \mu + \sigma) \approx 0.683 \]\[ P(\mu - 2\sigma \leq X \leq \mu + 2\sigma) \approx 0.954 \]\[ P(\mu - 3\sigma \leq X \leq \mu + 3\sigma) \approx 0.997 \]

Moment generating function (MGF) of the normal:

\[ M_X(t) = E[e^{tX}] = \exp\!\left(\mu t + \frac{\sigma^2 t^2}{2}\right) \]

The MGF is useful for deriving moments: $E[X] = M_X'(0) = \mu$ and $E[X^2] = M_X''(0) = \mu^2 + \sigma^2$, confirming $\text{Var}(X) = \sigma^2$.

1.2.2 The Lognormal Distribution

Lognormal distribution: $Y$ is lognormal if $\ln Y \sim N(\mu, \sigma^2)$. Its pdf is: \[ f_Y(y) = \frac{1}{y\sigma\sqrt{2\pi}} \exp\!\left(-\frac{(\ln y - \mu)^2}{2\sigma^2}\right), \quad y > 0 \]

The lognormal arises naturally as the distribution of asset prices when log returns are normally distributed. Its moments are:

\[ E[Y] = e^{\mu + \sigma^2/2} \]\[ \text{Var}(Y) = e^{2\mu + \sigma^2}(e^{\sigma^2} - 1) \]

Why lognormal for prices? If $S_t$ is a stock price and $\ln(S_t/S_{t-1}) = r_t \sim N(\mu, \sigma^2)$ i.i.d., then over $T$ periods: \[ \ln\!\left(\frac{S_T}{S_0}\right) = \sum_{t=1}^T r_t \sim N(T\mu, T\sigma^2) \]

so $S_T = S_0 \exp\!\left(\sum r_t\right)$ is lognormal. Prices are bounded below by zero (limited liability) and the percentage change distribution is symmetric — both properties matched by the lognormal but not the normal.

1.2.3 The Poisson Distribution

The Poisson distribution models the number of rare events occurring in a fixed time interval.

Poisson distribution: $X \sim \text{Poisson}(\lambda)$ has PMF: \[ P(X = k) = \frac{e^{-\lambda} \lambda^k}{k!}, \quad k = 0, 1, 2, \ldots \]

with mean $E[X] = \lambda$ and variance $\text{Var}(X) = \lambda$ (equidispersion property).

In finance, Poisson processes model jump arrivals in jump-diffusion models (Merton, 1976), credit events (default arrivals in reduced-form credit models), and trade arrivals in market microstructure. The Poisson process $\{N_t\}_{t \geq 0}$ counts events up to time $t$: $N_t - N_s \sim \text{Poisson}(\lambda(t-s))$ for $t > s$, with independent increments.

1.3 Expectation, Variance, and Covariance

Expectation: For a continuous random variable with pdf $f$: \[ E[X] = \int_{-\infty}^{\infty} x\, f(x)\, dx \]

For a function $g(X)$: $E[g(X)] = \int_{-\infty}^{\infty} g(x) f(x)\, dx$ (Law of the Unconscious Statistician).

Variance:

\[ \text{Var}(X) = E[(X - E[X])^2] = E[X^2] - (E[X])^2 \]

Standard deviation: $\sigma_X = \sqrt{\text{Var}(X)}$

Covariance:

\[ \text{Cov}(X,Y) = E[(X - E[X])(Y - E[Y])] = E[XY] - E[X]E[Y] \]

Correlation:

\[ \rho_{XY} = \frac{\text{Cov}(X,Y)}{\sigma_X \sigma_Y} \in [-1, 1] \]

Key algebraic properties:

Linearity of expectation: $E[aX + bY] = aE[X] + bE[Y]$ (always, regardless of dependence)
Variance of a linear combination: $\text{Var}(aX + bY) = a^2\text{Var}(X) + b^2\text{Var}(Y) + 2ab\,\text{Cov}(X,Y)$
If $X$ and $Y$ are independent: $\text{Cov}(X,Y) = 0$ and $\text{Var}(X + Y) = \text{Var}(X) + \text{Var}(Y)$

Law of Total Expectation (Tower Property): \[ E[X] = E[E[X \mid Y]] \]

This is used extensively in dynamic programming and option pricing. It says: to compute the unconditional expectation of $X$, first compute the conditional expectation given $Y$, then average over $Y$.

Law of Total Variance:

\[ \text{Var}(X) = E[\text{Var}(X \mid Y)] + \text{Var}(E[X \mid Y]) \]

Total variance = average within-group variance + variance of group means.

1.4 Moment Generating Functions

Moment generating function (MGF): \[ M_X(t) = E[e^{tX}] = \int_{-\infty}^{\infty} e^{tx} f(x)\, dx \]

defined for all $t$ in some neighborhood of 0. The $n$-th moment of $X$ is:

\[ E[X^n] = M_X^{(n)}(0) = \left.\frac{d^n}{dt^n} M_X(t)\right|_{t=0} \]

MGFs of common distributions:

Distribution	MGF $M_X(t)$
$N(\mu, \sigma^2)$	$\exp(\mu t + \sigma^2 t^2/2)$
$\text{Poisson}(\lambda)$	$\exp(\lambda(e^t - 1))$
$\text{Exponential}(\lambda)$	$\lambda/(\lambda - t)$, $t < \lambda$
$\text{Bernoulli}(p)$	$1 - p + pe^t$

The MGF uniquely determines a distribution (when it exists in a neighborhood of 0). A key application: if $X$ and $Y$ are independent with MGFs $M_X$ and $M_Y$, then $M_{X+Y}(t) = M_X(t) \cdot M_Y(t)$. This immediately proves that the sum of independent normals is normal.

Cumulant generating function (CGF): $K_X(t) = \ln M_X(t)$. The cumulants $\kappa_n = K_X^{(n)}(0)$ have the property that $\kappa_1 = E[X]$, $\kappa_2 = \text{Var}(X)$, $\kappa_3$ is related to skewness, $\kappa_4$ to excess kurtosis. For the normal distribution, all cumulants of order 3 and higher are zero — deviation from this provides a measure of non-normality.

Chapter 2: Financial Data and Descriptive Statistics

2.1 Asset Returns as Random Variables

Simple (gross) return: $ R_t = P_t / P_{t-1} $

Net return: $ r_t = R_t - 1 = (P_t - P_{t-1})/P_{t-1} $

Continuously compounded (log) return:

\[ r_t^{cc} = \ln P_t - \ln P_{t-1} = \ln(1 + r_t) \]

Log returns are additive over time: $r_t^{cc}(k) = \sum_{i=0}^{k-1} r_{t-i}^{cc}$, whereas simple returns are multiplicative: $R_t(k) = \prod_{i=0}^{k-1} R_{t-i}$. For small $r$, $\ln(1+r) \approx r$, so log and simple returns are approximately equal at daily or weekly frequencies.

2.2 Descriptive Statistics

Central tendency:

\[ \bar{r} = \frac{1}{T}\sum_{t=1}^T r_t \quad \text{(arithmetic mean)} \]\[ \bar{r}_g = \left(\prod_{t=1}^T (1+r_t)\right)^{1/T} - 1 \quad \text{(geometric mean)} \]

The arithmetic mean overstates long-run compounded growth. The geometric mean exceeds the arithmetic mean only if all returns are equal; otherwise $\bar{r}_g < \bar{r}$, with approximate gap $\approx \sigma^2/2$.

Dispersion:

\[ s^2 = \frac{1}{T-1}\sum_{t=1}^T (r_t - \bar{r})^2 \quad \text{(sample variance)} \]

Skewness and kurtosis:

\[ \text{Skew} = \frac{1}{T}\sum_{t=1}^T \left(\frac{r_t - \bar{r}}{s}\right)^3, \quad \text{Kurt} = \frac{1}{T}\sum_{t=1}^T \left(\frac{r_t - \bar{r}}{s}\right)^4 \]

Excess kurtosis $= \text{Kurt} - 3$. Financial returns routinely show negative skew and positive excess kurtosis (fat tails), violating the normality assumption underlying many classical models.

Jarque-Bera test for normality:

\[ JB = \frac{T}{6}\left(\text{Skew}^2 + \frac{(\text{Kurt}-3)^2}{4}\right) \sim \chi^2_2 \text{ under } H_0: \text{normal} \]

Chapter 3: Statistical Inference for Finance

3.1 Sampling Distributions and the Central Limit Theorem

Central Limit Theorem (CLT): Let $X_1, X_2, \ldots, X_n$ be i.i.d. with mean $\mu$ and variance $\sigma^2 < \infty$. Then: \[ \frac{\bar{X} - \mu}{\sigma/\sqrt{n}} \xrightarrow{d} N(0,1) \text{ as } n \to \infty \]

The sample mean is approximately normally distributed in large samples, regardless of the underlying distribution of $X_i$. This is the foundation for large-sample inference in finance.

The CLT requires finite variance — a condition that may fail for heavy-tailed distributions (e.g., stable distributions with tail index $\alpha < 2$). When the CLT applies, confidence intervals and t-tests are asymptotically valid even without normality.

3.2 Confidence Intervals

A 95% confidence interval for the population mean $\mu$, given a sample of size $n$ with known $\sigma$:

\[ \bar{X} \pm 1.96 \cdot \frac{\sigma}{\sqrt{n}} \]

When $\sigma$ is unknown (the usual case), use the sample standard deviation $s$ and replace the standard normal critical value with a t-critical value:

\[ \bar{X} \pm t_{\alpha/2,\, n-1} \cdot \frac{s}{\sqrt{n}} \]

The t-distribution with $\nu$ degrees of freedom has heavier tails than the normal; as $\nu \to \infty$ it converges to $N(0,1)$.

Example — Estimating mean monthly return: Suppose monthly returns on a stock have sample mean $\bar{r} = 0.012$ (1.2%) and sample std dev $s = 0.06$ (6%) over $T = 60$ months. A 95% CI for the true mean monthly return: \[ 0.012 \pm 2.00 \cdot \frac{0.06}{\sqrt{60}} = 0.012 \pm 0.0155 = (-0.3\%, 2.7\%) \]

The interval includes zero — we cannot statistically reject the hypothesis that the true mean return is zero, even though the point estimate is positive. This illustrates the difficulty of detecting a risk premium with limited data.

3.3 Hypothesis Testing

Step-by-step hypothesis testing:

State $H_0$ (null) and $H_1$ (alternative).
Choose significance level $\alpha$ (typically 1%, 5%, or 10%).
Compute the test statistic.
Determine the rejection region or compute the p-value.
Make a decision: reject $H_0$ if the test statistic falls in the rejection region (or equivalently, if p-value $< \alpha$).

One-sample t-test: $H_0: \mu = \mu_0$ vs. $H_1: \mu \neq \mu_0$:

\[ t = \frac{\bar{X} - \mu_0}{s/\sqrt{n}} \sim t_{n-1} \text{ under } H_0 \]

Type I error (false positive): Reject $H_0$ when it is true. Probability = $\alpha$. Type II error (false negative): Fail to reject $H_0$ when it is false. Probability = $\beta$. Power of a test: $1 - \beta$. The probability of correctly rejecting a false $H_0$.

p-value interpretation: The p-value is not the probability that $H_0$ is true. It is the probability of observing a test statistic at least as extreme as the one observed, assuming $H_0$ is true. A small p-value means the data are unlikely under $H_0$, not that $H_0$ is probably false.

Two-sample t-test for equality of means: Used to compare performance of two portfolios, two trading strategies, etc. Under equal variances:

\[ t = \frac{\bar{X}_1 - \bar{X}_2}{s_p\sqrt{1/n_1 + 1/n_2}}, \quad s_p^2 = \frac{(n_1-1)s_1^2 + (n_2-1)s_2^2}{n_1+n_2-2} \]

Chapter 4: Simple Linear Regression

4.1 The OLS Estimator

The simple linear regression model:

\[ y_t = \alpha + \beta x_t + \varepsilon_t, \quad t = 1, \ldots, T \]

OLS minimizes the sum of squared residuals (SSR):

\[ \min_{\alpha,\beta} \sum_{t=1}^T (y_t - \alpha - \beta x_t)^2 \]

Taking first-order conditions and solving:

\[ \hat{\beta} = \frac{\sum_{t=1}^T (x_t - \bar{x})(y_t - \bar{y})}{\sum_{t=1}^T (x_t - \bar{x})^2} = \frac{\widehat{\text{Cov}}(x,y)}{\widehat{\text{Var}}(x)} \]\[ \hat{\alpha} = \bar{y} - \hat{\beta}\bar{x} \]

The regression line always passes through the point $(\bar{x}, \bar{y})$.

4.2 Gauss-Markov Theorem and Classical Assumptions

Gauss-Markov Theorem: Under the classical linear regression assumptions:

Linearity: The DGP is $y = X\beta + \varepsilon$.
Full rank: $X$ has rank $k+1$ (no perfect multicollinearity).
Exogeneity: $E[\varepsilon \mid X] = 0$.
Spherical errors: $\text{Var}(\varepsilon \mid X) = \sigma^2 I_T$ (homoskedasticity and no autocorrelation).

the OLS estimator $\hat{\beta}$ is BLUE — Best (minimum variance) Linear Unbiased Estimator. No other linear unbiased estimator has smaller variance.

Unbiasedness: Under assumptions 1–3:

\[ E[\hat{\beta}] = E\left[(X^TX)^{-1}X^Ty\right] = (X^TX)^{-1}X^TX\beta = \beta \]

Variance of OLS estimator:

\[ \text{Var}(\hat{\beta}) = \sigma^2(X^TX)^{-1} \]

estimated by substituting $\hat{\sigma}^2 = SSR/(T-k-1)$ for $\sigma^2$.

4.3 Goodness of Fit

Decomposition of total variation:

\[ \underbrace{\sum(y_t - \bar{y})^2}_{SST} = \underbrace{\sum(\hat{y}_t - \bar{y})^2}_{SSM} + \underbrace{\sum\hat{\varepsilon}_t^2}_{SSR} \]\[ R^2 = \frac{SSM}{SST} = 1 - \frac{SSR}{SST} \in [0,1] \]

In simple regression, $R^2 = \hat{\rho}^2_{xy}$ — the squared sample correlation. In the CAPM context, $R^2$ equals the fraction of total return variance attributable to market-wide (systematic) risk.

4.4 Inference in Regression

t-test for a single coefficient ($H_0: \beta = 0$):

\[ t = \frac{\hat{\beta}}{SE(\hat{\beta})} \sim t_{T-2} \text{ under } H_0 \]

where $SE(\hat{\beta}) = \hat{\sigma}/\sqrt{\sum(x_t - \bar{x})^2}$.

F-test for overall significance ($H_0: \beta = 0$ in simple regression, or joint zero in multiple):

\[ F = \frac{SSM/k}{SSR/(T-k-1)} \sim F_{k, T-k-1} \text{ under } H_0 \]

In simple regression, $F = t^2$ — the two tests are equivalent.

4.5 CAPM as a Regression

The Security Characteristic Line (SCL):

\[ r_{it} - r_{ft} = \alpha_i + \beta_i(r_{mt} - r_{ft}) + \varepsilon_{it} \]

Jensen’s alpha $\alpha_i$: the intercept measures abnormal return relative to the CAPM benchmark. Under the CAPM null, $\alpha_i = 0$ for all assets in equilibrium.

Market beta $\beta_i$: measures systematic risk — the sensitivity of asset $i$’s excess return to the market excess return.

\[ \beta_i = \frac{\text{Cov}(r_i, r_m)}{\text{Var}(r_m)} \]

Portfolio beta is the value-weighted average of constituent betas: $\beta_p = \sum_i w_i \beta_i$.

Example — CAPM estimation: Suppose we run the SCL regression for a Canadian equity using 60 months of data and obtain $\hat{\alpha} = 0.009$ (0.9% per month), $\hat{\beta} = 1.35$, $SE(\hat{\beta}) = 0.15$, $R^2 = 0.48$. The t-statistic for the null $\beta = 1$ is: \[ t = \frac{1.35 - 1.00}{0.15} = 2.33 \]

which exceeds the 5% two-tailed critical value of approximately 2.00, so we reject $\beta = 1$ at the 5% level. The stock is significantly more volatile than the market.

Chapter 5: Multiple Linear Regression and Factor Models

5.1 The Multiple Regression Model in Matrix Form

With $k$ regressors and $T$ observations, write $\mathbf{y} = \mathbf{X}\boldsymbol{\beta} + \boldsymbol{\varepsilon}$ where:

$\mathbf{y}$ is $T \times 1$
$\mathbf{X}$ is $T \times (k+1)$ (including a column of ones for the intercept)
$\boldsymbol{\beta}$ is $(k+1) \times 1$
$\boldsymbol{\varepsilon}$ is $T \times 1$

The OLS estimator is:

\[ \hat{\boldsymbol{\beta}} = (\mathbf{X}^T\mathbf{X})^{-1}\mathbf{X}^T\mathbf{y} \]

The fitted values are $\hat{\mathbf{y}} = \mathbf{X}\hat{\boldsymbol{\beta}} = \mathbf{H}\mathbf{y}$ where $\mathbf{H} = \mathbf{X}(\mathbf{X}^T\mathbf{X})^{-1}\mathbf{X}^T$ is the hat matrix (projection onto the column space of $\mathbf{X}$).

Adjusted $R^2$ penalizes for additional regressors:

\[ \bar{R}^2 = 1 - \frac{SSR/(T-k-1)}{SST/(T-1)} = 1 - (1-R^2)\frac{T-1}{T-k-1} \]

Unlike $R^2$, $\bar{R}^2$ can decrease when an irrelevant regressor is added.

5.2 The Fama-French Factor Models

Three-factor model (Fama and French, 1993):

\[ r_{it} - r_{ft} = \alpha_i + \beta_i^{MKT} MKT_t + \beta_i^{SMB} SMB_t + \beta_i^{HML} HML_t + \varepsilon_{it} \]

MKT: Market excess return $r_m - r_f$
SMB (Small Minus Big): Return spread between small-cap and large-cap portfolios, capturing the size premium
HML (High Minus Low): Return spread between high and low book-to-market portfolios, capturing the value premium

Five-factor model (Fama and French, 2015) adds:

RMW (Robust Minus Weak): Profitability factor
CMA (Conservative Minus Aggressive): Investment factor

Carhart (1997) four-factor model adds momentum:

MOM (Winners Minus Losers): Past-12-month return spread

These multifactor regressions are estimated by OLS; the intercepts (alphas) test whether a portfolio or fund earns returns beyond the systematic factor exposures.

5.3 Regression Diagnostics

5.3.1 Heteroskedasticity

Heteroskedasticity: $\text{Var}(\varepsilon_t \mid \mathbf{x}_t) = \sigma_t^2$ varies with $t$. OLS remains unbiased but is no longer BLUE, and reported standard errors are incorrect.

Breusch-Pagan test: Regress squared OLS residuals $\hat{\varepsilon}_t^2$ on the regressors. The LM statistic $= T \cdot R^2$ from that auxiliary regression is $\chi^2_k$ under $H_0$ of homoskedasticity.

White (1980) heteroskedasticity-robust standard errors correct inference without specifying the form of heteroskedasticity:

\[ \widehat{\text{Var}}_{HC}(\hat{\boldsymbol{\beta}}) = (\mathbf{X}^T\mathbf{X})^{-1}\left(\sum_{t=1}^T \hat{\varepsilon}_t^2 \mathbf{x}_t\mathbf{x}_t^T\right)(\mathbf{X}^T\mathbf{X})^{-1} \]

This “sandwich” variance estimator is asymptotically valid whether or not errors are homoskedastic.

5.3.2 Autocorrelation

Autocorrelation: $\text{Cov}(\varepsilon_t, \varepsilon_s) \neq 0$ for $t \neq s$. Standard errors are biased (usually downward), causing t-statistics to be overstated.

Durbin-Watson statistic:

\[ DW = \frac{\sum_{t=2}^T (\hat{\varepsilon}_t - \hat{\varepsilon}_{t-1})^2}{\sum_{t=1}^T \hat{\varepsilon}_t^2} \approx 2(1 - \hat{\rho}_1) \]

DW near 2 signals no first-order autocorrelation; below 2 indicates positive, above 2 negative autocorrelation.

Newey-West HAC standard errors are valid under both heteroskedasticity and autocorrelation of unknown form:

\[ \widehat{\text{Var}}_{NW}(\hat{\boldsymbol{\beta}}) = (\mathbf{X}^T\mathbf{X})^{-1} \hat{\mathbf{S}} (\mathbf{X}^T\mathbf{X})^{-1} \]

where $\hat{\mathbf{S}}$ is a weighted sum of lagged autocovariance matrices of the score.

5.3.3 Multicollinearity

Variance Inflation Factor:

\[ VIF_j = \frac{1}{1 - R_j^2} \]

where $R_j^2$ is from regressing $x_j$ on all other predictors. $VIF_j > 10$ indicates severe multicollinearity; coefficient estimates are imprecise even if the model fits well overall.

Chapter 6: Linear Algebra for Portfolio Management

6.1 Vectors and Matrices

A portfolio of $n$ assets is fully described by a weight vector:

\[ \mathbf{w} = (w_1, w_2, \ldots, w_n)^T, \quad \sum_{i=1}^n w_i = 1 \]

(allowing short positions: $w_i < 0$ is permissible). The portfolio return is the inner product:

\[ r_p = \mathbf{w}^T \mathbf{r} = \sum_{i=1}^n w_i r_i \]

where $\mathbf{r} = (r_1, \ldots, r_n)^T$ is the vector of asset returns.

Portfolio expected return:

\[ E[r_p] = \mathbf{w}^T \boldsymbol{\mu}, \quad \boldsymbol{\mu} = E[\mathbf{r}] \]

Portfolio variance:

\[ \text{Var}(r_p) = \mathbf{w}^T \boldsymbol{\Sigma} \mathbf{w} \]

where $\boldsymbol{\Sigma}$ is the $n \times n$ covariance matrix with $\Sigma_{ij} = \text{Cov}(r_i, r_j)$.

6.2 Properties of the Covariance Matrix

Properties of $\boldsymbol{\Sigma}$:

Symmetric: $\boldsymbol{\Sigma} = \boldsymbol{\Sigma}^T$ since $\text{Cov}(r_i, r_j) = \text{Cov}(r_j, r_i)$.
Positive semi-definite: $\mathbf{w}^T\boldsymbol{\Sigma}\mathbf{w} \geq 0$ for all $\mathbf{w}$, since portfolio variance is non-negative.
Positive definite (if no redundant assets): $\mathbf{w}^T\boldsymbol{\Sigma}\mathbf{w} > 0$ for all $\mathbf{w} \neq \mathbf{0}$, guaranteeing $\boldsymbol{\Sigma}^{-1}$ exists.

The covariance matrix can be written as:

\[ \boldsymbol{\Sigma} = \mathbf{D}\boldsymbol{\rho}\mathbf{D} \]

where $\mathbf{D} = \text{diag}(\sigma_1, \ldots, \sigma_n)$ and $\boldsymbol{\rho}$ is the correlation matrix. In practice $\boldsymbol{\Sigma}$ is estimated from historical data; with $T$ observations and $n$ assets:

\[ \hat{\boldsymbol{\Sigma}} = \frac{1}{T-1}\sum_{t=1}^T (\mathbf{r}_t - \bar{\mathbf{r}})(\mathbf{r}_t - \bar{\mathbf{r}})^T \]

Reliable estimation requires $T \gg n$; with $n = 100$ assets, one needs hundreds of months of data for a stable sample covariance matrix.

6.3 Mean-Variance Optimization

The Markowitz minimum-variance problem:

\[ \min_{\mathbf{w}} \mathbf{w}^T\boldsymbol{\Sigma}\mathbf{w} \quad \text{subject to} \quad \mathbf{w}^T\boldsymbol{\mu} = \mu_p, \quad \mathbf{w}^T\mathbf{1} = 1 \]

Using Lagrange multipliers with multipliers $\lambda_1$ and $\lambda_2$:

\[ \mathcal{L} = \mathbf{w}^T\boldsymbol{\Sigma}\mathbf{w} - \lambda_1(\mathbf{w}^T\boldsymbol{\mu} - \mu_p) - \lambda_2(\mathbf{w}^T\mathbf{1} - 1) \]

First-order condition: $\partial \mathcal{L}/\partial \mathbf{w} = 2\boldsymbol{\Sigma}\mathbf{w} - \lambda_1\boldsymbol{\mu} - \lambda_2\mathbf{1} = \mathbf{0}$, so:

\[ \mathbf{w}^* = \frac{\lambda_1}{2}\boldsymbol{\Sigma}^{-1}\boldsymbol{\mu} + \frac{\lambda_2}{2}\boldsymbol{\Sigma}^{-1}\mathbf{1} \]

The Lagrange multipliers are solved by imposing the two constraints, yielding closed-form expressions involving the scalars $A = \boldsymbol{\mu}^T\boldsymbol{\Sigma}^{-1}\boldsymbol{\mu}$, $B = \boldsymbol{\mu}^T\boldsymbol{\Sigma}^{-1}\mathbf{1}$, $C = \mathbf{1}^T\boldsymbol{\Sigma}^{-1}\mathbf{1}$, and $D = AC - B^2$.

The efficient frontier in $(\sigma_p, \mu_p)$ space is a hyperbola:

\[ \sigma_p^2 = \frac{C\mu_p^2 - 2B\mu_p + A}{D} \]

The global minimum variance (GMV) portfolio has weights:

\[ \mathbf{w}_{GMV} = \frac{\boldsymbol{\Sigma}^{-1}\mathbf{1}}{\mathbf{1}^T\boldsymbol{\Sigma}^{-1}\mathbf{1}} \]

with minimum variance $\sigma^2_{GMV} = 1/C$ and expected return $\mu_{GMV} = B/C$.

Tangency (maximum Sharpe ratio) portfolio with risk-free rate $r_f$:

\[ \mathbf{w}_{tan} = \frac{\boldsymbol{\Sigma}^{-1}(\boldsymbol{\mu} - r_f\mathbf{1})}{\mathbf{1}^T\boldsymbol{\Sigma}^{-1}(\boldsymbol{\mu} - r_f\mathbf{1})} \]

All portfolios on the Capital Market Line (CML) combine the tangency portfolio and the risk-free asset:

\[ \mu_p = r_f + \frac{\mu_{tan} - r_f}{\sigma_{tan}}\sigma_p \]

The slope $(\mu_{tan} - r_f)/\sigma_{tan}$ is the Sharpe ratio of the tangency portfolio, which is the highest achievable Sharpe ratio.

6.4 Eigendecomposition and Principal Components

The covariance matrix $\boldsymbol{\Sigma}$ is symmetric positive semi-definite, so it admits an eigendecomposition:

\[ \boldsymbol{\Sigma} = \mathbf{P}\boldsymbol{\Lambda}\mathbf{P}^T \]

where $\boldsymbol{\Lambda} = \text{diag}(\lambda_1, \ldots, \lambda_n)$ contains eigenvalues ($\lambda_1 \geq \lambda_2 \geq \cdots \geq \lambda_n \geq 0$) and $\mathbf{P}$ is orthogonal (eigenvectors as columns).

Principal Component Analysis (PCA) projects returns onto the eigenvectors: the first principal component (eigenvector corresponding to $\lambda_1$) explains the largest fraction of total return variance ($\lambda_1/\sum \lambda_i$). In equity markets, the first PC typically represents a market-wide factor; subsequent PCs capture sector and style factors.

Chapter 7: Financial Mathematics — Time Value and Optimization

7.1 Time Value of Money

Simple interest: $FV = PV(1 + r \cdot T)$

Compound interest (discrete, $m$ periods per year):

\[ FV = PV\left(1 + \frac{r}{m}\right)^{mT} \]

Continuous compounding (limit as $m \to \infty$):

\[ FV = PV \cdot e^{rT} \]

This follows from $\lim_{m\to\infty}(1 + r/m)^m = e^r$. Continuous compounding is used extensively in derivative pricing and stochastic calculus.

Present value as an integral: For a continuous cash-flow stream $c(t)$ received over $[0, T]$ at continuously compounded discount rate $r$:

\[ PV = \int_0^T c(t)\, e^{-rt}\, dt \]

For a constant annuity $c(t) = C$:

\[ PV = C\int_0^T e^{-rt}\, dt = C\left[\frac{-e^{-rt}}{r}\right]_0^T = \frac{C}{r}(1 - e^{-rT}) \]

As $T \to \infty$ (perpetuity): $PV = C/r$.

7.2 Bond Pricing and Yield

A bond paying coupon $C$ semi-annually for $T$ years with face value $F$ has price:

\[ P = \sum_{t=1}^{2T} \frac{C/2}{(1 + y/2)^t} + \frac{F}{(1 + y/2)^{2T}} \]

where $y$ is the yield to maturity (YTM) — the single discount rate that equates the present value of all cash flows to the market price. This equation has no closed form solution for $y$; it must be solved numerically.

Newton-Raphson method for yield: Given price $P^*$, start with an initial guess $y_0$ and iterate:

\[ y_{n+1} = y_n - \frac{f(y_n)}{f'(y_n)} \]

where $f(y) = P(y) - P^*$ and $P'(y) = dP/dy$. The derivative $dP/dy$ equals $-D_{\text{mod}} \cdot P$ where $D_{\text{mod}}$ is the modified duration:

\[ D_{\text{mod}} = \frac{1}{P} \sum_{t=1}^{2T} \frac{t/2 \cdot C/2}{(1+y/2)^{t+1}} + \frac{T \cdot F}{(1+y/2)^{2T+1}} \]

7.3 Calculus Review for Finance

Optimization: A differentiable function $f: \mathbb{R} \to \mathbb{R}$ achieves a local minimum at $x^*$ if:

\[ f'(x^*) = 0 \quad \text{(first-order condition)} \]\[ f''(x^*) > 0 \quad \text{(second-order condition)} \]

For multivariate optimization of $f: \mathbb{R}^n \to \mathbb{R}$, the first-order condition is $\nabla f(\mathbf{x}^*) = \mathbf{0}$ and the Hessian matrix $\mathbf{H} = [\partial^2 f / \partial x_i \partial x_j]$ must be positive definite at $\mathbf{x}^*$ for a local minimum.

Constrained optimization — Lagrangian: To minimize $f(\mathbf{x})$ subject to $g(\mathbf{x}) = 0$:

\[ \mathcal{L}(\mathbf{x}, \lambda) = f(\mathbf{x}) - \lambda g(\mathbf{x}) \]

FOC: $\nabla_{\mathbf{x}} \mathcal{L} = \mathbf{0}$ and $g(\mathbf{x}) = 0$. The scalar $\lambda$ (Lagrange multiplier) is the marginal cost of tightening the constraint: it measures by how much the optimal value of $f$ changes per unit increase in the constraint.

Taylor expansion (used in duration/convexity analysis):

\[ f(x + \Delta x) \approx f(x) + f'(x)\Delta x + \frac{1}{2}f''(x)(\Delta x)^2 + \cdots \]

Applied to bond price as a function of yield:

\[ \Delta P \approx -D_{\text{mod}} \cdot P \cdot \Delta y + \frac{1}{2}\text{Convexity} \cdot P \cdot (\Delta y)^2 \]

Chapter 8: Time Series Analysis

8.1 Stationarity

Strict stationarity: The joint distribution of $(X_{t_1}, \ldots, X_{t_k})$ is the same as $(X_{t_1+h}, \ldots, X_{t_k+h})$ for all $h$ and all $k$.

Weak (covariance) stationarity: A time series $\{X_t\}$ is weakly stationary if:

$E[X_t] = \mu$ (constant mean)
$\text{Var}(X_t) = \sigma^2 < \infty$ (constant, finite variance)
$\text{Cov}(X_t, X_{t-k}) = \gamma(k)$ depends only on lag $k$, not on $t$

Most time-series modeling assumes weak stationarity. Unit root processes (random walks) are not stationary: their variance grows without bound over time.

Autocovariance and autocorrelation functions:

\[ \gamma(k) = \text{Cov}(X_t, X_{t-k}) \]\[ \rho(k) = \frac{\gamma(k)}{\gamma(0)} \in [-1,1] \]

The sample ACF:

\[ \hat{\rho}(k) = \frac{\sum_{t=k+1}^T (X_t - \bar{X})(X_{t-k} - \bar{X})}{\sum_{t=1}^T (X_t - \bar{X})^2} \]

Under the null of white noise, $\hat{\rho}(k) \approx N(0, 1/T)$, so approximate 95% confidence bands are $\pm 1.96/\sqrt{T}$.

8.2 AR(p) Models

Autoregressive model of order $p$ — AR($p$): \[ X_t = \phi_0 + \phi_1 X_{t-1} + \phi_2 X_{t-2} + \cdots + \phi_p X_{t-p} + \varepsilon_t \]

where $\varepsilon_t \sim WN(0, \sigma^2)$ (white noise: zero mean, constant variance, uncorrelated).

Stationarity condition: The AR($p$) is stationary if all roots of the characteristic polynomial

\[ \Phi(z) = 1 - \phi_1 z - \phi_2 z^2 - \cdots - \phi_p z^p = 0 \]

lie outside the unit circle ($|z| > 1$).

For the AR(1) model $X_t = \phi X_{t-1} + \varepsilon_t$, stationarity requires $|\phi| < 1$. When stationary:

\[ E[X_t] = 0, \quad \text{Var}(X_t) = \frac{\sigma^2}{1-\phi^2}, \quad \rho(k) = \phi^k \]

The ACF decays geometrically — the hallmark of an AR(1). The PACF (partial autocorrelation function) cuts off after lag $p$ for an AR($p$) process.

8.3 MA(q) Models

Moving average model of order $q$ — MA($q$): \[ X_t = \mu + \varepsilon_t + \theta_1\varepsilon_{t-1} + \theta_2\varepsilon_{t-2} + \cdots + \theta_q\varepsilon_{t-q} \]

An MA($q$) process is always stationary. Its ACF cuts off after lag $q$ (all autocorrelations beyond lag $q$ are exactly zero), while the PACF decays geometrically. This is the mirror image of the AR($p$) pattern.

Invertibility condition: The MA($q$) is invertible (has a well-defined AR($\infty$) representation) if all roots of $\Theta(z) = 1 + \theta_1 z + \cdots + \theta_q z^q = 0$ lie outside the unit circle.

8.4 ARMA(p, q) Models

ARMA($p,q$) model: \[ X_t = \phi_0 + \sum_{i=1}^p \phi_i X_{t-i} + \varepsilon_t + \sum_{j=1}^q \theta_j \varepsilon_{t-j} \]

Stationary and invertible if roots of both $\Phi(z)$ and $\Theta(z)$ lie strictly outside the unit circle.

Model identification (Box-Jenkins methodology):

Identify tentative $p, q$ from ACF/PACF patterns.
Estimate by maximum likelihood or conditional least squares.
Check residuals: ACF of $\hat{\varepsilon}_t$ should show no significant autocorrelation (Ljung-Box Q-test).

Ljung-Box statistic:

\[ Q(m) = T(T+2)\sum_{k=1}^m \frac{\hat{\rho}_k^2}{T-k} \sim \chi^2_{m-p-q} \text{ under } H_0 \]

Information criteria for model selection:

\[ AIC = -2\ln(\hat{L}) + 2(p+q+1) \]\[ BIC = -2\ln(\hat{L}) + (p+q+1)\ln T \]

BIC imposes a heavier penalty for model complexity than AIC and tends to select more parsimonious models. Both criteria should be minimized.

8.5 Unit Roots and Integrated Processes

A random walk $X_t = X_{t-1} + \varepsilon_t$ has a unit root: $\phi = 1$ in the AR(1). Such a process is I(1) — integrated of order 1. Its variance grows linearly in $t$: $\text{Var}(X_t) = t\sigma^2$.

Augmented Dickey-Fuller (ADF) test for a unit root in $X_t$:

Test regression: $\Delta X_t = \alpha + \delta t + \gamma X_{t-1} + \sum_{j=1}^p c_j \Delta X_{t-j} + u_t$

$H_0: \gamma = 0$ (unit root, non-stationary) vs. $H_1: \gamma < 0$ (stationary).

The ADF test statistic does not follow a standard t-distribution under $H_0$; Dickey-Fuller critical values must be used (more negative than standard t-critical values).

Why non-stationarity matters: Regressing one non-stationary series on another can yield a seemingly significant relationship even when the two series are entirely unrelated — a spurious regression. This is a fundamental danger in financial time series analysis. Two I(1) series may be cointegrated (share a long-run equilibrium) if a linear combination of them is I(0), in which case the regression is not spurious but meaningful.

8.6 Forecasting with Time Series Models

For a stationary AR(1) model $X_t = \phi X_{t-1} + \varepsilon_t$, the optimal (minimum MSE) $h$-step-ahead forecast from time $T$ is:

\[ \hat{X}_{T+h} = \phi^h X_T \]

which reverts toward zero (the unconditional mean) geometrically. The forecast error variance:

\[ \text{Var}(X_{T+h} - \hat{X}_{T+h}) = \sigma^2\frac{1 - \phi^{2h}}{1-\phi^2} \to \frac{\sigma^2}{1-\phi^2} = \text{Var}(X_t) \text{ as } h \to \infty \]

Long-horizon forecasts become uninformative as uncertainty grows toward the unconditional variance.

Chapter 9: Volatility Modeling — ARCH and GARCH

9.1 Volatility Clustering

One of the most robust empirical regularities in financial data is volatility clustering: large changes in asset prices tend to be followed by large changes (of either sign), and small changes by small changes. Mandelbrot (1963) first documented this; Engle (1982) provided a formal econometric model.

ARCH effects test: To detect time-varying volatility, regress squared residuals on their lags (auxiliary regression):

\[ \hat{\varepsilon}_t^2 = \gamma_0 + \gamma_1 \hat{\varepsilon}_{t-1}^2 + \cdots + \gamma_q \hat{\varepsilon}_{t-q}^2 + u_t \]

The Engle LM statistic $= T \cdot R^2$ is $\chi^2_q$ under $H_0$ of no ARCH effects.

9.2 ARCH Model

ARCH($q$) model (Engle, 1982): \[ r_t = \mu + \varepsilon_t, \quad \varepsilon_t = \sigma_t z_t, \quad z_t \sim i.i.d.\, N(0,1) \]\[ \sigma_t^2 = \omega + \alpha_1\varepsilon_{t-1}^2 + \alpha_2\varepsilon_{t-2}^2 + \cdots + \alpha_q\varepsilon_{t-q}^2 \]

with $\omega > 0$, $\alpha_i \geq 0$, and $\sum_{i=1}^q \alpha_i < 1$ for covariance stationarity.

Interpretation: Today’s conditional variance $\sigma_t^2$ depends on yesterday’s squared shock $\varepsilon_{t-1}^2$. A large shock yesterday inflates today’s variance, creating volatility clustering. ARCH($q$) requires many lags to fit observed persistence well, motivating GARCH.

9.3 GARCH Model

GARCH($p,q$) model (Bollerslev, 1986): \[ \sigma_t^2 = \omega + \sum_{i=1}^q \alpha_i\varepsilon_{t-i}^2 + \sum_{j=1}^p \beta_j\sigma_{t-j}^2 \]

with $\omega > 0$, $\alpha_i \geq 0$, $\beta_j \geq 0$, and $\sum \alpha_i + \sum \beta_j < 1$ for stationarity.

The GARCH(1,1) model is the most widely used specification:

\[ \sigma_t^2 = \omega + \alpha\varepsilon_{t-1}^2 + \beta\sigma_{t-1}^2 \]

Interpretation:

$\omega/(1-\alpha-\beta)$ is the long-run (unconditional) variance.
$\alpha$ captures the immediate impact of shocks on volatility (ARCH effect).
$\beta$ measures the persistence of volatility — how much yesterday’s conditional variance carries over.

In practice, $\hat{\alpha} + \hat{\beta} \approx 0.97$ to $0.99$ for daily equity returns, implying very high volatility persistence. When $\alpha + \beta = 1$, we have the IGARCH (Integrated GARCH) model, where volatility shocks are permanent.

GARCH volatility forecasting:

\[ E_t[\sigma_{t+h}^2] = \bar{\sigma}^2 + (\alpha+\beta)^{h-1}(\sigma_{t+1}^2 - \bar{\sigma}^2) \]

where $\bar{\sigma}^2 = \omega/(1-\alpha-\beta)$. Forecasts revert toward the long-run variance, with speed governed by $\alpha+\beta$.

9.4 Extensions: EGARCH and GJR-GARCH

GJR-GARCH (Glosten-Jagannathan-Runkle): Adds a leverage effect — negative shocks have a larger impact on volatility than positive shocks:

\[ \sigma_t^2 = \omega + (\alpha + \gamma\mathbf{1}_{\varepsilon_{t-1}<0})\varepsilon_{t-1}^2 + \beta\sigma_{t-1}^2 \]

where $\mathbf{1}_{\varepsilon_{t-1}<0}$ is an indicator for negative shocks. Empirically, $\hat{\gamma} > 0$ for equity indices (the leverage effect or volatility asymmetry).

EGARCH (Nelson, 1991): Models log-variance to ensure positivity without parameter constraints:

\[ \ln\sigma_t^2 = \omega + \sum_{j=1}^p \beta_j \ln\sigma_{t-j}^2 + \sum_{i=1}^q \left(\alpha_i |z_{t-i}| + \gamma_i z_{t-i}\right) \]

The $\gamma_i$ terms allow asymmetric response: negative $z_{t-i}$ (negative shocks) increase log-variance by more than positive $z_{t-i}$.

Realized volatility: With intraday data available, one can estimate daily volatility non-parametrically as the sum of squared intraday log-returns:

\[ RV_t = \sum_{j=1}^m r_{t,j}^2 \]

where $r_{t,j}$ are intraday returns sampled at frequency $m$ per day. As $m \to \infty$, $RV_t \to \int_t^{t+1} \sigma_s^2 ds$ (integrated variance).

Chapter 10: Monte Carlo Simulation

10.1 Random Number Generation

Monte Carlo (MC) simulation uses pseudo-random number generators to approximate integrals, option prices, and risk measures that have no closed-form solution.

Linear congruential generator: Produces a sequence of pseudo-random integers via:

\[ X_{n+1} = (aX_n + c) \mod m \]

Uniform $(0,1)$ samples: $U_n = X_n/m$. These pass many statistical tests for randomness though they are deterministic.

Generating normal random variables — Box-Muller transform: Given two i.i.d. $U_1, U_2 \sim U(0,1)$:

\[ Z_1 = \sqrt{-2\ln U_1}\cos(2\pi U_2), \quad Z_2 = \sqrt{-2\ln U_1}\sin(2\pi U_2) \]

Then $Z_1, Z_2 \sim i.i.d.\, N(0,1)$ — an exact transformation. A general normal sample is obtained as $X = \mu + \sigma Z$.

Generating correlated normals: For a two-asset portfolio with correlation $\rho$, generate $Z_1, Z_2 \sim N(0,1)$ i.i.d. and set:

\[ \tilde{Z}_1 = Z_1, \quad \tilde{Z}_2 = \rho Z_1 + \sqrt{1-\rho^2} Z_2 \]

Then $\text{Cor}(\tilde{Z}_1, \tilde{Z}_2) = \rho$. In the $n$-asset case, use the Cholesky decomposition: $\boldsymbol{\Sigma} = \mathbf{L}\mathbf{L}^T$ where $\mathbf{L}$ is lower triangular, and generate $\mathbf{r} = \mathbf{L}\mathbf{z}$ for i.i.d. standard normal $\mathbf{z}$.

10.2 Simulating Asset Prices

Under the Black-Scholes model, the stock price at time $T$ is:

\[ S_T = S_0 \exp\!\left(\left(\mu - \frac{\sigma^2}{2}\right)T + \sigma\sqrt{T}\,Z\right), \quad Z \sim N(0,1) \]

For a path-dependent option or risk measure requiring $n$ time steps, discretize the GBM via:

\[ S_{t+\Delta t} = S_t \exp\!\left(\left(\mu - \frac{\sigma^2}{2}\right)\Delta t + \sigma\sqrt{\Delta t}\,Z_t\right) \]

where $Z_t \sim N(0,1)$ i.i.d. The Euler-Maruyama scheme for a general SDE $dS = a(S,t)dt + b(S,t)dW$:

\[ S_{t+\Delta t} \approx S_t + a(S_t,t)\Delta t + b(S_t,t)\sqrt{\Delta t}\,Z_t \]

10.3 Variance Reduction Techniques

Antithetic variates: For each uniform sample $U$, also use $1-U$ (its antithetic pair). The estimator:

\[ \hat{\theta}_{AV} = \frac{1}{N}\sum_{i=1}^N \frac{f(Z_i) + f(-Z_i)}{2} \]

Since $f(Z)$ and $f(-Z)$ are negatively correlated for monotone $f$, the variance of $\hat{\theta}_{AV}$ is smaller than that of the crude estimator. No additional simulations are required.

Control variates: If $g(S)$ is a payoff with known analytical expectation $E[g(S)] = \mu_g$, use:

\[ \hat{\theta}_{CV} = \hat{\theta} - c(\hat{\mu}_g - \mu_g) \]

Optimal control coefficient: $c^* = \text{Cov}(f,g)/\text{Var}(g)$. Maximum variance reduction is $1 - \rho_{f,g}^2$.

Importance sampling: Shift the sampling distribution to the region of interest (e.g., the tail for VaR/ES estimation), then reweight:

\[ E_P[f(X)] = E_Q\left[f(X) \frac{dP}{dQ}(X)\right] \]

where $dP/dQ$ is the likelihood ratio (Radon-Nikodym derivative).

MC convergence: The standard error of a MC estimator with $N$ simulations is:

\[ SE = \frac{\hat{\sigma}_f}{\sqrt{N}} \]

To halve the standard error, quadruple the number of simulations. Variance reduction techniques improve the effective $N$ without additional computation.

Chapter 11: Numerical Methods in Finance

11.1 The Binomial Tree Model

The binomial tree (Cox, Ross, Rubinstein, 1979) discretizes the asset price evolution into up and down moves per period.

One-period binomial model: In each period $\Delta t = T/N$, the stock moves:

\[ S_u = S \cdot u, \quad S_d = S \cdot d \]

with risk-neutral probabilities:

\[ p = \frac{e^{r\Delta t} - d}{u - d}, \quad 1-p = \frac{u - e^{r\Delta t}}{u - d} \]

The CRR parameterization: $u = e^{\sigma\sqrt{\Delta t}}$, $d = 1/u = e^{-\sigma\sqrt{\Delta t}}$, so the tree is recombining (same node reached by “up then down” and “down then up”).

Multi-period tree pricing by backward induction: At terminal nodes, option payoffs are known. Working backward:

\[ V_t = e^{-r\Delta t}[p\, V_{t+1,u} + (1-p)\, V_{t+1,d}] \]

For European options, this converges to the Black-Scholes price as $N \to \infty$. For American options, at each node compare the hold value $V_t$ (above) with the immediate exercise value; the option value is the maximum:

\[ V_t^{Am} = \max(\text{exercise value},\, e^{-r\Delta t}[p\, V_{t+1,u}^{Am} + (1-p)\, V_{t+1,d}^{Am}]) \]

Early exercise of American puts (but generally not American calls on non-dividend-paying stocks) can be optimal when deep in-the-money.

11.2 Finite Difference Methods

The Black-Scholes PDE for a European option price $V(S,t)$:

\[ \frac{\partial V}{\partial t} + \frac{1}{2}\sigma^2 S^2 \frac{\partial^2 V}{\partial S^2} + rS\frac{\partial V}{\partial S} - rV = 0 \]

with terminal condition $V(S,T) = \text{payoff}(S)$.

Finite difference methods discretize the $(S,t)$ grid. Let $S_i = i\Delta S$ and $t_j = j\Delta t$. Denote $V_{i,j} \approx V(i\Delta S, j\Delta t)$.

Explicit scheme (forward difference in time):

\[ \frac{V_{i,j+1} - V_{i,j}}{\Delta t} + \frac{\sigma^2 S_i^2}{2} \frac{V_{i+1,j} - 2V_{i,j} + V_{i-1,j}}{(\Delta S)^2} + rS_i\frac{V_{i+1,j} - V_{i-1,j}}{2\Delta S} - rV_{i,j} = 0 \]

Solving for $V_{i,j}$ in terms of known values at time $j$ gives explicit updates. Stable only if the time step satisfies the CFL condition: $\Delta t \leq (\Delta S)^2/(\sigma^2 S_{max}^2)$.

Implicit (Crank-Nicolson) scheme averages the explicit and implicit discretizations, achieving second-order accuracy in both time and space. It is unconditionally stable.

Chapter 12: Introduction to Stochastic Calculus

12.1 Brownian Motion

Standard Brownian motion (Wiener process): A stochastic process $\{W_t\}_{t\geq 0}$ satisfying:

$W_0 = 0$
Independent increments: $W_t - W_s \perp W_s - W_u$ for $u < s < t$
Normal increments: $W_t - W_s \sim N(0, t-s)$ for $s < t$
Continuous paths: $t \mapsto W_t$ is almost surely continuous

Properties:

$E[W_t] = 0$, $\text{Var}(W_t) = t$
$\text{Cov}(W_s, W_t) = \min(s,t)$
Brownian motion is nowhere differentiable (paths have infinite variation on any interval)
Quadratic variation: $[W,W]_t = t$ (the heuristic rule $(dW_t)^2 = dt$)

Geometric Brownian motion (GBM): The standard model for stock prices:

\[ dS_t = \mu S_t\, dt + \sigma S_t\, dW_t \]

By Itô’s lemma (see below), the solution is:

\[ S_t = S_0 \exp\!\left(\left(\mu - \frac{\sigma^2}{2}\right)t + \sigma W_t\right) \]

12.2 Itô’s Lemma

Itô's Lemma: Let $X_t$ be an Itô process: $dX_t = a_t\, dt + b_t\, dW_t$. For a twice continuously differentiable function $f(t, x)$: \[ df(t, X_t) = \frac{\partial f}{\partial t} dt + \frac{\partial f}{\partial x} dX_t + \frac{1}{2}\frac{\partial^2 f}{\partial x^2} b_t^2\, dt \]\[ = \left(\frac{\partial f}{\partial t} + a_t \frac{\partial f}{\partial x} + \frac{1}{2}b_t^2 \frac{\partial^2 f}{\partial x^2}\right) dt + b_t \frac{\partial f}{\partial x}\, dW_t \]

The extra $\frac{1}{2}b_t^2 f_{xx}$ term (compared to ordinary calculus) arises from the quadratic variation of Brownian motion: $(dW_t)^2 = dt$.

Example — Applying Itô's Lemma to GBM: Let $S_t$ follow GBM: $dS = \mu S\, dt + \sigma S\, dW$. Let $f(S) = \ln S$. \[ df = f_S\, dS + \frac{1}{2}f_{SS}(dS)^2 = \frac{1}{S}\,dS + \frac{1}{2}\left(-\frac{1}{S^2}\right)\sigma^2 S^2\, dt \]\[ = \frac{1}{S}(\mu S\, dt + \sigma S\, dW) - \frac{\sigma^2}{2}\, dt = \left(\mu - \frac{\sigma^2}{2}\right)dt + \sigma\, dW_t \]

Integrating from 0 to $T$: $\ln S_T - \ln S_0 = (\mu - \sigma^2/2)T + \sigma W_T$, confirming $S_T = S_0 e^{(\mu-\sigma^2/2)T + \sigma W_T}$.

12.3 Black-Scholes Derivation Sketch

Black and Scholes (1973) derived the no-arbitrage price of a European option by constructing a delta-hedged portfolio that is instantaneously risk-free.

Step 1 — Delta-hedged portfolio: Hold one option long and $\Delta = \partial V/\partial S$ shares short. Portfolio value: $\Pi = V - \Delta S$.

Step 2 — Portfolio dynamics via Itô’s Lemma:

\[ d\Pi = dV - \Delta\, dS = \left(\frac{\partial V}{\partial t} + \frac{1}{2}\sigma^2 S^2 \frac{\partial^2 V}{\partial S^2}\right)dt \]

The stochastic term $\sigma S\, dW_t$ cancels perfectly. The portfolio is instantaneously riskless.

Step 3 — No-arbitrage: The riskless portfolio must earn the risk-free rate:

\[ d\Pi = r\Pi\, dt = r(V - \Delta S)\, dt \]

Step 4 — Black-Scholes PDE:

\[ \frac{\partial V}{\partial t} + \frac{1}{2}\sigma^2 S^2 \frac{\partial^2 V}{\partial S^2} + rS\frac{\partial V}{\partial S} - rV = 0 \]

Step 5 — Closed-form solution for a European call with strike $K$ and maturity $T$:

\[ C(S, t) = S_t\Phi(d_1) - Ke^{-r(T-t)}\Phi(d_2) \]\[ d_1 = \frac{\ln(S_t/K) + (r + \sigma^2/2)(T-t)}{\sigma\sqrt{T-t}}, \quad d_2 = d_1 - \sigma\sqrt{T-t} \]

Put-Call Parity (model-free, from no-arbitrage):

\[ C - P = S - Ke^{-r(T-t)} \]

Option Greeks:

Greek	Formula	Interpretation
Delta $\Delta$	$\Phi(d_1)$ (call)	Price sensitivity to $S$
Gamma $\Gamma$	$\phi(d_1)/(S\sigma\sqrt{T-t})$	Rate of change of delta
Vega $\mathcal{V}$	$S\phi(d_1)\sqrt{T-t}$	Sensitivity to $\sigma$
Theta $\Theta$	Complex negative expression	Time decay (value lost per day)
Rho $\rho$	$K(T-t)e^{-r(T-t)}\Phi(d_2)$	Sensitivity to $r$

Chapter 13: Value at Risk and Expected Shortfall

13.1 Value at Risk

Value at Risk (VaR): At confidence level $\alpha$ and horizon $h$, VaR is the loss threshold exceeded with probability $1 - \alpha$: \[ P(\text{Loss} > VaR_\alpha) = 1 - \alpha \]

Equivalently, $VaR_\alpha$ is the $(1-\alpha)$ quantile of the loss distribution. At 95% confidence: $P(\text{Loss} > VaR_{0.95}) = 0.05$.

Three methods for estimating VaR:

13.1.1 Historical Simulation

Given a window of $T$ past returns $\{r_1, \ldots, r_T\}$, apply those returns to the current portfolio value $V_0$ to generate $T$ hypothetical P&L scenarios:

\[ \text{Loss}_t = -V_0 \cdot r_t \]

Sort the losses; VaR at 95% is the 5th percentile of simulated losses (i.e., the 5th-worst outcome in a window of 100).

Advantages: Non-parametric; automatically captures fat tails, skewness, and correlations as observed historically. No distributional assumptions.

Disadvantages: Entirely backward-looking; uses only $T$ scenarios; sudden regime changes are poorly handled; equal-weighting of all historical periods.

13.1.2 Parametric (Variance-Covariance) VaR

Assume portfolio returns are normally distributed: $r_p \sim N(\mu_p, \sigma_p^2)$. Then:

\[ VaR_\alpha = -\mu_p + z_\alpha \sigma_p \]

where $z_\alpha = \Phi^{-1}(\alpha)$ (e.g., $z_{0.95} = 1.645$, $z_{0.99} = 2.326$).

For a portfolio with value $V_0$:

\[ VaR_\alpha = V_0(-\mu_p + z_\alpha\sigma_p) \]

Multi-asset parametric VaR: With weight vector $\mathbf{w}$ and covariance matrix $\boldsymbol{\Sigma}$:

\[ \sigma_p = \sqrt{\mathbf{w}^T\boldsymbol{\Sigma}\mathbf{w}} \]\[ VaR_\alpha = V_0(z_\alpha\sigma_p - \mu_p) \]

Advantages: Simple, fast, easily aggregated across positions. Disadvantages: Normality assumption understates tail risk; VaR is not subadditive (a merged portfolio can have higher VaR than the sum of parts in non-elliptical distributions).

13.1.3 Monte Carlo VaR

Simulate $N$ scenarios of portfolio returns using a specified distributional model (e.g., multivariate normal, t-distribution, GARCH-based, jump-diffusion). Compute portfolio P&L for each scenario and estimate VaR as the empirical quantile of simulated losses.

This approach can accommodate complex portfolios (options, non-linear instruments), non-normal return distributions, and dynamic hedging strategies at the cost of computational intensity.

13.2 Expected Shortfall

Expected Shortfall (ES) / Conditional VaR (CVaR): The expected loss given that the loss exceeds VaR: \[ ES_\alpha = E[\text{Loss} \mid \text{Loss} > VaR_\alpha] \]\[ = \frac{1}{1-\alpha}\int_\alpha^1 VaR_u\, du \]

ES is the average of all VaR levels above $\alpha$ — it captures the full extent of tail losses beyond the VaR threshold.

Parametric ES under normality:

\[ ES_\alpha = \mu_p + \sigma_p\frac{\phi(z_\alpha)}{1-\alpha} \]

where $\phi$ is the standard normal pdf. Since $\phi(z_\alpha)/(1-\alpha) > z_\alpha$, ES is always larger than VaR.

Advantages of ES over VaR:

Coherent risk measure: ES satisfies subadditivity — $ES(\text{portfolio}) \leq ES(A) + ES(B)$ — encouraging diversification. VaR is not always subadditive.
Captures severity: VaR says “losses exceed $X$ with 5% probability” but says nothing about how large those losses are. ES provides information about the entire tail.
Basel III/IV regulatory adoption: ES (at 97.5%) has replaced VaR (at 99%) as the internal model standard for market risk capital.

Example — Comparing VaR and ES: Suppose a portfolio has daily returns $\sim N(0, 0.01^2)$ (zero mean, 1% daily vol). Portfolio value is $\$1{,}000{,}000$. \[ VaR_{0.99} = V_0 \cdot z_{0.99} \cdot \sigma = 1{,}000{,}000 \times 2.326 \times 0.01 = \$23{,}260 \]\[ ES_{0.99} = V_0 \cdot \frac{\phi(z_{0.99})}{0.01} \cdot \sigma = 1{,}000{,}000 \times \frac{\phi(2.326)}{0.01} \times 0.01 \]\[ = 1{,}000{,}000 \times \phi(2.326) \times \frac{1}{0.01} \times 0.01 = 1{,}000{,}000 \times \frac{0.0267}{0.01} \approx \$26{,}700 \]

So the expected loss on the worst 1% of days is $26,700 — about 15% larger than the VaR threshold.

13.3 Backtesting VaR

Kupiec proportion of failures (POF) test: Over $T$ days, let $N$ be the number of VaR exceedances (loss > VaR). Under a correctly specified VaR model:

\[ N \sim \text{Binomial}(T, 1-\alpha) \]

The likelihood ratio test statistic:

\[ LR_{POF} = -2\ln\left[\frac{(1-\alpha)^{T-N}\alpha^N}{(1-N/T)^{T-N}(N/T)^N}\right] \sim \chi^2_1 \text{ under } H_0 \]

A model producing too many or too few exceptions fails the backtest and may require recalibration.

Christoffersen (1998) conditional coverage test jointly tests whether exceptions occur at the correct frequency and are independently distributed over time (no clustering of exceptions). VaR exceptions should be serially independent; clustering indicates the model does not respond quickly enough to volatility changes.

Chapter 14: Integration of Methods — A Complete Quantitative Workflow

14.1 From Data to Decision

A complete quantitative finance workflow integrates every tool developed in this course.

Phase 1 — Data acquisition and cleaning: Obtain return data from reliable sources (e.g., FRED for macro data, French Data Library for factor returns). Compute log returns. Check for data quality: missing values, stock splits and dividends (use adjusted prices), survivorship bias in fund data, look-ahead bias in accounting data.

Phase 2 — Exploratory data analysis: Compute descriptive statistics: mean, standard deviation, skewness, excess kurtosis, minimum, maximum, and percentiles. Produce time series plots, histograms with normal overlay, Q-Q plots. Run the Jarque-Bera test for normality. Plot the ACF of returns and squared returns to detect serial correlation and ARCH effects.

Phase 3 — Model specification and estimation: Specify the econometric model (regression model, ARMA-GARCH, etc.) motivated by the EDA and theory. Estimate by OLS or maximum likelihood. Report parameter estimates with standard errors, t-statistics, and p-values. For regression, report $R^2$, adjusted $R^2$, and the F-statistic.

Phase 4 — Diagnostic testing: Test regression residuals for normality (JB test), heteroskedasticity (Breusch-Pagan or White test), autocorrelation (DW or Ljung-Box), and multicollinearity (VIF). Test for ARCH effects in volatility models.

Phase 5 — Inference and interpretation: Distinguish statistical significance from economic significance. A large sample may produce tiny p-values for economically negligible effects. Report effect sizes (e.g., the magnitude of alpha in basis points, the economic impact of a one-standard-deviation move in a regressor).

Phase 6 — Forecasting and risk assessment: Use estimated models to produce point forecasts and forecast intervals. Compute VaR and Expected Shortfall for risk management. Backtest risk models against historical exceedances.

Phase 7 — Reporting: Produce a well-organized research report with clear motivation, transparent methodology, complete results tables, proper diagnostics, and honest limitations. Tables should use asterisks to denote significance: $^* p < 0.10$, $^{**} p < 0.05$, $^{***} p < 0.01$.

14.2 Common Pitfalls in Quantitative Finance

Data snooping / p-hacking: Testing many specifications on the same dataset inflates the probability of spurious findings. With 100 independent tests at the 5% level, five are expected to reject a true null. Report only pre-specified hypotheses and use out-of-sample testing to validate.

Overfitting: A model with many parameters can fit historical data perfectly but forecast poorly. Prefer parsimony; use AIC/BIC and cross-validation to select model complexity; evaluate performance on held-out data.

Ignoring transaction costs: Many apparent anomalies disappear once bid-ask spreads, market impact, and management fees are accounted for. Economic profitability (after costs) rather than statistical significance is the relevant criterion for trading strategies.

Survivorship bias: Databases that include only currently active funds or stocks overstate historical average returns. The average realized return of a random fund selected in 1990 is lower than the average realized return of funds still active in 2026.

Correlation instability: Pairwise correlations estimated over quiet periods dramatically understate co-movement during crises. Mean-variance portfolios optimized with tranquil-period correlations can suffer larger-than-expected losses when correlations spike.

14.3 A Unified Formula Reference

Core formulas summary:

Statistics:

\[ \bar{x} = \frac{1}{n}\sum x_i, \quad s^2 = \frac{\sum(x_i-\bar{x})^2}{n-1}, \quad t = \frac{\bar{x}-\mu_0}{s/\sqrt{n}} \]

OLS:

\[ \hat{\boldsymbol{\beta}} = (\mathbf{X}^T\mathbf{X})^{-1}\mathbf{X}^T\mathbf{y}, \quad R^2 = 1 - \frac{SSR}{SST}, \quad F = \frac{SSM/k}{SSR/(T-k-1)} \]

Portfolio:

\[ \mu_p = \mathbf{w}^T\boldsymbol{\mu}, \quad \sigma_p^2 = \mathbf{w}^T\boldsymbol{\Sigma}\mathbf{w}, \quad SR = \frac{\mu_p - r_f}{\sigma_p} \]

Time series:

\[ \rho(k) = \frac{\text{Cov}(X_t, X_{t-k})}{\text{Var}(X_t)}, \quad AIC = -2\ln\hat{L} + 2k, \quad BIC = -2\ln\hat{L} + k\ln T \]

GARCH(1,1):

\[ \sigma_t^2 = \omega + \alpha\varepsilon_{t-1}^2 + \beta\sigma_{t-1}^2, \quad \bar{\sigma}^2 = \frac{\omega}{1-\alpha-\beta} \]

Black-Scholes:

\[ C = S\Phi(d_1) - Ke^{-rT}\Phi(d_2), \quad d_{1,2} = \frac{\ln(S/K)+(r\pm\sigma^2/2)T}{\sigma\sqrt{T}} \]

Risk measures:

\[ VaR_\alpha = \mu_p - z_\alpha\sigma_p, \quad ES_\alpha = \mu_p + \sigma_p\frac{\phi(z_\alpha)}{1-\alpha} \]

Summary

AFM 323 builds the quantitative scaffold that underlies virtually every area of modern finance. Starting from probability theory — probability spaces, random variables, the normal and lognormal distributions, and moment generating functions — the course develops the statistical machinery of hypothesis testing, confidence intervals, and regression analysis. The OLS estimator is treated rigorously through the Gauss-Markov theorem and extended to multiple regression in matrix form, with applications in the CAPM and Fama-French factor models. Diagnostic methods (heteroskedasticity, autocorrelation, multicollinearity) translate abstract assumptions into actionable tests.

The linear algebra chapter makes portfolio mathematics precise: the quadratic form $\mathbf{w}^T\boldsymbol{\Sigma}\mathbf{w}$ is the portfolio variance; constrained optimization via Lagrange multipliers traces the efficient frontier analytically. Financial mathematics connects continuous compounding, bond pricing, and the Newton-Raphson algorithm for yield computation into a coherent set of tools. Time series analysis — AR, MA, ARMA models, the ACF/PACF identification scheme, and ADF unit root tests — provides the vocabulary for dynamic modeling of return processes.

ARCH and GARCH models formalize the empirically pervasive phenomenon of volatility clustering, with GARCH(1,1) serving as the industry workhorse for daily volatility estimation and forecasting. Monte Carlo simulation with variance reduction techniques (antithetic variates, control variates, importance sampling) provides flexible numerical integration for complex payoffs and risk measures. Finite difference methods and the binomial tree offer structured alternatives for option pricing that converge to the Black-Scholes price as the grid is refined.

The Black-Scholes derivation — resting on Brownian motion, Itô’s Lemma, and the delta-hedging argument — illustrates how continuous-time stochastic calculus produces exact, model-dependent pricing formulas. Finally, Value at Risk and Expected Shortfall translate statistical models into the risk capital numbers that drive regulatory compliance and portfolio management decisions.

Throughout, the emphasis is on connecting mathematical rigor to economic interpretation and on the discipline of honest empirical practice: specifying models before testing, distinguishing in-sample fit from out-of-sample validity, and translating statistical findings into economically meaningful conclusions.

Distribution	MGF \(M_X(t)\)
\(N(\mu, \sigma^2)\)	\(\exp(\mu t + \sigma^2 t^2/2)\)
\(\text{Poisson}(\lambda)\)	\(\exp(\lambda(e^t - 1))\)
\(\text{Exponential}(\lambda)\)	\(\lambda/(\lambda - t)\), \(t < \lambda\)
\(\text{Bernoulli}(p)\)	\(1 - p + pe^t\)

Greek	Formula	Interpretation
Delta \(\Delta\)	\(\Phi(d_1)\) (call)	Price sensitivity to \(S\)
Gamma \(\Gamma\)	\(\phi(d_1)/(S\sigma\sqrt{T-t})\)	Rate of change of delta
Vega \(\mathcal{V}\)	\(S\phi(d_1)\sqrt{T-t}\)	Sensitivity to \(\sigma\)
Theta \(\Theta\)	Complex negative expression	Time decay (value lost per day)
Rho \(\rho\)	\(K(T-t)e^{-r(T-t)}\Phi(d_2)\)	Sensitivity to \(r\)