AFM 323: Quantitative Foundations for Finance

Estimated study time: 30 minutes

Table of contents

Sources and References

Primary textbook — Wirjanto, T.S. Foundations of Data Analytics for Finance with Excel. University of Waterloo, 2025 (lecture notes and Excel instructional materials).

Supplementary — CFA Institute. Quantitative Investment Analysis, 4th Edition. CFA Institute, 2020. Campbell, J. Y., Lo, A. W., and MacKinlay, A. C. The Econometrics of Financial Markets. Princeton University Press, 1997.

Online resources — FRED (Federal Reserve Economic Data); Kenneth French Data Library (Fama-French factors); Yahoo Finance; MIT OCW 18.S096 Topics in Mathematics with Applications in Finance.


Chapter 1: Financial Data and the Nature of Randomness

Asset Returns as Random Variables

The starting point for quantitative finance is recognizing that asset prices — and more fundamentally, asset returns — are random variables governed by some (partially unknown) probability distribution. Financial data analysis is the empirical discipline of inferring the properties of these distributions from observed data.

Simple (gross) return: The ratio of ending price to beginning price: \[ R_t = \frac{P_t}{P_{t-1}} \]

Net return: \( r_t = R_t - 1 = \frac{P_t - P_{t-1}}{P_{t-1}} \)

Continuously compounded (log) return:

\[ r_t^{cc} = \ln(P_t) - \ln(P_{t-1}) = \ln(1 + r_t) \]

Log returns have desirable properties for statistical modeling: they are additive over time (whereas simple returns are multiplicative), and they are symmetric around zero (whereas simple returns have a lower bound of -100%). For small returns, \(r^{cc} \approx r\).

Multi-period returns: The gross return over \(k\) periods is the product of single-period gross returns:

\[ R_t(k) = R_t \times R_{t-1} \times \cdots \times R_{t-k+1} = \prod_{i=0}^{k-1} R_{t-i} \]

In log return terms, this is simply additive:

\[ r_t^{cc}(k) = \sum_{i=0}^{k-1} r_{t-i}^{cc} \]

The Lognormal Model of Asset Prices

The standard assumption in much of financial theory is that asset prices follow geometric Brownian motion, which implies that log returns are normally distributed:

\[ \ln\left(\frac{P_t}{P_0}\right) \sim N\left(\mu t, \sigma^2 t\right) \]

This means prices follow a lognormal distribution — they cannot become negative, and percentage changes are symmetric. The parameters \(\mu\) (drift) and \(\sigma^2\) (variance per unit time) describe the distribution completely.

Empirical departures from normality are pervasive in financial data:

  • Fat tails (leptokurtosis): Extreme returns occur far more often than the normal distribution predicts.
  • Negative skewness: Large negative returns are more common than large positive returns for equity indices.
  • Volatility clustering: Periods of high volatility tend to be followed by more high-volatility periods (ARCH effects).

Chapter 2: Financial Data Visualization and Descriptive Statistics

Descriptive Statistics for Financial Variables

Central tendency: The mean of a return series \(\{r_1, r_2, \ldots, r_T\}\):

\[ \bar{r} = \frac{1}{T} \sum_{t=1}^{T} r_t \]

The geometric mean is more appropriate for multi-period performance measurement:

\[ \bar{r}_g = \left(\prod_{t=1}^{T} (1 + r_t)\right)^{1/T} - 1 \]

The geometric mean is always less than or equal to the arithmetic mean; the difference is approximately \(\sigma^2/2\).

Dispersion: The sample variance and standard deviation:

\[ s^2 = \frac{1}{T-1} \sum_{t=1}^{T} (r_t - \bar{r})^2, \quad s = \sqrt{s^2} \]

Higher moments:

The sample skewness measures asymmetry:

\[ \text{Skew} = \frac{1}{T} \sum_{t=1}^{T} \left(\frac{r_t - \bar{r}}{s}\right)^3 \]

Negative skewness (left tail heavy) is typical for equity returns.

The sample excess kurtosis measures tail thickness relative to the normal distribution (which has kurtosis = 3):

\[ \text{Excess Kurtosis} = \frac{1}{T} \sum_{t=1}^{T} \left(\frac{r_t - \bar{r}}{s}\right)^4 - 3 \]

Positive excess kurtosis (leptokurtosis) indicates heavier tails than normal — more frequent extreme events.

Visualization Tools

Effective financial data visualization includes:

  • Time series plots of prices and returns to identify trends, seasonality, and structural breaks.
  • Histograms overlaid with a normal curve to visualize distributional shape.
  • Q-Q plots (quantile-quantile plots) comparing sample quantiles to normal quantiles — deviations from a 45-degree line reveal non-normality.
  • Autocorrelation function (ACF) plots to detect serial correlation in returns or squared returns.

Covariance and Correlation

The sample covariance between assets \(i\) and \(j\):

\[ s_{ij} = \frac{1}{T-1} \sum_{t=1}^{T} (r_{it} - \bar{r}_i)(r_{jt} - \bar{r}_j) \]

The sample correlation:

\[ \rho_{ij} = \frac{s_{ij}}{s_i s_j} \in [-1, 1] \]

Correlation is central to portfolio construction: a portfolio’s variance depends on pairwise correlations, not just individual variances. Assets with low mutual correlation provide the greatest diversification benefit.

Important caveat: Correlation is not causation. Moreover, correlations between asset classes tend to spike during market crises — just when diversification is most needed. This correlation instability is a key limitation of mean-variance optimization in practice.


Chapter 3: Simple Linear Regression — The Single-Factor Model

Regression Analysis in Finance

Ordinary Least Squares (OLS) regression is the workhorse of empirical finance. In its simplest form, we specify a linear relationship between a dependent variable \(y\) and an independent variable \(x\):

\[ y_t = \alpha + \beta x_t + \varepsilon_t \]

The OLS estimators minimize the sum of squared residuals:

\[ \min_{\alpha, \beta} \sum_{t=1}^{T} (y_t - \alpha - \beta x_t)^2 \]

The closed-form solutions are:

\[ \hat{\beta} = \frac{\sum_{t=1}^{T} (x_t - \bar{x})(y_t - \bar{y})}{\sum_{t=1}^{T} (x_t - \bar{x})^2} = \frac{s_{xy}}{s_x^2} \]\[ \hat{\alpha} = \bar{y} - \hat{\beta} \bar{x} \]

The Capital Asset Pricing Model (CAPM) as a Regression

The CAPM implies a specific linear relationship between an asset’s excess return and the market excess return:

\[ r_{it} - r_{ft} = \alpha_i + \beta_i (r_{mt} - r_{ft}) + \varepsilon_{it} \]

This is the Security Characteristic Line (SCL) or the market model regression. The parameters have direct financial interpretations:

  • \(\hat{\beta}_i\): Estimated systematic risk (sensitivity to market movements).
  • \(\hat{\alpha}_i\) (Jensen’s alpha): Excess return unexplained by the CAPM. Under the strict CAPM, \(\alpha = 0\) for all assets in equilibrium.

Goodness of Fit: R-squared

The coefficient of determination \(R^2\) measures the proportion of variance in \(y\) explained by the regression:

\[ R^2 = 1 - \frac{SSR}{SST} = 1 - \frac{\sum \hat{\varepsilon}_t^2}{\sum (y_t - \bar{y})^2} \]

In the market model, \(R^2\) equals \(\rho^2_{im}\) — the squared correlation between the asset return and the market return. It represents the fraction of the asset’s total variance that is systematic.

Classical Regression Assumptions (Gauss-Markov)

For OLS to produce BLUE (Best Linear Unbiased Estimators), the following conditions must hold:

  1. Linearity: The true relationship is linear in parameters.
  2. No perfect multicollinearity: Regressors are not perfectly correlated.
  3. Zero conditional mean: \(E[\varepsilon_t | x_t] = 0\) (exogeneity).
  4. Homoskedasticity: \(\text{Var}(\varepsilon_t | x_t) = \sigma^2\) (constant variance).
  5. No autocorrelation: \(\text{Cov}(\varepsilon_t, \varepsilon_s) = 0\) for \(t \neq s\).

When these assumptions hold, the OLS estimators have the smallest variance among all linear unbiased estimators (Gauss-Markov theorem). Adding the assumption that \(\varepsilon_t \sim N(0, \sigma^2)\) allows exact inference via \(t\)- and \(F\)-tests.

Hypothesis Testing in Regression

The t-test for \(H_0: \beta = 0\):

\[ t = \frac{\hat{\beta}}{SE(\hat{\beta})} \sim t_{T-2} \]

where \(SE(\hat{\beta}) = \frac{s}{\sqrt{\sum(x_t - \bar{x})^2}}\) and \(s = \sqrt{SSR/(T-2)}\) is the standard error of regression.

Reject \(H_0\) at significance level \(\alpha\) if \(|t| > t_{\alpha/2, T-2}\). The p-value reports the probability of observing a test statistic at least as extreme under \(H_0\).

Example — CAPM estimation: Using monthly excess returns for a Canadian equity (say, Shopify) over 60 months, we run the market model regression. Suppose we obtain: \[ \hat{r}_{Shopify} - r_f = 0.8\% + 1.45 (r_{TSX} - r_f) \]

with \(t(\hat{\alpha}) = 2.1\) and \(t(\hat{\beta}) = 8.3\), \(R^2 = 0.52\).

The beta of 1.45 indicates Shopify is 45% more volatile than the market. The alpha of 0.8% per month (annualized ≈ 9.6%) suggests outperformance relative to CAPM — though this must be interpreted cautiously given estimation error and potential omitted risk factors.


Chapter 4: Multiple Linear Regression — Multifactor Models

From CAPM to Multi-Factor Models

Empirical research (Fama and French, 1992; 1993) demonstrated that the single-factor CAPM leaves substantial systematic variation in returns unexplained. Additional factors — related to firm size, book-to-market ratio, momentum, and profitability — improve the model’s explanatory power.

The general multiple regression model:

\[ y_t = \beta_0 + \beta_1 x_{1t} + \beta_2 x_{2t} + \cdots + \beta_k x_{kt} + \varepsilon_t \]

In matrix notation:

\[ \mathbf{y} = \mathbf{X} \boldsymbol{\beta} + \boldsymbol{\varepsilon} \]

The OLS estimator:

\[ \hat{\boldsymbol{\beta}} = (\mathbf{X}^T \mathbf{X})^{-1} \mathbf{X}^T \mathbf{y} \]

The Fama-French Three-Factor Model

Fama and French (1993) identified three factors that explain a large portion of cross-sectional variation in equity returns:

\[ r_{it} - r_{ft} = \alpha_i + \beta_i^{MKT}(r_{mt} - r_{ft}) + \beta_i^{SMB} \cdot SMB_t + \beta_i^{HML} \cdot HML_t + \varepsilon_{it} \]

where:

  • SMB (Small Minus Big): Return spread between small-cap and large-cap portfolios. Positive SMB loading indicates exposure to the size premium.
  • HML (High Minus Low): Return spread between high book-to-market (value) and low book-to-market (growth) stocks. Positive HML loading indicates value exposure.

The intuition: small firms and value firms have historically earned higher average returns; the FF model attributes this to systematic risk, not mispricing.

The Fama-French-Carhart Four-Factor Model

Carhart (1997) added a momentum factor (MOM or WML — Winners Minus Losers) to capture the empirical tendency for recent outperformers to continue outperforming over 3–12 months:

\[ r_{it} - r_{ft} = \alpha_i + \beta_i^{MKT}MKT_t + \beta_i^{SMB}SMB_t + \beta_i^{HML}HML_t + \beta_i^{MOM}MOM_t + \varepsilon_{it} \]

Fama and French (2015) subsequently added profitability (RMW — Robust Minus Weak) and investment (CMA — Conservative Minus Aggressive) factors, creating the five-factor model.

Adjusted R-Squared

In multiple regression, adding variables always increases \(R^2\) even if spurious. The adjusted \(R^2\) penalizes for the number of predictors:

\[ \bar{R}^2 = 1 - \frac{SSR/(T-k-1)}{SST/(T-1)} \]

Use adjusted \(R^2\) to compare models with different numbers of regressors.

The F-Test for Overall Significance

The F-statistic tests \(H_0: \beta_1 = \beta_2 = \cdots = \beta_k = 0\) jointly:

\[ F = \frac{(SST - SSR)/k}{SSR/(T-k-1)} \sim F_{k, T-k-1} \]

A significant F-statistic indicates the model as a whole has explanatory power.


Chapter 5: Diagnosing and Correcting Regression Problems

Heteroskedasticity

Heteroskedasticity occurs when the error variance is non-constant: \(\text{Var}(\varepsilon_t | x_t) = \sigma_t^2\). In financial data, this is extremely common — volatility clusters, so residuals from a return regression will exhibit larger variance during turbulent periods.

Detection:

  • White test or Breusch-Pagan test for heteroskedasticity.
  • Plot residuals vs. fitted values or vs. time; fan-shaped patterns indicate heteroskedasticity.

Consequences: OLS estimators remain unbiased but are no longer efficient, and standard errors are incorrect, invalidating t-tests and F-tests.

Correction:

  • Use heteroskedasticity-consistent (HC) or “robust” standard errors (White, 1980).
  • Use Weighted Least Squares (WLS) if the form of heteroskedasticity is known.

Autocorrelation

Autocorrelation (serial correlation) in residuals means \(\text{Cov}(\varepsilon_t, \varepsilon_{t-k}) \neq 0\). This is common in time series regressions where omitted variables follow a time pattern.

Detection:

  • Durbin-Watson (DW) statistic: Tests for first-order autocorrelation (\(AR(1)\)):
\[ DW = \frac{\sum_{t=2}^{T} (\hat{\varepsilon}_t - \hat{\varepsilon}_{t-1})^2}{\sum_{t=1}^{T} \hat{\varepsilon}_t^2} \approx 2(1 - \hat{\rho}) \]

DW near 2 indicates no autocorrelation; below 2 (positive) or above 2 (negative) autocorrelation.

  • Breusch-Godfrey LM test for higher-order autocorrelation.

Correction: Newey-West standard errors (HAC — Heteroskedasticity and Autocorrelation Consistent) are widely used in financial econometrics to produce valid inference without specifying the exact autocorrelation structure.

Multicollinearity

Multicollinearity occurs when regressors are highly (but not perfectly) correlated. OLS estimates remain unbiased but become imprecise (large standard errors), making individual coefficient tests uninformative even when the overall model F-test is significant.

Detection:

  • Variance Inflation Factor (VIF): \(VIF_j = 1/(1 - R_j^2)\) where \(R_j^2\) is the \(R^2\) from regressing \(x_j\) on all other regressors. VIF > 10 signals severe multicollinearity.

Remedies: Reduce the number of predictors; use principal components regression; obtain more data.


Chapter 6: Portfolio Theory and Empirical Asset Pricing

Mean-Variance Optimization

The mean-variance framework (Markowitz, 1952) constructs portfolios that minimize variance for a given expected return. For an \(n\)-asset universe with expected return vector \(\boldsymbol{\mu}\) and covariance matrix \(\boldsymbol{\Sigma}\), the portfolio weights \(\mathbf{w}\) satisfying:

\[ \min_{\mathbf{w}} \mathbf{w}^T \boldsymbol{\Sigma} \mathbf{w} \quad \text{subject to} \quad \mathbf{w}^T \boldsymbol{\mu} = \mu_p, \quad \mathbf{w}^T \mathbf{1} = 1 \]

trace out the efficient frontier in mean-standard deviation space.

The Sharpe ratio measures risk-adjusted return:

\[ SR = \frac{\mu_p - r_f}{\sigma_p} \]

The portfolio with the highest Sharpe ratio is the tangency portfolio, which all rational investors should hold in combination with the risk-free asset.

Testing Asset Pricing Models

Testing the CAPM or any factor model requires evaluating whether the intercept (alpha) is statistically zero. Gibbons, Ross, and Shanken (GRS) test (1989) provides a joint test of whether all alphas in a set of test portfolios are simultaneously zero, accounting for estimation error and cross-correlation of residuals.

The GRS statistic:

\[ GRS = \frac{T - N - K}{N} \left(1 + \hat{\mu}_f^T \hat{\Sigma}_{f}^{-1} \hat{\mu}_f\right)^{-1} \hat{\alpha}^T \hat{\Sigma}^{-1} \hat{\alpha} \sim F_{N, T-N-K} \]

Rejection of the null suggests the model does not fully explain the cross-section of expected returns.

Empirical Properties of the Efficient Market Hypothesis

The Efficient Market Hypothesis (EMH) (Fama, 1970) asserts that asset prices fully reflect available information:

  • Weak form: Current prices reflect all past price information; technical analysis has no predictive power.
  • Semi-strong form: Prices reflect all publicly available information; fundamental analysis yields no abnormal returns.
  • Strong form: Prices reflect all information, including private (insider) information.

Empirical tests use regression-based approaches:

  • Return predictability tests: Regress future returns on lagged variables (dividend yield, price-to-earnings, term spread). Predictability inconsistent with weak-form EMH.
  • Event studies: Measure abnormal returns around information events (earnings announcements, M&A). Rapid price adjustment consistent with semi-strong EMH.
  • Momentum and reversal: Short-run momentum (Jegadeesh and Titman, 1993) and long-run reversal (De Bondt and Thaler, 1985) are anomalies inconsistent with at least some formulations of EMH.

Chapter 7: First Steps Toward Machine Learning in Finance

Why Machine Learning?

Classical regression imposes a specific functional form (linearity) and assumes a well-specified model. Financial data often involves complex, nonlinear relationships between many predictors, and the relevant predictors may not be known in advance. Machine learning (ML) offers flexible tools for learning patterns from data without rigid prior specification.

ML in finance is not a replacement for theory-based modeling but a complement: theory guides feature engineering and model interpretation; ML provides flexible estimation.

Key Concepts

Bias-Variance Tradeoff

A central concept in ML is the bias-variance tradeoff:

  • Bias: Error from erroneous assumptions in the learning algorithm (underfitting). High-bias models are too simple.
  • Variance: Error from sensitivity to fluctuations in the training data (overfitting). High-variance models are too complex.
\[ \text{Expected MSE} = \text{Bias}^2 + \text{Variance} + \text{Irreducible Noise} \]

Optimal models balance bias and variance. In financial data with limited samples and high noise, overfitting is a persistent danger.

Cross-Validation

K-fold cross-validation divides the data into \(K\) subsets, trains on \(K-1\) folds and tests on the remaining fold, rotating through all folds. This produces an out-of-sample performance estimate that guards against overfitting.

Walk-forward (rolling window) validation is the appropriate time-series adaptation: the training set is always in the past relative to the test set, preserving the temporal ordering of financial data.

Regularization

LASSO (Least Absolute Shrinkage and Selection Operator) adds an L1 penalty to OLS:

\[ \min_{\boldsymbol{\beta}} \sum_{t=1}^T (y_t - \mathbf{x}_t^T \boldsymbol{\beta})^2 + \lambda \sum_{j=1}^k |\beta_j| \]

LASSO performs automatic variable selection by shrinking small coefficients exactly to zero.

Ridge regression adds an L2 penalty that shrinks all coefficients toward zero without exact sparsity:

\[ \min_{\boldsymbol{\beta}} \sum_{t=1}^T (y_t - \mathbf{x}_t^T \boldsymbol{\beta})^2 + \lambda \sum_{j=1}^k \beta_j^2 \]

Both methods reduce variance at the cost of some bias. The hyperparameter \(\lambda\) (regularization strength) is tuned via cross-validation.

Decision Trees and Ensemble Methods

A decision tree partitions the predictor space into regions, assigning a predicted value to each region. Trees are interpretable but prone to overfitting.

Random forests aggregate predictions from many trees, each trained on a random subsample of observations and predictors. This bagging approach dramatically reduces variance without increasing bias much.

Gradient boosting (e.g., XGBoost) sequentially adds trees that each correct the residual errors of the current ensemble.

Applications in Finance

  • Return prediction: Predicting next-month equity returns using a large set of accounting and market predictors. Gu, Kelly, and Xiu (2020) find that ML methods (neural networks, gradient boosting) outperform linear models in predictive \(R^2\).
  • Credit scoring: Classifying loan applicants as creditworthy or not based on payment history, income, and other features.
  • Algorithmic trading: Pattern recognition in order flow and price data.
  • Risk management: Estimating tail risk measures more flexibly than parametric models.
  • Fraud detection: Identifying anomalous transactions in real time.

Caution in finance: Financial data has a low signal-to-noise ratio. Models that perform well in-sample frequently fail out-of-sample. Rigorous out-of-sample testing and understanding of the economic mechanism behind any pattern are essential before trusting ML predictions.


Chapter 8: Practical Data Analysis Workflow

The Research Report Framework

A quantitative finance research report integrates data acquisition, exploratory analysis, model estimation, inference, and interpretation. A complete report structure:

  1. Introduction and economic motivation: State the research question, its economic significance, and the analytical approach.
  2. Data description: Sources, sample period, variable definitions, frequency, and handling of missing data. Summary statistics table.
  3. Exploratory analysis: Time series plots, histograms, Q-Q plots, autocorrelation plots. Identify data anomalies.
  4. Methodology: Specify the econometric model, estimation method, and hypothesis tests.
  5. Results: Regression coefficient estimates, standard errors, t-statistics, p-values, \(R^2\), and joint tests. Tables with asterisks indicating significance levels.
  6. Diagnostics: Tests for heteroskedasticity, autocorrelation, and normality of residuals.
  7. Economic interpretation: What do the coefficient estimates mean in financial terms? How large are the effects? Are they economically as well as statistically significant?
  8. Conclusion and limitations: Summary of findings, caveats, and suggestions for future analysis.

Working with Excel for Financial Data Analysis

Excel’s Data Analysis ToolPak provides regression (including coefficient estimates, standard errors, t-statistics, and F-statistics), descriptive statistics, and histogram tools sufficient for many financial analysis tasks.

Key Excel functions for quantitative finance:

FunctionPurpose
AVERAGE, STDEV.SMean and sample std deviation
COVARIANCE.S, CORRELSample covariance and correlation
SLOPE, INTERCEPTSimple regression beta and alpha
RSQR-squared
NORM.S.DIST, NORM.S.INVStandard normal CDF and inverse
T.TEST, F.TESTHypothesis tests
LINESTMultiple regression (array function)

For more complex analysis (rolling windows, autocorrelation functions, cross-validation), Python (pandas, statsmodels, scikit-learn) or R offer superior capabilities.


Summary

AFM 323 provides the statistical and econometric foundation for empirical financial analysis. The course begins with the distributional properties of financial returns — their fat tails, skewness, and volatility clustering — before developing the full toolkit of simple and multiple regression. These tools are applied to the central models of asset pricing: CAPM, Fama-French three-factor, and extensions. The course emphasizes not just estimation but inference, diagnostics, and the interpretation of results in economic terms. The final section introduces machine learning as a flexible extension of regression, with particular attention to the bias-variance tradeoff and the discipline of out-of-sample validation. Throughout, Excel is the primary computational tool, with emphasis on building a practical research workflow from raw financial data to interpretable, defensible quantitative conclusions.

Back to top