AFM 112: Analytic Methods for Business 1
Laila Rohani
Estimated study time: 1 hr 38 min
Table of contents
Sources and References
Primary reference — Balka, J. (open access). Statistics: A First Course (various editions, freely available online). This text is the backbone of the quantitative methods material. Supplementary — Wackerly, D., Mendenhall, W., & Scheaffer, R. (2008). Mathematical Statistics with Applications (7th ed.). Brooks/Cole. DeVore, J. L. (2016). Probability and Statistics for Engineering and the Sciences (9th ed.). Cengage Learning.
Chapter 1: Types of Data and Measurement Scales
1.1 Why Data Classification Matters
Before applying any statistical technique, the analyst must understand the nature of the data at hand. The appropriate summary statistics, charts, and inferential procedures all depend on whether a variable is categorical, ordinal, or numerical. Using a histogram for a nominal variable or computing a mean for an ordinal scale both produce misleading results. For AFM students, misclassifying a variable is a routine source of error that shows up in both spreadsheet work and programmatic analysis.
1.2 Scales of Measurement
Nominal Scale
Nominal variables classify observations into named categories with no inherent order. Arithmetic operations on nominal codes are meaningless.
Examples in accounting and finance:
- Industry sector (Technology, Financial Services, Energy, Healthcare)
- Type of audit opinion (Unqualified, Qualified, Adverse, Disclaimer)
- Payment method (Cash, Credit card, Wire transfer, Cheque)
- Province of incorporation
Because nominal categories carry no numerical meaning, the only meaningful summary statistics are counts and proportions. The appropriate chart is a bar chart or pie chart (with the caveat that pie charts become difficult to read with more than four or five categories).
Ordinal Scale
Ordinal variables have categories that can be ranked, but the gaps between adjacent ranks are not necessarily equal. A customer satisfaction score of 4 is better than 3, but the difference between 4 and 3 need not equal the difference between 3 and 2.
Examples in accounting and finance:
- Bond credit rating (AAA, AA, A, BBB, BB, B, CCC, D)
- Audit risk level (Low, Medium, High, Very High)
- Employee performance tier (Below expectations, Meets expectations, Exceeds expectations, Outstanding)
- Education level of survey respondent (High school, Bachelor’s, Master’s, Doctoral)
The median and mode are meaningful for ordinal data, but the mean is technically not, because computing a mean requires that differences between values be comparable. In practice, means are often computed on Likert-scale responses (1–5), but this is a modelling assumption, not a mathematical fact.
Interval Scale
Interval scales have equal spacing between values but lack a true zero. The zero on an interval scale is arbitrary, so ratios are not interpretable.
Classic example: Temperature in Celsius. 20°C is not “twice as warm” as 10°C, because 0°C does not represent the absence of temperature.
Finance-adjacent example: Calendar dates. The difference between January 1 and February 1 is 31 days (interval is meaningful), but a ratio of dates is not.
Ratio Scale
Ratio scales have equal spacing and a meaningful zero that represents the complete absence of the quantity. All arithmetic operations are valid.
Examples in accounting and finance:
- Account balance (a balance of $0 means no money; a balance of $200,000 is twice a balance of $100,000)
- Revenue, cost, gross profit
- Number of transactions
- Days payable outstanding
- Interest rate (%)
The majority of quantitative financial variables are on a ratio scale. All standard descriptive statistics (mean, variance, standard deviation, coefficient of variation) and inferential procedures apply.
1.3 Cross-Sectional vs. Time-Series Data
The type of data structure governs the appropriate model. Time-series data requires techniques that account for autocorrelation (the tendency for observations close in time to be correlated). Cross-sectional data requires accounting for heterogeneity across units.
1.4 Population vs. Sample
Chapter 2: Frequency Distributions and Histograms
2.1 Organizing Raw Data
Raw data in its unordered form is difficult to interpret. A frequency distribution reorganizes the data to reveal patterns in the distribution.
Categorical Frequency Table
| Opinion Type | Frequency | Relative Frequency | Percentage |
|---|---|---|---|
| Unqualified | 94 | 0.783 | 78.3% |
| Qualified | 18 | 0.150 | 15.0% |
| Adverse | 5 | 0.042 | 4.2% |
| Disclaimer | 3 | 0.025 | 2.5% |
| Total | 120 | 1.000 | 100.0% |
The relative frequency for each category is computed as: frequency / total. The most common opinion is unqualified (78.3%), which is typical for a firm with a rigorous client-acceptance process.
2.2 Frequency Distributions for Numerical Data
For numerical data, observations are grouped into class intervals (also called bins or classes). The choice of the number of classes affects how much detail the distribution reveals.
Sturges’ rule suggests the number of classes \( k \) for a dataset of size \( n \):
\[ k \approx 1 + 3.322 \log_{10}(n) \]For \( n = 100 \): \( k \approx 1 + 3.322 \times 2 = 7.6 \), so use 7 or 8 classes. For \( n = 1000 \): \( k \approx 1 + 3.322 \times 3 = 10.97 \), so use 10 or 11 classes.
Class width is then approximately:
\[ \text{Width} \approx \frac{\text{Maximum} - \text{Minimum}}{k} \]Round up to a convenient number so the classes cover the full range.
Using Sturges’ rule: \( k \approx 1 + 3.322 \log_{10}(80) \approx 7.3 \), so 7 classes.
Class width \( \approx (62 - 3)/7 = 8.4 \), rounded up to 9. Starting at 3:
| Class Interval | Midpoint | Frequency | Relative Frequency |
|---|---|---|---|
| 3 – 11 | 7 | 12 | 0.150 |
| 12 – 20 | 16 | 22 | 0.275 |
| 21 – 29 | 25 | 19 | 0.238 |
| 30 – 38 | 34 | 13 | 0.163 |
| 39 – 47 | 43 | 8 | 0.100 |
| 48 – 56 | 52 | 4 | 0.050 |
| 57 – 65 | 61 | 2 | 0.025 |
| Total | 80 | 1.001 |
(Rounding to three decimal places causes the relative frequencies to sum to 1.001 rather than exactly 1.000.)
The distribution is right-skewed: most claims are processed within 20 days, but a small number take over 50 days.
2.3 Cumulative Frequency Distributions
A cumulative frequency column records the number of observations at or below the upper bound of each class. A cumulative relative frequency records this as a proportion.
Extending Example 2.2:
| Class (Upper Bound) | Frequency | Cumulative Frequency | Cumulative Relative Frequency |
|---|---|---|---|
| ≤ 11 | 12 | 12 | 0.150 |
| ≤ 20 | 22 | 34 | 0.425 |
| ≤ 29 | 19 | 53 | 0.663 |
| ≤ 38 | 13 | 66 | 0.825 |
| ≤ 47 | 8 | 74 | 0.925 |
| ≤ 56 | 4 | 78 | 0.975 |
| ≤ 65 | 2 | 80 | 1.000 |
From this table we can immediately answer questions such as: “What proportion of claims are processed within 29 days?” Answer: 66.3%.
2.4 Histograms
A histogram is the graphical counterpart of a numerical frequency distribution. The horizontal axis represents the measurement scale and the vertical axis represents frequency or relative frequency. The bars are contiguous (touching), reflecting that the underlying variable is continuous.
- Shape: Is the distribution symmetric, right-skewed (long right tail), or left-skewed (long left tail)?
- Centre: Around what value is the mass of the distribution located?
- Spread: How wide is the distribution?
- Modality: Does the histogram show one peak (unimodal), two peaks (bimodal), or more?
- Outliers: Are there isolated bars far from the main body?
Right-skewed (positively skewed) distributions are common in financial data: income, transaction values, insurance claim amounts, and waiting times all tend to have long right tails because values cannot be negative but can be arbitrarily large.
Left-skewed (negatively skewed) distributions occur when values cluster near a natural maximum—for example, exam scores that are truncated at 100.
Chapter 3: Measures of Central Tendency
3.1 The Arithmetic Mean
The arithmetic mean is the most widely used measure of centre. For a sample of \( n \) observations \( x_1, x_2, \ldots, x_n \):
\[ \bar{x} = \frac{x_1 + x_2 + \cdots + x_n}{n} = \frac{1}{n} \sum_{i=1}^{n} x_i \]For a population of \( N \) observations, the population mean is denoted \( \mu \):
\[ \mu = \frac{1}{N} \sum_{i=1}^{N} x_i \]Properties of the Mean
- Uniqueness: Every dataset has exactly one mean.
- Uses all observations: Every data value contributes to the mean.
- Algebraic tractability: The mean is the centre of gravity of the distribution — the sum of deviations from the mean is exactly zero: \( \sum_{i=1}^{n}(x_i - \bar{x}) = 0 \).
- Sensitivity to outliers: A single extreme value can pull the mean far from the bulk of the data.
The average account balance is $7,114. Now suppose a wealthy customer joins with a balance of $210,000. The new mean becomes \( (49.8 + 210)/8 = 259.8/8 = 32.475 \), or $32,475—a value larger than seven of the eight customers’ balances. This illustrates the mean’s sensitivity to outliers.
Weighted Mean
When observations carry different weights (e.g., different numbers of units, different portfolio allocations), the weighted mean is appropriate:
\[ \bar{x}_w = \frac{\sum_{i=1}^{n} w_i x_i}{\sum_{i=1}^{n} w_i} \]| Asset | Weight | Annual Return |
|---|---|---|
| Canadian equities | 0.50 | 8.2% |
| US equities | 0.30 | 11.4% |
| Fixed income | 0.20 | 3.6% |
The portfolio’s expected return is 8.24% per annum.
3.2 The Median
The median is the value that divides the sorted dataset into two equal halves. It is the 50th percentile.
Procedure:
- Sort the observations from smallest to largest.
- If \( n \) is odd, the median is the middle value at position \( (n+1)/2 \).
- If \( n \) is even, the median is the average of the two middle values at positions \( n/2 \) and \( n/2 + 1 \).
Sorted (already sorted). \( n = 9 \), so the median is at position \( (9+1)/2 = 5 \).
Median = 68 (the 5th value) = $68,000.
The mean: \( \bar{x} = (52+58+61+65+68+72+80+95+145)/9 = 696/9 = 77.3 \) thousand dollars.
The senior partner’s salary of $145,000 pulls the mean to $77,300, well above seven of the nine employees’ salaries. The median of $68,000 better represents a “typical” salary in this department.
3.3 The Mode
The mode is the most frequently occurring value (or values) in the dataset. A dataset can be unimodal (one mode), bimodal (two modes), multimodal (more than two), or have no mode if all values are distinct.
The mode is especially useful for nominal variables, where the mean and median are not defined. “The modal payment method is credit card” is a meaningful statement; “the mean payment method is 2.3” is not.
3.4 Relationship Among Mean, Median, and Mode
For a unimodal distribution that is roughly symmetric, mean ≈ median ≈ mode.
For a right-skewed distribution: mode < median < mean. The long right tail pulls the mean upward.
For a left-skewed distribution: mean < median < mode. The long left tail pulls the mean downward.
This relationship is a useful quick check: if you observe \(\bar{x} > \text{Median}\), the distribution is likely right-skewed, which motivates checking for outliers or using a log transformation.
Chapter 4: Measures of Spread
4.1 The Range
\[ \text{Range} = x_{\max} - x_{\min} \]The range is the simplest measure of spread and is trivially easy to compute. Its weakness is that it depends on only two observations and is highly sensitive to outliers.
Fund A: 0.5, 0.8, 0.6, 0.7, 0.9 — Range = 0.9 − 0.5 = 0.4% Fund B: −1.2, 0.0, 0.7, 0.5, 2.0 — Range = 2.0 − (−1.2) = 3.2%
Fund B is considerably more volatile. The range captures the difference but provides no information about how observations are distributed within that interval.
4.2 Variance and Standard Deviation
The sample variance measures the average squared deviation of each observation from the sample mean:
\[ s^2 = \frac{\sum_{i=1}^{n}(x_i - \bar{x})^2}{n - 1} \]The denominator is \( n - 1 \) (degrees of freedom) rather than \( n \) to make \( s^2 \) an unbiased estimator of the population variance \( \sigma^2 \). This is called Bessel’s correction.
The sample standard deviation is the square root of the sample variance:
\[ s = \sqrt{s^2} = \sqrt{\frac{\sum_{i=1}^{n}(x_i - \bar{x})^2}{n - 1}} \]The standard deviation is expressed in the same units as the original data, making it interpretable as a “typical distance from the mean.”
The population variance and population standard deviation use \( N \) in the denominator:
\[ \sigma^2 = \frac{\sum_{i=1}^{N}(x_i - \mu)^2}{N}, \qquad \sigma = \sqrt{\sigma^2} \]Step 1 — Compute the mean:
\[ \bar{x} = \frac{12 + 15 + 11 + 18 + 14}{5} = \frac{70}{5} = 14 \]Step 2 — Compute deviations and squared deviations:
| \( x_i \) | \( x_i - \bar{x} \) | \( (x_i - \bar{x})^2 \) |
|---|---|---|
| 12 | −2 | 4 |
| 15 | +1 | 1 |
| 11 | −3 | 9 |
| 18 | +4 | 16 |
| 14 | 0 | 0 |
| Sum | 0 | 30 |
Step 3 — Compute variance and standard deviation:
\[ s^2 = \frac{30}{5 - 1} = \frac{30}{4} = 7.5 \text{ (millions}^2\text{)} \]\[ s = \sqrt{7.5} \approx 2.739 \text{ million dollars} \]Revenue in a typical quarter deviates from the mean of $14M by about $2.74M.
Computational Formula for Variance
When computing by hand with large datasets, the computational formula avoids accumulating rounding errors:
\[ s^2 = \frac{\sum_{i=1}^{n} x_i^2 - n\bar{x}^2}{n - 1} \]4.3 The Interquartile Range (IQR)
Percentiles and quartiles divide the sorted data into equal-sized groups:
- \( Q_1 \) (first quartile, 25th percentile): 25% of observations fall below this value.
- \( Q_2 \) (second quartile, 50th percentile): the median.
- \( Q_3 \) (third quartile, 75th percentile): 75% of observations fall below this value.
The interquartile range is:
\[ \text{IQR} = Q_3 - Q_1 \]The IQR is the range of the middle 50% of the data. It is robust to outliers because extreme values in the tails do not affect \( Q_1 \) or \( Q_3 \).
\( Q_1 \) = average of 3rd and 4th values = (4 + 5)/2 = 4.5 days. \( Q_3 \) = average of 9th and 10th values = (12 + 15)/2 = 13.5 days.
\[ \text{IQR} = 13.5 - 4.5 = 9 \text{ days} \]Note that the outlier of 35 days has no impact on the IQR, though it would substantially inflate the standard deviation.
4.4 The Coefficient of Variation
The coefficient of variation (CV) expresses the standard deviation as a percentage of the mean, enabling meaningful comparison of variability across datasets with different scales or units:
\[ \text{CV} = \frac{s}{\bar{x}} \times 100\% \]| Fund | Mean Annual Return | Standard Deviation | CV |
|---|---|---|---|
| Fund A (Large Cap) | 9.5% | 3.2% | 33.7% |
| Fund B (Small Cap) | 14.1% | 8.6% | 61.0% |
| Fund C (Bond) | 4.2% | 1.1% | 26.2% |
Fund C has the lowest CV (least variable relative to its return), while Fund B has the highest CV (most variable relative to its return). An investor seeking return per unit of risk would favour Fund C on a relative-volatility basis and Fund A as an equity option.
Note: CV is undefined (or meaningless) when the mean is zero or negative, which is why it is used carefully with return data that can include negative years.
Chapter 5: Skewness, Kurtosis, and Box Plots
5.1 Skewness
Skewness quantifies the asymmetry of a distribution. The most common formula for the sample skewness coefficient is:
\[ g_1 = \frac{\frac{1}{n}\sum_{i=1}^{n}(x_i - \bar{x})^3}{s^3} \]- \( g_1 = 0 \): Perfectly symmetric distribution.
- \( g_1 > 0 \): Right-skewed (positive skew); the right tail is longer.
- \( g_1 < 0 \): Left-skewed (negative skew); the left tail is longer.
As a practical guideline:
- \( |g_1| < 0.5 \): approximately symmetric.
- \( 0.5 \leq |g_1| < 1 \): moderate skewness.
- \( |g_1| \geq 1 \): substantial skewness.
5.2 Kurtosis
Kurtosis measures the heaviness of the tails relative to a normal distribution. Excess kurtosis (kurtosis minus 3) is more commonly reported:
\[ \text{Excess kurtosis} = \frac{\frac{1}{n}\sum_{i=1}^{n}(x_i - \bar{x})^4}{s^4} - 3 \]- Excess kurtosis = 0: tails like a normal distribution (mesokurtic).
- Excess kurtosis > 0: heavier tails than normal (leptokurtic); more extreme observations than a normal distribution would predict.
- Excess kurtosis < 0: lighter tails (platykurtic).
Financial returns are well-documented to exhibit leptokurtosis (excess kurtosis > 0), meaning extreme gains and losses occur more often than a normal distribution would suggest. This is called the “fat tails” property and is critical for risk management—Value at Risk models that assume normality systematically underestimate tail risk.
5.3 Five-Number Summary
The five-number summary compresses a distribution into five key values:
\[ \{ x_{\min},\; Q_1,\; Q_2 \text{ (Median)},\; Q_3,\; x_{\max} \} \]This provides a complete picture of a distribution’s location, spread, and tail behaviour.
Five-number summary: \(\{2,\; 4.5,\; 7.5,\; 13.5,\; 35\}\)
Median = (7 + 8)/2 = 7.5 days. The large gap between \( Q_3 = 13.5 \) and the maximum of 35 reveals the right-skewed tail.
5.4 Box Plots (Box-and-Whisker Plots)
A box plot graphically displays the five-number summary with additional identification of outliers.
Construction:
- Draw a box from \( Q_1 \) to \( Q_3 \). The width of the box equals the IQR.
- Draw a line inside the box at the median \( Q_2 \).
- Compute fences (outlier thresholds):
- Lower fence: \( Q_1 - 1.5 \times \text{IQR} \)
- Upper fence: \( Q_3 + 1.5 \times \text{IQR} \)
- Draw whiskers extending from the box to the most extreme data points that are still within the fences.
- Plot any observations beyond the fences as individual points — these are potential outliers.
\( Q_1 = 4.5 \), \( Q_3 = 13.5 \), \( \text{IQR} = 9 \).
Lower fence: \( 4.5 - 1.5(9) = 4.5 - 13.5 = -9 \) days (not meaningful here; no negative times). Upper fence: \( 13.5 + 1.5(9) = 13.5 + 13.5 = 27 \) days.
The value of 35 days exceeds the upper fence of 27 days and is plotted as an outlier. The upper whisker extends to 18 (the largest non-outlier value). The lower whisker extends to 2 (the minimum, which is within the lower fence).
Box plots are particularly valuable for comparing distributions across groups. A side-by-side box plot showing loan processing times by branch can immediately reveal whether one branch is systematically slower than others.
Chapter 6: Data Visualization
6.1 Principles of Effective Charts
Edward Tufte’s foundational principle of visualization is the data-ink ratio: the proportion of a chart’s ink that is devoted to displaying actual data. Maximizing the data-ink ratio — eliminating chart junk such as unnecessary gridlines, 3D effects, shadows, and decorative images — produces cleaner, more readable charts.
A second key principle is encode quantitative comparisons using position on a common scale wherever possible. Position is the most accurately perceived visual attribute. Length, area, and angle are progressively less accurate encodings — which is why bar charts (position/length) are easier to read than bubble charts (area) or pie charts (angle).
6.2 Chart Types and When to Use Them
Bar Charts (Categorical Data)
A bar chart displays the frequency, relative frequency, or mean of a categorical variable. Bars are separated by gaps (unlike a histogram), emphasizing the discrete nature of the categories.
- Use a vertical bar chart when category names are short and the number of categories is small (≤ 6).
- Use a horizontal bar chart when category labels are long or the number of categories is larger (makes labels readable).
| Product | Revenue ($M) |
|---|---|
| Retail Banking | 420 |
| Commercial Lending | 310 |
| Wealth Management | 185 |
| Insurance | 97 |
| Other | 42 |
A horizontal bar chart with bars sorted from longest to shortest (Retail Banking at top, Other at bottom) makes it immediately clear that Retail Banking contributes more than 40% of total revenue.
Histograms (Continuous Data)
Use a histogram to show the distribution (shape, centre, spread, outliers) of a continuous variable. Key choices:
- Number of bins: Too few bins over-smooth and hide structure; too many bins create noise. Sturges’ rule provides a starting point.
- Axis labels: Both axes must be labelled. The y-axis can show count, relative frequency, or density.
Scatter Plots (Relationship Between Two Variables)
A scatter plot places one variable on the x-axis and another on the y-axis, with one point per observation. Use scatter plots to:
- Detect linear or nonlinear relationships between two numerical variables.
- Identify outliers in the joint distribution.
- Assess whether a linear model is appropriate before running a regression.
Time Series Plots (Data Over Time)
A time series plot (line chart) connects successive observations in time order. It is the appropriate chart for tracking a variable over time. Key features to look for:
- Trend: A long-run upward or downward movement.
- Seasonality: A regular, repeating pattern within each year (e.g., retail sales peak in December).
- Cyclical variation: Longer-run fluctuations tied to business cycles.
- Irregular variation: Random, unpredictable noise.
| Data Situation | Recommended Chart |
|---|---|
| One categorical variable — frequencies | Bar chart |
| One continuous variable — distribution | Histogram or box plot |
| Two continuous variables — relationship | Scatter plot |
| One variable over time | Time series (line) plot |
| Comparing distributions across groups | Side-by-side box plots |
| Part-to-whole (≤ 5 categories) | Stacked bar or pie chart |
| Geographic variation | Choropleth map |
6.3 Common Visualization Errors to Avoid
- Truncated y-axis: A bar chart starting at a non-zero y-value exaggerates differences. A 5% difference looks like a 500% difference if the axis starts at 95%.
- Dual y-axes: Two scales on the same chart can create spurious visual correlations that do not exist in the data.
- 3D charts: Three-dimensional bar charts distort perceived heights due to perspective effects.
- Pie charts with too many slices: More than five slices become impossible to read; use a bar chart instead.
- Overplotting: In a scatter plot with thousands of points, individual points overlap and the underlying pattern is obscured. Solutions include transparency (alpha blending), jitter, or hexagonal bin plots.
Chapter 7: Probability Fundamentals
7.1 Definitions and Notation
Set operations on events:
- Union \( A \cup B \): “Either A or B (or both) occurs.”
- Intersection \( A \cap B \): “Both A and B occur.”
- Complement \( A^c \): “A does not occur.”
- Mutually exclusive: \( A \cap B = \emptyset \) — A and B cannot both occur.
- Exhaustive: \( A_1 \cup A_2 \cup \cdots \cup A_k = S \) — at least one of the events must occur.
7.2 Probability Axioms
Probability is a function that assigns a number to each event. Three axioms, due to Kolmogorov, define a valid probability measure:
- \( P(A) \geq 0 \) for any event \( A \).
- \( P(S) = 1 \) (the probability of some outcome occurring is 1).
- If \( A \) and \( B \) are mutually exclusive, \( P(A \cup B) = P(A) + P(B) \).
From these axioms, all other probability rules can be derived.
Derived rules:
- \( P(A^c) = 1 - P(A) \) — complement rule.
- \( P(\emptyset) = 0 \) — the empty event has probability zero.
- \( 0 \leq P(A) \leq 1 \) for all events.
7.3 General Addition Rule
\[ P(A \cup B) = P(A) + P(B) - P(A \cap B) \]When \( A \) and \( B \) are mutually exclusive, \( P(A \cap B) = 0 \) and the rule reduces to the additive axiom.
What is the probability that a randomly selected component has at least one defect?
\[ P(\text{Dimensional} \cup \text{Surface}) = 0.04 + 0.03 - 0.01 = 0.06 \]Six percent of components have at least one defect.
7.4 Conditional Probability
The conditional probability of event \( A \) given that event \( B \) has occurred is:
\[ P(A \mid B) = \frac{P(A \cap B)}{P(B)}, \quad P(B) > 0 \]Conditioning reduces the sample space to only those outcomes in \( B \) and asks what fraction of those also belong to \( A \).
Construct the contingency table:
| Error Present | Error Absent | Total | |
|---|---|---|---|
| Test Positive | 24 | 17 | 41 |
| Test Negative | 6 | 153 | 159 |
| Total | 30 | 170 | 200 |
Given a positive test result, there is a 58.5% probability that the account actually contains a material error. This is the positive predictive value of the test.
7.5 Independence
Events \( A \) and \( B \) are independent if the occurrence of one does not change the probability of the other:
\[ P(A \mid B) = P(A) \iff A \text{ and } B \text{ are independent} \]Equivalently:
\[ A \text{ and } B \text{ are independent} \iff P(A \cap B) = P(A) \cdot P(B) \]The independent backup control reduces the failure probability from 2% to 0.1%.
7.6 The Law of Total Probability
If \( B_1, B_2, \ldots, B_k \) are mutually exclusive and exhaustive events (a partition of \( S \)), then for any event \( A \):
\[ P(A) = \sum_{i=1}^{k} P(A \mid B_i) \cdot P(B_i) \]The overall default rate is 5.9%.
7.7 Bayes’ Theorem
Bayes’ theorem combines the law of total probability with conditional probability to update a probability in light of new information:
\[ P(B_i \mid A) = \frac{P(A \mid B_i) \cdot P(B_i)}{\sum_{j=1}^{k} P(A \mid B_j) \cdot P(B_j)} \]The terms have specific names in Bayesian reasoning:
- \( P(B_i) \): Prior probability — our belief about \( B_i \) before observing \( A \).
- \( P(A \mid B_i) \): Likelihood — how probable is \( A \) if \( B_i \) is true?
- \( P(B_i \mid A) \): Posterior probability — our updated belief after observing \( A \).
A transaction is flagged. What is the probability it is actually fraudulent?
Let \( F \) = transaction is fraudulent, \( F^c \) = legitimate. Let \( + \) = flagged by system.
Given: \( P(F) = 0.02 \), \( P(F^c) = 0.98 \), \( P(+ \mid F) = 0.90 \), \( P(+ \mid F^c) = 0.05 \).
\[ P(+) = P(+ \mid F) P(F) + P(+ \mid F^c) P(F^c) = (0.90)(0.02) + (0.05)(0.98) \]\[ = 0.018 + 0.049 = 0.067 \]\[ P(F \mid +) = \frac{P(+ \mid F) \cdot P(F)}{P(+)} = \frac{(0.90)(0.02)}{0.067} = \frac{0.018}{0.067} \approx 0.269 \]Despite a 90% sensitivity, only 26.9% of flagged transactions are actually fraudulent. This low positive predictive value is driven by the low base rate (2%) of actual fraud. The system generates many false positives that require manual review.
This example illustrates why base rates matter so much in risk scoring and why high sensitivity alone does not guarantee a useful screening tool.
Chapter 8: Discrete Probability Distributions
8.1 Random Variables
Probability Mass Function (PMF)
For a discrete random variable \( X \), the probability mass function \( p(x) \) gives the probability that \( X \) equals each possible value:
\[ p(x) = P(X = x), \quad \text{for each } x \text{ in the support of } X \]Valid PMFs satisfy: \( p(x) \geq 0 \) for all \( x \), and \( \sum_{\text{all } x} p(x) = 1 \).
Expected Value and Variance of a Discrete Random Variable
\[ E(X) = \mu = \sum_{\text{all }x} x \cdot p(x) \]\[ \text{Var}(X) = \sigma^2 = \sum_{\text{all }x} (x - \mu)^2 \cdot p(x) = E(X^2) - [E(X)]^2 \]\[ \text{SD}(X) = \sigma = \sqrt{\sigma^2} \]8.2 Bernoulli Distribution
The simplest discrete distribution models a single trial with two outcomes: success (1) or failure (0).
The Bernoulli distribution underlies the Binomial distribution and is the building block for binary outcome modelling in credit scoring, insurance underwriting, and quality control.
8.3 Binomial Distribution
The Binomial distribution models the number of successes in a fixed number of independent Bernoulli trials.
where \( \binom{n}{k} = \frac{n!}{k!(n-k)!} \) is the binomial coefficient (number of ways to choose \( k \) successes from \( n \) trials).
\[ E(X) = np, \quad \text{Var}(X) = np(1-p), \quad \text{SD}(X) = \sqrt{np(1-p)} \]Here \( X \sim \text{Binomial}(n = 20, p = 0.03) \).
\[ E(X) = 20 \times 0.03 = 0.6 \text{ defective components (on average)} \]\[ \text{Var}(X) = 20 \times 0.03 \times 0.97 = 0.582 \]\[ \text{SD}(X) = \sqrt{0.582} \approx 0.763 \]Probability of finding zero defectives:
\[ P(X = 0) = \binom{20}{0}(0.03)^0(0.97)^{20} = (0.97)^{20} \approx 0.5438 \]There is a 54.4% probability of finding no defectives in the sample of 20, even if the true defective rate is 3%. This illustrates the challenge of quality control with small samples.
Probability of finding at least 2 defectives:
\[ P(X \geq 2) = 1 - P(X = 0) - P(X = 1) \]\[ P(X = 1) = \binom{20}{1}(0.03)^1(0.97)^{19} = 20 \times 0.03 \times (0.97)^{19} \approx 20 \times 0.03 \times 0.5604 \approx 0.3362 \]\[ P(X \geq 2) = 1 - 0.5438 - 0.3362 = 0.1200 \]There is a 12.0% probability of finding two or more defectives.
Normal Approximation to the Binomial
When \( n \) is large and \( p \) is not too close to 0 or 1, the Binomial distribution is approximately Normal:
\[ X \approx N(np,\; np(1-p)) \]The rule of thumb for this approximation: use it when \( np \geq 10 \) and \( n(1-p) \geq 10 \).
When applying the normal approximation, the continuity correction improves accuracy by treating the discrete value \( k \) as the continuous interval \( [k - 0.5, k + 0.5] \):
\[ P(X \leq k) \approx P\!\left(Z \leq \frac{k + 0.5 - np}{\sqrt{np(1-p)}}\right) \]8.4 Poisson Distribution
The Poisson distribution models the number of events occurring in a fixed interval of time or space, when events occur independently at a constant average rate.
A distinctive property: the mean and variance are both equal to \( \lambda \).
Probability of receiving exactly 10 calls in the next hour:
\[ P(X = 10) = \frac{e^{-8} \cdot 8^{10}}{10!} = \frac{0.000335 \times 1,073,741,824}{3,628,800} \approx 0.0993 \]There is approximately a 9.93% probability of receiving exactly 10 calls.
Probability of receiving fewer than 5 calls:
\[ P(X < 5) = P(X = 0) + P(X = 1) + P(X = 2) + P(X = 3) + P(X = 4) \]\[ = e^{-8}\!\left(\frac{8^0}{0!} + \frac{8^1}{1!} + \frac{8^2}{2!} + \frac{8^3}{3!} + \frac{8^4}{4!}\right) \]\[ = e^{-8}(1 + 8 + 32 + \tfrac{512}{6} + \tfrac{4096}{24}) \]\[ = 0.000335 \times (1 + 8 + 32 + 85.33 + 170.67) \]\[ = 0.000335 \times 297 \approx 0.0995 \]There is about a 9.95% probability of fewer than 5 calls in an hour. Management can use this to staff the call centre to keep wait times acceptable at various confidence levels.
Chapter 9: Continuous Probability Distributions
9.1 Probability Density Functions
For a continuous random variable \( X \), probabilities are computed as areas under the probability density function (PDF) \( f(x) \):
\[ P(a \leq X \leq b) = \int_a^b f(x)\, dx \]Properties of a valid PDF: \( f(x) \geq 0 \) for all \( x \), and \( \int_{-\infty}^{\infty} f(x)\, dx = 1 \).
An important consequence: for a continuous random variable, \( P(X = c) = 0 \) for any specific value \( c \), because the area under a single point is zero. Thus \( P(X \leq c) = P(X < c) \).
9.2 Uniform Distribution
Probability processing takes more than 75 minutes:
\[ P(X > 75) = \frac{90 - 75}{90 - 30} = \frac{15}{60} = 0.25 \]9.3 Exponential Distribution
The exponential distribution models the time between successive events in a Poisson process. If events occur at rate \( \lambda \) per unit time (Poisson), the time between events is exponential with parameter \( \lambda \).
The remaining wait time, given that you have already waited \( s \) units, has the same distribution as the original wait time. The past waiting time is irrelevant.
Probability a customer waits more than 8 minutes for the next arrival:
\[ P(X > 8) = e^{-\lambda \cdot 8} = e^{-(1/5)(8)} = e^{-1.6} \approx 0.2019 \]There is about a 20.2% probability of a gap longer than 8 minutes between successive arrivals.
9.4 Normal Distribution
The normal distribution is the most important continuous distribution in statistics. Many natural phenomena, measurement errors, and sample averages follow normal distributions or are well-approximated by them.
Parameters: \( \mu \) = mean (location), \( \sigma^2 \) = variance (spread).
\[ E(X) = \mu, \quad \text{Var}(X) = \sigma^2, \quad \text{SD}(X) = \sigma \]Key properties:
- Symmetric and bell-shaped, centred at \( \mu \).
- Increasing \( \sigma \) flattens and widens the curve; decreasing \( \sigma \) makes it taller and narrower.
- The total area under the curve is 1.
- Tails extend to \( \pm\infty \) but become negligible beyond \( \pm 3\sigma \).
The Empirical Rule (68-95-99.7 Rule)
Approximately 68% of observations fall within one standard deviation of the mean, 95% within two, and 99.7% within three.
9.5 The Standard Normal Distribution and Z-Scores
The standard normal distribution is the special case \( Z \sim N(0, 1) \) — mean 0 and variance 1.
Standardization: Any normal random variable \( X \sim N(\mu, \sigma^2) \) can be converted to a standard normal variable by subtracting the mean and dividing by the standard deviation:
\[ Z = \frac{X - \mu}{\sigma} \]The \( z \)-score measures how many standard deviations an observation lies above or below the mean. A \( z \)-score of \( +2.0 \) means the observation is 2 standard deviations above the mean; a \( z \)-score of \( -1.5 \) means 1.5 standard deviations below.
Computing normal probabilities:
- Standardize: convert to \( Z \).
- Look up \( \Phi(z) = P(Z \leq z) \) in a standard normal table, or use software.
- Use symmetry and complementation as needed.
(a) What is the probability of a negative return in any given year?
\[ P(X < 0) = P\!\left(Z < \frac{0 - 8}{12}\right) = P(Z < -0.667) \]From a standard normal table: \( \Phi(-0.67) \approx 0.2514 \).
There is approximately a 25.1% probability of a negative return.
(b) What is the probability of a return exceeding 20%?
\[ P(X > 20) = P\!\left(Z > \frac{20 - 8}{12}\right) = P(Z > 1.00) = 1 - \Phi(1.00) = 1 - 0.8413 = 0.1587 \]There is approximately a 15.9% probability of a return exceeding 20%.
(c) What return is at the 95th percentile?
Find \( x \) such that \( P(X \leq x) = 0.95 \). From the standard normal table, \( \Phi(1.645) = 0.95 \), so \( z_{0.95} = 1.645 \).
\[ x = \mu + z \cdot \sigma = 8 + 1.645 \times 12 = 8 + 19.74 = 27.74\% \]The 95th percentile annual return is approximately 27.7%.
The 1% VaR corresponds to the 1st percentile of the daily return distribution.
\( z_{0.01} = -2.326 \) (from tables: \( P(Z < -2.326) = 0.01 \)).
\[ x_{0.01} = 0.05 + (-2.326)(1.2) = 0.05 - 2.791 = -2.741\% \]The daily VaR is 2.74%: on a portfolio worth $100 million, the bank should expect to lose more than $2.74 million on only 1% of trading days (about 2-3 days per year under the normality assumption).
9.6 Using the Standard Normal Table
Standard normal tables give \( \Phi(z) = P(Z \leq z) \) for \( z \geq 0 \). For negative values, use symmetry:
\[ P(Z \leq -z) = 1 - P(Z \leq z) = 1 - \Phi(z) \]Key probabilities to memorize:
| \( z \) | \( \Phi(z) = P(Z \leq z) \) |
|---|---|
| 0.00 | 0.5000 |
| 1.00 | 0.8413 |
| 1.28 | 0.8997 ≈ 0.90 |
| 1.645 | 0.9500 |
| 1.96 | 0.9750 |
| 2.00 | 0.9772 |
| 2.326 | 0.9900 |
| 2.576 | 0.9950 |
| 3.00 | 0.9987 |
The values 1.28, 1.645, 1.96, and 2.576 are particularly important as they correspond to the 90th, 95th, 97.5th, and 99.5th percentiles and appear repeatedly in confidence interval construction.
Chapter 10: Sampling Distributions and the Central Limit Theorem
10.1 Why Sampling Distributions Matter
Statistical inference draws conclusions about population parameters using sample statistics. The key insight is that a sample statistic — like \( \bar{x} \) — is itself a random variable: it varies from sample to sample. The sampling distribution describes this variability.
10.2 Sampling Distribution of the Sample Mean
Suppose the population has mean \( \mu \) and variance \( \sigma^2 \) (finite). A random sample of size \( n \) is drawn. The sample mean \( \bar{X} = \frac{1}{n}\sum_{i=1}^n X_i \) has the following properties:
The standard deviation of the sample mean is called the standard error (SE).
Two key results:
- The sample mean is an unbiased estimator of \( \mu \): its expected value equals the population mean.
- The standard error decreases as sample size increases: larger samples produce more precise estimates. To halve the standard error, quadruple the sample size.
10.3 The Central Limit Theorem
In practice, \(\bar{X}\) is approximately normally distributed for sufficiently large \( n \), regardless of the shape of the population distribution.
Practical rule of thumb: The CLT approximation is generally adequate for \( n \geq 30 \) when the population distribution is not too severely skewed. For nearly symmetric populations, normality of \( \bar{X} \) kicks in even sooner (sometimes as low as \( n = 5 \) or \( n = 10 \)).
The CLT is arguably the most important theorem in applied statistics. It explains why the normal distribution appears so widely in inference: even if the underlying data are skewed, Binomial, or Poisson, the distribution of sample averages converges to a normal distribution for large samples.
The company receives \( n = 150 \) claims per week. What is the probability that the average weekly claim amount exceeds $3,000?
By the CLT:
\[ \bar{X} \approx N\!\left(2500, \frac{4200^2}{150}\right) = N(2500, 117{,}600) \]\[ \text{SE} = \frac{4200}{\sqrt{150}} = \frac{4200}{12.247} \approx 343.0 \]\[ P(\bar{X} > 3000) = P\!\left(Z > \frac{3000 - 2500}{343.0}\right) = P(Z > 1.458) \]\[ = 1 - \Phi(1.458) \approx 1 - 0.9277 = 0.0723 \]There is approximately a 7.2% probability that the average claim exceeds $3,000 in any given week.
| Sample Size \( n \) | Standard Error \( \sigma/\sqrt{n} \) |
|---|---|
| 10 | $2,530 |
| 25 | $1,600 |
| 100 | $800 |
| 400 | $400 |
| 1,000 | $253 |
Quadrupling the sample size halves the standard error. This illustrates the diminishing returns of larger samples: going from \( n = 100 \) to \( n = 400 \) cuts the SE in half (a worthwhile improvement), but going from \( n = 400 \) to \( n = 1,600 \) achieves the same reduction for four times the sampling cost.
Chapter 11: Point Estimation
11.1 Estimators and Estimates
11.2 Unbiasedness
Key unbiased estimators:
- The sample mean \( \bar{X} \) is an unbiased estimator of the population mean \( \mu \).
- The sample proportion \( \hat{p} = X/n \) is an unbiased estimator of the population proportion \( p \).
- The sample variance \( S^2 = \frac{1}{n-1}\sum_{i=1}^n (X_i - \bar{X})^2 \) is an unbiased estimator of \( \sigma^2 \). (This is why we divide by \( n-1 \), not \( n \).)
Bias: If \( E(\hat{\theta}) \neq \theta \), the estimator is biased. The bias is \( E(\hat{\theta}) - \theta \).
11.3 Efficiency
Among all unbiased estimators of a parameter, the one with the smallest variance is called the most efficient (or minimum variance unbiased estimator, MVUE). For a normal population, the sample mean is the MVUE of \( \mu \).
An estimator with small MSE can be preferred even if it is slightly biased.
Chapter 12: Confidence Intervals
12.1 The Logic of Confidence Intervals
A point estimate is a single number that serves as our “best guess” of a parameter. But a point estimate conveys no information about precision. A confidence interval (CI) supplements the point estimate with a margin of error, producing a range of plausible values for the parameter.
12.2 Confidence Interval for a Population Mean — Large Sample
When the sample size is large (\( n \geq 30 \)) or the population standard deviation \( \sigma \) is known, the sampling distribution of \( \bar{X} \) is approximately normal and the CI for \( \mu \) is:
\[ \bar{x} \pm z_{\alpha/2} \cdot \frac{\sigma}{\sqrt{n}} \]where \( z_{\alpha/2} \) is the critical value from the standard normal distribution corresponding to confidence level \( (1-\alpha) \times 100\% \):
| Confidence Level | \( \alpha \) | \( z_{\alpha/2} \) |
|---|---|---|
| 90% | 0.10 | 1.645 |
| 95% | 0.05 | 1.960 |
| 99% | 0.01 | 2.576 |
When \( \sigma \) is unknown (the usual case in practice), we substitute the sample standard deviation \( s \). For large samples, the result is approximately valid.
Margin of Error (ME):
\[ \text{ME} = z_{\alpha/2} \cdot \frac{s}{\sqrt{n}} \]The CI is then: \( (\bar{x} - \text{ME},\; \bar{x} + \text{ME}) \).
Construct a 95% confidence interval for the true mean processing time.
\[ \text{ME} = 1.960 \times \frac{6.4}{\sqrt{64}} = 1.960 \times \frac{6.4}{8} = 1.960 \times 0.80 = 1.568 \]\[ \text{95% CI} = (18.5 - 1.568,\; 18.5 + 1.568) = (16.93,\; 20.07) \text{ hours} \]We are 95% confident that the true mean processing time is between 16.9 and 20.1 hours.
If management’s target is 18 hours average processing time, note that 18 hours is inside the confidence interval — the data are consistent with meeting the target, but there is uncertainty.
12.3 Confidence Interval for a Population Mean — Small Sample (t-Distribution)
When \( n < 30 \) and \( \sigma \) is unknown, we cannot rely on the CLT to guarantee approximate normality of \( \bar{X} \). If we additionally assume the population is approximately normal, we use the t-distribution with \( n - 1 \) degrees of freedom:
\[ \frac{\bar{X} - \mu}{S/\sqrt{n}} \sim t_{n-1} \]The t-distribution is symmetric and bell-shaped like the standard normal, but has heavier tails — it accounts for the additional uncertainty in estimating \( \sigma \) from a small sample. As \( n \to \infty \), the t-distribution converges to the standard normal.
Small-sample CI for \( \mu \):
\[ \bar{x} \pm t_{\alpha/2,\; n-1} \cdot \frac{s}{\sqrt{n}} \]where \( t_{\alpha/2,\; n-1} \) is the \( (1 - \alpha/2) \) quantile of the t-distribution with \( n-1 \) degrees of freedom.
Selected critical values:
| Degrees of Freedom | \( t_{0.025} \) (95% CI) | \( t_{0.005} \) (99% CI) |
|---|---|---|
| 5 | 2.571 | 4.032 |
| 10 | 2.228 | 3.169 |
| 15 | 2.131 | 2.947 |
| 20 | 2.086 | 2.845 |
| 25 | 2.060 | 2.787 |
| 30 | 2.042 | 2.750 |
| ∞ (Normal) | 1.960 | 2.576 |
Assuming expense amounts are approximately normally distributed, construct a 90% confidence interval for the mean reimbursement.
Degrees of freedom: \( df = n - 1 = 11 \). \( t_{0.05, 11} = 1.796 \) (from t-table).
\[ \text{ME} = 1.796 \times \frac{64.20}{\sqrt{12}} = 1.796 \times \frac{64.20}{3.464} = 1.796 \times 18.53 = 33.29 \]\[ \text{90% CI} = (287.40 - 33.29,\; 287.40 + 33.29) = (\$254.11,\; \$320.69) \]The auditor is 90% confident that the true mean expense claim lies between $254 and $321.
12.4 Confidence Interval for a Population Proportion
Let \( X \) be the number of successes in \( n \) trials. The sample proportion is \( \hat{p} = X/n \).
For large samples (where \( n\hat{p} \geq 10 \) and \( n(1-\hat{p}) \geq 10 \)), by the CLT:
\[ \hat{p} \approx N\!\left(p,\; \frac{p(1-p)}{n}\right) \]The large-sample CI for a population proportion is:
\[ \hat{p} \pm z_{\alpha/2} \sqrt{\frac{\hat{p}(1-\hat{p})}{n}} \]Construct a 95% confidence interval for the true proportion of overdue accounts.
\[ \text{SE}(\hat{p}) = \sqrt{\frac{0.17 \times 0.83}{200}} = \sqrt{\frac{0.1411}{200}} = \sqrt{0.000706} \approx 0.02657 \]\[ \text{95% CI} = 0.17 \pm 1.96 \times 0.02657 = 0.17 \pm 0.0521 = (0.1179,\; 0.2221) \]The manager is 95% confident that the true proportion of overdue accounts is between 11.8% and 22.2%.
12.5 Determining the Required Sample Size
Before collecting data, analysts must plan how large a sample is needed to achieve a desired margin of error. For estimating a population mean with margin of error \( E \) and confidence level \( (1-\alpha) \):
\[ n = \left(\frac{z_{\alpha/2} \cdot \sigma}{E}\right)^2 \]Since \( \sigma \) is often unknown at the planning stage, use a pilot-study estimate, a conservative bound, or literature-based values.
For a population proportion:
\[ n = \frac{z_{\alpha/2}^2 \cdot p(1-p)}{E^2} \]If \( p \) is completely unknown, use \( p = 0.5 \), which maximizes \( p(1-p) \) and gives the largest (most conservative) required sample size.
The bank needs to survey at least 1,843 customers. This relatively large sample is required because the margin of error (3 percentage points) is small and the confidence level (99%) is high.
Chapter 13: Integrative Review and Applied Problems
13.1 Connecting the Pieces
The topics covered in this course form a logical chain:
- Data classification determines which summaries are valid.
- Frequency distributions and histograms reveal the shape of distributions.
- Measures of centre and spread quantify location and variability.
- Probability theory provides the theoretical foundation for drawing inferences.
- Probability distributions model specific data-generating processes.
- Sampling distributions and the CLT explain how sample statistics behave.
- Confidence intervals translate sample information into statements about population parameters.
Each step builds on the previous, forming the foundation for more advanced techniques in regression analysis, hypothesis testing, and predictive modelling (AFM 113 and beyond).
13.2 Comprehensive Worked Example
A national retailer’s internal audit team investigates the efficiency of its point-of-sale system. They record the transaction processing time (in seconds) for a random sample of \( n = 36 \) transactions. The results are:
8, 12, 7, 15, 11, 9, 14, 8, 10, 13, 6, 18, 9, 11, 12, 7, 10, 14, 8, 11, 15, 9, 13, 7, 10, 12, 8, 16, 11, 9, 13, 7, 10, 14, 9, 11
Step 1: Summary statistics.
Sum = 8+12+7+15+11+9+14+8+10+13+6+18+9+11+12+7+10+14+8+11+15+9+13+7+10+12+8+16+11+9+13+7+10+14+9+11 = 383
\[ \bar{x} = \frac{383}{36} \approx 10.64 \text{ seconds} \]Sorted data (for percentiles): 6, 7, 7, 7, 7, 8, 8, 8, 8, 9, 9, 9, 9, 9, 10, 10, 10, 10, 11, 11, 11, 11, 11, 12, 12, 12, 13, 13, 13, 14, 14, 14, 15, 15, 16, 18
\( Q_2 \) (Median) = average of 18th and 19th values = (10 + 11)/2 = 10.5 seconds. \( Q_1 \) = average of 9th and 10th values = (8 + 9)/2 = 8.5 seconds. \( Q_3 \) = average of 27th and 28th values = (13 + 13)/2 = 13.0 seconds. \( \text{IQR} = 13.0 - 8.5 = 4.5 \) seconds.
Sample standard deviation (computed): \( s \approx 2.71 \) seconds.
\[ \text{CV} = \frac{2.71}{10.64} \times 100\% \approx 25.5\% \]The mean (10.64s) is slightly larger than the median (10.5s), suggesting mild right skewness, consistent with the value of 18 seconds being a high outlier.
Step 2: 95% confidence interval for the mean processing time.
With \( n = 36 \geq 30 \), use the normal approximation:
\[ \text{95% CI} = 10.64 \pm 1.96 \times \frac{2.71}{\sqrt{36}} = 10.64 \pm 1.96 \times 0.452 = 10.64 \pm 0.885 = (9.76,\; 11.52) \text{ seconds} \]Step 3: Probability calculation.
If processing times follow \( N(10.64, 2.71^2) \), what proportion of transactions take more than 15 seconds?
\[ P(X > 15) = P\!\left(Z > \frac{15 - 10.64}{2.71}\right) = P(Z > 1.609) \approx 1 - \Phi(1.61) \approx 1 - 0.9463 = 0.054 \]Approximately 5.4% of transactions are expected to take more than 15 seconds.
Step 4: Business interpretation.
The average processing time of 10.6 seconds is well within the company’s service standard of 15 seconds. The 95% CI (9.8s to 11.5s) confirms this with high confidence. However, about 5% of transactions exceed 15 seconds, warranting investigation of what drives slower transactions (specific cashiers, product types, payment methods).
13.3 Summary of Key Formulas
| Topic | Formula |
|---|---|
| Sample mean | \( \bar{x} = \frac{1}{n}\sum x_i \) |
| Sample variance | \( s^2 = \frac{\sum(x_i - \bar{x})^2}{n-1} \) |
| IQR | \( Q_3 - Q_1 \) |
| CV | \( s/\bar{x} \times 100\% \) |
| Conditional probability | \( P(A\mid B) = P(A \cap B)/P(B) \) |
| Bayes’ theorem | \( P(B_i \mid A) = P(A \mid B_i)P(B_i)/P(A) \) |
| Binomial PMF | \( P(X=k) = \binom{n}{k}p^k(1-p)^{n-k} \) |
| Binomial mean/variance | \( np \) ; \( np(1-p) \) |
| Poisson PMF | \( P(X=k) = e^{-\lambda}\lambda^k/k! \) |
| Normal standardization | \( Z = (X - \mu)/\sigma \) |
| Standard error | \( \text{SE} = \sigma/\sqrt{n} \) |
| Large-sample CI for \( \mu \) | \( \bar{x} \pm z_{\alpha/2} \cdot s/\sqrt{n} \) |
| Small-sample CI for \( \mu \) | \( \bar{x} \pm t_{\alpha/2,n-1} \cdot s/\sqrt{n} \) |
| CI for proportion | \( \hat{p} \pm z_{\alpha/2}\sqrt{\hat{p}(1-\hat{p})/n} \) |
| Sample size (mean) | \( n = (z_{\alpha/2}\sigma/E)^2 \) |
| Sample size (proportion) | \( n = z_{\alpha/2}^2 p(1-p)/E^2 \) |
13.4 Practice Problems
Problem 1. A random sample of 15 audit engagements records the number of hours billed: 42, 58, 35, 71, 49, 63, 38, 55, 67, 44, 52, 78, 41, 60, 47.
(a) Compute the mean, median, and mode. (b) Compute the range, IQR, variance, and standard deviation. (c) Compute the coefficient of variation. (d) Identify any outliers using the 1.5 × IQR rule. (e) Describe the shape of the distribution.
Problem 2. A credit card issuer believes 8% of its cardholders carry a balance greater than $10,000. A random sample of 250 accounts is drawn.
(a) What is the expected number of accounts with balances over $10,000? (b) What is the standard deviation of the count? (c) What is the probability that exactly 20 accounts have balances over $10,000? (d) Using the normal approximation, what is the probability that more than 25 accounts have balances over $10,000?
Problem 3. Transaction processing times at an ATM follow an exponential distribution with a mean of 45 seconds.
(a) What is the rate parameter \( \lambda \)? (b) What is the probability a transaction takes more than 1 minute? (c) What is the median transaction time? (Hint: use the formula \( \text{Median} = \ln(2)/\lambda \).) (d) What transaction time is exceeded by only 5% of transactions?
Problem 4. Annual returns for a market index are approximately normally distributed with mean 7.2% and standard deviation 15.6%.
(a) What is the probability of a loss exceeding 10% in a given year? (b) What return separates the top 10% of years from the rest? (c) In what range do 80% of annual returns fall? (d) Over 25 independent years, what is the probability that the average annual return is negative?
Problem 5. A mortgage lender samples 50 recently approved mortgages to estimate the mean loan-to-value ratio (LTV). Results: \( \bar{x} = 0.783 \), \( s = 0.094 \).
(a) Construct a 90%, 95%, and 99% confidence interval for the mean LTV. (b) How do the widths of these intervals compare? What drives the difference? (c) How large a sample would be needed to achieve a margin of error of 0.01 at 95% confidence? (Use \( \sigma \approx 0.094 \).)
These notes draw on Balka’s open-access statistics text, Wackerly, Mendenhall, and Scheaffer’s Mathematical Statistics with Applications*, and DeVore’s* Probability and Statistics for Engineering and the Sciences. Worked examples are original and designed for an AFM audience.