AFM 113: Analytic Methods for Business 2
Daniel Jiang
Estimated study time: 1 hr 28 min
Table of contents
Sources and References
Primary textbook — Balka, J. Introductory Statistics Explained (open access). Available at jbstatistics.com. Supplementary — Devore, J.L. (2016). Probability and Statistics for Engineering and the Sciences (9th ed.). Cengage. Newbold, P., Carlson, W.L., & Thorne, B. (2013). Statistics for Business and Economics (8th ed.). Pearson. Online resources — OpenIntro Statistics (openintro.org); MIT OCW 18.650: Statistics for Applications; Khan Academy Statistics and Probability.
Chapter 1: Hypothesis Testing — Core Framework
1.1 The Logic of Hypothesis Testing
Hypothesis testing is the engine of classical statistical inference. It provides a structured, reproducible procedure for deciding whether sample evidence is strong enough to warrant rejecting a baseline claim about a population. The process translates a business question into competing statistical hypotheses, computes a test statistic from data, and renders a verdict calibrated to a pre-specified tolerance for error.
The two hypotheses are exhaustive and mutually exclusive. In practice, we never “prove” \(H_0\) true — we either reject it (sufficient evidence against it) or fail to reject it (insufficient evidence, not proof of truth). The asymmetry is intentional: the null is a straw-man position that requires compelling evidence to overturn.
Steps in a Hypothesis Test
- State \(H_0\) and \(H_a\) precisely, in terms of a named population parameter.
- Choose a significance level \(\alpha\) — the maximum probability of incorrectly rejecting \(H_0\) the analyst will tolerate. Common choices: 0.10, 0.05, 0.01.
- Select the appropriate test statistic and verify conditions for its validity.
- Compute the test statistic from sample data.
- Find the p-value (probability of a result at least as extreme as observed, under \(H_0\)) or compare the test statistic to the critical value.
- Decide: reject \(H_0\) if p-value \(< \alpha\) (equivalently, if |test statistic| exceeds the critical value for two-tailed tests).
- Interpret in context — translate the statistical conclusion into a business-meaningful statement.
1.2 One-Tailed vs. Two-Tailed Tests
The choice of tail direction is dictated by the research question, not by what you observe in the data — it must be specified before looking at results.
1.3 Type I and Type II Errors
Any binary decision procedure operating under uncertainty will sometimes be wrong. There are exactly two ways to err:
| \(H_0\) Actually True | \(H_0\) Actually False | |
|---|---|---|
| Fail to Reject \(H_0\) | Correct (probability \(1 - \alpha\)) | Type II Error (probability \(\beta\)) |
| Reject \(H_0\) | Type I Error (probability \(\alpha\)) | Correct — Power (probability \(1 - \beta\)) |
There is an inherent trade-off: decreasing \(\alpha\) (making it harder to reject \(H_0\)) reduces Type I errors but increases \(\beta\) (Type II errors), lowering power. The only way to reduce both simultaneously is to increase sample size. In financial auditing, regulatory settings, and drug approval, the costs of each error type differ sharply — this asymmetry should inform the choice of \(\alpha\).
Power Analysis and Sample Size
The power of a one-sample z-test against the specific alternative \(\mu = \mu_1\) is:
\[ \text{Power} = P\!\left(Z > z_\alpha - \frac{|\mu_1 - \mu_0|}{\sigma/\sqrt{n}}\right) \]for an upper-tailed test. To achieve power \(1 - \beta\) at effect size \(\delta = \mu_1 - \mu_0\) with significance level \(\alpha\):
\[ n = \left(\frac{(z_\alpha + z_\beta)\,\sigma}{\delta}\right)^2 \]For a two-tailed test replace \(z_\alpha\) with \(z_{\alpha/2}\). This formula is the foundation of sample size planning in surveys, clinical trials, and audit procedures.
Round up to \(n = 46\) invoices. Sampling fewer than 46 items would leave the test underpowered and likely to miss a real misstatement of $500.
1.4 The p-Value: Interpretation and Misuse
The p-value is not the probability that \(H_0\) is true. It is not the probability that results occurred by chance. It is not a measure of the effect’s practical importance. These are among the most common misinterpretations in applied statistics.
Common misinterpretations to avoid:
- “The p-value is 0.03, so there is a 3% probability that \(H_0\) is true.” — FALSE. \(H_0\) is either true or not; the p-value is a conditional probability about data, not about hypotheses.
- “p = 0.06 means no effect exists.” — FALSE. Failing to reject \(H_0\) is not evidence that \(H_0\) is true; it only means insufficient evidence to reject it.
- “p = 0.001 means the effect is large and important.” — FALSE. With a huge sample, even a trivially small and practically unimportant effect will produce a tiny p-value.
Chapter 2: The One-Sample t-Test and z-Test
2.1 One-Sample z-Test (Known \(\sigma\))
When the population standard deviation \(\sigma\) is known (rare in practice), the test statistic under \(H_0: \mu = \mu_0\) is:
\[ z = \frac{\bar{x} - \mu_0}{\sigma / \sqrt{n}} \]This follows a standard normal distribution exactly (for normal populations) or approximately via the CLT (for large \(n\)).
Critical values for common significance levels:
| Test Type | \(\alpha = 0.10\) | \(\alpha = 0.05\) | \(\alpha = 0.01\) |
|---|---|---|---|
| Two-tailed | \(\pm 1.645\) | \(\pm 1.960\) | \(\pm 2.576\) |
| Upper-tailed | \(1.282\) | \(1.645\) | \(2.326\) |
| Lower-tailed | \(-1.282\) | \(-1.645\) | \(-2.326\) |
2.2 One-Sample t-Test (Unknown \(\sigma\))
In practice \(\sigma\) must be estimated by the sample standard deviation \(s\). The test statistic is:
\[ t = \frac{\bar{x} - \mu_0}{s / \sqrt{n}} \]Under \(H_0\) and the normality assumption, this follows a Student’s t-distribution with \(n - 1\) degrees of freedom. The t-distribution has heavier tails than the normal, reflecting the additional uncertainty from estimating \(\sigma\).
Hypotheses: \(H_0: \mu = 3.5\) vs. \(H_a: \mu > 3.5\).
Test statistic:
\[ t = \frac{3.9 - 3.5}{1.2 / \sqrt{36}} = \frac{0.4}{0.2} = 2.00, \quad \text{df} = 35 \]p-value (upper-tailed): In R, pt(2.00, df = 35, lower.tail = FALSE) \(\approx 0.027\).
Decision: Since \(p = 0.027 < \alpha = 0.05\), reject \(H_0\).
Interpretation: There is statistically significant evidence that the mean invoice processing time exceeds 3.5 days. Management should investigate the cause of the slowdown.
Conditions for the One-Sample t-Test
- Random sample: Observations are independently drawn from the population.
- Normality or large sample: Either the population is approximately normal, or \(n \geq 30\) so the CLT applies. For \(n < 30\), assess normality with a histogram, QQ plot, or Shapiro-Wilk test.
- Continuous outcome: The variable being measured is quantitative.
R Implementation
# One-sample t-test (two-tailed)
t.test(x, mu = 3.5, alternative = "two.sided")
# One-sample t-test (upper-tailed)
t.test(x, mu = 3.5, alternative = "greater")
# One-sample t-test (lower-tailed)
t.test(x, mu = 3.5, alternative = "less")
The output reports: t statistic, degrees of freedom, p-value, sample mean, and 95% confidence interval for \(\mu\).
Chapter 3: Two-Sample Inference
3.1 Independent Samples t-Test
Many business questions require comparing two independent groups: Does the experimental branch outperform the control branch in customer satisfaction? Is the mean claim amount different between two insurance product lines?
Hypotheses (two-tailed): \(H_0: \mu_1 - \mu_2 = D_0\) (typically \(D_0 = 0\)) vs. \(H_a: \mu_1 - \mu_2 \neq 0\).
Test statistic (Welch, not assuming equal variances):
\[ t = \frac{(\bar{x}_1 - \bar{x}_2) - D_0}{\sqrt{\dfrac{s_1^2}{n_1} + \dfrac{s_2^2}{n_2}}} \]Degrees of freedom (Welch-Satterthwaite approximation):
\[ df \approx \frac{\left(\dfrac{s_1^2}{n_1} + \dfrac{s_2^2}{n_2}\right)^2}{\dfrac{(s_1^2/n_1)^2}{n_1 - 1} + \dfrac{(s_2^2/n_2)^2}{n_2 - 1}} \]This df is not generally an integer; R rounds it or uses it continuously in the t-distribution. Welch’s test is preferred over the pooled (equal-variance) t-test because it remains valid whether or not the variances are equal, while incurring little efficiency loss when variances happen to be equal.
\(H_0: \mu_E = \mu_W\) vs. \(H_a: \mu_E \neq \mu_W\).
\[ t = \frac{142000 - 128000}{\sqrt{\dfrac{18000^2}{25} + \dfrac{22000^2}{30}}} = \frac{14000}{\sqrt{12{,}960{,}000 + 16{,}133{,}333}} = \frac{14000}{\sqrt{29{,}093{,}333}} = \frac{14000}{5394} \approx 2.595 \]Using t.test(east_sales, west_sales) in R gives \(df \approx 52.8\), p-value \(\approx 0.012\).
At \(\alpha = 0.05\), \(p = 0.012 < 0.05\): reject \(H_0\). The East region has significantly higher average weekly sales than the West.
3.2 Paired Samples t-Test
When each observation in Group 1 is logically matched with an observation in Group 2 (before/after measurements, matched vendors, twin stores), the paired design eliminates between-subject variability and is more powerful. Define differences \(d_i = x_{1i} - x_{2i}\) and compute:
\[ t = \frac{\bar{d} - D_0}{s_d / \sqrt{n}}, \quad df = n - 1 \]where \(\bar{d}\) is the mean difference and \(s_d\) is the standard deviation of the differences.
| Vendor | Before | After | \(d_i\) |
|---|---|---|---|
| 1 | 18 | 14 | 4 |
| 2 | 22 | 19 | 3 |
| 3 | 15 | 16 | -1 |
| 4 | 28 | 21 | 7 |
| 5 | 19 | 15 | 4 |
| 6 | 24 | 22 | 2 |
| 7 | 20 | 17 | 3 |
| 8 | 17 | 13 | 4 |
| 9 | 25 | 24 | 1 |
| 10 | 21 | 18 | 3 |
| 11 | 23 | 19 | 4 |
| 12 | 16 | 14 | 2 |
\(\bar{d} = 36/12 = 3.0\), \(s_d \approx 2.09\).
\[ t = \frac{3.0 - 0}{2.09 / \sqrt{12}} = \frac{3.0}{0.603} \approx 4.97, \quad df = 11 \]p-value (two-tailed) \(\approx 0.0004\). Strong evidence that the new process reduces cycle time. Average reduction is 3 days per vendor.
3.3 F-Test for Equality of Variances
Before applying the pooled t-test (which assumes \(\sigma_1^2 = \sigma_2^2\)), analysts sometimes test equality of variances. The F-test for this purpose uses:
\[ F = \frac{s_1^2}{s_2^2} \]Under \(H_0: \sigma_1^2 = \sigma_2^2\), \(F\) follows an F-distribution with \((n_1 - 1, n_2 - 1)\) degrees of freedom.
var.test(x, y).3.4 Comparing Two Proportions
\(\hat{p} = (45 + 72)/(400 + 500) = 117/900 = 0.130\).
\[ z = \frac{0.1125 - 0.144}{\sqrt{0.130 \times 0.870 \times (1/400 + 1/500)}} = \frac{-0.0315}{\sqrt{0.1131 \times 0.0045}} = \frac{-0.0315}{\sqrt{0.000509}} = \frac{-0.0315}{0.02256} \approx -1.396 \]p-value (two-tailed) \(= 2 \times P(Z < -1.396) \approx 2 \times 0.0814 = 0.163\).
At \(\alpha = 0.05\), fail to reject \(H_0\). Insufficient evidence that default rates differ between tiers.
Chapter 4: Chi-Square Tests
4.1 Goodness-of-Fit Test
The chi-square goodness-of-fit test asks whether the observed distribution of a categorical variable matches a specified theoretical distribution.
where \(O_i\) is the observed frequency in category \(i\) and \(E_i = n \cdot p_i\) is the expected frequency under \(H_0\). Under \(H_0\), \(\chi^2 \sim \chi^2_{k-1}\) (degrees of freedom \(= k - 1\) minus the number of parameters estimated from data).
Conditions: All expected cell counts \(E_i \geq 5\) (combine categories if necessary).
df = 3, critical value at \(\alpha = 0.05\) is \(\chi^2_{3, 0.05} = 7.815\). Since \(2.000 < 7.815\), fail to reject. No evidence against the uniform quarterly distribution.
4.2 Test of Independence
The chi-square test of independence assesses whether two categorical variables are associated in a contingency table.
Expected cell counts under independence:
\[ E_{ij} = \frac{(\text{Row } i \text{ Total}) \times (\text{Column } j \text{ Total})}{n} \]\[ \chi^2 = \sum_{i}\sum_{j} \frac{(O_{ij} - E_{ij})^2}{E_{ij}}, \quad df = (r-1)(c-1) \]| North | South | East | West | Row Total | |
|---|---|---|---|---|---|
| Low | 20 | 30 | 25 | 15 | 90 |
| Medium | 50 | 60 | 55 | 45 | 210 |
| High | 55 | 60 | 45 | 40 | 200 |
| Col Total | 125 | 150 | 125 | 100 | 500 |
Expected count for (Low, North): \(E_{11} = 90 \times 125 / 500 = 22.5\).
After computing all 12 expected counts and the \(\chi^2\) statistic (df = \((3-1)(4-1) = 6\)), compare to \(\chi^2_{6, 0.05} = 12.592\). In R: chisq.test(satisfaction_table).
Chapter 5: One-Way Analysis of Variance (ANOVA)
5.1 Motivation: Comparing More Than Two Means
When comparing means across \(k \geq 3\) groups, running all pairwise t-tests is problematic: with \(k = 5\) groups there are \(\binom{5}{2} = 10\) pairwise tests, and at \(\alpha = 0.05\) each, the probability of at least one false rejection (familywise error rate) rises to \(1 - 0.95^{10} \approx 0.40\). ANOVA tests all \(k\) group means simultaneously at a single \(\alpha\) level.
5.2 Partitioning Total Variation
ANOVA decomposes the total variation in the response into two sources:
\[ SST = SSB + SSW \]where:
- \(SST = \sum_{i=1}^{k}\sum_{j=1}^{n_i}(y_{ij} - \bar{y})^2\) — Total Sum of Squares: total variation of all observations around the grand mean \(\bar{y}\).
- \(SSB = \sum_{i=1}^{k} n_i(\bar{y}_i - \bar{y})^2\) — Between-Groups Sum of Squares: variation of group means around the grand mean (explained by group membership).
- \(SSW = \sum_{i=1}^{k}\sum_{j=1}^{n_i}(y_{ij} - \bar{y}_i)^2\) — Within-Groups Sum of Squares: variation of individual observations around their group mean (unexplained noise).
Mean squares are obtained by dividing by degrees of freedom:
\[ MSB = \frac{SSB}{k - 1}, \qquad MSW = \frac{SSW}{N - k} \]where \(N = \sum n_i\) is the total sample size.
5.3 The F-Statistic
Under \(H_0\), both \(MSB\) and \(MSW\) estimate \(\sigma^2\), so \(F \approx 1\). When \(H_0\) is false, \(MSB\) exceeds \(MSW\) systematically (between-group differences inflate \(MSB\) but not \(MSW\)). The test is always upper-tailed: reject \(H_0\) if \(F > F_{\alpha, k-1, N-k}\).
5.4 The ANOVA Table
| Source | SS | df | MS | F | p-value |
|---|---|---|---|---|---|
| Between Groups | SSB | \(k - 1\) | \(MSB = SSB/(k-1)\) | \(MSB/MSW\) | \(P(F_{k-1, N-k} > F)\) |
| Within Groups (Error) | SSW | \(N - k\) | \(MSW = SSW/(N-k)\) | ||
| Total | SST | \(N - 1\) |
5.5 Assumptions of One-Way ANOVA
- Independence: Observations within and across groups are independent.
- Normality: Within each group, the response is approximately normally distributed. ANOVA is robust to moderate departures when group sizes are equal and reasonably large.
- Equal variances (homoscedasticity): Population variances are equal across all groups: \(\sigma_1^2 = \sigma_2^2 = \cdots = \sigma_k^2\). Assess with Levene’s test (
leveneTest()in R) or Bartlett’s test.
| Region | \(n_i\) | \(\bar{y}_i\) | \(s_i\) |
|---|---|---|---|
| North | 8 | 142 | 18 |
| South | 10 | 128 | 22 |
| East | 9 | 155 | 15 |
| West | 7 | 133 | 20 |
\(N = 34\), grand mean \(\bar{y} = (8 \times 142 + 10 \times 128 + 9 \times 155 + 7 \times 133) / 34\).
\(\bar{y} = (1136 + 1280 + 1395 + 931)/34 = 4742/34 = 139.47\).
\[ SSB = 8(142-139.47)^2 + 10(128-139.47)^2 + 9(155-139.47)^2 + 7(133-139.47)^2 \]\[ = 8(6.40) + 10(131.55) + 9(241.06) + 7(41.89) \]\[ = 51.2 + 1315.5 + 2169.5 + 293.2 = 3829.4 \]\[ MSB = 3829.4 / 3 = 1276.5 \]\(SSW\) is computed from within-group variances: \(SSW \approx \sum(n_i - 1)s_i^2 = 7(324) + 9(484) + 8(225) + 6(400) = 2268 + 4356 + 1800 + 2400 = 10824\).
\(MSW = 10824 / 30 = 360.8\).
\[ F = 1276.5 / 360.8 = 3.538 \]Critical value \(F_{3, 30, 0.05} = 2.922\). Since \(3.538 > 2.922\), reject \(H_0\). At least one region differs in mean monthly sales. In R: summary(aov(sales ~ region, data = store_data)).
5.6 Post-Hoc Tests
A significant ANOVA F-test tells us only that at least one pair of means differs — it does not say which pairs. Post-hoc tests make pairwise comparisons while controlling the familywise error rate.
Tukey’s Honestly Significant Difference (HSD)
where \(q_{\alpha, k, N-k}\) is the Studentized range critical value and \(n\) is the common group size (for balanced designs). For unbalanced designs, use the Tukey-Kramer adjustment.
Bonferroni Correction
In R:
# One-way ANOVA
model_anova <- aov(sales ~ region, data = store_data)
summary(model_anova)
# Post-hoc: Tukey's HSD
TukeyHSD(model_anova)
# Post-hoc: Bonferroni-adjusted pairwise t-tests
pairwise.t.test(store_data$sales, store_data$region, p.adjust.method = "bonferroni")
Chapter 6: Two-Way ANOVA
6.1 Extending ANOVA to Two Factors
Two-way ANOVA examines the effects of two categorical factors simultaneously and tests whether the factors interact. It is more efficient than running two separate one-way ANOVAs because it partitions variation more precisely and can detect interaction effects.
Notation: Factor A has \(a\) levels, Factor B has \(b\) levels, with \(n\) replications per cell. The model is:
\[ y_{ijk} = \mu + \alpha_i + \beta_j + (\alpha\beta)_{ij} + \varepsilon_{ijk} \]where \(\alpha_i\) is the main effect of Factor A level \(i\), \(\beta_j\) is the main effect of Factor B level \(j\), \((\alpha\beta)_{ij}\) is the interaction effect, and \(\varepsilon_{ijk} \sim N(0, \sigma^2)\).
6.2 Sum of Squares Decomposition
\[ SST = SSA + SSB + SSAB + SSE \]| Source | df | MS | F |
|---|---|---|---|
| Factor A | \(a - 1\) | \(MSA\) | \(MSA/MSE\) |
| Factor B | \(b - 1\) | \(MSB\) | \(MSB/MSE\) |
| Interaction AB | \((a-1)(b-1)\) | \(MSAB\) | \(MSAB/MSE\) |
| Error | \(ab(n-1)\) | \(MSE\) | |
| Total | \(abn - 1\) |
6.3 Interpreting Interaction
An interaction plot (plotting group means with lines connecting levels of one factor, across levels of the other) reveals interaction visually: parallel lines indicate no interaction; converging, crossing, or diverging lines indicate interaction.
The significant interaction means: the advantage of Digital over Traditional advertising differs by market (e.g., large advantage in Urban markets, negligible in Rural markets). Reporting main effects alone would be misleading.
Chapter 7: Simple Linear Regression
7.1 The SLR Model
Simple Linear Regression (SLR) quantifies the linear relationship between a single predictor \(X\) and a continuous response \(Y\). In business settings, it answers questions like: How does advertising spend predict sales? How do machine-hours predict overhead costs? How does interest rate changes affect bond prices?
\(\beta_0\) is the true intercept, \(\beta_1\) is the true slope, and \(\varepsilon_i\) is the irreducible random error for observation \(i\).
Model assumptions (LINE):
- Linearity: The true relationship between \(X\) and \(E(Y)\) is linear.
- Independence: Errors are mutually independent.
- Normality: Errors are normally distributed.
- Equal variance (homoscedasticity): \(\text{Var}(\varepsilon_i) = \sigma^2\) constant across all \(X\).
7.2 Ordinary Least Squares (OLS) Estimation
OLS finds \(\hat{\beta}_0\) and \(\hat{\beta}_1\) that minimize the Residual Sum of Squares:
\[ SSE = \sum_{i=1}^{n}(y_i - \hat{y}_i)^2 = \sum_{i=1}^{n}(y_i - \hat{\beta}_0 - \hat{\beta}_1 x_i)^2 \]Taking partial derivatives and setting them to zero yields the normal equations:
\[ \frac{\partial SSE}{\partial \hat{\beta}_0} = -2\sum(y_i - \hat{\beta}_0 - \hat{\beta}_1 x_i) = 0 \]\[ \frac{\partial SSE}{\partial \hat{\beta}_1} = -2\sum x_i(y_i - \hat{\beta}_0 - \hat{\beta}_1 x_i) = 0 \]Solving:
\[ \hat{\beta}_1 = \frac{\sum_{i=1}^{n}(x_i - \bar{x})(y_i - \bar{y})}{\sum_{i=1}^{n}(x_i - \bar{x})^2} = \frac{S_{xy}}{S_{xx}}, \qquad \hat{\beta}_0 = \bar{y} - \hat{\beta}_1 \bar{x} \]The fitted values are \(\hat{y}_i = \hat{\beta}_0 + \hat{\beta}_1 x_i\) and the residuals are \(e_i = y_i - \hat{y}_i\).
7.3 Interpretation of Coefficients
- \(\hat{\beta}_1\) (slope): The estimated average change in \(Y\) for a one-unit increase in \(X\), holding everything else constant. Units: units of \(Y\) per unit of \(X\).
- \(\hat{\beta}_0\) (intercept): The estimated average value of \(Y\) when \(X = 0\). This is only meaningful if \(X = 0\) is within the plausible range of the data; otherwise it is an extrapolation artifact.
| Summary statistic | Value |
|---|---|
| \(\bar{x}\) (avg. ad spend) | 45.2 |
| \(\bar{y}\) (avg. revenue) | 312.8 |
| \(S_{xx} = \sum(x_i - \bar{x})^2\) | 8{,}640 |
| \(S_{xy} = \sum(x_i-\bar{x})(y_i-\bar{y})\) | 37{,}152 |
Fitted model: \(\hat{\text{Revenue}} = 118.4 + 4.30 \times \text{AdSpend}\)
Interpretation: Each additional $1{,}000 in advertising is associated with an estimated $4{,}300 increase in monthly sales revenue, on average. The intercept of $118{,}400 represents estimated revenue when advertising spend is zero — plausible as a baseline (brand recognition, repeat customers).
7.4 Measuring Fit: R-Squared and Related Statistics
Coefficient of Determination
\[ SST = SSR + SSE \]where \(SSR = \sum(\hat{y}_i - \bar{y})^2\) is the regression sum of squares (variation explained by \(X\)) and \(SSE = \sum(y_i - \hat{y}_i)^2\) is the residual sum of squares (unexplained variation).
\[ R^2 = \frac{SSR}{SST} = 1 - \frac{SSE}{SST} \]\(R^2\) ranges from 0 to 1. In SLR, \(R^2 = r^2\) where \(r\) is the Pearson correlation coefficient.
Residual Standard Error
The estimated standard deviation of the errors:
\[ \hat{\sigma} = s_e = \sqrt{\frac{SSE}{n - 2}} = \sqrt{MSE} \]The denominator is \(n - 2\) because two parameters (\(\beta_0\), \(\beta_1\)) were estimated. \(s_e\) is reported as “Residual Standard Error” in R output and measures the typical size of prediction errors in the units of \(Y\).
7.5 Inference on the Slope
Standard Error of \(\hat{\beta}_1\)
\[ SE(\hat{\beta}_1) = \frac{s_e}{\sqrt{S_{xx}}} = \sqrt{\frac{MSE}{S_{xx}}} \]This decreases as (a) \(n\) increases, (b) \(S_{xx}\) increases (wider spread in \(X\) values), or (c) \(s_e\) decreases (better fit).
t-Test for the Slope
\[ H_0: \beta_1 = 0 \quad \text{(no linear relationship)} \quad \text{vs.} \quad H_a: \beta_1 \neq 0 \]\[ t = \frac{\hat{\beta}_1}{SE(\hat{\beta}_1)}, \quad df = n - 2 \]A significant result means there is evidence of a linear relationship between \(X\) and \(Y\) in the population. Note: the F-test in the ANOVA table of regression output is equivalent to this t-test in SLR (i.e., \(F = t^2\)).
Confidence Interval for \(\beta_1\)
\[ \hat{\beta}_1 \pm t_{\alpha/2,\, n-2} \cdot SE(\hat{\beta}_1) \]95% CI for \(\beta_1\): \(4.30 \pm t_{0.025, 22} \times 0.306 = 4.30 \pm 2.074 \times 0.306 = 4.30 \pm 0.63 = (3.67,\ 4.93)\).
We are 95% confident that each additional $1{,}000 of advertising is associated with between $3{,}670 and $4{,}930 in additional revenue, on average.
7.6 Confidence Intervals for Mean Response and Prediction Intervals
At a given predictor value \(x^*\), two types of interval estimate are available:
This interval narrows as \(x^*\) approaches \(\bar{x}\) (the center of the data) and widens toward the extremes.
The extra “1” under the radical accounts for the individual variability of the new observation around the mean response. Prediction intervals are always wider than the corresponding confidence intervals.
7.7 Residual Analysis
Residual analysis is the primary tool for checking whether the LINE assumptions are satisfied.
Key residual plots:
- Residuals vs. Fitted values: Should show a horizontal band with no pattern. A funnel shape indicates heteroscedasticity; a curve indicates non-linearity.
- Normal QQ plot of residuals: Points should fall approximately on a straight diagonal line. Systematic deviations indicate non-normality.
- Residuals vs. predictor \(X\): Equivalent to (1) in SLR; helps identify non-linearity.
- Scale-Location plot (square root of |standardized residuals| vs. fitted): Assesses homoscedasticity — should be a horizontal band.
model <- lm(revenue ~ ad_spend, data = marketing_data)
par(mfrow = c(2, 2))
plot(model) # Four residual diagnostic plots
Chapter 8: Multiple Linear Regression
8.1 Extending to Multiple Predictors
Multiple Linear Regression (MLR) models the relationship between a response \(Y\) and \(p\) predictor variables \(X_1, X_2, \ldots, X_p\):
\[ Y_i = \beta_0 + \beta_1 X_{i1} + \beta_2 X_{i2} + \cdots + \beta_p X_{ip} + \varepsilon_i \]In matrix notation: \(\mathbf{Y} = \mathbf{X}\boldsymbol{\beta} + \boldsymbol{\varepsilon}\), where \(\mathbf{X}\) is the \(n \times (p+1)\) design matrix. The OLS estimator is:
\[ \hat{\boldsymbol{\beta}} = (\mathbf{X}^\top \mathbf{X})^{-1} \mathbf{X}^\top \mathbf{Y} \]8.2 Interpretation of Coefficients in MLR
Fitted model: \(\hat{Y} = 42.3 + 3.15 X_1 + 8.90 X_2\)
- \(\hat{\beta}_1 = 3.15\): Holding number of production runs fixed, each additional machine-hour is associated with $3{,}150 in additional overhead, on average.
- \(\hat{\beta}_2 = 8.90\): Holding machine-hours fixed, each additional production run is associated with $8{,}900 in additional overhead, on average.
- \(\hat{\beta}_0 = 42.3\): Estimated overhead when both predictors are zero — an extrapolation anchor with limited practical meaning here.
8.3 Adjusted R-Squared
Adding any predictor to a model will never decrease \(R^2\), even if the predictor has no real relationship with \(Y\) (overfitting). The adjusted \(R^2\) penalizes for model complexity:
\[ \bar{R}^2 = 1 - \frac{SSE/(n-p-1)}{SST/(n-1)} = 1 - (1 - R^2)\frac{n-1}{n-p-1} \]Unlike \(R^2\), adjusted \(R^2\) can decrease when a predictor is added that does not sufficiently improve the fit. It is used for comparing models with different numbers of predictors.
8.4 F-Test for Overall Significance
Tests whether at least one predictor has a non-zero coefficient:
\[ H_0: \beta_1 = \beta_2 = \cdots = \beta_p = 0 \quad \text{vs.} \quad H_a: \text{at least one } \beta_j \neq 0 \]\[ F = \frac{SSR/p}{SSE/(n-p-1)} = \frac{MSR}{MSE}, \quad \text{df} = (p,\ n-p-1) \]The p-value for this F-test appears in the last line of R’s summary(lm(...)) output. A significant overall F-test should precede interpretation of individual coefficients.
8.5 Individual t-Tests for Coefficients
Each coefficient has its own t-test: \(H_0: \beta_j = 0\) vs. \(H_a: \beta_j \neq 0\):
\[ t_j = \frac{\hat{\beta}_j}{SE(\hat{\beta}_j)}, \quad df = n - p - 1 \]These test the partial effect of \(X_j\) given all other predictors in the model. A predictor that is individually significant in SLR may become insignificant in MLR if it is correlated with another predictor (multicollinearity).
8.6 Multicollinearity
Detection: Compute the Variance Inflation Factor (VIF):
\[ VIF_j = \frac{1}{1 - R_j^2} \]where \(R_j^2\) is the \(R^2\) from regressing \(X_j\) on all other predictors. As a rule of thumb, \(VIF > 10\) (or even \(VIF > 5\) in some fields) indicates problematic multicollinearity.
Consequences: Large standard errors → wide confidence intervals → difficulty identifying which variables matter → unstable coefficient estimates (small changes in data lead to large changes in coefficients).
Remedies: Drop one of the correlated predictors; combine correlated predictors into a single index; use ridge regression or principal components regression.
library(car)
vif(model_mlr) # Variance inflation factors
8.7 Heteroscedasticity
Detection: Residuals vs. fitted values plot (funnel/fan shape); Breusch-Pagan test (bptest() in the lmtest package).
Consequences: OLS estimates remain unbiased but are no longer minimum variance (BLUE). Standard errors are incorrect, invalidating t-tests and F-tests.
Remedies: Weighted least squares (WLS); heteroscedasticity-consistent (robust) standard errors (coeftest(model, vcov = vcovHC(model, type = “HC3”)) from the sandwich package); log-transforming the response.
8.8 Model Selection: AIC, BIC, and Stepwise Regression
When many candidate predictors are available, model selection criteria help balance fit against complexity.
where \(\hat{L}\) is the maximized log-likelihood. Lower AIC is better. The penalty term \(2(p+1)\) discourages overfitting.
BIC applies a stronger penalty for model complexity than AIC, especially for large \(n\), and tends to select more parsimonious models.
Stepwise regression: An algorithmic procedure that adds or removes predictors one at a time based on a criterion (AIC, p-value, or \(R^2\)). Variants: forward selection (start empty, add best predictor iteratively), backward elimination (start full, remove least-significant predictor iteratively), stepwise (bidirectional). These are exploratory tools — results should be interpreted cautiously and validated on new data.
# AIC-based stepwise selection
library(MASS)
full_model <- lm(y ~ x1 + x2 + x3 + x4 + x5, data = df)
step_model <- stepAIC(full_model, direction = "both")
summary(step_model)
# AIC and BIC for a given model
AIC(model_mlr)
BIC(model_mlr)
Chapter 9: Non-Parametric Tests
9.1 When to Use Non-Parametric Tests
Parametric tests (t-tests, ANOVA, regression) rely on distributional assumptions — primarily normality of the response or residuals. Non-parametric tests make fewer assumptions and are appropriate when:
- The sample is small and normality cannot be verified.
- The data are ordinal (ranked) rather than continuous.
- Outliers strongly distort the mean, making it an inappropriate summary.
- The Shapiro-Wilk test or QQ plot indicates significant non-normality.
Non-parametric tests trade statistical efficiency for robustness: they require larger samples to detect the same effect size.
9.2 Wilcoxon Signed-Rank Test (One-Sample / Paired)
Procedure:
- Compute differences \(d_i = x_i - M_0\) (or \(d_i = x_{1i} - x_{2i}\) for paired data).
- Discard zero differences; rank the absolute values of remaining differences.
- Compute \(W^+\) = sum of ranks with positive differences, \(W^-\) = sum of ranks with negative differences.
- Use \(W = \min(W^+, W^-)\) as the test statistic. Compare to the Wilcoxon critical value table, or use R.
For large samples (\(n \geq 10\)), \(W^+\) is approximately normal with:
\[ \mu_{W^+} = \frac{n(n+1)}{4}, \qquad \sigma_{W^+}^2 = \frac{n(n+1)(2n+1)}{24} \]wilcox.test(x, mu = M_0, alternative = "two.sided") # One-sample
wilcox.test(after, before, paired = TRUE, alternative = "two.sided") # Paired
9.3 Mann-Whitney U Test (Two Independent Samples)
Procedure:
- Pool all observations from both groups and rank from smallest to largest (averaging ranks for ties).
- Compute \(R_1\) = sum of ranks for Group 1.
- \(U_1 = n_1 n_2 + \frac{n_1(n_1+1)}{2} - R_1\); \(U_2 = n_1 n_2 - U_1\).
- Test statistic: \(U = \min(U_1, U_2)\).
For large samples, \(U\) is approximately normal with \(\mu_U = n_1 n_2 / 2\) and \(\sigma_U^2 = n_1 n_2(n_1 + n_2 + 1)/12\).
wilcox.test(group1, group2, alternative = "two.sided") # Mann-Whitney
Pooling and ranking: 3(1), 4(2), 5(3), 6(4), 7(5.5), 7(5.5), 8(7), 9(8), 10(9), 12(10).
\(R_A = 1 + 2 + 3 + 4 + 5.5 = 15.5\); \(U_A = 5 \times 5 + 15 - 15.5 = 24.5\); \(U_B = 25 - 24.5 = 0.5\).
\(U = 0.5\). With \(n_1 = n_2 = 5\), the critical value at \(\alpha = 0.05\) two-tailed is \(U^* = 2\). Since \(0.5 < 2\), reject \(H_0\). Branch B has significantly longer wait times.
9.4 Kruskal-Wallis Test (Non-Parametric ANOVA)
Under \(H_0\), \(H \sim \chi^2_{k-1}\) approximately (for \(n_i \geq 5\)).
kruskal.test(response ~ group, data = df)
Chapter 10: Tests for Proportions and the Chi-Square Distribution
10.1 One-Sample Test for a Proportion
Valid when \(np_0 \geq 10\) and \(n(1 - p_0) \geq 10\).
p-value (upper-tailed) = \(P(Z > 1.565) \approx 0.059\). At \(\alpha = 0.05\), fail to reject \(H_0\) — marginal evidence. At \(\alpha = 0.10\), would reject. The auditor might request a larger sample or investigate the 22 errors identified.
10.2 The Chi-Square Distribution
The chi-square distribution arises naturally in: (a) tests for variance (\(\chi^2 = (n-1)s^2/\sigma_0^2\)), (b) goodness-of-fit and independence tests, and (c) as the distribution of the log-likelihood ratio in maximum likelihood estimation.
One-Sample Test for Variance
\[ H_0: \sigma^2 = \sigma_0^2, \quad \chi^2 = \frac{(n-1)s^2}{\sigma_0^2}, \quad df = n-1 \]This test is sensitive to departures from normality; use with caution.
Chapter 11: Introduction to Time Series Analysis
11.1 Components of a Time Series
A time series is a sequence of observations recorded at successive equally-spaced points in time. In business and finance: monthly sales, quarterly earnings, daily stock prices, weekly unemployment claims. Time series data violates the independence assumption of standard regression — observations close in time tend to be correlated.
- Trend (\(T_t\)): The long-term upward or downward direction of the series.
- Seasonal component (\(S_t\)): Regular, predictable fluctuations that repeat over a fixed period (e.g., higher retail sales every December).
- Cyclical component (\(C_t\)): Longer-term undulations associated with business cycles (expansions and contractions), with irregular duration.
- Irregular (random) component (\(I_t\)): Unpredictable, random noise remaining after the other components are accounted for.
Additive model: \(Y_t = T_t + S_t + C_t + I_t\) — appropriate when seasonal variation is roughly constant in absolute magnitude.
Multiplicative model: \(Y_t = T_t \times S_t \times C_t \times I_t\) — appropriate when seasonal variation grows proportionally with the trend. Can be linearized by taking logarithms: \(\ln Y_t = \ln T_t + \ln S_t + \ln C_t + \ln I_t\).
11.2 Moving Averages
A moving average smooths a time series by averaging consecutive windows of observations, removing short-term fluctuations and revealing the trend.
For \(m = 5\): \(\hat{T}_t = (Y_{t-2} + Y_{t-1} + Y_t + Y_{t+1} + Y_{t+2})/5\).
Choosing \(m\) equal to the length of the seasonal period (e.g., \(m = 12\) for monthly data, \(m = 4\) for quarterly data) eliminates the seasonal component and exposes the underlying trend-cycle. A centered moving average (\(2 \times 12\) MA) is used for even-period data to maintain alignment with the original time index.
4-period moving average:
- Centered at Q2'23–Q3'23: \((42+58+67+71)/4 = 59.5\)
- Centered at Q3'23–Q4'23: \((58+67+71+48)/4 = 61.0\)
- Centered at Q4'23–Q1'24: \((67+71+48+63)/4 = 62.25\)
- Centered at Q1'24–Q2'24: \((71+48+63+72)/4 = 63.5\)
- Centered at Q2'24–Q3'24: \((48+63+72+76)/4 = 64.75\)
The smoothed values reveal a steady upward trend (~1.5M per quarter) that the raw seasonal swings obscure.
11.3 Exponential Smoothing
Exponential smoothing is a powerful forecasting method that assigns exponentially decreasing weights to past observations — recent data receives more weight than older data.
where \(\alpha \in (0, 1)\) is the smoothing parameter. Large \(\alpha\) (near 1) weights recent observations heavily (more responsive, less smooth); small \(\alpha\) (near 0) weights historical observations more (smoother, slower to react).
Expanding the recursion:
\[ \hat{Y}_{t+1} = \alpha Y_t + \alpha(1-\alpha)Y_{t-1} + \alpha(1-\alpha)^2 Y_{t-2} + \cdots \]The weights \(\alpha(1-\alpha)^j\) decline geometrically, confirming the exponential weighting. SES produces optimal forecasts for series with no trend and no seasonality (i.e., a random walk with noise).
Choosing \(\alpha\): Minimize the sum of squared forecast errors over the historical data:
\[ SSE(\alpha) = \sum_{t=2}^{T}(Y_t - \hat{Y}_t)^2 \]In R: HoltWinters(ts, beta = FALSE, gamma = FALSE) fits SES and optimizes \(\alpha\) automatically.
11.4 Holt’s Two-Parameter Exponential Smoothing (Trend Corrected)
SES does not account for trend. Holt’s method adds a trend equation:
\[ \text{Level: } L_t = \alpha Y_t + (1-\alpha)(L_{t-1} + T_{t-1}) \]\[ \text{Trend: } T_t = \beta(L_t - L_{t-1}) + (1-\beta)T_{t-1} \]\[ \text{Forecast: } \hat{Y}_{t+h} = L_t + h T_t \]where \(\alpha\) smooths the level and \(\beta\) smooths the trend. Both parameters are optimized by minimizing SSE.
11.5 Holt-Winters Seasonal Exponential Smoothing
Holt-Winters extends Holt’s method to handle seasonality of period \(s\):
Additive version:
\[ L_t = \alpha(Y_t - S_{t-s}) + (1-\alpha)(L_{t-1} + T_{t-1}) \]\[ T_t = \beta(L_t - L_{t-1}) + (1-\beta)T_{t-1} \]\[ S_t = \gamma(Y_t - L_t) + (1-\gamma)S_{t-s} \]\[ \hat{Y}_{t+h} = L_t + hT_t + S_{t+h-s} \]Three parameters to optimize: \(\alpha\) (level), \(\beta\) (trend), \(\gamma\) (seasonality).
In R:
ts_data <- ts(sales_vector, start = c(2020, 1), frequency = 12)
hw_model <- HoltWinters(ts_data)
forecast_hw <- forecast(hw_model, h = 12)
plot(forecast_hw)
11.6 Trend Regression
When a time series has a clear linear trend, a regression of \(Y_t\) on time \(t\) is a natural starting point:
\[ Y_t = \beta_0 + \beta_1 t + \varepsilon_t \]Forecasting: \(\hat{Y}_{t} = \hat{\beta}_0 + \hat{\beta}_1 t\). Extend \(t\) beyond the sample to forecast.
For exponential (multiplicative) growth, linearize with a log transform: regress \(\ln Y_t\) on \(t\), then exponentiate the forecast.
Caution: Regression residuals in time series are often autocorrelated — the Durbin-Watson statistic tests for first-order autocorrelation (\(DW \approx 2\) indicates no autocorrelation; values \(<1.5\) or \(>2.5\) signal problems). Autocorrelated errors invalidate standard t-tests and confidence intervals.
Chapter 12: Putting It All Together — Statistical Workflow in R
12.1 A Complete Analysis Pipeline
A rigorous quantitative analysis in business follows a standard pipeline. Each stage connects to specific tools covered in AFM 113.
Stage 1 — Understand and clean the data:
library(tidyverse)
glimpse(df)
summary(df)
df %>% filter(is.na(revenue)) # Check missing values
Stage 2 — Explore distributions:
ggplot(df, aes(x = revenue)) + geom_histogram(bins = 20) + theme_minimal()
ggplot(df, aes(sample = revenue)) + stat_qq() + stat_qq_line()
shapiro.test(df$revenue[1:50]) # Shapiro-Wilk (for n <= 5000)
Stage 3 — Choose and apply the right test:
| Question | Parametric Test | Non-Parametric Alternative |
|---|---|---|
| One group vs. target mean | One-sample t-test | Wilcoxon signed-rank |
| Two independent group means | Welch two-sample t-test | Mann-Whitney U |
| Two related group means | Paired t-test | Wilcoxon signed-rank (paired) |
| Three or more group means | One-way ANOVA + Tukey | Kruskal-Wallis |
| Two categorical variables | Chi-square independence test | Fisher’s exact test |
| One proportion vs. target | One-sample z-test for proportions | Exact binomial test |
| Two proportions | Two-sample z-test for proportions | Fisher’s exact test |
Stage 4 — Build a regression model:
model <- lm(revenue ~ ad_spend + stores + region, data = df)
summary(model) # Coefficients, SE, t-stats, p-values, R-squared, F-test
confint(model) # 95% CI for all coefficients
vif(model) # Multicollinearity check
plot(model) # Residual diagnostics
Stage 5 — Forecast (time series):
ts_sales <- ts(df$sales, start = c(2020, 1), frequency = 12)
hw <- HoltWinters(ts_sales)
forecast(hw, h = 6) # 6-month-ahead forecast
12.2 Worked Comprehensive Example
Step 1 — Descriptive statistics reveal right-skewed revenue distribution (mean $312K, median $285K, SD $98K). Log-transform considered.
Step 2 — ANOVA tests whether mean revenue differs by region. F-statistic = 11.43 (\(p < 0.001\)). Tukey post-hoc shows: East significantly higher than West and South (adjusted p \(< 0.05\)); North vs. South not significant.
Step 3 — Correlation shows AdSpend (\(r = 0.71\)), StoreSize (\(r = 0.65\)), OnlineComp (\(r = -0.38\)) all correlated with Revenue.
Step 4 — MLR:
\[ \widehat{\text{Revenue}} = 88.3 + 3.42\,\text{AdSpend} + 5.18\,\text{StoreSize} - 1.22\,\text{OnlineComp} + \text{(region indicators)} \]\(R^2 = 0.734\), adjusted \(R^2 = 0.726\), overall \(F = 87.3\) (\(p < 0.001\)). All predictors significant (\(p < 0.05\)). VIFs all below 3 (no serious multicollinearity). Residual plots show approximate homoscedasticity and normality.
Interpretation: Holding other factors constant, each $1{,}000 additional advertising is associated with $3{,}420 additional monthly revenue. An additional 1{,}000 sq ft of store size adds $5{,}180, on average. Each 1-point increase in the online competition index reduces revenue by $1{,}220 — highlighting the strategic importance of managing competitive pressure from e-commerce.
12.3 Common Errors and How to Avoid Them
| Error | Description | Correct Approach |
|---|---|---|
| Multiple testing | Running 10 tests at \(\alpha = 0.05\) without adjustment; familywise error \(\approx 40\%\) | Apply Bonferroni or FDR correction; use ANOVA instead of pairwise t-tests |
| p-hacking | Trying many models until \(p < 0.05\) appears | Pre-register hypotheses; report all tests performed |
| Extrapolation | Predicting far outside the range of observed \(X\) | Report the observed range; qualify predictions as extrapolations |
| Confusing CI and PI | Using a confidence interval for the mean to bound individual predictions | Use prediction intervals for individual outcomes |
| Ignoring residual diagnostics | Trusting regression results without checking assumptions | Always plot residuals vs. fitted, and QQ plot of residuals |
| Omitted variable bias | Leaving a confounding variable out of MLR | Use domain knowledge to include all relevant predictors; acknowledge limitations |
| Treating \(R^2\) as the only metric | High \(R^2\) can coexist with poor predictions, heteroscedasticity, multicollinearity | Report residual standard error, VIF, and diagnostic plots alongside \(R^2\) |
12.4 Glossary of Key Terms
| Term | Definition |
|---|---|
| p-value | Probability of observed (or more extreme) data under \(H_0\) |
| Significance level \(\alpha\) | Pre-set maximum Type I error rate |
| Statistical power \(1 - \beta\) | Probability of correctly rejecting a false \(H_0\) |
| Confidence interval | Range of plausible values for a parameter at a given confidence level |
| Prediction interval | Range for a single future observation; always wider than CI |
| F-statistic | Ratio of explained to unexplained variance (ANOVA, regression) |
| Residual | Difference between observed and fitted value: \(e_i = y_i - \hat{y}_i\) |
| SST / SSR / SSE | Total / Regression / Error sums of squares |
| Multicollinearity | High correlation among predictors in MLR |
| Heteroscedasticity | Non-constant error variance across predictor values |
| VIF | Variance inflation factor; measures severity of multicollinearity |
| AIC / BIC | Model selection criteria penalizing complexity |
| Moving average | Smoothed time series estimate using local window of observations |
| Exponential smoothing | Forecast weighting past observations geometrically |
| Holt-Winters | Triple exponential smoothing for trend + seasonality |
| MAPE | Mean absolute percentage error — common forecast accuracy metric |
Appendix A: Critical Value Reference Tables
Standard Normal Critical Values
| Confidence Level | \(\alpha\) | \(z_{\alpha/2}\) (two-tailed) |
|---|---|---|
| 90% | 0.10 | 1.645 |
| 95% | 0.05 | 1.960 |
| 99% | 0.01 | 2.576 |
| 99.9% | 0.001 | 3.291 |
Student’s t Critical Values (two-tailed, \(\alpha = 0.05\))
| df | \(t_{0.025, df}\) | df | \(t_{0.025, df}\) |
|---|---|---|---|
| 5 | 2.571 | 20 | 2.086 |
| 10 | 2.228 | 30 | 2.042 |
| 15 | 2.131 | 60 | 2.000 |
| 18 | 2.101 | 120 | 1.980 |
| 19 | 2.093 | \(\infty\) | 1.960 |
Chi-Square Critical Values (\(\alpha = 0.05\), upper tail)
| df | \(\chi^2_{df, 0.05}\) | df | \(\chi^2_{df, 0.05}\) |
|---|---|---|---|
| 1 | 3.841 | 8 | 15.507 |
| 2 | 5.991 | 9 | 16.919 |
| 3 | 7.815 | 10 | 18.307 |
| 4 | 9.488 | 15 | 24.996 |
| 5 | 11.070 | 20 | 31.410 |
| 6 | 12.592 | 25 | 37.652 |
| 7 | 14.067 | 30 | 43.773 |
Appendix B: R Quick Reference for AFM 113
# ---- Descriptive Statistics ----
mean(x); median(x); sd(x); var(x); IQR(x)
summary(x)
table(group)
# ---- Normality Assessment ----
shapiro.test(x)
qqnorm(x); qqline(x)
# ---- One-Sample Tests ----
t.test(x, mu = mu0, alternative = "two.sided") # t-test
prop.test(x = k, n = n, p = p0) # Proportion z-test
wilcox.test(x, mu = M0) # Wilcoxon signed-rank
# ---- Two-Sample Tests ----
t.test(x, y, alternative = "two.sided") # Welch t-test
t.test(x, y, paired = TRUE) # Paired t-test
prop.test(c(x1, x2), c(n1, n2)) # Two proportions
var.test(x, y) # F-test for variances
wilcox.test(x, y) # Mann-Whitney U
# ---- Chi-Square Tests ----
chisq.test(table_data) # Independence or GoF
chisq.test(x, p = c(0.25, 0.25, 0.25, 0.25)) # GoF with specified probs
# ---- ANOVA ----
model_aov <- aov(y ~ group, data = df)
summary(model_aov)
TukeyHSD(model_aov)
pairwise.t.test(y, group, p.adjust.method = "bonferroni")
kruskal.test(y ~ group, data = df) # Non-parametric ANOVA
# ---- Two-Way ANOVA ----
model_2way <- aov(y ~ factorA * factorB, data = df)
summary(model_2way)
interaction.plot(df$factorA, df$factorB, df$y)
# ---- Simple Linear Regression ----
model_slr <- lm(y ~ x, data = df)
summary(model_slr)
confint(model_slr)
predict(model_slr, newdata = data.frame(x = x_star), interval = "confidence")
predict(model_slr, newdata = data.frame(x = x_star), interval = "prediction")
plot(model_slr) # Four diagnostic plots
# ---- Multiple Linear Regression ----
model_mlr <- lm(y ~ x1 + x2 + x3, data = df)
summary(model_mlr)
library(car); vif(model_mlr)
library(MASS); stepAIC(model_mlr, direction = "both")
AIC(model_mlr); BIC(model_mlr)
# ---- Heteroscedasticity-Robust SEs ----
library(lmtest); library(sandwich)
coeftest(model_mlr, vcov = vcovHC(model_mlr, type = "HC3"))
# ---- Time Series ----
ts_data <- ts(y, start = c(2020, 1), frequency = 12)
plot(ts_data)
hw <- HoltWinters(ts_data)
library(forecast)
forecast_hw <- forecast(hw, h = 12)
plot(forecast_hw)
autoplot(forecast_hw)
Appendix C: Formula Sheet Summary
One-sample t-statistic:
\[ t = \frac{\bar{x} - \mu_0}{s/\sqrt{n}}, \quad df = n-1 \]Two-sample Welch t-statistic:
\[ t = \frac{(\bar{x}_1 - \bar{x}_2)}{\sqrt{s_1^2/n_1 + s_2^2/n_2}} \]Paired t-statistic:
\[ t = \frac{\bar{d}}{s_d/\sqrt{n}}, \quad df = n-1 \]Proportion z-statistic (one-sample):
\[ z = \frac{\hat{p} - p_0}{\sqrt{p_0(1-p_0)/n}} \]Chi-square statistic:
\[ \chi^2 = \sum \frac{(O - E)^2}{E} \]OLS slope and intercept:
\[ \hat{\beta}_1 = \frac{S_{xy}}{S_{xx}}, \quad \hat{\beta}_0 = \bar{y} - \hat{\beta}_1 \bar{x} \]R-squared:
\[ R^2 = 1 - \frac{SSE}{SST} = \frac{SSR}{SST} \]Adjusted R-squared:
\[ \bar{R}^2 = 1 - \frac{SSE/(n-p-1)}{SST/(n-1)} \]ANOVA F-statistic:
\[ F = \frac{MSB}{MSW} = \frac{SSB/(k-1)}{SSW/(N-k)} \]Prediction interval for new observation:
\[ \hat{y}^* \pm t_{\alpha/2,\, n-2} \cdot s_e \sqrt{1 + \frac{1}{n} + \frac{(x^* - \bar{x})^2}{S_{xx}}} \]Simple exponential smoothing:
\[ \hat{Y}_{t+1} = \alpha Y_t + (1 - \alpha)\hat{Y}_t \]Sample size for given power (two-tailed):
\[ n = \left(\frac{(z_{\alpha/2} + z_\beta)\sigma}{\delta}\right)^2 \]