AFM 113: Analytic Methods for Business 2
Estimated study time: 31 minutes
Table of contents
Sources and References
Primary textbook — Instructor’s teaching notes (distributed on LEARN). Balka, J. Introductory Statistics Explained (open access supplemental text). Supplementary — Devore, J.L. (2016). Probability and Statistics for Engineering and the Sciences (9th ed.). Cengage. Newbold, P., Carlson, W.L., & Thorne, B. (2013). Statistics for Business and Economics (8th ed.). Pearson. Online resources — OpenIntro Statistics (openintro.org); MIT OCW 18.650: Statistics for Applications; Khan Academy Statistics and Probability.
Chapter 1: Simple Linear Regression
Motivation and Setup
Simple linear regression (SLR) models the relationship between a single predictor variable and a continuous outcome variable. In business and accounting contexts, regression answers questions such as:
- How does advertising expenditure relate to sales revenue?
- How does a company’s stock return relate to the market return (the foundation of the CAPM beta)?
- How does the number of machine-hours predict manufacturing overhead cost (a key tool in cost estimation)?
where \(\beta_0\) is the population intercept, \(\beta_1\) is the population slope, and \(\varepsilon_i\) is the random error term for observation \(i\).
The error term \(\varepsilon_i\) captures all variation in \(Y\) that is not explained by the linear relationship with \(X\). The standard regression assumptions require that the errors be independent, normally distributed with mean zero and constant variance \(\sigma^2\) (homoscedasticity).
Ordinary Least Squares Estimation
The population parameters \(\beta_0\) and \(\beta_1\) are unknown and must be estimated from sample data. Ordinary Least Squares (OLS) finds the estimates \(\hat{\beta}_0\) and \(\hat{\beta}_1\) that minimize the sum of squared residuals:
\[ \text{SSR} = \sum_{i=1}^{n} (y_i - \hat{y}_i)^2 = \sum_{i=1}^{n} (y_i - \hat{\beta}_0 - \hat{\beta}_1 x_i)^2 \]The OLS formulas are:
\[ \hat{\beta}_1 = \frac{\sum_{i=1}^{n}(x_i - \bar{x})(y_i - \bar{y})}{\sum_{i=1}^{n}(x_i - \bar{x})^2} = \frac{S_{xy}}{S_{xx}} \]\[ \hat{\beta}_0 = \bar{y} - \hat{\beta}_1 \bar{x} \]The resulting fitted line is \(\hat{y} = \hat{\beta}_0 + \hat{\beta}_1 x\).
Interpreting the Coefficients
- Intercept \(\hat{\beta}_0\): The predicted value of \(Y\) when \(X = 0\). This may or may not have practical meaning depending on whether \(X = 0\) is within or near the observed range of the data.
- Slope \(\hat{\beta}_1\): The estimated change in \(Y\) for a one-unit increase in \(X\). If \(\hat{\beta}_1 = 2.5\) in a model predicting overhead cost from machine-hours, each additional machine-hour is associated with $2.50 of overhead on average.
Assessing Model Fit
Coefficient of Determination: R-squared
where \(SSE = \sum(y_i - \hat{y}_i)^2\) is the residual sum of squares and \(SST = \sum(y_i - \bar{y})^2\) is the total sum of squares.
\(R^2\) ranges from 0 (no linear relationship) to 1 (perfect linear fit). In SLR, \(R^2 = r^2\), the square of the correlation coefficient. Note that a high \(R^2\) does not confirm causality—it only confirms that a linear association exists in the sample data.
The Correlation Coefficient
Values near ±1 indicate a strong linear relationship; values near 0 indicate a weak linear relationship (though a nonlinear relationship may still exist).
Regression in R
# Fit the model
model <- lm(overhead ~ machine_hours, data = factory_data)
# View results
summary(model)
# Plot the fitted line
library(ggplot2)
ggplot(factory_data, aes(x = machine_hours, y = overhead)) +
geom_point() +
geom_smooth(method = "lm", se = TRUE) +
labs(title = "Overhead vs. Machine Hours",
x = "Machine Hours", y = "Overhead Cost ($)")
The summary(model) output provides estimated coefficients, standard errors, t-statistics, p-values, and \(R^2\).
Chapter 2: Probability Distributions
Discrete Probability Distributions
Properties of a valid PMF: (1) \(P(X = x) \geq 0\) for all \(x\), and (2) \(\sum_x P(X = x) = 1\).
Binomial Distribution
The binomial distribution models the number of successes in a fixed number of independent trials, each with the same probability of success.
\[ P(X = k) = \binom{n}{k} p^k (1-p)^{n-k}, \quad k = 0, 1, \ldots, n \]where \(n\) is the number of trials and \(p\) is the probability of success on each trial.
In R: dbinom(3, size = 20, prob = 0.10)
Poisson Distribution
The Poisson distribution models the number of events occurring in a fixed interval of time or space, when events occur independently at a constant average rate.
\[ P(X = k) = \frac{e^{-\lambda} \lambda^k}{k!}, \quad k = 0, 1, 2, \ldots \]where \(\lambda\) is the average number of events per interval. Both the mean and variance equal \(\lambda\).
Applications in accounting and finance: number of wire transfers per hour, number of customer complaints per month, number of audit exceptions per 100 transactions.
Chapter 3: The Normal Distribution
Properties of the Normal Distribution
Key properties:
- Symmetric about the mean: the left and right halves are mirror images.
- Mean = Median = Mode.
- The empirical rule: approximately 68% of observations fall within \(\pm 1\sigma\) of the mean; 95% within \(\pm 2\sigma\); 99.7% within \(\pm 3\sigma\).
- The total area under the curve equals 1.
The normal distribution is the cornerstone of classical statistics. Many real-world phenomena (measurement errors, returns on diversified portfolios, certain biological measurements) are approximately normally distributed. More importantly, the Central Limit Theorem (Chapter 5) guarantees that sampling distributions of means are approximately normal even when the underlying population is not.
The Standard Normal Distribution
Any normal random variable can be standardized:
\[ Z = \frac{X - \mu}{\sigma} \]The z-score tells us how many standard deviations an observation is from the mean. Standardization allows the use of a single set of probability tables (or the pnorm() function in R) for any normal distribution.
Computing Normal Probabilities
P(X < a): Standardize to find \(z = (a - \mu)/\sigma\); then \(P(X < a) = P(Z < z) = \Phi(z)\).
P(X > a): \(P(X > a) = 1 - \Phi(z)\).
P(a < X < b): \(\Phi(z_b) - \Phi(z_a)\) where \(z_a = (a - \mu)/\sigma\) and \(z_b = (b - \mu)/\sigma\).
In R:
pnorm(q, mean = mu, sd = sigma) # P(X <= q)
1 - pnorm(q, mean = mu, sd = sigma) # P(X > q)
qnorm(p, mean = mu, sd = sigma) # quantile: value with P(X <= x) = p
Assessing Normality
Before applying methods that assume normality, check whether the assumption is reasonable:
- Histogram: Should be approximately symmetric and bell-shaped.
- QQ plot (Quantile-Quantile plot): Plots sample quantiles against theoretical normal quantiles. If the data are normally distributed, the points fall approximately on a straight line.
- Shapiro-Wilk test (formal test): Tests the null hypothesis that the data come from a normal distribution. In R:
shapiro.test(x). A small p-value provides evidence against normality.
Chapter 4: Sampling Distributions and the Central Limit Theorem
Populations and Samples
Statistical inference is the process of drawing conclusions about a population from a sample. Key vocabulary:
Because statistics vary from sample to sample, they are random variables with their own probability distributions.
The Sampling Distribution of the Sample Mean
If we repeatedly draw samples of size \(n\) from a population with mean \(\mu\) and standard deviation \(\sigma\), and compute \(\bar{x}\) each time, the collection of all possible \(\bar{x}\) values forms the sampling distribution of the sample mean.
Key results:
- \(E(\bar{X}) = \mu\) — the sample mean is an unbiased estimator of the population mean.
- \(\text{Std Dev}(\bar{X}) = \frac{\sigma}{\sqrt{n}}\) — called the standard error of the mean.
- As \(n\) increases, the standard error decreases: larger samples yield more precise estimates.
The Central Limit Theorem (CLT)
The CLT is one of the most profound results in all of statistics. It explains why normal-based inference works even when the underlying population is skewed or non-normal—provided the sample is large enough (typically \(n \geq 30\) as a rule of thumb, though more for highly skewed populations).
Law of Large Numbers
The Law of Large Numbers (LLN) is a related but distinct result: as sample size \(n\) increases without bound, the sample mean \(\bar{x}\) converges to the true population mean \(\mu\). The LLN provides the theoretical guarantee that estimating population parameters from large samples is a sound strategy.
Chapter 5: Confidence Intervals
The Logic of a Confidence Interval
A point estimate (e.g., \(\bar{x} = 42.3\)) is a single number estimated from the sample. Because different samples would produce different estimates, we acknowledge sampling uncertainty by constructing an interval estimate—a range of plausible values for the population parameter.
Confidence Interval for a Population Mean (known \(\sigma\))
When the population standard deviation \(\sigma\) is known:
\[ \bar{x} \pm z_{\alpha/2} \cdot \frac{\sigma}{\sqrt{n}} \]where \(z_{\alpha/2}\) is the critical value from the standard normal distribution. For a 95% CI, \(z_{0.025} = 1.960\).
Confidence Interval for a Population Mean (unknown \(\sigma\))
In practice, \(\sigma\) is almost never known. When it must be estimated from the sample as \(s\), the correct distribution is the Student’s t-distribution with \(n-1\) degrees of freedom:
\[ \bar{x} \pm t_{\alpha/2,\, n-1} \cdot \frac{s}{\sqrt{n}} \]Confidence Interval for a Population Proportion
For a proportion \(p\) estimated by \(\hat{p} = x/n\):
\[ \hat{p} \pm z_{\alpha/2} \sqrt{\frac{\hat{p}(1-\hat{p})}{n}} \]This approximation is valid when \(n\hat{p} \geq 10\) and \(n(1-\hat{p}) \geq 10\) (the success-failure condition).
Interpreting Confidence Intervals
A common misinterpretation is: “There is a 95% probability that \(\mu\) lies in this interval.” Technically, \(\mu\) is fixed (not random); it either lies in the interval or does not. The correct interpretation is: “This method of constructing intervals will capture the true \(\mu\) in 95% of repeated applications.” Nevertheless, in practice, treating a 95% CI as representing plausible values for \(\mu\) is a useful working heuristic.
Margin of Error and Sample Size
The margin of error (half-width of the CI) is \(E = z_{\alpha/2} \cdot \sigma/\sqrt{n}\). Solving for \(n\):
\[ n = \left(\frac{z_{\alpha/2} \cdot \sigma}{E}\right)^2 \]This formula allows analysts to determine the sample size required to achieve a desired precision—an important consideration in audit sampling and survey design.
Chapter 6: Hypothesis Testing
The Framework of Hypothesis Testing
Hypothesis testing is a formal procedure for making decisions about population parameters based on sample evidence. It begins with competing hypotheses:
Steps in Hypothesis Testing
- State \(H_0\) and \(H_a\) clearly.
- Choose the significance level \(\alpha\) (commonly 0.05 or 0.01) — the probability of a Type I error the analyst is willing to tolerate.
- Compute the test statistic from the sample data.
- Determine the p-value (or compare test statistic to critical value).
- Make a decision: Reject \(H_0\) if p-value \(< \alpha\).
- Interpret the result in the context of the business question.
Type I and Type II Errors
| \(H_0\) is actually true | \(H_0\) is actually false | |
|---|---|---|
| Fail to reject \(H_0\) | Correct decision | Type II Error (miss, probability \(\beta\)) |
| Reject \(H_0\) | Type I Error (false alarm, probability \(\alpha\)) | Correct decision (Power = \(1-\beta\)) |
One-Sample Tests for a Mean
Z-test (when \(\sigma\) is known):
\[ z = \frac{\bar{x} - \mu_0}{\sigma / \sqrt{n}} \]t-test (when \(\sigma\) is unknown, which is almost always):
\[ t = \frac{\bar{x} - \mu_0}{s / \sqrt{n}}, \quad \text{df} = n - 1 \]One-Tailed vs. Two-Tailed Tests
- Two-tailed (\(H_a: \mu \neq \mu_0\)): Reject if \(|t| > t_{\alpha/2, n-1}\). Used when interested in deviations in either direction.
- One-tailed, upper (\(H_a: \mu > \mu_0\)): Reject if \(t > t_{\alpha, n-1}\).
- One-tailed, lower (\(H_a: \mu < \mu_0\)): Reject if \(t < -t_{\alpha, n-1}\).
Hypotheses: \(H_0: \mu = 3.5\), \(H_a: \mu > 3.5\).
Test statistic: \(t = (3.9 - 3.5) / (1.2 / \sqrt{36}) = 0.4 / 0.2 = 2.0\), df = 35.
p-value (one-tailed): In R, pt(2.0, df = 35, lower.tail = FALSE) ≈ 0.027.
Conclusion at \(\alpha = 0.05\): p = 0.027 < 0.05, so we reject \(H_0\). There is significant evidence that mean processing time exceeds 3.5 days.
Test for a Population Proportion
\[ z = \frac{\hat{p} - p_0}{\sqrt{p_0(1-p_0)/n}} \]This test applies when \(np_0 \geq 10\) and \(n(1-p_0) \geq 10\).
Chi-Square Test for Goodness of Fit and Independence
The chi-square test for independence tests whether two categorical variables are associated in a contingency table:
\[ \chi^2 = \sum \frac{(O_{ij} - E_{ij})^2}{E_{ij}} \]where \(O_{ij}\) is the observed count and \(E_{ij} = (\text{row total} \times \text{column total}) / n\) is the expected count under independence.
Degrees of freedom = \((r-1)(c-1)\) where \(r\) and \(c\) are the number of rows and columns.
Chapter 7: Two-Sample Inference
Comparing Two Population Means
Many business decisions require comparing two groups: Did the experimental marketing campaign outperform the control? Is the default rate higher for one credit tier than another? Does average transaction value differ between mobile and desktop channels?
Independent Samples t-Test
When two samples are drawn independently from separate populations:
\[ t = \frac{(\bar{x}_1 - \bar{x}_2) - D_0}{\sqrt{\frac{s_1^2}{n_1} + \frac{s_2^2}{n_2}}} \]where \(D_0\) is the hypothesized difference (usually 0 under \(H_0\)).
Degrees of freedom are estimated using the Welch-Satterthwaite approximation:
\[ df \approx \frac{\left(\frac{s_1^2}{n_1} + \frac{s_2^2}{n_2}\right)^2}{\frac{(s_1^2/n_1)^2}{n_1-1} + \frac{(s_2^2/n_2)^2}{n_2-1}} \]This is the default in R’s t.test() function (Welch’s t-test, which does not assume equal variances).
Paired Samples t-Test
When each observation in one group is logically matched to an observation in the other group (before/after measurements, matched subjects), the paired design is more powerful because it eliminates between-subject variability. The analysis reduces to a one-sample t-test on the differences \(d_i = x_{1i} - x_{2i}\):
\[ t = \frac{\bar{d} - 0}{s_d / \sqrt{n}} \]Comparing Two Proportions
\[ z = \frac{(\hat{p}_1 - \hat{p}_2) - 0}{\sqrt{\hat{p}(1-\hat{p})\left(\frac{1}{n_1} + \frac{1}{n_2}\right)}} \]where \(\hat{p} = (x_1 + x_2)/(n_1 + n_2)\) is the pooled proportion estimate.
Chapter 8: Bringing It Together — Statistical Inference in R
Conducting Tests in R
R provides built-in functions for most standard inference procedures. Mastery involves knowing which function to call, how to specify hypotheses, and how to read the output.
# One-sample t-test
t.test(x, mu = 3.5, alternative = "greater")
# Independent samples t-test (Welch, default)
t.test(group1, group2, alternative = "two.sided")
# Paired t-test
t.test(after, before, paired = TRUE, alternative = "two.sided")
# Test for a proportion
prop.test(x = 45, n = 200, p = 0.20, alternative = "greater")
# Chi-square test of independence
chisq.test(contingency_table)
# Linear regression
model <- lm(y ~ x, data = df)
summary(model)
confint(model) # Confidence intervals for coefficients
Connecting Regression and Inference
The slope estimate \(\hat{\beta}_1\) in simple linear regression is itself a random variable (it varies across samples). Inference for the slope uses the t-distribution:
\[ t = \frac{\hat{\beta}_1 - 0}{SE(\hat{\beta}_1)}, \quad df = n - 2 \]A significant t-test for the slope (\(p < \alpha\)) indicates that there is evidence of a linear relationship between \(X\) and \(Y\) in the population.
A \((1-\alpha)\) confidence interval for the slope:
\[ \hat{\beta}_1 \pm t_{\alpha/2,\, n-2} \cdot SE(\hat{\beta}_1) \]Prediction Intervals vs. Confidence Intervals in Regression
There are two types of interval estimates in the regression context:
- Confidence interval for the mean response: Estimates the average value of \(Y\) for all individuals with a given \(X = x^*\). Narrower.
- Prediction interval for an individual response: Estimates where a single new observation with \(X = x^*\) will fall. Wider, because it includes both estimation uncertainty and individual-level variability.
predict(model, newdata = data.frame(x = x_star), interval = "confidence")
predict(model, newdata = data.frame(x = x_star), interval = "prediction")
Term Project: Statistical Analysis for a Non-Technical Audience
The term project synthesizes the entire AFM 113 curriculum. Teams of 4–5 students work on a business analytics case that requires them to:
- Describe the dataset and its key variables using descriptive statistics and visualizations.
- Apply inferential methods: confidence intervals and/or hypothesis tests relevant to a business question.
- Model a relationship between variables using simple linear regression and interpret the results.
- Communicate findings in a report intended for a non-technical business audience — using plain language, meaningful visualizations, and actionable conclusions.
Translating technical statistical results for non-specialists is a critical professional skill. Saying “the p-value of 0.03 is less than \(\alpha = 0.05\), so we reject the null” is technically correct but uninformative to a non-statistician. A professional communicator translates this to: “Our analysis provides strong evidence that customers who received the targeted promotion spent, on average, 18% more than those who did not, and this difference is unlikely to be explained by random chance.”