AFM 113: Analytic Methods for Business 2

Estimated study time: 31 minutes

Table of contents

Sources and References

Primary textbook — Instructor’s teaching notes (distributed on LEARN). Balka, J. Introductory Statistics Explained (open access supplemental text). Supplementary — Devore, J.L. (2016). Probability and Statistics for Engineering and the Sciences (9th ed.). Cengage. Newbold, P., Carlson, W.L., & Thorne, B. (2013). Statistics for Business and Economics (8th ed.). Pearson. Online resources — OpenIntro Statistics (openintro.org); MIT OCW 18.650: Statistics for Applications; Khan Academy Statistics and Probability.


Chapter 1: Simple Linear Regression

Motivation and Setup

Simple linear regression (SLR) models the relationship between a single predictor variable and a continuous outcome variable. In business and accounting contexts, regression answers questions such as:

  • How does advertising expenditure relate to sales revenue?
  • How does a company’s stock return relate to the market return (the foundation of the CAPM beta)?
  • How does the number of machine-hours predict manufacturing overhead cost (a key tool in cost estimation)?
Simple Linear Regression Model: A statistical model that describes the relationship between a response (dependent) variable \(Y\) and a predictor (independent) variable \(X\) using the equation: \[ Y_i = \beta_0 + \beta_1 X_i + \varepsilon_i \]

where \(\beta_0\) is the population intercept, \(\beta_1\) is the population slope, and \(\varepsilon_i\) is the random error term for observation \(i\).

The error term \(\varepsilon_i\) captures all variation in \(Y\) that is not explained by the linear relationship with \(X\). The standard regression assumptions require that the errors be independent, normally distributed with mean zero and constant variance \(\sigma^2\) (homoscedasticity).

Ordinary Least Squares Estimation

The population parameters \(\beta_0\) and \(\beta_1\) are unknown and must be estimated from sample data. Ordinary Least Squares (OLS) finds the estimates \(\hat{\beta}_0\) and \(\hat{\beta}_1\) that minimize the sum of squared residuals:

\[ \text{SSR} = \sum_{i=1}^{n} (y_i - \hat{y}_i)^2 = \sum_{i=1}^{n} (y_i - \hat{\beta}_0 - \hat{\beta}_1 x_i)^2 \]

The OLS formulas are:

\[ \hat{\beta}_1 = \frac{\sum_{i=1}^{n}(x_i - \bar{x})(y_i - \bar{y})}{\sum_{i=1}^{n}(x_i - \bar{x})^2} = \frac{S_{xy}}{S_{xx}} \]\[ \hat{\beta}_0 = \bar{y} - \hat{\beta}_1 \bar{x} \]

The resulting fitted line is \(\hat{y} = \hat{\beta}_0 + \hat{\beta}_1 x\).

Interpreting the Coefficients

  • Intercept \(\hat{\beta}_0\): The predicted value of \(Y\) when \(X = 0\). This may or may not have practical meaning depending on whether \(X = 0\) is within or near the observed range of the data.
  • Slope \(\hat{\beta}_1\): The estimated change in \(Y\) for a one-unit increase in \(X\). If \(\hat{\beta}_1 = 2.5\) in a model predicting overhead cost from machine-hours, each additional machine-hour is associated with $2.50 of overhead on average.

Assessing Model Fit

Coefficient of Determination: R-squared

R-squared (\(R^2\)): The proportion of the total variation in \(Y\) that is explained by the linear relationship with \(X\). \[ R^2 = 1 - \frac{SSE}{SST} \]

where \(SSE = \sum(y_i - \hat{y}_i)^2\) is the residual sum of squares and \(SST = \sum(y_i - \bar{y})^2\) is the total sum of squares.

\(R^2\) ranges from 0 (no linear relationship) to 1 (perfect linear fit). In SLR, \(R^2 = r^2\), the square of the correlation coefficient. Note that a high \(R^2\) does not confirm causality—it only confirms that a linear association exists in the sample data.

The Correlation Coefficient

Pearson correlation coefficient \(r\): Measures the strength and direction of the linear association between two variables. It ranges from \(-1\) to \(+1\). \[ r = \frac{\sum(x_i - \bar{x})(y_i - \bar{y})}{\sqrt{\sum(x_i - \bar{x})^2 \cdot \sum(y_i - \bar{y})^2}} \]

Values near ±1 indicate a strong linear relationship; values near 0 indicate a weak linear relationship (though a nonlinear relationship may still exist).

Regression in R

# Fit the model
model <- lm(overhead ~ machine_hours, data = factory_data)

# View results
summary(model)

# Plot the fitted line
library(ggplot2)
ggplot(factory_data, aes(x = machine_hours, y = overhead)) +
  geom_point() +
  geom_smooth(method = "lm", se = TRUE) +
  labs(title = "Overhead vs. Machine Hours",
       x = "Machine Hours", y = "Overhead Cost ($)")

The summary(model) output provides estimated coefficients, standard errors, t-statistics, p-values, and \(R^2\).


Chapter 2: Probability Distributions

Discrete Probability Distributions

Probability distribution: A function that describes the likelihood of each possible value of a random variable. For a discrete random variable, the probability mass function (PMF) gives \(P(X = x)\) for each value \(x\).

Properties of a valid PMF: (1) \(P(X = x) \geq 0\) for all \(x\), and (2) \(\sum_x P(X = x) = 1\).

Binomial Distribution

The binomial distribution models the number of successes in a fixed number of independent trials, each with the same probability of success.

\[ P(X = k) = \binom{n}{k} p^k (1-p)^{n-k}, \quad k = 0, 1, \ldots, n \]

where \(n\) is the number of trials and \(p\) is the probability of success on each trial.

Binomial Distribution parameters: \(n\) (number of trials) and \(p\) (probability of success). Mean \(\mu = np\); Variance \(\sigma^2 = np(1-p)\).
An auditor selects 20 invoices from a population where 10% of invoices contain an error. What is the probability that exactly 3 invoices have errors? \[ P(X = 3) = \binom{20}{3}(0.10)^3(0.90)^{17} = 1140 \times 0.001 \times 0.1668 \approx 0.190 \]

In R: dbinom(3, size = 20, prob = 0.10)

Poisson Distribution

The Poisson distribution models the number of events occurring in a fixed interval of time or space, when events occur independently at a constant average rate.

\[ P(X = k) = \frac{e^{-\lambda} \lambda^k}{k!}, \quad k = 0, 1, 2, \ldots \]

where \(\lambda\) is the average number of events per interval. Both the mean and variance equal \(\lambda\).

Applications in accounting and finance: number of wire transfers per hour, number of customer complaints per month, number of audit exceptions per 100 transactions.


Chapter 3: The Normal Distribution

Properties of the Normal Distribution

Normal distribution: A symmetric, bell-shaped continuous probability distribution characterized by its mean \(\mu\) and standard deviation \(\sigma\). Notation: \(X \sim N(\mu, \sigma^2)\).

Key properties:

  • Symmetric about the mean: the left and right halves are mirror images.
  • Mean = Median = Mode.
  • The empirical rule: approximately 68% of observations fall within \(\pm 1\sigma\) of the mean; 95% within \(\pm 2\sigma\); 99.7% within \(\pm 3\sigma\).
  • The total area under the curve equals 1.

The normal distribution is the cornerstone of classical statistics. Many real-world phenomena (measurement errors, returns on diversified portfolios, certain biological measurements) are approximately normally distributed. More importantly, the Central Limit Theorem (Chapter 5) guarantees that sampling distributions of means are approximately normal even when the underlying population is not.

The Standard Normal Distribution

Standard normal distribution: A normal distribution with mean 0 and standard deviation 1. Denoted \(Z \sim N(0, 1)\). The variable \(Z\) is called a z-score.

Any normal random variable can be standardized:

\[ Z = \frac{X - \mu}{\sigma} \]

The z-score tells us how many standard deviations an observation is from the mean. Standardization allows the use of a single set of probability tables (or the pnorm() function in R) for any normal distribution.

Computing Normal Probabilities

P(X < a): Standardize to find \(z = (a - \mu)/\sigma\); then \(P(X < a) = P(Z < z) = \Phi(z)\).

P(X > a): \(P(X > a) = 1 - \Phi(z)\).

P(a < X < b): \(\Phi(z_b) - \Phi(z_a)\) where \(z_a = (a - \mu)/\sigma\) and \(z_b = (b - \mu)/\sigma\).

In R:

pnorm(q, mean = mu, sd = sigma)           # P(X <= q)
1 - pnorm(q, mean = mu, sd = sigma)       # P(X > q)
qnorm(p, mean = mu, sd = sigma)           # quantile: value with P(X <= x) = p

Assessing Normality

Before applying methods that assume normality, check whether the assumption is reasonable:

  • Histogram: Should be approximately symmetric and bell-shaped.
  • QQ plot (Quantile-Quantile plot): Plots sample quantiles against theoretical normal quantiles. If the data are normally distributed, the points fall approximately on a straight line.
  • Shapiro-Wilk test (formal test): Tests the null hypothesis that the data come from a normal distribution. In R: shapiro.test(x). A small p-value provides evidence against normality.

Chapter 4: Sampling Distributions and the Central Limit Theorem

Populations and Samples

Statistical inference is the process of drawing conclusions about a population from a sample. Key vocabulary:

Population: The entire collection of individuals or measurements of interest. A population parameter (e.g., the true mean \(\mu\) or proportion \(p\)) is a fixed but typically unknown quantity.
Sample: A subset of the population, selected (ideally) by a random mechanism. A sample statistic (e.g., the sample mean \(\bar{x}\) or proportion \(\hat{p}\)) is computed from the sample and used to estimate the population parameter.

Because statistics vary from sample to sample, they are random variables with their own probability distributions.

The Sampling Distribution of the Sample Mean

If we repeatedly draw samples of size \(n\) from a population with mean \(\mu\) and standard deviation \(\sigma\), and compute \(\bar{x}\) each time, the collection of all possible \(\bar{x}\) values forms the sampling distribution of the sample mean.

Key results:

  • \(E(\bar{X}) = \mu\) — the sample mean is an unbiased estimator of the population mean.
  • \(\text{Std Dev}(\bar{X}) = \frac{\sigma}{\sqrt{n}}\) — called the standard error of the mean.
  • As \(n\) increases, the standard error decreases: larger samples yield more precise estimates.

The Central Limit Theorem (CLT)

Central Limit Theorem: If a random sample of size \(n\) is drawn from any population with mean \(\mu\) and finite standard deviation \(\sigma\), then as \(n\) becomes large, the sampling distribution of the sample mean \(\bar{X}\) approaches a normal distribution with mean \(\mu\) and standard deviation \(\sigma / \sqrt{n}\), regardless of the shape of the original population distribution. \[ \bar{X} \sim N\!\left(\mu,\ \frac{\sigma^2}{n}\right) \quad \text{approximately, for large } n \]

The CLT is one of the most profound results in all of statistics. It explains why normal-based inference works even when the underlying population is skewed or non-normal—provided the sample is large enough (typically \(n \geq 30\) as a rule of thumb, though more for highly skewed populations).

Law of Large Numbers

The Law of Large Numbers (LLN) is a related but distinct result: as sample size \(n\) increases without bound, the sample mean \(\bar{x}\) converges to the true population mean \(\mu\). The LLN provides the theoretical guarantee that estimating population parameters from large samples is a sound strategy.


Chapter 5: Confidence Intervals

The Logic of a Confidence Interval

A point estimate (e.g., \(\bar{x} = 42.3\)) is a single number estimated from the sample. Because different samples would produce different estimates, we acknowledge sampling uncertainty by constructing an interval estimate—a range of plausible values for the population parameter.

Confidence interval (CI): An interval computed from sample data that, with a stated probability (the confidence level), contains the true population parameter. A 95% confidence interval means that if we repeated the sampling procedure many times, approximately 95% of the computed intervals would contain the true parameter.

Confidence Interval for a Population Mean (known \(\sigma\))

When the population standard deviation \(\sigma\) is known:

\[ \bar{x} \pm z_{\alpha/2} \cdot \frac{\sigma}{\sqrt{n}} \]

where \(z_{\alpha/2}\) is the critical value from the standard normal distribution. For a 95% CI, \(z_{0.025} = 1.960\).

Confidence Interval for a Population Mean (unknown \(\sigma\))

In practice, \(\sigma\) is almost never known. When it must be estimated from the sample as \(s\), the correct distribution is the Student’s t-distribution with \(n-1\) degrees of freedom:

\[ \bar{x} \pm t_{\alpha/2,\, n-1} \cdot \frac{s}{\sqrt{n}} \]
Student's t-distribution: A family of symmetric, bell-shaped distributions indexed by degrees of freedom (df). Heavier tails than the normal distribution, reflecting additional uncertainty from estimating \(\sigma\). As df → ∞, the t-distribution approaches the standard normal.

Confidence Interval for a Population Proportion

For a proportion \(p\) estimated by \(\hat{p} = x/n\):

\[ \hat{p} \pm z_{\alpha/2} \sqrt{\frac{\hat{p}(1-\hat{p})}{n}} \]

This approximation is valid when \(n\hat{p} \geq 10\) and \(n(1-\hat{p}) \geq 10\) (the success-failure condition).

Interpreting Confidence Intervals

A common misinterpretation is: “There is a 95% probability that \(\mu\) lies in this interval.” Technically, \(\mu\) is fixed (not random); it either lies in the interval or does not. The correct interpretation is: “This method of constructing intervals will capture the true \(\mu\) in 95% of repeated applications.” Nevertheless, in practice, treating a 95% CI as representing plausible values for \(\mu\) is a useful working heuristic.

Margin of Error and Sample Size

The margin of error (half-width of the CI) is \(E = z_{\alpha/2} \cdot \sigma/\sqrt{n}\). Solving for \(n\):

\[ n = \left(\frac{z_{\alpha/2} \cdot \sigma}{E}\right)^2 \]

This formula allows analysts to determine the sample size required to achieve a desired precision—an important consideration in audit sampling and survey design.


Chapter 6: Hypothesis Testing

The Framework of Hypothesis Testing

Hypothesis testing is a formal procedure for making decisions about population parameters based on sample evidence. It begins with competing hypotheses:

Null hypothesis (\(H_0\)): A statement of "no effect" or "no difference" — the baseline claim. It is assumed true until sufficient evidence against it accumulates.
Alternative hypothesis (\(H_a\) or \(H_1\)): The claim the analyst wants to support — that there is an effect, a difference, or a departure from the null.

Steps in Hypothesis Testing

  1. State \(H_0\) and \(H_a\) clearly.
  2. Choose the significance level \(\alpha\) (commonly 0.05 or 0.01) — the probability of a Type I error the analyst is willing to tolerate.
  3. Compute the test statistic from the sample data.
  4. Determine the p-value (or compare test statistic to critical value).
  5. Make a decision: Reject \(H_0\) if p-value \(< \alpha\).
  6. Interpret the result in the context of the business question.

Type I and Type II Errors

\(H_0\) is actually true\(H_0\) is actually false
Fail to reject \(H_0\)Correct decisionType II Error (miss, probability \(\beta\))
Reject \(H_0\)Type I Error (false alarm, probability \(\alpha\))Correct decision (Power = \(1-\beta\))
p-value: The probability of obtaining a test statistic at least as extreme as the observed value, assuming \(H_0\) is true. A small p-value provides evidence against \(H_0\).

One-Sample Tests for a Mean

Z-test (when \(\sigma\) is known):

\[ z = \frac{\bar{x} - \mu_0}{\sigma / \sqrt{n}} \]

t-test (when \(\sigma\) is unknown, which is almost always):

\[ t = \frac{\bar{x} - \mu_0}{s / \sqrt{n}}, \quad \text{df} = n - 1 \]

One-Tailed vs. Two-Tailed Tests

  • Two-tailed (\(H_a: \mu \neq \mu_0\)): Reject if \(|t| > t_{\alpha/2, n-1}\). Used when interested in deviations in either direction.
  • One-tailed, upper (\(H_a: \mu > \mu_0\)): Reject if \(t > t_{\alpha, n-1}\).
  • One-tailed, lower (\(H_a: \mu < \mu_0\)): Reject if \(t < -t_{\alpha, n-1}\).
An internal auditor suspects that the average invoice processing time has increased above the standard of 3.5 days. A sample of 36 recent invoices yields \(\bar{x} = 3.9\) days with \(s = 1.2\) days.

Hypotheses: \(H_0: \mu = 3.5\), \(H_a: \mu > 3.5\).

Test statistic: \(t = (3.9 - 3.5) / (1.2 / \sqrt{36}) = 0.4 / 0.2 = 2.0\), df = 35.

p-value (one-tailed): In R, pt(2.0, df = 35, lower.tail = FALSE) ≈ 0.027.

Conclusion at \(\alpha = 0.05\): p = 0.027 < 0.05, so we reject \(H_0\). There is significant evidence that mean processing time exceeds 3.5 days.

Test for a Population Proportion

\[ z = \frac{\hat{p} - p_0}{\sqrt{p_0(1-p_0)/n}} \]

This test applies when \(np_0 \geq 10\) and \(n(1-p_0) \geq 10\).

Chi-Square Test for Goodness of Fit and Independence

The chi-square test for independence tests whether two categorical variables are associated in a contingency table:

\[ \chi^2 = \sum \frac{(O_{ij} - E_{ij})^2}{E_{ij}} \]

where \(O_{ij}\) is the observed count and \(E_{ij} = (\text{row total} \times \text{column total}) / n\) is the expected count under independence.

Degrees of freedom = \((r-1)(c-1)\) where \(r\) and \(c\) are the number of rows and columns.


Chapter 7: Two-Sample Inference

Comparing Two Population Means

Many business decisions require comparing two groups: Did the experimental marketing campaign outperform the control? Is the default rate higher for one credit tier than another? Does average transaction value differ between mobile and desktop channels?

Independent Samples t-Test

When two samples are drawn independently from separate populations:

\[ t = \frac{(\bar{x}_1 - \bar{x}_2) - D_0}{\sqrt{\frac{s_1^2}{n_1} + \frac{s_2^2}{n_2}}} \]

where \(D_0\) is the hypothesized difference (usually 0 under \(H_0\)).

Degrees of freedom are estimated using the Welch-Satterthwaite approximation:

\[ df \approx \frac{\left(\frac{s_1^2}{n_1} + \frac{s_2^2}{n_2}\right)^2}{\frac{(s_1^2/n_1)^2}{n_1-1} + \frac{(s_2^2/n_2)^2}{n_2-1}} \]

This is the default in R’s t.test() function (Welch’s t-test, which does not assume equal variances).

Paired Samples t-Test

When each observation in one group is logically matched to an observation in the other group (before/after measurements, matched subjects), the paired design is more powerful because it eliminates between-subject variability. The analysis reduces to a one-sample t-test on the differences \(d_i = x_{1i} - x_{2i}\):

\[ t = \frac{\bar{d} - 0}{s_d / \sqrt{n}} \]
A company implements a new accounts payable process and measures payment cycle time for 15 vendors before and after. The paired design controls for the fact that some vendors always take longer to process than others—the before/after comparison within each vendor is what matters.

Comparing Two Proportions

\[ z = \frac{(\hat{p}_1 - \hat{p}_2) - 0}{\sqrt{\hat{p}(1-\hat{p})\left(\frac{1}{n_1} + \frac{1}{n_2}\right)}} \]

where \(\hat{p} = (x_1 + x_2)/(n_1 + n_2)\) is the pooled proportion estimate.


Chapter 8: Bringing It Together — Statistical Inference in R

Conducting Tests in R

R provides built-in functions for most standard inference procedures. Mastery involves knowing which function to call, how to specify hypotheses, and how to read the output.

# One-sample t-test
t.test(x, mu = 3.5, alternative = "greater")

# Independent samples t-test (Welch, default)
t.test(group1, group2, alternative = "two.sided")

# Paired t-test
t.test(after, before, paired = TRUE, alternative = "two.sided")

# Test for a proportion
prop.test(x = 45, n = 200, p = 0.20, alternative = "greater")

# Chi-square test of independence
chisq.test(contingency_table)

# Linear regression
model <- lm(y ~ x, data = df)
summary(model)
confint(model)   # Confidence intervals for coefficients

Connecting Regression and Inference

The slope estimate \(\hat{\beta}_1\) in simple linear regression is itself a random variable (it varies across samples). Inference for the slope uses the t-distribution:

\[ t = \frac{\hat{\beta}_1 - 0}{SE(\hat{\beta}_1)}, \quad df = n - 2 \]

A significant t-test for the slope (\(p < \alpha\)) indicates that there is evidence of a linear relationship between \(X\) and \(Y\) in the population.

A \((1-\alpha)\) confidence interval for the slope:

\[ \hat{\beta}_1 \pm t_{\alpha/2,\, n-2} \cdot SE(\hat{\beta}_1) \]

Prediction Intervals vs. Confidence Intervals in Regression

There are two types of interval estimates in the regression context:

  • Confidence interval for the mean response: Estimates the average value of \(Y\) for all individuals with a given \(X = x^*\). Narrower.
  • Prediction interval for an individual response: Estimates where a single new observation with \(X = x^*\) will fall. Wider, because it includes both estimation uncertainty and individual-level variability.
predict(model, newdata = data.frame(x = x_star), interval = "confidence")
predict(model, newdata = data.frame(x = x_star), interval = "prediction")

Term Project: Statistical Analysis for a Non-Technical Audience

The term project synthesizes the entire AFM 113 curriculum. Teams of 4–5 students work on a business analytics case that requires them to:

  1. Describe the dataset and its key variables using descriptive statistics and visualizations.
  2. Apply inferential methods: confidence intervals and/or hypothesis tests relevant to a business question.
  3. Model a relationship between variables using simple linear regression and interpret the results.
  4. Communicate findings in a report intended for a non-technical business audience — using plain language, meaningful visualizations, and actionable conclusions.

Translating technical statistical results for non-specialists is a critical professional skill. Saying “the p-value of 0.03 is less than \(\alpha = 0.05\), so we reject the null” is technically correct but uninformative to a non-statistician. A professional communicator translates this to: “Our analysis provides strong evidence that customers who received the targeted promotion spent, on average, 18% more than those who did not, and this difference is unlikely to be explained by random chance.”

Back to top