STAT 332: Sampling and Experimental Design
Riley Metzger
Estimated study time: 1 hr 25 min
Table of contents
Sources and References
Primary notes — Cameron Roopnarine (Hextical), STAT 332: Sampling and Experimental Design, Winter 2021, hextical.github.io/university-notes Textbooks — Lohr, Sampling: Design and Analysis, 3rd Edition (2022); Särndal, Swensson, and Wretman, Model Assisted Survey Sampling (1992)
Chapter 1: Introduction to Survey Sampling
The goal of survey sampling is to learn about a finite population by observing only a subset of its units. This chapter establishes the conceptual framework (populations, samples, errors) and introduces the Horvitz-Thompson estimator, the foundational tool of design-based inference.
1.1 The PPDAC Framework
Every statistical investigation follows five stages: Problem, Plan, Data, Analysis, Conclusion.
Problem. Identify the question and the target population (TP) — the group about which information is sought. The response is the quantity each unit provides, and the attribute is the summary of interest (e.g., the population mean).
Plan. Specify the study population (SP), which is the set of units that can actually be observed. The SP need not equal the TP; for example, a drug study may target humans but observe mice. A sample is a subset of the SP drawn according to a well-defined protocol.
Data. Collect the observations according to the plan.
Analysis. Apply statistical methods to the data.
Conclusion. Relate findings back to the original problem, being mindful of three common errors:
- Study error: the attribute of the TP differs from that of the SP.
- Sampling error: the sample statistic differs from the population parameter.
- Measurement error: what we want to measure differs from what we actually measure.
1.2 Finite Populations and Basic Quantities
Note that the population variance uses \(N-1\) in the denominator, following the convention in Lohr (2022). The population total is \(\tau = N\mu = \sum_{i=1}^{N} y_i\).
\[ \pi = \frac{1}{N}\sum_{i=1}^{N} y_i. \]1.3 Samples, Inclusion Probabilities, and Sampling Designs
Then \(E[I_i] = \pi_i\), \(\text{Var}(I_i) = \pi_i(1-\pi_i)\), and \(\text{Cov}(I_i, I_j) = \pi_{ij} - \pi_i \pi_j\) for \(i \neq j\).
1.4 The Horvitz-Thompson Estimator
The cornerstone of design-based inference is the Horvitz-Thompson (HT) estimator for the population total:
The corresponding HT estimator of the population mean is \(\hat{\mu}_{\text{HT}} = \hat{\tau}_{\text{HT}} / N\).
This framework applies to any probability sampling design. The following chapters specialize it to particular designs.
Chapter 2: Simple Random Sampling
Simple random sampling without replacement (SRSWOR, or simply SRS) is the most fundamental probability sampling design. Every subset of size \(n\) is equally likely.
2.1 Definition and Inclusion Probabilities
2.2 The HT Estimator under SRS
\[ \hat{\tau}_{\text{HT}} = \sum_{i \in \mathcal{S}} \frac{y_i}{n/N} = N \bar{y}, \]\[ \hat{\mu}_{\text{HT}} = \bar{y} = \frac{1}{n}\sum_{i \in \mathcal{S}} y_i. \]2.3 Estimating Means, Totals, and Proportions
The sample mean \(\bar{y}\) is an unbiased estimator of \(\mu\), and \(N\bar{y}\) is an unbiased estimator of \(\tau\).
2.4 Sample Size Determination
\[ E = c\,\frac{\hat{\sigma}}{\sqrt{n}}\sqrt{1 - \frac{n}{N}}, \]\[ n = \left(\frac{E^2}{c^2 \sigma^2} + \frac{1}{N}\right)^{-1}. \]For proportions, the worst-case variance is maximized at \(\hat{\pi} = 1/2\), so a conservative sample size uses \(\hat{\sigma}^2 = 1/4\).
2.5 CLT for Finite Populations and Confidence Intervals
\[ \frac{\bar{y} - \mu}{\sqrt{(1 - n/N)\,\sigma^2/n}} \;\xrightarrow{d}\; \mathcal{N}(0,1). \]\[ \bar{y} \;\pm\; z_{\alpha/2}\,\frac{\hat{\sigma}}{\sqrt{n}}\,\sqrt{1 - \frac{n}{N}}, \]\[ \hat{\pi} \;\pm\; z_{\alpha/2}\,\sqrt{\frac{\hat{\pi}(1-\hat{\pi})}{n}\left(1-\frac{n}{N}\right)}. \]Chapter 3: Stratification
Stratification divides the population into non-overlapping subgroups (strata) and samples independently within each stratum. This can improve precision substantially when strata are internally homogeneous.
3.1 Stratified Simple Random Sampling
3.2 The HT Estimator under Stratified SRS
\[ \hat{\mu}_{\text{str}} = \sum_{h=1}^{H} w_h \bar{y}_h, \qquad \bar{y}_h = \frac{1}{n_h}\sum_{i \in \mathcal{S}_h} y_i. \]For proportions, replace \(\sigma_h^2\) with \(\pi_h(1-\pi_h)\) and use the estimator \(\hat{\pi}_{\text{str}} = \sum_{h=1}^{H} w_h \hat{\pi}_h\).
3.3 Optimal Allocation
The allocation question is: given a fixed total sample size \(n\), how should we distribute \(n_1, \ldots, n_H\) to minimize \(\text{Var}(\hat{\mu}_{\text{str}})\)?
Proportional Allocation
Allocate in proportion to stratum size: \(n_h = n \cdot w_h\). This is simple and guarantees that each unit has the same overall inclusion probability \(n/N\).
Neyman (Optimal) Allocation
Allocate more to strata with higher variability. Minimizing \(\text{Var}(\hat{\mu}_{\text{str}})\) subject to \(\sum n_h = n\) via Lagrange multipliers yields:
In practice, the stratum variances are unknown. A common approach is to take a small pilot sample to estimate them.
3.4 Poststratification
Poststratification is useful when stratum membership is unknown before sampling, or when administrative convenience prevents separate sampling.
3.5 Comparing Stratified Designs: A Worked Example
| Stratum \(h\) | Department | \(N_h\) | \(w_h\) | \(\sigma_h\) |
|---|---|---|---|---|
| 1 | Engineering | 500 | 0.50 | 2.0 |
| 2 | Sales | 300 | 0.30 | 3.0 |
| 3 | Admin | 200 | 0.20 | 1.5 |
SRS (no stratification): \(\sigma^2 \approx \sum w_h[\sigma_h^2 + (\mu_h - \mu)^2]\). If department means differ, the overall variance includes a between-stratum component, making SRS less efficient.
The Neyman allocation shifted 11 units from Engineering (low variance) to Sales (high variance), reducing the overall variance by about 8.5% compared to proportional allocation.
Chapter 4: Auxiliary Information
When a variable \(x\) correlated with the response \(y\) is known for the entire population (or at least its population mean \(\mu_x\) is known), we can use this auxiliary information to improve estimation.
4.1 The Regression Estimator
\[ y_i = \alpha + \beta(x_i - \bar{x}) + R_i, \qquad R_i \sim \mathcal{N}(0, \sigma^2). \]\[ \hat{\alpha} = \bar{y}, \qquad \hat{\beta} = \frac{s_{xy}}{s_x^2} = \frac{\sum_{i \in \mathcal{S}} y_i(x_i - \bar{x})}{\sum_{i \in \mathcal{S}} (x_i - \bar{x})^2}, \]where \(s_{xy} = S_{xy}/(n-1)\) and \(s_x^2 = S_{xx}/(n-1)\).
When the sample mean of \(x\) equals the population mean (\(\bar{x} = \mu_x\)), the regression estimator collapses to the SRS estimator \(\bar{y}\). When \(\bar{x} \neq \mu_x\), the correction term \(\hat{\beta}(\mu_x - \bar{x})\) adjusts for the discrepancy.
4.2 The Ratio Estimator
where \(\hat{\sigma}_{\text{ratio}}^2 = W/(n-1)\) and \(W\) is the residual sum of squares from the ratio model.
Ratio Estimation for Subgroup Means
When the goal is to estimate the mean of a subgroup (e.g., the mean grade of male students), ratio estimation applies naturally. If \(z_i\) is a binary indicator for subgroup membership, the subgroup mean is \(\theta = \mu / \pi\) where \(\mu\) is the average of \(y_i z_i\) and \(\pi\) is the proportion of units in the subgroup. The estimate is \(\hat{\theta} = \hat{\mu}/\hat{\pi}\).
\[ \frac{\tilde{\mu}}{\tilde{\pi}} \approx \frac{\mu}{\pi} + \frac{1}{\pi}(\tilde{\mu} - \mu) - \frac{\mu}{\pi^2}(\tilde{\pi} - \pi), \]\[ \text{Var}\!\left(\frac{\tilde{\mu}}{\tilde{\pi}}\right) \approx \frac{1}{\pi^2}\,\text{Var}\!\left(\tilde{\mu} - \frac{\mu}{\pi}\tilde{\pi}\right). \]\[ \hat{\theta} \;\pm\; c\,\frac{1}{\hat{\pi}}\,\frac{\hat{\sigma}_{\text{ratio}}}{\sqrt{n}}\,\sqrt{1 - \frac{n}{N}}, \qquad \hat{\sigma}_{\text{ratio}}^2 = \frac{\sum_{i \in \mathcal{S}}(y_i - \hat{\theta} z_i)^2}{n-1}. \]4.3 Comparing SRS, Regression, and Ratio Estimators
| Technique | Estimate | CI |
|---|---|---|
| SRS | \(\hat{\mu}_y = \bar{y}\) | \(\bar{y} \pm c\,\frac{\hat{\sigma}_y}{\sqrt{n}}\sqrt{1 - n/N}\) |
| Regression | \(\hat{\mu}_{\text{reg}} = \bar{y} + \hat{\beta}(\mu_x - \bar{x})\) | \(\hat{\mu}_{\text{reg}} \pm c\,\frac{\hat{\sigma}_r}{\sqrt{n}}\sqrt{1 - n/N}\) |
| Ratio | \(\hat{\mu}_{\text{ratio}} = \frac{\bar{y}}{\bar{x}}\mu_x\) | \(\hat{\mu}_{\text{ratio}} \pm c\,\frac{\hat{\sigma}_{\text{ratio}}}{\sqrt{n}}\sqrt{1 - n/N}\) |
4.4 The Hajek Estimator
The Hajek estimator can be viewed as a ratio estimator where the auxiliary variable is the constant \(x_i = 1\). It is particularly useful when \(N\) is unknown or when the design produces highly variable weights.
Chapter 5: Cluster and Two-Stage Sampling
In many practical surveys, a list of individual units does not exist, but a list of clusters (groups of units) does. Cluster sampling selects entire clusters first, then observes units within them.
5.1 Single-Stage Cluster Sampling
where \(f_c = m/M\) and \(\bar{t} = \tau/M\). The estimated variance replaces \(S_t^2\) with the sample variance of observed cluster totals \(s_t^2\).
\[ \hat{\bar{y}}_r = \frac{\sum_{j \in \mathcal{S}_c} t_j}{\sum_{j \in \mathcal{S}_c} N_j}. \]Design Effect for Cluster Sampling
\[ \text{deff} = \frac{\text{Var}_{\text{cluster}}(\hat{\bar{y}})}{\text{Var}_{\text{SRS}}(\hat{\bar{y}})}. \]\[ \text{deff} \approx 1 + (\bar{N} - 1)\rho, \]where \(\rho\) is the intraclass correlation coefficient measuring the similarity of units within the same cluster. When \(\rho > 0\) (units within a cluster are more alike than units across clusters), the design effect exceeds 1, meaning cluster sampling is less efficient than SRS.
5.2 Two-Stage Sampling: SRS-SRS
where \(\bar{y}_j\) is the sample mean within cluster \(j\).
\[ \text{Var}(\hat{\tau}) = M^2 \frac{1 - f_c}{m}\, S_b^2 + \frac{M}{m}\sum_{j=1}^{M} N_j^2 \frac{1 - f_j}{n_j}\, S_{wj}^2, \]where \(S_b^2\) is the variance of the cluster means \(N_j\bar{Y}_j\), \(S_{wj}^2\) is the within-cluster variance, \(f_c = m/M\), and \(f_j = n_j/N_j\). The first term reflects uncertainty from sampling clusters; the second reflects uncertainty from subsampling within clusters.
5.3 STSRS-SRS Designs
When the first stage uses stratified SRS of clusters and the second stage uses SRS within selected clusters, we combine stratification’s variance reduction with cluster sampling’s operational convenience.
Suppose the \(M\) clusters are partitioned into \(H\) strata, with stratum \(h\) containing \(M_h\) clusters. We select \(m_h\) clusters from stratum \(h\) by SRS, and then subsample \(n_{hj}\) units within each selected cluster.
\[ \hat{\tau}_{\text{STSRS-SRS}} = \sum_{h=1}^{H} \frac{M_h}{m_h} \sum_{j=1}^{m_h} N_{hj}\,\bar{y}_{hj}, \]and its variance combines the stratified between-cluster term with within-cluster terms from each stratum. Stratifying clusters by geography, institution type, or size reduces the between-cluster variance component, yielding tighter confidence intervals than unstratified cluster sampling.
5.4 Probability Proportional to Size Sampling
When clusters vary greatly in size, SRS of clusters is inefficient because a sample dominated by small clusters underrepresents the population. PPS sampling corrects this.
If \(t_j/p_j\) is approximately constant across clusters (i.e., cluster totals are roughly proportional to their sizes), then the variance is near zero, making PPS highly efficient for populations with heterogeneous cluster sizes.
Chapter 6: Surveys in Practice
6.1 Sampling Weights
\[ \hat{\tau}_{\text{HT}} = \sum_{i \in \mathcal{S}} d_i\, y_i. \]In complex surveys, weights undergo several adjustments:
Base Weights and Weight Adjustments
\[ w_i = d_i \times a_i \times g_i, \]where \(a_i\) is a nonresponse adjustment factor and \(g_i\) is a calibration (or post-stratification) factor.
Calibration and Post-Stratification Weights
\[ w_i^{\text{cal}} = d_i \times \frac{N_k}{\sum_{j \in \mathcal{S} \cap \text{group } k} d_j}. \]This is a special case of raking (iterative proportional fitting), which adjusts weights to simultaneously match marginal totals along multiple dimensions.
6.2 Nonresponse
Nonresponse occurs when selected units fail to provide data. It introduces bias because respondents may differ systematically from non-respondents.
Two types of nonresponse:
- Unit nonresponse: the sampled individual does not participate at all. Common in household surveys, telephone polls, and mail questionnaires.
- Item nonresponse: the individual participates but fails to answer specific questions. This is typically addressed by imputation (filling in missing values using observed data).
Two-Phase Sampling for Nonresponse
- Phase 1: Draw an SRS of size \(n\). Let \(n_R\) units respond.
- Phase 2: Draw a subsample of \(m\) non-respondents and follow up intensively (e.g., in-person interviews).
This estimator is approximately unbiased for the population mean, as the follow-up subsample represents the non-respondent stratum. Similarly, for proportions, \(\hat{p} = (n_R/n)\hat{p}_R + (n_m/n)\hat{p}_m\).
Nonresponse Weight Adjustment
\[ w_i^{\text{adj}} = \frac{d_i}{\hat{\phi}_k}, \qquad \hat{\phi}_k = \frac{\sum_{j \in \mathcal{S} \cap \text{group } k} R_j \, d_j}{\sum_{j \in \mathcal{S} \cap \text{group } k} d_j}, \]where \(R_j = 1\) if unit \(j\) responded. The weight inflation compensates for the “missing” non-respondents, under the assumption that within each group, respondents and non-respondents have similar values of \(y\).
6.3 Variance Estimation Methods
For complex designs (stratification + clustering + unequal weights), analytic variance formulas may be intractable. Three approaches are standard in practice:
Linearization (Taylor Series)
\[ \hat{\theta} \approx g(\tau_1, \tau_2, \ldots) + \sum_k \frac{\partial g}{\partial \tau_k}\bigg|_{\boldsymbol{\tau}} (\hat{\tau}_k - \tau_k). \]The variance is then estimated as the design-based variance of the linearized statistic. For example, for the ratio \(\hat{R} = \hat{\tau}_y/\hat{\tau}_x\), the linearized residuals are \(e_i = y_i - \hat{R}\, x_i\), and \(\widehat{\text{Var}}(\hat{R}) \approx (1/\hat{\tau}_x^2)\,\widehat{\text{Var}}(\hat{\tau}_e)\).
Jackknife Variance Estimation
The jackknife is attractive because it requires only the ability to recompute the point estimate, and it automatically accounts for the complex design structure.
Bootstrap for Surveys
The survey bootstrap resamples at the primary sampling unit (PSU) level, respecting the stratified cluster structure. A common approach (Rao and Wu, 1988):
- Within each stratum \(h\), draw \(m_h - 1\) PSUs with replacement from the \(m_h\) observed PSUs.
- Adjust the weights of the resampled PSUs to maintain the total weight.
- Compute \(\hat{\theta}^{*(b)}\) for each of \(B\) bootstrap replicates.
- Estimate the variance as \(\widehat{\text{Var}}_{\text{boot}} = \frac{1}{B}\sum_{b=1}^{B}(\hat{\theta}^{*(b)} - \bar{\theta}^*)^2\).
The bootstrap is the most flexible of the three methods, handling nonlinear statistics, percentiles, and complex multi-stage designs with ease.
Chapter 7: One-Way ANOVA
The second half of the course addresses experimental design. In an experiment, the investigator controls which units receive which treatments, whereas in a survey the investigator only observes.
7.1 The Completely Randomized Design
where \(W = \sum_{ij}(y_{ij} - \hat{\mu} - \hat{\tau}_i)^2\) is the residual sum of squares.
The estimator of \(\tau_i\) is \(\tilde{\tau}_i = \bar{Y}_{i+} - \bar{Y}_{++}\). It is unbiased: \(E[\tilde{\tau}_i] = \tau_i\).
\[ \text{Var}(\tilde{\mu}) = \frac{\sigma^2}{tr}, \qquad \text{Var}(\tilde{\tau}_i) = \frac{\sigma^2}{2r}. \]\[ \hat{\tau}_i \;\pm\; t^*\,\sqrt{\frac{\hat{\sigma}^2}{2r}}, \qquad t^* \sim t(n - q + c). \]7.2 Sum of Squares Decomposition
The degrees of freedom also partition: \(\text{df}_{\text{Tot}} = n-1 = (\text{df}_{\text{Trt}} = t-1) + (\text{df}_{\text{Res}} = n-t)\).
| Source | df | SS | MS |
|---|---|---|---|
| Treatment | \(t-1\) | SS(Trt) | SS(Trt)/\((t-1)\) |
| Residual | \(n-t\) | SS(Res) | SS(Res)/\((n-t)\) |
| Total | \(n-1\) | SS(Tot) |
7.3 The F Test
The F test procedure:
- \(H_0: \tau_1 = \cdots = \tau_t = 0\) vs. \(H_a:\) at least one \(\tau_i \neq 0\).
- Compute \(d = \text{MS(Trt)}/\text{MS(Res)}\).
- The \(p\)-value is \(P(F(t-1, n-t) > d)\).
- Reject \(H_0\) if the \(p\)-value is small.
7.4 Contrasts and Multiple Comparisons
The estimator is \(\hat{\theta} = \sum a_i \hat{\tau}_i\), which is unbiased with variance \(\text{Var}(\tilde{\theta}) = \sigma^2 \sum a_i^2 / r\).
Tukey’s Honestly Significant Difference
\[ (\hat{\tau}_i - \hat{\tau}_j) \;\pm\; q_{\alpha}(t, n-t)\,\frac{\hat{\sigma}}{\sqrt{r}}, \]where \(q_{\alpha}(t, n-t)\) is the critical value from the Studentized range distribution.
Bonferroni Correction
For \(k\) pre-planned comparisons, each at significance level \(\alpha/k\), the Bonferroni method ensures that the overall error rate does not exceed \(\alpha\). This is more conservative than Tukey for all pairwise comparisons but can be applied to any set of contrasts.
7.5 Unbalanced CRD
The LS estimates are \(\hat{\mu} = \bar{y}_{++}\), \(\hat{\tau}_i = \bar{y}_{i+} - \bar{y}_{++}\), and \(\hat{\sigma}^2 = W / (n - t)\).
The variance of \(\tilde{\tau}_i - \tilde{\tau}_j\) is \(\sigma^2(1/r_i + 1/r_j)\), so confidence intervals for treatment comparisons must account for the unequal group sizes.
7.6 Model Diagnostics
The ANOVA model assumes: (i) \(E[R_{ij}] = 0\); (ii) \(\text{Var}(R_{ij}) = \sigma^2\) (constant variance); (iii) normality; (iv) independence. These are checked with:
- Residuals vs. fitted values plot: checks zero mean and constant variance.
- Normal Q-Q plot: checks normality.
- Residuals vs. order plot: checks independence.
Chapter 8: Two-Way ANOVA
8.1 The Two-Factor Model
The LS estimates are \(\hat{\mu} = \bar{y}_{+++}\) and \(\hat{\tau}_{ij} = \bar{y}_{ij+} - \bar{y}_{+++}\).
8.2 Interaction
\[ Y_{ijk} = \mu + \alpha_i + \beta_j + (\alpha\beta)_{ij} + R_{ijk}, \]where \((\alpha\beta)_{ij}\) represents the interaction. No interaction means \((\alpha\beta)_{ij} = 0\) for all \(i,j\).
Detecting Interaction
Three methods to detect interaction:
Interaction plot: Plot the group means against levels of one factor, with separate lines for each level of the other. Parallel lines suggest no interaction; non-parallel lines suggest interaction.
Contrast method: For a \(2 \times 2\) design, compute \(\hat{\theta} = \hat{\tau}_{11} - \hat{\tau}_{01} - \hat{\tau}_{10} + \hat{\tau}_{00}\). If the confidence interval for \(\theta\) includes 0, there is no evidence of interaction.
- \[
\text{SS(Trt)} = \text{SS(A)} + \text{SS(B)} + \text{SS(A:B)}.
\]
Test \(H_0: \text{no interaction}\) using \(F = \text{MS(A:B)} / \text{MS(Res)}\).
8.3 The Two-Way ANOVA Table
| Source | df | SS | MS | F |
|---|---|---|---|---|
| Factor A | \(\ell_1 - 1\) | SS(A) | MS(A) | MS(A)/MS(Res) |
| Factor B | \(\ell_2 - 1\) | SS(B) | MS(B) | MS(B)/MS(Res) |
| A:B Interaction | \((\ell_1-1)(\ell_2-1)\) | SS(A:B) | MS(A:B) | MS(A:B)/MS(Res) |
| Residual | \(\ell_1\ell_2(r-1)\) | SS(Res) | MS(Res) | |
| Total | \(n-1\) | SS(Tot) |
8.4 Sum of Squares Decomposition
\[ \text{SS(Trt)} = \text{SS(A)} + \text{SS(B)} + \text{SS(A:B)}, \]\[ \text{SS(A)} = r\ell_2 \sum_{i=1}^{\ell_1}(\bar{y}_{i++} - \bar{y}_{+++})^2, \qquad \text{SS(B)} = r\ell_1 \sum_{j=1}^{\ell_2}(\bar{y}_{+j+} - \bar{y}_{+++})^2, \]\[ \text{SS(A:B)} = r \sum_{i=1}^{\ell_1}\sum_{j=1}^{\ell_2}(\bar{y}_{ij+} - \bar{y}_{i++} - \bar{y}_{+j+} + \bar{y}_{+++})^2. \]Type I, II, and III Sums of Squares
In unbalanced designs (unequal cell sizes), the decomposition is no longer unique and the three standard types of sums of squares differ:
Type I (Sequential): Each term is adjusted only for terms already in the model. The result depends on the order of entry — \(\text{SS(A | 1)}\) differs from \(\text{SS(A | 1, B)}\). Used mainly when there is a natural ordering of factors.
- \[
\text{SS}_{\text{II}}(A) = \text{SS}(A \mid B), \qquad \text{SS}_{\text{II}}(B) = \text{SS}(B \mid A).
\]
Type II is appropriate when there is no significant interaction, as it uses more degrees of freedom for testing main effects.
- \[
\text{SS}_{\text{III}}(A) = \text{SS}(A \mid B, A\!:\!B).
\]
This is the default in many software packages and is recommended for unbalanced designs, particularly when the interaction may be present.
In a balanced design, all three types are identical. The choice matters only when cell sizes differ.
| Source | df | SS | MS | F |
|---|---|---|---|---|
| Irrigation | 1 | 112.5 | 112.5 | 18.75 |
| Fertilizer | 2 | 84.3 | 42.15 | 7.03 |
| Interaction | 2 | 9.8 | 4.90 | 0.82 |
| Residual | 18 | 108.0 | 6.00 | |
| Total | 23 | 314.6 |
At \(\alpha = 0.05\): irrigation is significant (\(F_{1,18} = 18.75\), \(p < 0.001\)), fertilizer is significant (\(F_{2,18} = 7.03\), \(p = 0.006\)), and the interaction is not significant (\(F_{2,18} = 0.82\), \(p = 0.46\)). Since the interaction is non-significant, an additive model \(Y_{ijk} = \mu + \alpha_i + \beta_j + R_{ijk}\) is adequate, and the main effects have clear interpretations.
8.5 Multiple Comparisons in Two-Way Designs
\[ \bar{y}_{i++} - \bar{y}_{i'++} \pm q_{\alpha, \ell_1, \nu}\sqrt{\frac{\text{MS(Res)}}{r\ell_2}}, \]where \(q_{\alpha,\ell_1,\nu}\) is the studentized range critical value and \(\nu\) is the residual degrees of freedom.
When the interaction is significant, comparisons of one factor at fixed levels of the other (simple effects) are more informative than marginal comparisons.
8.6 Diagnostics
The same diagnostic tools used in one-way ANOVA apply:
- Residuals vs. fitted values: check for non-constant variance (fanning patterns suggest heteroscedasticity).
- Normal Q-Q plot of residuals: check for non-normality. Moderate departures are tolerable with balanced designs due to the robustness of the F test.
- Residuals by factor level: plot residuals separately for each factor to detect level-specific variance patterns.
- Tukey’s one-degree-of-freedom test for nonadditivity: specifically tests whether the interaction has a multiplicative form \((\alpha\beta)_{ij} = c\,\alpha_i\beta_j\), which would suggest a transformation (e.g., log) could remove the interaction.
Chapter 9: Randomized Block Designs
9.1 The Rationale for Blocking
In a CRD, all variability not explained by treatments goes into the residual. If there is a known source of variability (e.g., different batches, different days), blocking removes it from the error term, increasing the power of treatment comparisons.
9.2 Analysis of the RCBD
\[ \hat{\mu} = \bar{y}_{++}, \qquad \hat{\tau}_i = \bar{y}_{i+} - \bar{y}_{++}, \qquad \hat{\beta}_j = \bar{y}_{+j} - \bar{y}_{++}, \]\[ \hat{\sigma}^2 = \frac{W}{rt - (t + r + 1) + 2} = \frac{W}{(r-1)(t-1)}. \]ANOVA Table for RCBD
| Source | df | SS | MS |
|---|---|---|---|
| Treatment | \(t-1\) | SS(Trt) | SS(Trt)/\((t-1)\) |
| Block | \(r-1\) | SS(Block) | SS(Block)/\((r-1)\) |
| Residual | \((r-1)(t-1)\) | SS(Res) | SS(Res)/\(((r-1)(t-1))\) |
| Total | \(rt-1\) | SS(Tot) |
The F test for treatment effects is \(F = \text{MS(Trt)}/\text{MS(Res)} \sim F(t-1, (r-1)(t-1))\) under \(H_0\).
The F test for block effects is \(F = \text{MS(Block)}/\text{MS(Res)} \sim F(r-1, (r-1)(t-1))\). A significant block effect confirms that blocking was beneficial.
9.3 Treatment Comparisons in RCBD
\[ \text{Var}(\hat{\tau}_i - \hat{\tau}_j) = \frac{2\sigma^2}{r}. \]\[ (\bar{y}_{i+} - \bar{y}_{j+}) \;\pm\; t_{\alpha/2,(r-1)(t-1)}\,\sqrt{\frac{2\,\text{MS(Res)}}{r}}. \]For simultaneous comparisons of all \(\binom{t}{2}\) treatment pairs, Tukey’s HSD replaces the \(t\)-critical value with \(q_{\alpha,t,(r-1)(t-1)}/\sqrt{2}\).
9.4 RCBD vs. CRD
When blocking is effective, the RCBD has a smaller residual variance than the CRD, yielding narrower confidence intervals and more powerful tests. The CRD ANOVA table is obtained from the RCBD by combining the Block and Residual sums of squares.
9.5 Latin Squares
\[ Y_{ijk} = \mu + \tau_i + \beta_j + \gamma_k + R_{ijk}, \]where \(\beta_j\) and \(\gamma_k\) are the row and column block effects respectively.
| Surface 1 | Surface 2 | Surface 3 | Surface 4 | |
|---|---|---|---|---|
| Car 1 | A | B | C | D |
| Car 2 | B | C | D | A |
| Car 3 | C | D | A | B |
| Car 4 | D | A | B | C |
Each compound appears exactly once in each row and column. The ANOVA decomposes \(\text{SS(Tot)}\) into SS(Compound), SS(Car), SS(Surface), and SS(Residual), with \((t-1)(t-2) = 6\) residual degrees of freedom.
The ANOVA table for a Latin square has:
| Source | df |
|---|---|
| Rows | \(t - 1\) |
| Columns | \(t - 1\) |
| Treatments | \(t - 1\) |
| Residual | \((t-1)(t-2)\) |
| Total | \(t^2 - 1\) |
Latin squares are efficient but have few residual degrees of freedom when \(t\) is small. Graeco-Latin squares extend the idea to control for three blocking factors simultaneously.
Chapter 10: Two-Level Factorial Designs
Factorial designs study the effects of multiple factors simultaneously. Two-level factorial designs restrict each factor to exactly two levels (coded as \(-1\) and \(+1\), or low and high).
10.1 The \(2^2\) Factorial Design
each with 1 degree of freedom.
- SS(Freq) = 304.2, \(F = 2.97\), \(p = 0.10\) (not significant)
- SS(Height) = 793.8, \(F = 7.75\), \(p = 0.013\) (significant)
- SS(Interaction) = 45.0, \(F = 0.44\), \(p = 0.52\) (no interaction)
10.2 The \(2^3\) Factorial Design
With three factors A, B, C at two levels each, there are \(2^3 = 8\) treatment combinations. The effects decompose into:
- 3 main effects: A, B, C
- 3 two-factor interactions: AB, AC, BC
- 1 three-factor interaction: ABC
Each effect has 1 degree of freedom, for a total of 7 treatment df. The main effects and interactions are computed by the Yates algorithm or by using contrast coefficients.
10.3 Fractional Factorial Designs
When the number of factors is large, a full factorial requires too many runs. A fractional factorial uses only a fraction (e.g., \(1/2\) or \(1/4\)) of the full design, sacrificing information about high-order interactions (which are often negligible) to reduce the experiment size.
- Resolution III: Main effects are aliased with two-factor interactions.
- Resolution IV: Main effects are free of two-factor interaction aliases, but two-factor interactions are aliased with each other.
- Resolution V: Main effects and two-factor interactions are free of aliases with each other.
Example: \(2^{3-1}\) Half-Fraction
A \(2^{3-1}\) design uses 4 runs instead of 8. If the generator is \(C = AB\), the defining relation is \(I = ABC\). This means:
- A is aliased with BC (since \(A \cdot ABC = BC\))
- B is aliased with AC
- C is aliased with AB
Under the effect sparsity principle (higher-order interactions are often negligible), we assume the two-factor interactions are small and interpret the estimated contrasts as main effects.
| Run | A | B | C = AB | Yield |
|---|---|---|---|---|
| 1 | \(-1\) | \(-1\) | \(+1\) | 72 |
| 2 | \(+1\) | \(-1\) | \(-1\) | 65 |
| 3 | \(-1\) | \(+1\) | \(-1\) | 78 |
| 4 | \(+1\) | \(+1\) | \(+1\) | 84 |
The estimated effect of A (aliased with BC) is \(\frac{1}{2}[(65 + 84) - (72 + 78)] = -0.5\). The estimated effect of B (aliased with AC) is \(\frac{1}{2}[(78 + 84) - (72 + 65)] = 12.5\). If two-factor interactions are small, temperature has negligible effect while pressure strongly increases yield.
General \(2^{k-p}\) Designs
For \(k\) factors, a \(2^{k-p}\) fractional factorial uses \(p\) generators to define the fraction. The defining relation is the set of all words formed from products of generators and their products. The word length pattern determines the resolution: the shortest word length in the defining relation is the design’s resolution.
Choosing an appropriate fraction involves balancing run economy against the ability to estimate effects of interest. Minimum aberration designs minimize aliasing among the most important (lowest-order) effects and are tabulated for common values of \(k\) and \(p\) in standard references.
Appendix: Summary of Key Formulas
Sampling Formulas
| Quantity | Parameter | Estimator | Variance of Estimator |
|---|---|---|---|
| Mean (SRS) | \(\mu\) | \(\bar{y}\) | \(\left(1 - \frac{n}{N}\right)\frac{\sigma^2}{n}\) |
| Proportion (SRS) | \(\pi\) | \(\hat{\pi} = \bar{y}\) | \(\left(1 - \frac{n}{N}\right)\frac{\pi(1-\pi)}{n}\) |
| Mean (Stratified) | \(\mu\) | \(\sum w_h \bar{y}_h\) | \(\sum w_h^2 \frac{\sigma_h^2}{n_h}\left(1 - \frac{n_h}{N_h}\right)\) |
| Regression | \(\mu_y\) | \(\bar{y} + \hat{\beta}(\mu_x - \bar{x})\) | \(\left(1 - \frac{n}{N}\right)\frac{\sigma_r^2}{n}\) |
| Ratio | \(\mu_y\) | \(\frac{\bar{y}}{\bar{x}}\mu_x\) | \(\left(1 - \frac{n}{N}\right)\frac{\sigma_{\text{ratio}}^2}{n}\) |
Experimental Design Formulas
| Design | Model | df (Residual) | F Statistic |
|---|---|---|---|
| CRD (balanced) | \(Y_{ij} = \mu + \tau_i + R_{ij}\) | \(t(r-1)\) | MS(Trt)/MS(Res) |
| CRD (unbalanced) | \(Y_{ij} = \mu + \tau_i + R_{ij}\) | \(n - t\) | MS(Trt)/MS(Res) |
| RCBD | \(Y_{ij} = \mu + \tau_i + \beta_j + R_{ij}\) | \((r-1)(t-1)\) | MS(Trt)/MS(Res) |
| Factorial CRD | \(Y_{ijk} = \mu + \tau_{ij} + R_{ijk}\) | \(\ell_1\ell_2(r-1)\) | MS(A:B)/MS(Res) |
| Factorial RBD | \(Y_{ijk} = \mu + \tau_{ij} + \beta_k + R_{ijk}\) | By design | MS(Trt)/MS(Res) |
Confidence Interval Summary
| Model | CI | Distribution of Critical Value |
|---|---|---|
| SRS Mean | \(\bar{y} \pm c\,\frac{\hat{\sigma}}{\sqrt{n}}\sqrt{1 - n/N}\) | \(\mathcal{N}(0,1)\) |
| SRS Proportion | \(\hat{\pi} \pm c\sqrt{\frac{\hat{\pi}(1-\hat{\pi})}{n}(1 - n/N)}\) | \(\mathcal{N}(0,1)\) |
| CRD Treatment Effect | \(\hat{\tau}_i \pm t^*\sqrt{\hat{\sigma}^2/(2r)}\) | \(t(n - q + c)\) |
| CRD Mean Difference | \(\hat{\tau}_i - \hat{\tau}_j \pm t^*\sqrt{\hat{\sigma}^2 \cdot 2/r}\) | \(t(n - t)\) |
| RCBD Treatment Diff. | \(\hat{\tau}_i - \hat{\tau}_j \pm t^*\sqrt{2\hat{\sigma}^2/r}\) | \(t((r-1)(t-1))\) |
Distributional Results
- If \(Z \sim \mathcal{N}(0,1)\), then \(Z^2 \sim \chi^2(1)\).
- If \(X \sim \chi^2(m)\) and \(Y \sim \chi^2(n)\) are independent, then \(X + Y \sim \chi^2(m+n)\).
- If \(Z \sim \mathcal{N}(0,1)\) and \(X \sim \chi^2(m)\) are independent, then \(Z/\sqrt{X/m} \sim t(m)\).
- If \(X \sim \chi^2(m)\) and \(Y \sim \chi^2(n)\) are independent, then \((X/m)/(Y/n) \sim F(m,n)\).
- The LS residual satisfies \((n-q+c)\hat{\sigma}^2/\sigma^2 \sim \chi^2(n-q+c)\).