SMF 230: Introduction to Statistics
Carl Rodrigue
Estimated study time: 58 minutes
Table of contents
Chapter 1: Foundations of Statistics in Social Research
Why Statistics Matter for Sexuality, Marriage, and Family Studies
Statistics provide the tools necessary for transforming raw observations about human relationships, family structures, and intimate life into rigorous, defensible claims. Without statistical reasoning, researchers in Sexuality, Marriage, and Family (SMF) studies would be limited to anecdote and speculation. With it, they can quantify patterns, test theories, and evaluate interventions designed to support families and communities.
This course emphasizes applied statistical literacy: the ability to choose the right test, run it in software, interpret the output, and communicate findings to both academic and public audiences. The dataset used throughout is the World Values Survey (WVS) Wave 7, focusing on the Canadian sample, which contains variables on family values, gender attitudes, relationship satisfaction, religiosity, and social trust, making it ideal for SMF research questions.
Descriptive vs. Inferential Statistics
All of statistics divides into two broad families.
Descriptive statistics summarize and organize data that have already been collected. They answer the question: “What does the data look like?” Examples include means, medians, frequency tables, and standard deviations.
Inferential statistics use data from a sample to draw conclusions about a larger population. They answer the question: “Can we generalize beyond the people we actually measured?” Examples include t-tests, chi-square tests, and regression models.
Levels of Measurement
Before choosing any statistical technique, you must identify the level of measurement of each variable. The level of measurement determines which statistics are appropriate.
Nominal
Nominal variables consist of categories with no inherent order. Examples: religion (Christian, Muslim, Hindu, None), marital status (married, divorced, single, widowed), province of residence.
The only meaningful operation is counting how many cases fall into each category. You cannot compute a meaningful average of marital statuses.
Ordinal
Ordinal variables have categories that can be ranked, but the distances between ranks are not necessarily equal. Examples: education level (less than high school, high school, some post-secondary, bachelor’s, graduate degree), life satisfaction rated on a scale from 1 (very dissatisfied) to 10 (very satisfied).
You can say that a rating of 8 is higher than a rating of 5, but you cannot assume the difference between 5 and 8 is the same as the difference between 2 and 5.
Interval
Interval variables have equal distances between values, but no true zero point. Example: temperature in Celsius (0 degrees C does not mean “no temperature”), year of birth.
Ratio
Ratio variables have equal intervals and a meaningful zero point. Examples: income in dollars, number of children, age in years.
Measures of Central Tendency
Measures of central tendency describe the “typical” or “centre” value in a distribution.
Mean
The arithmetic mean is the sum of all values divided by the number of observations:
\[ \bar{x} = \frac{\sum_{i=1}^{n} x_i}{n} \]The mean is sensitive to extreme values (outliers). If a few respondents report extremely high incomes, the mean income will be pulled upward and may not represent the typical respondent.
Median
The median is the middle value when all observations are arranged in order. If \( n \) is even, the median is the average of the two middle values.
The median is resistant to outliers and is often preferred for skewed distributions such as income or household size.
Mode
The mode is the most frequently occurring value. It is the only measure of central tendency appropriate for nominal data. A distribution can be unimodal (one peak), bimodal (two peaks), or multimodal (more than two peaks).
Measures of Variability
Central tendency alone is insufficient. Two datasets can have identical means but very different spreads.
Range
The range is the difference between the maximum and minimum values:
\[ \text{Range} = x_{\max} - x_{\min} \]It is simple but highly sensitive to outliers.
Variance
The variance measures the average squared deviation from the mean. For a sample:
\[ s^2 = \frac{\sum_{i=1}^{n} (x_i - \bar{x})^2}{n - 1} \]We divide by \( n - 1 \) (rather than \( n \)) to correct for the bias introduced by estimating the population variance from a sample. This correction is called Bessel’s correction.
Standard Deviation
The standard deviation is the square root of the variance:
\[ s = \sqrt{\frac{\sum_{i=1}^{n} (x_i - \bar{x})^2}{n - 1}} \]It is expressed in the same units as the original data, making it more interpretable than the variance.
Frequency Distributions and Visualization
A frequency distribution shows how often each value or category occurs in the data. It can be presented as a table or as a graph.
Histograms
A histogram displays the distribution of a continuous variable by dividing the range into bins and plotting the count (or proportion) of observations in each bin. The shape of a histogram reveals whether the distribution is symmetric, positively skewed (long right tail), or negatively skewed (long left tail).
Bar Charts
A bar chart is used for categorical (nominal or ordinal) data. Unlike histograms, the bars do not touch, emphasizing that the categories are discrete.
The Normal Distribution
The normal distribution (bell curve) is a theoretical probability distribution defined by two parameters: the mean \( \mu \) and the standard deviation \( \sigma \). It has the following properties:
- Symmetric about the mean
- Approximately 68% of values fall within \( \pm 1\sigma \) of the mean
- Approximately 95% fall within \( \pm 2\sigma \)
- Approximately 99.7% fall within \( \pm 3\sigma \)
This is known as the 68-95-99.7 rule (or the empirical rule).
Many inferential techniques assume that the sampling distribution of the test statistic is approximately normal, which is justified by the Central Limit Theorem.
The Central Limit Theorem
The Central Limit Theorem (CLT) states that, regardless of the shape of the population distribution, the distribution of sample means will approach a normal distribution as the sample size \( n \) increases, provided the population has a finite variance.
In practice, samples of \( n \geq 30 \) are often considered large enough for the CLT to apply, though this depends on how non-normal the underlying distribution is.
Chapter 2: Inferential Statistics, Causality, and Model Building
From Samples to Populations
In SMF research, it is rarely possible to measure every individual in a population of interest. Instead, researchers collect data from a sample and use inferential statistics to estimate population parameters.
Sampling Error
Sampling error is the discrepancy between a sample statistic (e.g., a sample mean \( \bar{x} \)) and the corresponding population parameter (e.g., \( \mu \)). Sampling error is unavoidable whenever we work with samples, but it can be quantified.
Standard Error
The standard error of the mean (SEM) estimates how much a sample mean is expected to vary from sample to sample:
\[ SE = \frac{s}{\sqrt{n}} \]As the sample size increases, the standard error decreases, meaning our estimate becomes more precise.
Hypothesis Testing
Hypothesis testing is a formal procedure for deciding whether sample data provide sufficient evidence to reject a claim about a population.
The Null and Alternative Hypotheses
- The null hypothesis (\( H_0 \)) states that there is no effect, no difference, or no relationship in the population.
- The alternative hypothesis (\( H_1 \) or \( H_a \)) states that there is an effect, a difference, or a relationship.
Example: A researcher wants to know whether married and unmarried Canadians differ in life satisfaction.
- \( H_0 \): There is no difference in mean life satisfaction between married and unmarried Canadians.
- \( H_1 \): There is a difference in mean life satisfaction between married and unmarried Canadians.
The p-value
The p-value is the probability of obtaining a test statistic as extreme as (or more extreme than) the one observed, assuming the null hypothesis is true.
Griffiths and Needleman (2019) argue that p-values are widely misunderstood and misused. They note that a p-value of 0.049 and a p-value of 0.051 reflect essentially the same strength of evidence, yet researchers often treat them as categorically different because one falls below the conventional threshold of 0.05.
Statistical Significance
By convention, a result is declared statistically significant if \( p < \alpha \), where \( \alpha \) is typically set at 0.05. This means that if the null hypothesis were true, we would expect to see data this extreme less than 5% of the time.
Type I and Type II Errors
| \( H_0 \) is true | \( H_0 \) is false | |
|---|---|---|
| Reject \( H_0 \) | Type I Error (false positive) | Correct decision (power) |
| Fail to reject \( H_0 \) | Correct decision | Type II Error (false negative) |
- The probability of a Type I error is \( \alpha \) (the significance level).
- The probability of a Type II error is \( \beta \). Statistical power is \( 1 - \beta \).
Effect Size
Effect size measures the magnitude of an observed effect, independent of sample size. Common effect size measures include Cohen’s d for differences between means and Pearson’s r for correlations.
Cohen’s d guidelines:
- Small: \( d = 0.2 \)
- Medium: \( d = 0.5 \)
- Large: \( d = 0.8 \)
A statistically significant result can have a trivially small effect size if the sample is very large. Conversely, a meaningful effect can fail to reach significance if the sample is too small.
Confidence Intervals
A 95% confidence interval for a mean is constructed as:
\[ \bar{x} \pm t_{\alpha/2} \times SE \]where \( t_{\alpha/2} \) is the critical value from the t-distribution with \( n - 1 \) degrees of freedom.
Causality and Statistical Model Building
Correlation Is Not Causation
One of the most important lessons in statistics is that correlation does not imply causation. Two variables can be strongly associated without one causing the other. Haig (2003) discusses the problem of spurious correlations, relationships that appear meaningful but are driven by a third variable or by coincidence.
Three requirements for establishing causality:
- Association (covariation): The variables must be statistically related.
- Temporal precedence: The cause must precede the effect in time.
- Non-spuriousness: The relationship must not be explained by a third variable (a confound).
Experimental designs with random assignment are the gold standard for causal inference because randomization balances confounds across groups. In SMF research, however, many variables of interest (e.g., marital status, sexual orientation, religion) cannot be randomly assigned, so researchers rely on observational designs and must control for confounds statistically.
Building Statistical Models
A statistical model is a mathematical representation of the relationships among variables. Model building in SMF typically follows these steps:
- Specify the research question and identify the dependent (outcome) variable and independent (predictor) variables.
- Choose the appropriate statistical technique based on the levels of measurement and the research question.
- Check assumptions (e.g., normality, homogeneity of variance, independence of observations).
- Run the analysis and interpret the output.
- Evaluate model fit and consider alternative explanations.
Chapter 3: Introduction to SPSS
What Is SPSS?
SPSS (Statistical Package for the Social Sciences), now officially called IBM SPSS Statistics, is one of the most widely used statistical software packages in the social sciences. Version 28 is used in this course.
SPSS uses a point-and-click graphical interface as well as a syntax-based programming language. Learning to write and save syntax is strongly recommended because it creates a reproducible record of every analytical decision.
The SPSS Interface
SPSS has two primary windows:
Data View
The Data View displays the dataset in a spreadsheet format. Each row represents a case (typically a survey respondent), and each column represents a variable (e.g., age, gender, marital status).
Variable View
The Variable View displays metadata about each variable:
- Name: A short identifier (e.g., Q57 or marital_status).
- Type: Numeric, string, date, etc.
- Width and Decimals: Display formatting.
- Label: A descriptive label (e.g., “Current marital status”).
- Values: Value labels that map numeric codes to meaningful categories (e.g., 1 = Married, 2 = Living together, 3 = Divorced).
- Missing: Codes for missing data (e.g., -99 = Refused, -98 = Don’t know).
- Measure: Nominal, Ordinal, or Scale (interval/ratio).
Output Viewer
When you run an analysis, results appear in a separate Output Viewer window. Output can be exported as PDF, Word, or other formats for inclusion in reports.
Working with the World Values Survey Data
The World Values Survey (WVS) is a global research project that explores people’s values, beliefs, and attitudes on topics including family, gender, religion, politics, and social trust. Wave 7 (2017-2022) includes data from nearly 100 countries.
For this course, the Canadian subset of WVS Wave 7 is used. Key variables relevant to SMF research include:
- Family values (importance of family, ideal number of children)
- Gender role attitudes (approval of women working outside the home)
- Marital and relationship status
- Life satisfaction and happiness
- Religious identity and practice
- Trust in institutions and other people
Basic SPSS Operations
Frequencies
The Frequencies procedure (Analyze > Descriptive Statistics > Frequencies) produces frequency tables showing the count and percentage of cases in each category of a variable. It can also generate bar charts and histograms.
Descriptives
The Descriptives procedure (Analyze > Descriptive Statistics > Descriptives) computes summary statistics including the mean, standard deviation, minimum, maximum, and range for scale variables.
Recoding Variables
Researchers frequently need to collapse or transform variables. The Recode into Different Variables command (Transform > Recode into Different Variables) creates a new variable based on an existing one. For example, you might recode a 10-point life satisfaction scale into three categories: Low (1-3), Medium (4-7), High (8-10).
Chapter 4: The Chi-Square Test
Purpose and Logic
The chi-square test of independence (sometimes written \( \chi^2 \)) is used to determine whether there is a statistically significant association between two categorical (nominal or ordinal) variables.
Example research question: Is there an association between gender and attitude toward same-sex marriage among Canadian WVS respondents?
The Crosstabulation Table
A crosstabulation (contingency table) displays the joint distribution of two categorical variables. Each cell contains the observed frequency, the count of cases that fall into that combination of categories.
| Agree | Neither | Disagree | Row Total | |
|---|---|---|---|---|
| Male | 120 | 45 | 85 | 250 |
| Female | 155 | 30 | 65 | 250 |
| Column Total | 275 | 75 | 150 | 500 |
Expected Frequencies
Under the null hypothesis of no association, the expected frequency for each cell is:
\[ E_{ij} = \frac{(\text{Row Total}_i)(\text{Column Total}_j)}{N} \]For the Male/Agree cell in the table above:
\[ E = \frac{250 \times 275}{500} = 137.5 \]The Chi-Square Statistic
The test statistic compares observed and expected frequencies:
\[ \chi^2 = \sum \frac{(O_{ij} - E_{ij})^2}{E_{ij}} \]where the sum is taken over all cells in the table.
A larger \( \chi^2 \) value indicates a greater discrepancy between observed and expected frequencies, providing stronger evidence against the null hypothesis.
Degrees of Freedom
For a chi-square test of independence:
\[ df = (r - 1)(c - 1) \]where \( r \) is the number of rows and \( c \) is the number of columns. The p-value is obtained by comparing the computed \( \chi^2 \) to the chi-square distribution with the appropriate degrees of freedom.
Assumptions
- Independence of observations: Each case contributes to only one cell.
- Expected frequency size: No expected frequency should be less than 1, and no more than 20% of expected frequencies should be less than 5. When this assumption is violated, consider combining categories or using Fisher’s exact test.
Effect Size: Cramer’s V
The chi-square statistic is sensitive to sample size: larger samples produce larger \( \chi^2 \) values even for the same degree of association. Cramer’s V is a standardized effect size measure:
\[ V = \sqrt{\frac{\chi^2}{n \times \min(r-1, c-1)}} \]Cramer’s V ranges from 0 (no association) to 1 (perfect association).
Guidelines for interpretation (for a table with \( df^* = \min(r-1, c-1) = 1 \)):
- Small: \( V \approx 0.10 \)
- Medium: \( V \approx 0.30 \)
- Large: \( V \approx 0.50 \)
Running Chi-Square in SPSS
- Go to Analyze > Descriptive Statistics > Crosstabs.
- Place one variable in the Row box and the other in the Column box.
- Click Statistics and check Chi-square and Phi and Cramer’s V.
- Click Cells and check Expected (under Counts) and Row or Column percentages.
- Click OK.
In the output, examine the Pearson Chi-Square row of the Chi-Square Tests table. Report the \( \chi^2 \) value, degrees of freedom, and p-value.
Chapter 5: The t-Test
Purpose
The t-test compares the means of a continuous (scale) variable between two groups. It answers the question: Is the difference between the two group means large enough to conclude that the groups differ in the population, or could the difference be due to sampling error?
Types of t-Tests
Independent Samples t-Test
The independent samples t-test compares means between two separate, unrelated groups. Example: Do married and unmarried Canadians differ in their reported life satisfaction?
Paired Samples t-Test
The paired samples t-test compares means from two related measurements on the same individuals. Example: Does a couple’s communication workshop improve relationship satisfaction (measured before and after)?
One-Sample t-Test
The one-sample t-test compares a sample mean to a known or hypothesized population value. Example: Is the mean number of children in our Canadian sample different from the national average of 1.6?
The Independent Samples t-Test in Detail
The Test Statistic
\[ t = \frac{\bar{x}_1 - \bar{x}_2}{SE_{\bar{x}_1 - \bar{x}_2}} \]where the denominator is the standard error of the difference between means. For equal variances assumed:
\[ SE = s_p \sqrt{\frac{1}{n_1} + \frac{1}{n_2}} \]and \( s_p \) is the pooled standard deviation:
\[ s_p = \sqrt{\frac{(n_1 - 1)s_1^2 + (n_2 - 1)s_2^2}{n_1 + n_2 - 2}} \]Degrees of Freedom
For an independent samples t-test with equal variances assumed:
\[ df = n_1 + n_2 - 2 \]Assumptions
- Independence of observations: Scores in one group are unrelated to scores in the other.
- Normality: The dependent variable is approximately normally distributed within each group. With large samples (\( n > 30 \) per group), the t-test is robust to moderate violations due to the CLT.
- Homogeneity of variance: The variance of the dependent variable is similar in both groups. SPSS reports Levene’s test for this assumption. If Levene’s test is significant (\( p < .05 \)), use the “Equal variances not assumed” row (Welch’s t-test).
Effect Size: Cohen’s d
\[ d = \frac{\bar{x}_1 - \bar{x}_2}{s_p} \]Running the Independent Samples t-Test in SPSS
- Go to Analyze > Compare Means > Independent-Samples T Test.
- Move the continuous dependent variable to the Test Variable(s) box.
- Move the grouping variable to the Grouping Variable box.
- Click Define Groups and enter the numeric codes for the two groups (e.g., 1 and 2).
- Click OK.
In the output, first check Levene’s test. Then read the appropriate row of the t-test table.
Chapter 6: Analysis of Variance (ANOVA)
Purpose
Analysis of Variance (ANOVA) extends the logic of the t-test to comparisons involving three or more groups. While a t-test asks “Do two groups differ?”, ANOVA asks “Do any of these groups differ from each other?”
Example: Do Canadians with different religious affiliations (Christian, Muslim, Hindu, None) differ in their attitudes toward traditional gender roles?
Why Not Multiple t-Tests?
If you have four groups, you would need \( \binom{4}{2} = 6 \) pairwise t-tests. Each test carries a 5% risk of Type I error. With six tests, the familywise error rate inflates:
\[ \alpha_{\text{FW}} = 1 - (1 - 0.05)^6 \approx 0.26 \]This means a 26% chance of at least one false positive. ANOVA controls this by testing all groups simultaneously in a single omnibus test.
The Logic of ANOVA
ANOVA compares two sources of variability:
- Between-group variance: How much group means differ from the overall (grand) mean.
- Within-group variance: How much individual scores differ from their own group mean.
If the groups truly differ, the between-group variance should be large relative to the within-group variance.
The F-Ratio
\[ F = \frac{MS_{\text{between}}}{MS_{\text{within}}} \]where:
\[ MS_{\text{between}} = \frac{SS_{\text{between}}}{df_{\text{between}}} \]\[ MS_{\text{within}} = \frac{SS_{\text{within}}}{df_{\text{within}}} \]- \( df_{\text{between}} = k - 1 \), where \( k \) is the number of groups
- \( df_{\text{within}} = N - k \), where \( N \) is the total sample size
An F-ratio near 1 suggests no group differences. Larger F-ratios provide evidence against the null hypothesis.
Assumptions
- Independence of observations.
- Normality within each group.
- Homogeneity of variance across groups. Tested by Levene’s test in SPSS. If violated, use the Welch F-test or the Brown-Forsythe test.
Post Hoc Tests
A significant ANOVA result tells you that at least one group differs, but not which groups differ from which. Post hoc tests perform pairwise comparisons while controlling the familywise error rate.
Common post hoc tests include:
- Tukey’s HSD (Honestly Significant Difference): Best for equal sample sizes; controls error rate tightly.
- Bonferroni: Conservative; divides \( \alpha \) by the number of comparisons.
- Games-Howell: Does not assume equal variances; appropriate when Levene’s test is significant.
Effect Size: Eta-Squared
\[ \eta^2 = \frac{SS_{\text{between}}}{SS_{\text{total}}} \]Eta-squared represents the proportion of total variance in the dependent variable that is explained by group membership.
Guidelines:
- Small: \( \eta^2 \approx 0.01 \)
- Medium: \( \eta^2 \approx 0.06 \)
- Large: \( \eta^2 \approx 0.14 \)
Running One-Way ANOVA in SPSS
- Go to Analyze > Compare Means > One-Way ANOVA.
- Move the dependent variable to the Dependent List and the grouping variable to the Factor box.
- Click Post Hoc and select the desired test (e.g., Tukey).
- Click Options and check Descriptive, Homogeneity of variance test, and Means plot.
- Click OK.
MANOVA: Multivariate Analysis of Variance
MANOVA (Multivariate Analysis of Variance) extends ANOVA to situations where there are two or more dependent variables examined simultaneously. Instead of running separate ANOVAs for each dependent variable (which inflates error rates), MANOVA tests whether the groups differ on the combination of dependent variables.
When to Use MANOVA
Use MANOVA when you want to compare groups on a set of related outcomes. Example: Do religious affiliations differ not just in gender role attitudes but also in family values and life satisfaction simultaneously?
Test Statistics
MANOVA produces several multivariate test statistics. The most commonly reported are:
- Wilks’ Lambda (\( \Lambda \)): Ranges from 0 to 1. Smaller values indicate greater group separation.
- Pillai’s Trace: More robust to violations of assumptions; preferred when sample sizes are unequal or assumptions are questionable.
- Hotelling’s Trace and Roy’s Largest Root: Alternative multivariate test statistics.
If the overall MANOVA is significant, follow up with separate ANOVAs on each dependent variable to identify where the differences lie.
Running MANOVA in SPSS
- Go to Analyze > General Linear Model > Multivariate.
- Move the dependent variables to the Dependent Variables box and the grouping variable to the Fixed Factor(s) box.
- Click Post Hoc to set up pairwise comparisons.
- Click Options for descriptive statistics and effect sizes.
- Click OK.
Chapter 7: Correlation
Purpose
Correlation measures the strength and direction of the linear relationship between two continuous variables. It answers: “As one variable increases, does the other tend to increase, decrease, or remain unchanged?”
Pearson’s Correlation Coefficient
The Pearson product-moment correlation coefficient (\( r \)) is the most common measure of linear association:
\[ r = \frac{\sum_{i=1}^{n}(x_i - \bar{x})(y_i - \bar{y})}{\sqrt{\sum_{i=1}^{n}(x_i - \bar{x})^2 \sum_{i=1}^{n}(y_i - \bar{y})^2}} \]Properties of r
- \( r \) ranges from \( -1 \) to \( +1 \).
- \( r = +1 \): Perfect positive linear relationship.
- \( r = -1 \): Perfect negative linear relationship.
- \( r = 0 \): No linear relationship (but there may be a nonlinear relationship).
Interpretation Guidelines
- \( |r| < 0.10 \): Negligible
- \( 0.10 \leq |r| < 0.30 \): Small (weak)
- \( 0.30 \leq |r| < 0.50 \): Medium (moderate)
- \( |r| \geq 0.50 \): Large (strong)
The Coefficient of Determination
The square of the correlation coefficient, \( r^2 \), represents the coefficient of determination: the proportion of variance in one variable that is explained by the other.
Example: If \( r = 0.40 \), then \( r^2 = 0.16 \), meaning 16% of the variance in one variable is shared with the other.
Assumptions of Pearson’s r
- Linearity: The relationship between the two variables is linear. Check with a scatterplot.
- Normality: Both variables are approximately normally distributed (required for significance testing, less critical for the estimate itself with large samples).
- Homoscedasticity: The spread of data points around the regression line is roughly constant across values of the predictor.
- No outliers: Extreme values can dramatically inflate or deflate \( r \).
Spearman’s Rank Correlation
When variables are ordinal, or when the relationship is monotonic but not linear, use Spearman’s rho (\( r_s \)). It is computed by applying the Pearson formula to the ranked values of each variable.
Running Correlations in SPSS
- Go to Analyze > Correlate > Bivariate.
- Move the variables of interest into the Variables box.
- Ensure Pearson is checked (and/or Spearman if appropriate).
- Check Flag significant correlations and select Two-tailed or One-tailed.
- Click OK.
The output is a correlation matrix showing the \( r \) value, significance level, and sample size for each pair of variables.
Partial Correlation
A partial correlation measures the relationship between two variables after controlling for the effect of one or more additional variables. This helps address the problem of spurious correlation.
Example: The correlation between religious attendance and life satisfaction might be partly explained by social support. A partial correlation controlling for social support reveals whether religious attendance has an independent association with life satisfaction.
In SPSS: Analyze > Correlate > Partial. Move the two primary variables to the Variables box and the control variable(s) to the Controlling for box.
Chapter 8: Regression Analysis
Linear Regression
Purpose
Linear regression goes beyond correlation by modeling the relationship between variables as a predictive equation. While correlation tells you that two variables are related, regression tells you the specific form of the relationship and allows you to predict the value of one variable from the other.
Simple Linear Regression
In simple linear regression, there is one predictor variable (\( x \)) and one outcome variable (\( y \)):
\[ \hat{y} = b_0 + b_1 x \]where:
- \( \hat{y} \) is the predicted value of the outcome
- \( b_0 \) is the y-intercept (the predicted value of \( y \) when \( x = 0 \))
- \( b_1 \) is the slope (the change in \( \hat{y} \) for a one-unit increase in \( x \))
Ordinary Least Squares (OLS)
The regression coefficients are estimated using Ordinary Least Squares, which minimizes the sum of squared residuals:
\[ \min \sum_{i=1}^{n} (y_i - \hat{y}_i)^2 = \min \sum_{i=1}^{n} e_i^2 \]where \( e_i = y_i - \hat{y}_i \) is the residual for observation \( i \).
The Slope Coefficient
\[ b_1 = \frac{\sum (x_i - \bar{x})(y_i - \bar{y})}{\sum (x_i - \bar{x})^2} = r \cdot \frac{s_y}{s_x} \]This formula shows the connection between correlation and regression: the slope equals the correlation coefficient scaled by the ratio of standard deviations.
R-Squared
The R-squared (\( R^2 \)) value indicates the proportion of variance in the outcome variable that is explained by the predictor(s):
\[ R^2 = 1 - \frac{SS_{\text{residual}}}{SS_{\text{total}}} \]In simple regression, \( R^2 = r^2 \).
Multiple Linear Regression
Multiple linear regression includes two or more predictor variables:
\[ \hat{y} = b_0 + b_1 x_1 + b_2 x_2 + \cdots + b_k x_k \]Each coefficient \( b_j \) represents the effect of \( x_j \) on \( y \), holding all other predictors constant. This is the principle of statistical control.
Standardized Coefficients (Beta)
When predictors are measured on different scales (e.g., age in years and income in dollars), comparing unstandardized coefficients is misleading. Standardized coefficients (\( \beta \)) express the effect of each predictor in standard deviation units:
\[ \beta_j = b_j \cdot \frac{s_{x_j}}{s_y} \]A \( \beta \) of 0.30 means that a one-standard-deviation increase in the predictor is associated with a 0.30-standard-deviation increase in the outcome.
Adjusted R-Squared
Adding predictors always increases \( R^2 \), even if the new predictors are irrelevant. Adjusted R-squared penalizes for the number of predictors:
\[ R^2_{\text{adj}} = 1 - \frac{(1 - R^2)(n - 1)}{n - k - 1} \]where \( k \) is the number of predictors.
Assumptions of Linear Regression
- Linearity: The relationship between each predictor and the outcome is linear.
- Independence of residuals: Residuals are not correlated with each other (no autocorrelation). Tested with the Durbin-Watson statistic (values near 2 indicate independence).
- Homoscedasticity: Residuals have constant variance across levels of the predictor(s). Check by plotting residuals against predicted values.
- Normality of residuals: Residuals are approximately normally distributed. Check with a histogram of residuals or a Normal P-P plot.
- No multicollinearity: Predictors are not too highly correlated with each other. Assessed using the Variance Inflation Factor (VIF). A VIF greater than 10 (or tolerance below 0.1) suggests problematic multicollinearity.
Running Linear Regression in SPSS
- Go to Analyze > Regression > Linear.
- Move the outcome variable to the Dependent box.
- Move the predictor(s) to the Independent(s) box.
- Click Statistics and check Estimates, Model fit, R squared change, Descriptives, and Collinearity diagnostics.
- Click Plots: set \( *ZRESID \) on the Y-axis and \( *ZPRED \) on the X-axis. Check Histogram and Normal probability plot.
- Click OK.
Logistic Regression
When to Use
Logistic regression is used when the outcome variable is categorical (typically binary: yes/no, agree/disagree, married/not married). Ordinary linear regression is inappropriate for binary outcomes because it can produce predicted values outside the 0-1 range and violates the assumption of normally distributed residuals.
The Logistic Function
Logistic regression models the probability of the outcome occurring:
\[ P(Y = 1) = \frac{1}{1 + e^{-(b_0 + b_1 x_1 + b_2 x_2 + \cdots)}} \]This S-shaped (sigmoid) function ensures that predicted probabilities always fall between 0 and 1.
The Logit
The logit is the natural logarithm of the odds:
\[ \text{logit}(P) = \ln\left(\frac{P}{1-P}\right) = b_0 + b_1 x_1 + b_2 x_2 + \cdots \]Odds Ratios
In logistic regression, coefficients are often exponentiated to produce odds ratios (\( e^{b} \)):
- An odds ratio of 1.0 means the predictor has no effect.
- An odds ratio greater than 1.0 means the event is more likely as the predictor increases.
- An odds ratio less than 1.0 means the event is less likely as the predictor increases.
Example: An odds ratio of 1.8 for “has children” predicting “identifies as religious” means that people with children have 1.8 times the odds of identifying as religious compared to people without children, controlling for other variables.
Model Evaluation
Logistic regression does not use \( R^2 \) in the traditional sense. Instead, model fit is assessed with:
- -2 Log Likelihood (-2LL): Smaller values indicate better fit.
- Nagelkerke R-squared: A pseudo-R-squared ranging from 0 to 1.
- Hosmer-Lemeshow Test: A non-significant result (\( p > .05 \)) indicates acceptable model fit.
- Classification Table: Shows the percentage of cases correctly classified by the model.
Running Logistic Regression in SPSS
- Go to Analyze > Regression > Binary Logistic.
- Move the binary outcome to the Dependent box.
- Move the predictor(s) to the Covariates box.
- Under Options, check Hosmer-Lemeshow goodness of fit, Classification plots, and CI for exp(B).
- Click OK.
Chapter 9: Choosing the Right Statistical Test
A Decision Framework
Selecting the appropriate statistical test depends on three key questions:
- What is the level of measurement of the dependent (outcome) variable? Categorical or continuous?
- What is the level of measurement of the independent (predictor) variable(s)? Categorical or continuous?
- How many groups or variables are involved?
Decision Table
| Outcome Variable | Predictor Variable | Number of Groups/Predictors | Recommended Test |
|---|---|---|---|
| Categorical | Categorical | 2 categories each | Chi-square test of independence |
| Continuous | Categorical | 2 groups | Independent samples t-test |
| Continuous | Categorical | 3+ groups | One-way ANOVA |
| Continuous (2+) | Categorical | 2+ groups | MANOVA |
| Continuous | Continuous | 1 predictor | Pearson correlation / Simple regression |
| Continuous | Continuous (2+) | 2+ predictors | Multiple regression |
| Categorical (binary) | Continuous and/or categorical | 1+ predictors | Logistic regression |
Checking Assumptions: A Summary
Every parametric test assumes certain properties of the data. Violating assumptions can lead to incorrect conclusions. Here is a consolidated checklist:
For t-tests and ANOVA
- Independence of observations (design feature, not testable after the fact)
- Normality of the dependent variable within groups (check with Shapiro-Wilk test or visual inspection of histograms/Q-Q plots)
- Homogeneity of variance (Levene’s test)
For Correlation and Regression
- Linearity (scatterplot)
- Normality of residuals (histogram, P-P plot)
- Homoscedasticity (residuals vs. predicted plot)
- No multicollinearity (VIF, for multiple regression)
- Independence of residuals (Durbin-Watson)
For Chi-Square
- Independence of observations
- Adequate expected frequencies (no cell below 1; fewer than 20% below 5)
Chapter 10: Moderation and Mediation
Introduction to Advanced Concepts
Moderation and mediation address more nuanced research questions than simple bivariate tests. Instead of asking “Is X related to Y?”, they ask:
- Moderation: “Does the relationship between X and Y depend on a third variable W?”
- Mediation: “Does X influence Y through an intervening variable M?”
These concepts were famously formalized by Baron and Kenny (1986) in one of the most cited papers in social science.
Moderation Analysis
Conceptual Definition
A moderator is a variable that affects the strength or direction of the relationship between an independent variable and a dependent variable. Moderation is synonymous with a statistical interaction.
Example: The relationship between religiosity and opposition to divorce may be moderated by age. Among older Canadians, religiosity might strongly predict opposition to divorce, but among younger Canadians, the relationship might be weaker.
Visualizing Moderation
Moderation is typically visualized with separate regression lines for different levels of the moderator. If the lines are not parallel, moderation is present. Crossed lines indicate a crossover interaction (qualitative interaction), where the direction of the effect reverses.
Testing Moderation in Regression
To test moderation, include the interaction term in a regression model:
\[ \hat{y} = b_0 + b_1 X + b_2 W + b_3 (X \times W) \]where \( X \times W \) is the product of the predictor and the moderator. If \( b_3 \) is statistically significant, moderation is supported.
Probing the Interaction
When moderation is significant, you need to understand its form. Simple slopes analysis tests the relationship between X and Y at different levels of the moderator (e.g., at the mean, one standard deviation above, and one standard deviation below the mean).
Mediation Analysis
Conceptual Definition
A mediator is a variable that explains the mechanism or process through which an independent variable influences a dependent variable. Unlike moderation, which asks “when” or “for whom,” mediation asks “how” or “why.”
Example: Does education (X) reduce support for traditional gender roles (Y) because education increases exposure to diverse perspectives (M)?
\[ X \rightarrow M \rightarrow Y \]Baron and Kenny’s Four Steps
The classic Baron and Kenny (1986) approach to testing mediation involves four regression equations:
\[ Y = b_0 + c \cdot X \]\[ M = b_0 + a \cdot X \]\[ Y = b_0 + c' \cdot X + b \cdot M \]Step 4: If the effect of X on Y is reduced (or becomes non-significant) when M is included, mediation is supported. The reduced effect of X on Y when M is controlled is the direct effect (\( c' \)). The indirect effect is \( a \times b \).
Types of Mediation
- Full mediation: The direct effect \( c' \) becomes non-significant when the mediator is included. X affects Y entirely through M.
- Partial mediation: The direct effect \( c' \) is reduced but remains significant. X affects Y partly through M and partly through other pathways.
The Sobel Test
The Sobel test formally tests whether the indirect effect (\( a \times b \)) is statistically significant:
\[ z = \frac{a \times b}{\sqrt{b^2 s_a^2 + a^2 s_b^2}} \]where \( s_a \) and \( s_b \) are the standard errors of the \( a \) and \( b \) coefficients.
Running Mediation in SPSS
While mediation can be tested using a series of standard regression analyses following Baron and Kenny’s steps, the PROCESS macro (Hayes, 2013) automates the procedure and provides bootstrap confidence intervals for the indirect effect.
- Install the PROCESS macro (downloaded from Andrew Hayes’s website).
- Go to Analyze > Regression > PROCESS.
- Specify X (independent variable), Y (outcome), and M (mediator).
- Select Model 4 (simple mediation).
- Set the number of bootstrap samples (5000 is standard).
- Click OK.
If the bootstrap confidence interval for the indirect effect does not contain zero, the indirect effect is significant.
Chapter 11: Exploratory Factor Analysis
Purpose
Exploratory Factor Analysis (EFA) is a data reduction technique used to identify the underlying structure among a set of observed variables. When a survey includes many items (e.g., 20 questions about family values), EFA can determine whether those items cluster into a smaller number of latent constructs (e.g., “traditional family values,” “egalitarian values,” “individualism”).
Key Concepts
Factors
A factor (or latent variable) is an unobserved construct that is inferred from patterns of correlations among observed variables. Items that correlate highly with each other but not with other items are assumed to reflect the same underlying factor.
Factor Loadings
A factor loading is the correlation between an observed variable and a factor. Loadings range from -1 to +1. A loading of 0.40 or higher is generally considered meaningful (though thresholds vary by discipline).
Eigenvalues
An eigenvalue represents the amount of variance explained by a factor. A common rule of thumb (the Kaiser criterion) is to retain factors with eigenvalues greater than 1.0, meaning the factor explains more variance than a single variable would.
The Scree Plot
A scree plot graphs eigenvalues in descending order. The point where the curve levels off (the “elbow”) suggests the optimal number of factors to retain. This visual method often yields different results than the Kaiser criterion, and researchers may use both.
Steps in Conducting EFA
Assess suitability: Run the Kaiser-Meyer-Olkin (KMO) measure and Bartlett’s test of sphericity. KMO should be at least 0.60 (above 0.80 is ideal). Bartlett’s test should be significant (\( p < .05 \)).
Choose extraction method: Principal Axis Factoring is common in social sciences because it extracts only shared variance (unlike Principal Components Analysis, which also includes unique variance).
Determine the number of factors: Use the Kaiser criterion, scree plot, and theoretical considerations.
Rotate the factor solution: Rotation simplifies the factor structure by maximizing high loadings and minimizing low ones.
- Varimax (orthogonal rotation): Produces uncorrelated factors. Easier to interpret.
- Oblimin or Promax (oblique rotation): Allows factors to correlate. Often more realistic in social science, where constructs are rarely truly independent.
Interpret the factors: Examine the rotated factor loadings. Group items that load highly on the same factor and assign a meaningful label.
Evaluate the solution: Check for items that cross-load (load highly on more than one factor) or fail to load meaningfully on any factor. These items may need to be removed.
Running EFA in SPSS
- Go to Analyze > Dimension Reduction > Factor.
- Move the items into the Variables box.
- Click Descriptives and check KMO and Bartlett’s test of sphericity.
- Click Extraction: select Principal Axis Factoring and check Scree plot.
- Click Rotation: select Varimax or Direct Oblimin.
- Click Options: check Suppress small coefficients and set the threshold (e.g., 0.30).
- Click OK.
Chapter 12: Latent Class Analysis
Purpose
Latent Class Analysis (LCA) is a person-centred approach that identifies subgroups (classes) of individuals who share similar patterns of responses across a set of categorical variables. While factor analysis groups variables, LCA groups people.
Example: Among Canadian WVS respondents, LCA might reveal distinct “types” of people based on their combination of family values, religiosity, and gender attitudes: perhaps a “traditional” class, a “progressive” class, and an “ambivalent” class.
Key Concepts
Latent Classes
A latent class is an unobserved subgroup within the population. Membership is probabilistic: each individual has a probability of belonging to each class, and is typically assigned to the class with the highest probability.
Model Selection
The researcher does not know the number of classes in advance. Multiple models are fit (e.g., 2-class, 3-class, 4-class solutions) and compared using fit indices:
- BIC (Bayesian Information Criterion): Lower values indicate better fit. The most commonly used criterion.
- AIC (Akaike Information Criterion): Also lower is better, but tends to favour more complex models than BIC.
- Entropy: Ranges from 0 to 1. Higher values indicate clearer class separation. Values above 0.80 are considered good.
- Lo-Mendell-Rubin (LMR) Test: Compares a k-class model to a (k-1)-class model. A significant p-value suggests the k-class model is preferred.
Interpreting Classes
Once the optimal number of classes is selected, examine the item-response probabilities (the probability of endorsing each category of each variable within each class). Classes are then labelled based on the distinctive response patterns.
LCA vs. EFA
| Feature | EFA | LCA |
|---|---|---|
| Groups | Variables | People |
| Variable type | Continuous (or treated as such) | Categorical |
| Output | Factors with loadings | Classes with response probabilities |
| Approach | Variable-centred | Person-centred |
Software Note
LCA is typically performed in specialized software such as Mplus or the poLCA package in R. SPSS does not have built-in LCA functionality, though some plugins exist.
Chapter 13: Communicating Statistical Results
Writing Statistical Results
Effective statistical communication requires translating numerical output into clear, meaningful prose. Follow these principles:
Report All Relevant Information
For each test, report:
- The type of test conducted
- The test statistic and its degrees of freedom
- The exact p-value (or “p < .001” if very small)
- An effect size measure
- Descriptive statistics (means, standard deviations, or percentages) for each group
Use Plain Language
After reporting the statistical details, explain what the result means in substantive terms. A general audience should be able to understand your conclusion even if they skip the numbers.
Avoid Common Pitfalls
- Do not say a result “proves” the hypothesis. Statistics provide evidence, not proof.
- Do not confuse statistical significance with practical importance. A tiny effect can be statistically significant with a large sample.
- Do not accept the null hypothesis. If \( p > .05 \), say you “failed to reject the null hypothesis” or found “no significant evidence of a difference.” The null may still be false.
- Do not report results as “approaching significance” or “marginally significant.” A result either meets the pre-specified alpha level or it does not.
Tables and Figures
Tables
Use tables to present complex results concisely. A regression table should include:
- Variable names
- Unstandardized coefficients (B) with standard errors
- Standardized coefficients (\( \beta \))
- t-values and p-values
- Model summary statistics (\( R^2 \), adjusted \( R^2 \), F-test)
Figures
Use figures to illustrate patterns:
- Bar charts for group comparisons (t-test, ANOVA results)
- Scatterplots for correlations and regression relationships
- Error bar charts showing means with 95% confidence intervals
- Interaction plots for moderation effects
Becoming Critical Consumers of Statistics
One of the most important outcomes of studying statistics is learning to evaluate the statistical claims made by others: in academic papers, government reports, media articles, and popular discourse.
Questions to ask when evaluating a statistical claim:
- What was the sample? Is it representative of the population the author claims to generalize to?
- What was the effect size? A statistically significant finding may not be practically meaningful.
- Were confounding variables controlled? Observational studies that fail to account for confounds may report spurious associations.
- Were multiple comparisons corrected for? Running many tests without correction inflates the false positive rate.
- Is the causal language justified? Only randomized experiments support causal conclusions. Observational studies can identify associations, not causes.
- Has the study been replicated? A single study, no matter how well-designed, provides weaker evidence than a body of converging findings.