SMF 230: Introduction to Statistics

Carl Rodrigue

Estimated study time: 58 minutes

Table of contents

Chapter 1: Foundations of Statistics in Social Research

Why Statistics Matter for Sexuality, Marriage, and Family Studies

Statistics provide the tools necessary for transforming raw observations about human relationships, family structures, and intimate life into rigorous, defensible claims. Without statistical reasoning, researchers in Sexuality, Marriage, and Family (SMF) studies would be limited to anecdote and speculation. With it, they can quantify patterns, test theories, and evaluate interventions designed to support families and communities.

This course emphasizes applied statistical literacy: the ability to choose the right test, run it in software, interpret the output, and communicate findings to both academic and public audiences. The dataset used throughout is the World Values Survey (WVS) Wave 7, focusing on the Canadian sample, which contains variables on family values, gender attitudes, relationship satisfaction, religiosity, and social trust, making it ideal for SMF research questions.

Descriptive vs. Inferential Statistics

All of statistics divides into two broad families.

Descriptive statistics summarize and organize data that have already been collected. They answer the question: “What does the data look like?” Examples include means, medians, frequency tables, and standard deviations.

Inferential statistics use data from a sample to draw conclusions about a larger population. They answer the question: “Can we generalize beyond the people we actually measured?” Examples include t-tests, chi-square tests, and regression models.

Key Distinction: Descriptive statistics describe what is in your data. Inferential statistics estimate what would be if you could measure the entire population.

Levels of Measurement

Before choosing any statistical technique, you must identify the level of measurement of each variable. The level of measurement determines which statistics are appropriate.

Nominal

Nominal variables consist of categories with no inherent order. Examples: religion (Christian, Muslim, Hindu, None), marital status (married, divorced, single, widowed), province of residence.

The only meaningful operation is counting how many cases fall into each category. You cannot compute a meaningful average of marital statuses.

Ordinal

Ordinal variables have categories that can be ranked, but the distances between ranks are not necessarily equal. Examples: education level (less than high school, high school, some post-secondary, bachelor’s, graduate degree), life satisfaction rated on a scale from 1 (very dissatisfied) to 10 (very satisfied).

You can say that a rating of 8 is higher than a rating of 5, but you cannot assume the difference between 5 and 8 is the same as the difference between 2 and 5.

Interval

Interval variables have equal distances between values, but no true zero point. Example: temperature in Celsius (0 degrees C does not mean “no temperature”), year of birth.

Ratio

Ratio variables have equal intervals and a meaningful zero point. Examples: income in dollars, number of children, age in years.

Practical Rule: In social science research, ordinal Likert-type scales (e.g., 1-5 agreement scales) are frequently treated as interval for the purpose of computing means and running parametric tests, though this is a debated practice.

Measures of Central Tendency

Measures of central tendency describe the “typical” or “centre” value in a distribution.

Mean

The arithmetic mean is the sum of all values divided by the number of observations:

\[ \bar{x} = \frac{\sum_{i=1}^{n} x_i}{n} \]

The mean is sensitive to extreme values (outliers). If a few respondents report extremely high incomes, the mean income will be pulled upward and may not represent the typical respondent.

Median

The median is the middle value when all observations are arranged in order. If \( n \) is even, the median is the average of the two middle values.

The median is resistant to outliers and is often preferred for skewed distributions such as income or household size.

Mode

The mode is the most frequently occurring value. It is the only measure of central tendency appropriate for nominal data. A distribution can be unimodal (one peak), bimodal (two peaks), or multimodal (more than two peaks).

Measures of Variability

Central tendency alone is insufficient. Two datasets can have identical means but very different spreads.

Range

The range is the difference between the maximum and minimum values:

\[ \text{Range} = x_{\max} - x_{\min} \]

It is simple but highly sensitive to outliers.

Variance

The variance measures the average squared deviation from the mean. For a sample:

\[ s^2 = \frac{\sum_{i=1}^{n} (x_i - \bar{x})^2}{n - 1} \]

We divide by \( n - 1 \) (rather than \( n \)) to correct for the bias introduced by estimating the population variance from a sample. This correction is called Bessel’s correction.

Standard Deviation

The standard deviation is the square root of the variance:

\[ s = \sqrt{\frac{\sum_{i=1}^{n} (x_i - \bar{x})^2}{n - 1}} \]

It is expressed in the same units as the original data, making it more interpretable than the variance.

Example: If the mean age of respondents is 42 years with a standard deviation of 15 years, roughly 68% of respondents fall between 27 and 57 years of age (assuming a normal distribution).

Frequency Distributions and Visualization

A frequency distribution shows how often each value or category occurs in the data. It can be presented as a table or as a graph.

Histograms

A histogram displays the distribution of a continuous variable by dividing the range into bins and plotting the count (or proportion) of observations in each bin. The shape of a histogram reveals whether the distribution is symmetric, positively skewed (long right tail), or negatively skewed (long left tail).

Bar Charts

A bar chart is used for categorical (nominal or ordinal) data. Unlike histograms, the bars do not touch, emphasizing that the categories are discrete.

The Normal Distribution

The normal distribution (bell curve) is a theoretical probability distribution defined by two parameters: the mean \( \mu \) and the standard deviation \( \sigma \). It has the following properties:

  • Symmetric about the mean
  • Approximately 68% of values fall within \( \pm 1\sigma \) of the mean
  • Approximately 95% fall within \( \pm 2\sigma \)
  • Approximately 99.7% fall within \( \pm 3\sigma \)

This is known as the 68-95-99.7 rule (or the empirical rule).

Many inferential techniques assume that the sampling distribution of the test statistic is approximately normal, which is justified by the Central Limit Theorem.

The Central Limit Theorem

The Central Limit Theorem (CLT) states that, regardless of the shape of the population distribution, the distribution of sample means will approach a normal distribution as the sample size \( n \) increases, provided the population has a finite variance.

In practice, samples of \( n \geq 30 \) are often considered large enough for the CLT to apply, though this depends on how non-normal the underlying distribution is.


Chapter 2: Inferential Statistics, Causality, and Model Building

From Samples to Populations

In SMF research, it is rarely possible to measure every individual in a population of interest. Instead, researchers collect data from a sample and use inferential statistics to estimate population parameters.

Sampling Error

Sampling error is the discrepancy between a sample statistic (e.g., a sample mean \( \bar{x} \)) and the corresponding population parameter (e.g., \( \mu \)). Sampling error is unavoidable whenever we work with samples, but it can be quantified.

Standard Error

The standard error of the mean (SEM) estimates how much a sample mean is expected to vary from sample to sample:

\[ SE = \frac{s}{\sqrt{n}} \]

As the sample size increases, the standard error decreases, meaning our estimate becomes more precise.

Hypothesis Testing

Hypothesis testing is a formal procedure for deciding whether sample data provide sufficient evidence to reject a claim about a population.

The Null and Alternative Hypotheses

  • The null hypothesis (\( H_0 \)) states that there is no effect, no difference, or no relationship in the population.
  • The alternative hypothesis (\( H_1 \) or \( H_a \)) states that there is an effect, a difference, or a relationship.

Example: A researcher wants to know whether married and unmarried Canadians differ in life satisfaction.

  • \( H_0 \): There is no difference in mean life satisfaction between married and unmarried Canadians.
  • \( H_1 \): There is a difference in mean life satisfaction between married and unmarried Canadians.

The p-value

The p-value is the probability of obtaining a test statistic as extreme as (or more extreme than) the one observed, assuming the null hypothesis is true.

Common Misconception: A p-value does not tell you the probability that the null hypothesis is true. It tells you the probability of the observed data (or something more extreme) given that the null hypothesis is true. This is a critical distinction.

Griffiths and Needleman (2019) argue that p-values are widely misunderstood and misused. They note that a p-value of 0.049 and a p-value of 0.051 reflect essentially the same strength of evidence, yet researchers often treat them as categorically different because one falls below the conventional threshold of 0.05.

Statistical Significance

By convention, a result is declared statistically significant if \( p < \alpha \), where \( \alpha \) is typically set at 0.05. This means that if the null hypothesis were true, we would expect to see data this extreme less than 5% of the time.

Type I and Type II Errors

\( H_0 \) is true\( H_0 \) is false
Reject \( H_0 \)Type I Error (false positive)Correct decision (power)
Fail to reject \( H_0 \)Correct decisionType II Error (false negative)
  • The probability of a Type I error is \( \alpha \) (the significance level).
  • The probability of a Type II error is \( \beta \). Statistical power is \( 1 - \beta \).

Effect Size

Effect size measures the magnitude of an observed effect, independent of sample size. Common effect size measures include Cohen’s d for differences between means and Pearson’s r for correlations.

Cohen’s d guidelines:

  • Small: \( d = 0.2 \)
  • Medium: \( d = 0.5 \)
  • Large: \( d = 0.8 \)

A statistically significant result can have a trivially small effect size if the sample is very large. Conversely, a meaningful effect can fail to reach significance if the sample is too small.

Confidence Intervals

A 95% confidence interval for a mean is constructed as:

\[ \bar{x} \pm t_{\alpha/2} \times SE \]

where \( t_{\alpha/2} \) is the critical value from the t-distribution with \( n - 1 \) degrees of freedom.

Interpretation: If we were to repeat this study many times, approximately 95% of the constructed intervals would contain the true population parameter. A single interval either contains the parameter or it does not; we do not know which.

Causality and Statistical Model Building

Correlation Is Not Causation

One of the most important lessons in statistics is that correlation does not imply causation. Two variables can be strongly associated without one causing the other. Haig (2003) discusses the problem of spurious correlations, relationships that appear meaningful but are driven by a third variable or by coincidence.

Three requirements for establishing causality:

  1. Association (covariation): The variables must be statistically related.
  2. Temporal precedence: The cause must precede the effect in time.
  3. Non-spuriousness: The relationship must not be explained by a third variable (a confound).

Experimental designs with random assignment are the gold standard for causal inference because randomization balances confounds across groups. In SMF research, however, many variables of interest (e.g., marital status, sexual orientation, religion) cannot be randomly assigned, so researchers rely on observational designs and must control for confounds statistically.

Building Statistical Models

A statistical model is a mathematical representation of the relationships among variables. Model building in SMF typically follows these steps:

  1. Specify the research question and identify the dependent (outcome) variable and independent (predictor) variables.
  2. Choose the appropriate statistical technique based on the levels of measurement and the research question.
  3. Check assumptions (e.g., normality, homogeneity of variance, independence of observations).
  4. Run the analysis and interpret the output.
  5. Evaluate model fit and consider alternative explanations.

Chapter 3: Introduction to SPSS

What Is SPSS?

SPSS (Statistical Package for the Social Sciences), now officially called IBM SPSS Statistics, is one of the most widely used statistical software packages in the social sciences. Version 28 is used in this course.

SPSS uses a point-and-click graphical interface as well as a syntax-based programming language. Learning to write and save syntax is strongly recommended because it creates a reproducible record of every analytical decision.

The SPSS Interface

SPSS has two primary windows:

Data View

The Data View displays the dataset in a spreadsheet format. Each row represents a case (typically a survey respondent), and each column represents a variable (e.g., age, gender, marital status).

Variable View

The Variable View displays metadata about each variable:

  • Name: A short identifier (e.g., Q57 or marital_status).
  • Type: Numeric, string, date, etc.
  • Width and Decimals: Display formatting.
  • Label: A descriptive label (e.g., “Current marital status”).
  • Values: Value labels that map numeric codes to meaningful categories (e.g., 1 = Married, 2 = Living together, 3 = Divorced).
  • Missing: Codes for missing data (e.g., -99 = Refused, -98 = Don’t know).
  • Measure: Nominal, Ordinal, or Scale (interval/ratio).

Output Viewer

When you run an analysis, results appear in a separate Output Viewer window. Output can be exported as PDF, Word, or other formats for inclusion in reports.

Working with the World Values Survey Data

The World Values Survey (WVS) is a global research project that explores people’s values, beliefs, and attitudes on topics including family, gender, religion, politics, and social trust. Wave 7 (2017-2022) includes data from nearly 100 countries.

For this course, the Canadian subset of WVS Wave 7 is used. Key variables relevant to SMF research include:

  • Family values (importance of family, ideal number of children)
  • Gender role attitudes (approval of women working outside the home)
  • Marital and relationship status
  • Life satisfaction and happiness
  • Religious identity and practice
  • Trust in institutions and other people

Basic SPSS Operations

Frequencies

The Frequencies procedure (Analyze > Descriptive Statistics > Frequencies) produces frequency tables showing the count and percentage of cases in each category of a variable. It can also generate bar charts and histograms.

Descriptives

The Descriptives procedure (Analyze > Descriptive Statistics > Descriptives) computes summary statistics including the mean, standard deviation, minimum, maximum, and range for scale variables.

Recoding Variables

Researchers frequently need to collapse or transform variables. The Recode into Different Variables command (Transform > Recode into Different Variables) creates a new variable based on an existing one. For example, you might recode a 10-point life satisfaction scale into three categories: Low (1-3), Medium (4-7), High (8-10).

Best Practice: Always recode into a different variable rather than overwriting the original. This preserves the raw data and makes your work reversible.

Chapter 4: The Chi-Square Test

Purpose and Logic

The chi-square test of independence (sometimes written \( \chi^2 \)) is used to determine whether there is a statistically significant association between two categorical (nominal or ordinal) variables.

Example research question: Is there an association between gender and attitude toward same-sex marriage among Canadian WVS respondents?

The Crosstabulation Table

A crosstabulation (contingency table) displays the joint distribution of two categorical variables. Each cell contains the observed frequency, the count of cases that fall into that combination of categories.

AgreeNeitherDisagreeRow Total
Male1204585250
Female1553065250
Column Total27575150500

Expected Frequencies

Under the null hypothesis of no association, the expected frequency for each cell is:

\[ E_{ij} = \frac{(\text{Row Total}_i)(\text{Column Total}_j)}{N} \]

For the Male/Agree cell in the table above:

\[ E = \frac{250 \times 275}{500} = 137.5 \]

The Chi-Square Statistic

The test statistic compares observed and expected frequencies:

\[ \chi^2 = \sum \frac{(O_{ij} - E_{ij})^2}{E_{ij}} \]

where the sum is taken over all cells in the table.

A larger \( \chi^2 \) value indicates a greater discrepancy between observed and expected frequencies, providing stronger evidence against the null hypothesis.

Degrees of Freedom

For a chi-square test of independence:

\[ df = (r - 1)(c - 1) \]

where \( r \) is the number of rows and \( c \) is the number of columns. The p-value is obtained by comparing the computed \( \chi^2 \) to the chi-square distribution with the appropriate degrees of freedom.

Assumptions

  1. Independence of observations: Each case contributes to only one cell.
  2. Expected frequency size: No expected frequency should be less than 1, and no more than 20% of expected frequencies should be less than 5. When this assumption is violated, consider combining categories or using Fisher’s exact test.

Effect Size: Cramer’s V

The chi-square statistic is sensitive to sample size: larger samples produce larger \( \chi^2 \) values even for the same degree of association. Cramer’s V is a standardized effect size measure:

\[ V = \sqrt{\frac{\chi^2}{n \times \min(r-1, c-1)}} \]

Cramer’s V ranges from 0 (no association) to 1 (perfect association).

Guidelines for interpretation (for a table with \( df^* = \min(r-1, c-1) = 1 \)):

  • Small: \( V \approx 0.10 \)
  • Medium: \( V \approx 0.30 \)
  • Large: \( V \approx 0.50 \)

Running Chi-Square in SPSS

  1. Go to Analyze > Descriptive Statistics > Crosstabs.
  2. Place one variable in the Row box and the other in the Column box.
  3. Click Statistics and check Chi-square and Phi and Cramer’s V.
  4. Click Cells and check Expected (under Counts) and Row or Column percentages.
  5. Click OK.

In the output, examine the Pearson Chi-Square row of the Chi-Square Tests table. Report the \( \chi^2 \) value, degrees of freedom, and p-value.

Reporting Example: A chi-square test of independence indicated a significant association between gender and attitude toward same-sex marriage, \( \chi^2(2, N = 500) = 12.34, p = .002, V = .16 \).

Chapter 5: The t-Test

Purpose

The t-test compares the means of a continuous (scale) variable between two groups. It answers the question: Is the difference between the two group means large enough to conclude that the groups differ in the population, or could the difference be due to sampling error?

Types of t-Tests

Independent Samples t-Test

The independent samples t-test compares means between two separate, unrelated groups. Example: Do married and unmarried Canadians differ in their reported life satisfaction?

Paired Samples t-Test

The paired samples t-test compares means from two related measurements on the same individuals. Example: Does a couple’s communication workshop improve relationship satisfaction (measured before and after)?

One-Sample t-Test

The one-sample t-test compares a sample mean to a known or hypothesized population value. Example: Is the mean number of children in our Canadian sample different from the national average of 1.6?

The Independent Samples t-Test in Detail

The Test Statistic

\[ t = \frac{\bar{x}_1 - \bar{x}_2}{SE_{\bar{x}_1 - \bar{x}_2}} \]

where the denominator is the standard error of the difference between means. For equal variances assumed:

\[ SE = s_p \sqrt{\frac{1}{n_1} + \frac{1}{n_2}} \]

and \( s_p \) is the pooled standard deviation:

\[ s_p = \sqrt{\frac{(n_1 - 1)s_1^2 + (n_2 - 1)s_2^2}{n_1 + n_2 - 2}} \]

Degrees of Freedom

For an independent samples t-test with equal variances assumed:

\[ df = n_1 + n_2 - 2 \]

Assumptions

  1. Independence of observations: Scores in one group are unrelated to scores in the other.
  2. Normality: The dependent variable is approximately normally distributed within each group. With large samples (\( n > 30 \) per group), the t-test is robust to moderate violations due to the CLT.
  3. Homogeneity of variance: The variance of the dependent variable is similar in both groups. SPSS reports Levene’s test for this assumption. If Levene’s test is significant (\( p < .05 \)), use the “Equal variances not assumed” row (Welch’s t-test).

Effect Size: Cohen’s d

\[ d = \frac{\bar{x}_1 - \bar{x}_2}{s_p} \]

Running the Independent Samples t-Test in SPSS

  1. Go to Analyze > Compare Means > Independent-Samples T Test.
  2. Move the continuous dependent variable to the Test Variable(s) box.
  3. Move the grouping variable to the Grouping Variable box.
  4. Click Define Groups and enter the numeric codes for the two groups (e.g., 1 and 2).
  5. Click OK.

In the output, first check Levene’s test. Then read the appropriate row of the t-test table.

Reporting Example: An independent samples t-test revealed that married respondents (M = 7.4, SD = 1.8) reported significantly higher life satisfaction than unmarried respondents (M = 6.6, SD = 2.1), t(498) = 4.56, p < .001, d = 0.41.

Chapter 6: Analysis of Variance (ANOVA)

Purpose

Analysis of Variance (ANOVA) extends the logic of the t-test to comparisons involving three or more groups. While a t-test asks “Do two groups differ?”, ANOVA asks “Do any of these groups differ from each other?”

Example: Do Canadians with different religious affiliations (Christian, Muslim, Hindu, None) differ in their attitudes toward traditional gender roles?

Why Not Multiple t-Tests?

If you have four groups, you would need \( \binom{4}{2} = 6 \) pairwise t-tests. Each test carries a 5% risk of Type I error. With six tests, the familywise error rate inflates:

\[ \alpha_{\text{FW}} = 1 - (1 - 0.05)^6 \approx 0.26 \]

This means a 26% chance of at least one false positive. ANOVA controls this by testing all groups simultaneously in a single omnibus test.

The Logic of ANOVA

ANOVA compares two sources of variability:

  1. Between-group variance: How much group means differ from the overall (grand) mean.
  2. Within-group variance: How much individual scores differ from their own group mean.

If the groups truly differ, the between-group variance should be large relative to the within-group variance.

The F-Ratio

\[ F = \frac{MS_{\text{between}}}{MS_{\text{within}}} \]

where:

\[ MS_{\text{between}} = \frac{SS_{\text{between}}}{df_{\text{between}}} \]\[ MS_{\text{within}} = \frac{SS_{\text{within}}}{df_{\text{within}}} \]
  • \( df_{\text{between}} = k - 1 \), where \( k \) is the number of groups
  • \( df_{\text{within}} = N - k \), where \( N \) is the total sample size

An F-ratio near 1 suggests no group differences. Larger F-ratios provide evidence against the null hypothesis.

Assumptions

  1. Independence of observations.
  2. Normality within each group.
  3. Homogeneity of variance across groups. Tested by Levene’s test in SPSS. If violated, use the Welch F-test or the Brown-Forsythe test.

Post Hoc Tests

A significant ANOVA result tells you that at least one group differs, but not which groups differ from which. Post hoc tests perform pairwise comparisons while controlling the familywise error rate.

Common post hoc tests include:

  • Tukey’s HSD (Honestly Significant Difference): Best for equal sample sizes; controls error rate tightly.
  • Bonferroni: Conservative; divides \( \alpha \) by the number of comparisons.
  • Games-Howell: Does not assume equal variances; appropriate when Levene’s test is significant.

Effect Size: Eta-Squared

\[ \eta^2 = \frac{SS_{\text{between}}}{SS_{\text{total}}} \]

Eta-squared represents the proportion of total variance in the dependent variable that is explained by group membership.

Guidelines:

  • Small: \( \eta^2 \approx 0.01 \)
  • Medium: \( \eta^2 \approx 0.06 \)
  • Large: \( \eta^2 \approx 0.14 \)

Running One-Way ANOVA in SPSS

  1. Go to Analyze > Compare Means > One-Way ANOVA.
  2. Move the dependent variable to the Dependent List and the grouping variable to the Factor box.
  3. Click Post Hoc and select the desired test (e.g., Tukey).
  4. Click Options and check Descriptive, Homogeneity of variance test, and Means plot.
  5. Click OK.
Reporting Example: A one-way ANOVA revealed a significant effect of religious affiliation on traditional gender role attitudes, F(3, 496) = 8.21, p < .001, \( \eta^2 \) = .05. Tukey HSD post hoc tests indicated that respondents with no religious affiliation (M = 2.3, SD = 0.9) scored significantly lower than Christian respondents (M = 3.1, SD = 1.1), p = .002.

MANOVA: Multivariate Analysis of Variance

MANOVA (Multivariate Analysis of Variance) extends ANOVA to situations where there are two or more dependent variables examined simultaneously. Instead of running separate ANOVAs for each dependent variable (which inflates error rates), MANOVA tests whether the groups differ on the combination of dependent variables.

When to Use MANOVA

Use MANOVA when you want to compare groups on a set of related outcomes. Example: Do religious affiliations differ not just in gender role attitudes but also in family values and life satisfaction simultaneously?

Test Statistics

MANOVA produces several multivariate test statistics. The most commonly reported are:

  • Wilks’ Lambda (\( \Lambda \)): Ranges from 0 to 1. Smaller values indicate greater group separation.
  • Pillai’s Trace: More robust to violations of assumptions; preferred when sample sizes are unequal or assumptions are questionable.
  • Hotelling’s Trace and Roy’s Largest Root: Alternative multivariate test statistics.

If the overall MANOVA is significant, follow up with separate ANOVAs on each dependent variable to identify where the differences lie.

Running MANOVA in SPSS

  1. Go to Analyze > General Linear Model > Multivariate.
  2. Move the dependent variables to the Dependent Variables box and the grouping variable to the Fixed Factor(s) box.
  3. Click Post Hoc to set up pairwise comparisons.
  4. Click Options for descriptive statistics and effect sizes.
  5. Click OK.

Chapter 7: Correlation

Purpose

Correlation measures the strength and direction of the linear relationship between two continuous variables. It answers: “As one variable increases, does the other tend to increase, decrease, or remain unchanged?”

Pearson’s Correlation Coefficient

The Pearson product-moment correlation coefficient (\( r \)) is the most common measure of linear association:

\[ r = \frac{\sum_{i=1}^{n}(x_i - \bar{x})(y_i - \bar{y})}{\sqrt{\sum_{i=1}^{n}(x_i - \bar{x})^2 \sum_{i=1}^{n}(y_i - \bar{y})^2}} \]

Properties of r

  • \( r \) ranges from \( -1 \) to \( +1 \).
  • \( r = +1 \): Perfect positive linear relationship.
  • \( r = -1 \): Perfect negative linear relationship.
  • \( r = 0 \): No linear relationship (but there may be a nonlinear relationship).

Interpretation Guidelines

  • \( |r| < 0.10 \): Negligible
  • \( 0.10 \leq |r| < 0.30 \): Small (weak)
  • \( 0.30 \leq |r| < 0.50 \): Medium (moderate)
  • \( |r| \geq 0.50 \): Large (strong)

The Coefficient of Determination

The square of the correlation coefficient, \( r^2 \), represents the coefficient of determination: the proportion of variance in one variable that is explained by the other.

Example: If \( r = 0.40 \), then \( r^2 = 0.16 \), meaning 16% of the variance in one variable is shared with the other.

Assumptions of Pearson’s r

  1. Linearity: The relationship between the two variables is linear. Check with a scatterplot.
  2. Normality: Both variables are approximately normally distributed (required for significance testing, less critical for the estimate itself with large samples).
  3. Homoscedasticity: The spread of data points around the regression line is roughly constant across values of the predictor.
  4. No outliers: Extreme values can dramatically inflate or deflate \( r \).

Spearman’s Rank Correlation

When variables are ordinal, or when the relationship is monotonic but not linear, use Spearman’s rho (\( r_s \)). It is computed by applying the Pearson formula to the ranked values of each variable.

Running Correlations in SPSS

  1. Go to Analyze > Correlate > Bivariate.
  2. Move the variables of interest into the Variables box.
  3. Ensure Pearson is checked (and/or Spearman if appropriate).
  4. Check Flag significant correlations and select Two-tailed or One-tailed.
  5. Click OK.

The output is a correlation matrix showing the \( r \) value, significance level, and sample size for each pair of variables.

Reporting Example: There was a statistically significant positive correlation between life satisfaction and trust in others, r(498) = .34, p < .001, indicating a moderate relationship.

Partial Correlation

A partial correlation measures the relationship between two variables after controlling for the effect of one or more additional variables. This helps address the problem of spurious correlation.

Example: The correlation between religious attendance and life satisfaction might be partly explained by social support. A partial correlation controlling for social support reveals whether religious attendance has an independent association with life satisfaction.

In SPSS: Analyze > Correlate > Partial. Move the two primary variables to the Variables box and the control variable(s) to the Controlling for box.


Chapter 8: Regression Analysis

Linear Regression

Purpose

Linear regression goes beyond correlation by modeling the relationship between variables as a predictive equation. While correlation tells you that two variables are related, regression tells you the specific form of the relationship and allows you to predict the value of one variable from the other.

Simple Linear Regression

In simple linear regression, there is one predictor variable (\( x \)) and one outcome variable (\( y \)):

\[ \hat{y} = b_0 + b_1 x \]

where:

  • \( \hat{y} \) is the predicted value of the outcome
  • \( b_0 \) is the y-intercept (the predicted value of \( y \) when \( x = 0 \))
  • \( b_1 \) is the slope (the change in \( \hat{y} \) for a one-unit increase in \( x \))

Ordinary Least Squares (OLS)

The regression coefficients are estimated using Ordinary Least Squares, which minimizes the sum of squared residuals:

\[ \min \sum_{i=1}^{n} (y_i - \hat{y}_i)^2 = \min \sum_{i=1}^{n} e_i^2 \]

where \( e_i = y_i - \hat{y}_i \) is the residual for observation \( i \).

The Slope Coefficient

\[ b_1 = \frac{\sum (x_i - \bar{x})(y_i - \bar{y})}{\sum (x_i - \bar{x})^2} = r \cdot \frac{s_y}{s_x} \]

This formula shows the connection between correlation and regression: the slope equals the correlation coefficient scaled by the ratio of standard deviations.

R-Squared

The R-squared (\( R^2 \)) value indicates the proportion of variance in the outcome variable that is explained by the predictor(s):

\[ R^2 = 1 - \frac{SS_{\text{residual}}}{SS_{\text{total}}} \]

In simple regression, \( R^2 = r^2 \).

Multiple Linear Regression

Multiple linear regression includes two or more predictor variables:

\[ \hat{y} = b_0 + b_1 x_1 + b_2 x_2 + \cdots + b_k x_k \]

Each coefficient \( b_j \) represents the effect of \( x_j \) on \( y \), holding all other predictors constant. This is the principle of statistical control.

Standardized Coefficients (Beta)

When predictors are measured on different scales (e.g., age in years and income in dollars), comparing unstandardized coefficients is misleading. Standardized coefficients (\( \beta \)) express the effect of each predictor in standard deviation units:

\[ \beta_j = b_j \cdot \frac{s_{x_j}}{s_y} \]

A \( \beta \) of 0.30 means that a one-standard-deviation increase in the predictor is associated with a 0.30-standard-deviation increase in the outcome.

Adjusted R-Squared

Adding predictors always increases \( R^2 \), even if the new predictors are irrelevant. Adjusted R-squared penalizes for the number of predictors:

\[ R^2_{\text{adj}} = 1 - \frac{(1 - R^2)(n - 1)}{n - k - 1} \]

where \( k \) is the number of predictors.

Assumptions of Linear Regression

  1. Linearity: The relationship between each predictor and the outcome is linear.
  2. Independence of residuals: Residuals are not correlated with each other (no autocorrelation). Tested with the Durbin-Watson statistic (values near 2 indicate independence).
  3. Homoscedasticity: Residuals have constant variance across levels of the predictor(s). Check by plotting residuals against predicted values.
  4. Normality of residuals: Residuals are approximately normally distributed. Check with a histogram of residuals or a Normal P-P plot.
  5. No multicollinearity: Predictors are not too highly correlated with each other. Assessed using the Variance Inflation Factor (VIF). A VIF greater than 10 (or tolerance below 0.1) suggests problematic multicollinearity.

Running Linear Regression in SPSS

  1. Go to Analyze > Regression > Linear.
  2. Move the outcome variable to the Dependent box.
  3. Move the predictor(s) to the Independent(s) box.
  4. Click Statistics and check Estimates, Model fit, R squared change, Descriptives, and Collinearity diagnostics.
  5. Click Plots: set \( *ZRESID \) on the Y-axis and \( *ZPRED \) on the X-axis. Check Histogram and Normal probability plot.
  6. Click OK.
Reporting Example: A multiple regression analysis was conducted to predict life satisfaction from age, income, and marital status. The model was statistically significant, F(3, 496) = 18.42, p < .001, and explained 10% of the variance in life satisfaction (R2adj = .10). Marital status was the strongest predictor (\( \beta \) = .22, p < .001), followed by income (\( \beta \) = .15, p = .003).

Logistic Regression

When to Use

Logistic regression is used when the outcome variable is categorical (typically binary: yes/no, agree/disagree, married/not married). Ordinary linear regression is inappropriate for binary outcomes because it can produce predicted values outside the 0-1 range and violates the assumption of normally distributed residuals.

The Logistic Function

Logistic regression models the probability of the outcome occurring:

\[ P(Y = 1) = \frac{1}{1 + e^{-(b_0 + b_1 x_1 + b_2 x_2 + \cdots)}} \]

This S-shaped (sigmoid) function ensures that predicted probabilities always fall between 0 and 1.

The Logit

The logit is the natural logarithm of the odds:

\[ \text{logit}(P) = \ln\left(\frac{P}{1-P}\right) = b_0 + b_1 x_1 + b_2 x_2 + \cdots \]

Odds Ratios

In logistic regression, coefficients are often exponentiated to produce odds ratios (\( e^{b} \)):

  • An odds ratio of 1.0 means the predictor has no effect.
  • An odds ratio greater than 1.0 means the event is more likely as the predictor increases.
  • An odds ratio less than 1.0 means the event is less likely as the predictor increases.

Example: An odds ratio of 1.8 for “has children” predicting “identifies as religious” means that people with children have 1.8 times the odds of identifying as religious compared to people without children, controlling for other variables.

Model Evaluation

Logistic regression does not use \( R^2 \) in the traditional sense. Instead, model fit is assessed with:

  • -2 Log Likelihood (-2LL): Smaller values indicate better fit.
  • Nagelkerke R-squared: A pseudo-R-squared ranging from 0 to 1.
  • Hosmer-Lemeshow Test: A non-significant result (\( p > .05 \)) indicates acceptable model fit.
  • Classification Table: Shows the percentage of cases correctly classified by the model.

Running Logistic Regression in SPSS

  1. Go to Analyze > Regression > Binary Logistic.
  2. Move the binary outcome to the Dependent box.
  3. Move the predictor(s) to the Covariates box.
  4. Under Options, check Hosmer-Lemeshow goodness of fit, Classification plots, and CI for exp(B).
  5. Click OK.
Reporting Example: A logistic regression was conducted to predict support for same-sex marriage (yes/no) from age, education, and religiosity. The model was statistically significant, \( \chi^2 \)(3) = 45.67, p < .001, Nagelkerke R2 = .14. Younger age (OR = 0.97, 95% CI [0.95, 0.99]) and higher education (OR = 1.45, 95% CI [1.18, 1.78]) were significant predictors of support.

Chapter 9: Choosing the Right Statistical Test

A Decision Framework

Selecting the appropriate statistical test depends on three key questions:

  1. What is the level of measurement of the dependent (outcome) variable? Categorical or continuous?
  2. What is the level of measurement of the independent (predictor) variable(s)? Categorical or continuous?
  3. How many groups or variables are involved?

Decision Table

Outcome VariablePredictor VariableNumber of Groups/PredictorsRecommended Test
CategoricalCategorical2 categories eachChi-square test of independence
ContinuousCategorical2 groupsIndependent samples t-test
ContinuousCategorical3+ groupsOne-way ANOVA
Continuous (2+)Categorical2+ groupsMANOVA
ContinuousContinuous1 predictorPearson correlation / Simple regression
ContinuousContinuous (2+)2+ predictorsMultiple regression
Categorical (binary)Continuous and/or categorical1+ predictorsLogistic regression
Practical Tip: When in doubt, start with a scatterplot or crosstabulation to visualize the data. The shape of the data will often suggest the appropriate analysis.

Checking Assumptions: A Summary

Every parametric test assumes certain properties of the data. Violating assumptions can lead to incorrect conclusions. Here is a consolidated checklist:

For t-tests and ANOVA

  • Independence of observations (design feature, not testable after the fact)
  • Normality of the dependent variable within groups (check with Shapiro-Wilk test or visual inspection of histograms/Q-Q plots)
  • Homogeneity of variance (Levene’s test)

For Correlation and Regression

  • Linearity (scatterplot)
  • Normality of residuals (histogram, P-P plot)
  • Homoscedasticity (residuals vs. predicted plot)
  • No multicollinearity (VIF, for multiple regression)
  • Independence of residuals (Durbin-Watson)

For Chi-Square

  • Independence of observations
  • Adequate expected frequencies (no cell below 1; fewer than 20% below 5)

Chapter 10: Moderation and Mediation

Introduction to Advanced Concepts

Moderation and mediation address more nuanced research questions than simple bivariate tests. Instead of asking “Is X related to Y?”, they ask:

  • Moderation: “Does the relationship between X and Y depend on a third variable W?”
  • Mediation: “Does X influence Y through an intervening variable M?”

These concepts were famously formalized by Baron and Kenny (1986) in one of the most cited papers in social science.

Moderation Analysis

Conceptual Definition

A moderator is a variable that affects the strength or direction of the relationship between an independent variable and a dependent variable. Moderation is synonymous with a statistical interaction.

Example: The relationship between religiosity and opposition to divorce may be moderated by age. Among older Canadians, religiosity might strongly predict opposition to divorce, but among younger Canadians, the relationship might be weaker.

Visualizing Moderation

Moderation is typically visualized with separate regression lines for different levels of the moderator. If the lines are not parallel, moderation is present. Crossed lines indicate a crossover interaction (qualitative interaction), where the direction of the effect reverses.

Testing Moderation in Regression

To test moderation, include the interaction term in a regression model:

\[ \hat{y} = b_0 + b_1 X + b_2 W + b_3 (X \times W) \]

where \( X \times W \) is the product of the predictor and the moderator. If \( b_3 \) is statistically significant, moderation is supported.

Important: Before creating the interaction term, mean-centre both the predictor and the moderator (subtract their means). This reduces multicollinearity between the main effects and the interaction term without affecting the test of moderation.

Probing the Interaction

When moderation is significant, you need to understand its form. Simple slopes analysis tests the relationship between X and Y at different levels of the moderator (e.g., at the mean, one standard deviation above, and one standard deviation below the mean).

Mediation Analysis

Conceptual Definition

A mediator is a variable that explains the mechanism or process through which an independent variable influences a dependent variable. Unlike moderation, which asks “when” or “for whom,” mediation asks “how” or “why.”

Example: Does education (X) reduce support for traditional gender roles (Y) because education increases exposure to diverse perspectives (M)?

\[ X \rightarrow M \rightarrow Y \]

Baron and Kenny’s Four Steps

The classic Baron and Kenny (1986) approach to testing mediation involves four regression equations:

\[ Y = b_0 + c \cdot X \]\[ M = b_0 + a \cdot X \]\[ Y = b_0 + c' \cdot X + b \cdot M \]

Step 4: If the effect of X on Y is reduced (or becomes non-significant) when M is included, mediation is supported. The reduced effect of X on Y when M is controlled is the direct effect (\( c' \)). The indirect effect is \( a \times b \).

Types of Mediation

  • Full mediation: The direct effect \( c' \) becomes non-significant when the mediator is included. X affects Y entirely through M.
  • Partial mediation: The direct effect \( c' \) is reduced but remains significant. X affects Y partly through M and partly through other pathways.

The Sobel Test

The Sobel test formally tests whether the indirect effect (\( a \times b \)) is statistically significant:

\[ z = \frac{a \times b}{\sqrt{b^2 s_a^2 + a^2 s_b^2}} \]

where \( s_a \) and \( s_b \) are the standard errors of the \( a \) and \( b \) coefficients.

Note: Modern mediation analysis often uses bootstrapping (e.g., the PROCESS macro for SPSS by Andrew Hayes) to test the indirect effect, as the Sobel test assumes normality of the indirect effect, which is often violated.

Running Mediation in SPSS

While mediation can be tested using a series of standard regression analyses following Baron and Kenny’s steps, the PROCESS macro (Hayes, 2013) automates the procedure and provides bootstrap confidence intervals for the indirect effect.

  1. Install the PROCESS macro (downloaded from Andrew Hayes’s website).
  2. Go to Analyze > Regression > PROCESS.
  3. Specify X (independent variable), Y (outcome), and M (mediator).
  4. Select Model 4 (simple mediation).
  5. Set the number of bootstrap samples (5000 is standard).
  6. Click OK.

If the bootstrap confidence interval for the indirect effect does not contain zero, the indirect effect is significant.


Chapter 11: Exploratory Factor Analysis

Purpose

Exploratory Factor Analysis (EFA) is a data reduction technique used to identify the underlying structure among a set of observed variables. When a survey includes many items (e.g., 20 questions about family values), EFA can determine whether those items cluster into a smaller number of latent constructs (e.g., “traditional family values,” “egalitarian values,” “individualism”).

Key Concepts

Factors

A factor (or latent variable) is an unobserved construct that is inferred from patterns of correlations among observed variables. Items that correlate highly with each other but not with other items are assumed to reflect the same underlying factor.

Factor Loadings

A factor loading is the correlation between an observed variable and a factor. Loadings range from -1 to +1. A loading of 0.40 or higher is generally considered meaningful (though thresholds vary by discipline).

Eigenvalues

An eigenvalue represents the amount of variance explained by a factor. A common rule of thumb (the Kaiser criterion) is to retain factors with eigenvalues greater than 1.0, meaning the factor explains more variance than a single variable would.

The Scree Plot

A scree plot graphs eigenvalues in descending order. The point where the curve levels off (the “elbow”) suggests the optimal number of factors to retain. This visual method often yields different results than the Kaiser criterion, and researchers may use both.

Steps in Conducting EFA

  1. Assess suitability: Run the Kaiser-Meyer-Olkin (KMO) measure and Bartlett’s test of sphericity. KMO should be at least 0.60 (above 0.80 is ideal). Bartlett’s test should be significant (\( p < .05 \)).

  2. Choose extraction method: Principal Axis Factoring is common in social sciences because it extracts only shared variance (unlike Principal Components Analysis, which also includes unique variance).

  3. Determine the number of factors: Use the Kaiser criterion, scree plot, and theoretical considerations.

  4. Rotate the factor solution: Rotation simplifies the factor structure by maximizing high loadings and minimizing low ones.

    • Varimax (orthogonal rotation): Produces uncorrelated factors. Easier to interpret.
    • Oblimin or Promax (oblique rotation): Allows factors to correlate. Often more realistic in social science, where constructs are rarely truly independent.
  5. Interpret the factors: Examine the rotated factor loadings. Group items that load highly on the same factor and assign a meaningful label.

  6. Evaluate the solution: Check for items that cross-load (load highly on more than one factor) or fail to load meaningfully on any factor. These items may need to be removed.

Running EFA in SPSS

  1. Go to Analyze > Dimension Reduction > Factor.
  2. Move the items into the Variables box.
  3. Click Descriptives and check KMO and Bartlett’s test of sphericity.
  4. Click Extraction: select Principal Axis Factoring and check Scree plot.
  5. Click Rotation: select Varimax or Direct Oblimin.
  6. Click Options: check Suppress small coefficients and set the threshold (e.g., 0.30).
  7. Click OK.
Example Interpretation: A principal axis factor analysis with varimax rotation was conducted on 15 items measuring family attitudes. The KMO was .84 and Bartlett's test was significant (p < .001). Three factors with eigenvalues greater than 1.0 were extracted, explaining 52% of the total variance. Factor 1, labelled "Traditional Family Values," included items about the importance of marriage and opposition to divorce. Factor 2, labelled "Gender Egalitarianism," included items about shared household roles and women in leadership.

Chapter 12: Latent Class Analysis

Purpose

Latent Class Analysis (LCA) is a person-centred approach that identifies subgroups (classes) of individuals who share similar patterns of responses across a set of categorical variables. While factor analysis groups variables, LCA groups people.

Example: Among Canadian WVS respondents, LCA might reveal distinct “types” of people based on their combination of family values, religiosity, and gender attitudes: perhaps a “traditional” class, a “progressive” class, and an “ambivalent” class.

Key Concepts

Latent Classes

A latent class is an unobserved subgroup within the population. Membership is probabilistic: each individual has a probability of belonging to each class, and is typically assigned to the class with the highest probability.

Model Selection

The researcher does not know the number of classes in advance. Multiple models are fit (e.g., 2-class, 3-class, 4-class solutions) and compared using fit indices:

  • BIC (Bayesian Information Criterion): Lower values indicate better fit. The most commonly used criterion.
  • AIC (Akaike Information Criterion): Also lower is better, but tends to favour more complex models than BIC.
  • Entropy: Ranges from 0 to 1. Higher values indicate clearer class separation. Values above 0.80 are considered good.
  • Lo-Mendell-Rubin (LMR) Test: Compares a k-class model to a (k-1)-class model. A significant p-value suggests the k-class model is preferred.

Interpreting Classes

Once the optimal number of classes is selected, examine the item-response probabilities (the probability of endorsing each category of each variable within each class). Classes are then labelled based on the distinctive response patterns.

LCA vs. EFA

FeatureEFALCA
GroupsVariablesPeople
Variable typeContinuous (or treated as such)Categorical
OutputFactors with loadingsClasses with response probabilities
ApproachVariable-centredPerson-centred

Software Note

LCA is typically performed in specialized software such as Mplus or the poLCA package in R. SPSS does not have built-in LCA functionality, though some plugins exist.


Chapter 13: Communicating Statistical Results

Writing Statistical Results

Effective statistical communication requires translating numerical output into clear, meaningful prose. Follow these principles:

Report All Relevant Information

For each test, report:

  • The type of test conducted
  • The test statistic and its degrees of freedom
  • The exact p-value (or “p < .001” if very small)
  • An effect size measure
  • Descriptive statistics (means, standard deviations, or percentages) for each group

Use Plain Language

After reporting the statistical details, explain what the result means in substantive terms. A general audience should be able to understand your conclusion even if they skip the numbers.

Avoid Common Pitfalls

  • Do not say a result “proves” the hypothesis. Statistics provide evidence, not proof.
  • Do not confuse statistical significance with practical importance. A tiny effect can be statistically significant with a large sample.
  • Do not accept the null hypothesis. If \( p > .05 \), say you “failed to reject the null hypothesis” or found “no significant evidence of a difference.” The null may still be false.
  • Do not report results as “approaching significance” or “marginally significant.” A result either meets the pre-specified alpha level or it does not.

Tables and Figures

Tables

Use tables to present complex results concisely. A regression table should include:

  • Variable names
  • Unstandardized coefficients (B) with standard errors
  • Standardized coefficients (\( \beta \))
  • t-values and p-values
  • Model summary statistics (\( R^2 \), adjusted \( R^2 \), F-test)

Figures

Use figures to illustrate patterns:

  • Bar charts for group comparisons (t-test, ANOVA results)
  • Scatterplots for correlations and regression relationships
  • Error bar charts showing means with 95% confidence intervals
  • Interaction plots for moderation effects

Becoming Critical Consumers of Statistics

One of the most important outcomes of studying statistics is learning to evaluate the statistical claims made by others: in academic papers, government reports, media articles, and popular discourse.

Questions to ask when evaluating a statistical claim:

  1. What was the sample? Is it representative of the population the author claims to generalize to?
  2. What was the effect size? A statistically significant finding may not be practically meaningful.
  3. Were confounding variables controlled? Observational studies that fail to account for confounds may report spurious associations.
  4. Were multiple comparisons corrected for? Running many tests without correction inflates the false positive rate.
  5. Is the causal language justified? Only randomized experiments support causal conclusions. Observational studies can identify associations, not causes.
  6. Has the study been replicated? A single study, no matter how well-designed, provides weaker evidence than a body of converging findings.
Final Thought: Statistics is not merely a set of formulas to memorize. It is a way of thinking carefully about evidence, uncertainty, and the gap between what we observe in a sample and what is true in the world. The goal is not to become a statistician, but to become a researcher and citizen who can use and evaluate quantitative evidence with confidence and humility.
Back to top