How to Run a Paired Samples T-Test in SPSS (Thesis Guide)

By Leonard Cucosen
SPSSStatisticsResearch Methods

The paired samples t-test compares two measurements taken from the same participants. It is the standard analysis for pre-test/post-test designs, before-and-after studies, and any research where the same group is measured twice under different conditions. Unlike the independent samples t-test, which compares two separate groups, the paired test accounts for the within-subject correlation between measurements, giving it more statistical power to detect a real difference.

This guide walks through the full process in SPSS: checking assumptions, running the test, interpreting all three output tables, calculating Cohen's d for paired designs, and reporting results in APA 7th format. If you followed the independent samples t-test guide, note that the paired version has different normality requirements and a different effect size formula. The tutorial uses an extended version of the thesis dataset from previous guides, with two new variables added for pre-test and post-test scores.

Key Takeaways:

  • The paired samples t-test compares two measurements from the same participants (e.g., pre-test vs. post-test, before vs. after intervention)
  • The normality assumption applies to the difference scores, not to each variable individually
  • SPSS computes the difference as Variable 1 minus Variable 2, which often produces a negative t-value when scores improve. This is not an error.
  • Cohen's d for paired designs uses a different formula than the independent version: d = Mean of Differences / SD of Differences
  • Report the t statistic, degrees of freedom, p-value, both means with SDs, and effect size in your Results chapter

Before you begin: This guide assumes you have your data loaded in SPSS with two related measurements (e.g., pre-test and post-test scores) defined in Variable View. You should have already examined descriptive statistics for your variables. If you need to check normality of the difference scores, see our guide on how to check normality in SPSS.

When to Use a Paired Samples T-Test

The paired samples t-test is appropriate when your research design meets these conditions:

  1. You have one continuous dependent variable measured at two time points or under two conditions.
  2. The same participants provide both measurements (or participants are matched in pairs).
  3. You want to determine whether the mean difference between the two measurements is statistically significant.

If the two measurements come from different participants, use the independent samples t-test instead. If you have three or more related measurements, use repeated measures ANOVA.

Common thesis examples:

Research QuestionVariable 1Variable 2Design
Does the training program improve test scores?PreTestScorePostTestScorePre/post intervention
Do students rate the course differently at midterm vs. final?MidtermRatingFinalRatingTwo time points
Is there a difference between self-reported and observed behavior?SelfReportObservedScoreTwo measurement methods

Table 1: Common research designs suitable for the paired samples t-test

Assumptions

The paired samples t-test has three assumptions. Two of them are straightforward; the third requires a specific check that differs from the independent t-test.

Each case in the dataset must have both measurements. Participant 1 has a pre-test score and a post-test score; participant 2 has a pre-test score and a post-test score; and so on. This is met through research design. If some participants dropped out between measurements, SPSS handles this through listwise deletion (those cases are excluded from the analysis).

2. Continuous Dependent Variable

Both measurements must be on an interval or ratio scale. Test scores, ratings on a continuous scale, and physiological measurements all qualify. Ordinal data with few categories (e.g., a 3-point scale) is better analyzed with the Wilcoxon signed-rank test.

3. Normality of the Difference Scores

This is where the paired t-test differs from the independent version. The normality assumption does not apply to each variable separately. It applies to the difference scores (Variable 1 minus Variable 2 for each participant).

To check this:

  1. Create a new variable: Transform > Compute Variable. Set the target variable to something like Difference and the expression to PostTestScore - PreTestScore.
  2. Test normality on this new variable using the Explore procedure, as described in the normality guide.

With 30 or more participants, the paired t-test is robust to moderate normality violations (Schmider et al., 2010). With our sample of 150, this assumption is easily met. If your sample is small and the difference scores are severely non-normal, use the Wilcoxon signed-rank test as the non-parametric alternative.

Note that Levene's test for equality of variances, which is part of the independent t-test workflow, does not apply here. There is only one group, measured twice, so there are no between-group variances to compare.

Expert Help Available

Struggling with your statistical analysis?

We help students and researchers understand their data with SPSS, Excel, R, and Amos.

Get Expert Help

Example Dataset

This tutorial uses an extended version of the thesis dataset from the descriptive statistics and normality guides. Two new variables have been added to the dataset: PreTestScore and PostTestScore, representing scores before and after a study skills intervention. You can download the extended dataset from the sidebar.

Research question: Did the study skills intervention improve test scores?

  • Variable 1: PreTestScore (continuous, Scale, range 40-90)
  • Variable 2: PostTestScore (continuous, Scale, range 45-95)
  • Sample: 150 participants measured before and after the intervention
  • Design: Single-group pre-test/post-test

SPSS Variable View showing PreTestScore and PostTestScore variables as Numeric, Scale measurement

Figure 1: Variable View in SPSS showing the PreTestScore and PostTestScore variables

SPSS Data View showing the first 30 cases with PreTestScore and PostTestScore columns

Figure 2: Data View in SPSS with pre-test and post-test scores for 150 participants

Step-by-Step: Running the Paired Samples T-Test

Step 1: Navigate to the T-Test Dialog

Go to Analyze > Compare Means > Paired-Samples T Test.

SPSS menu path showing Analyze, Compare Means, Paired-Samples T Test highlighted

Figure 3: Navigate to Analyze > Compare Means > Paired-Samples T Test

Step 2: Select the Paired Variables

In the Paired-Samples T Test dialog:

  1. Select PreTestScore from the left variable list.
  2. Hold Ctrl (or Cmd on Mac) and also select PostTestScore.
  3. Click the blue arrow button to move both variables into the Paired Variables box.
  4. SPSS displays them as Pair 1: PreTestScore - PostTestScore.

The order matters for the sign of the output. SPSS calculates Variable 1 minus Variable 2 (PreTestScore minus PostTestScore). Since we expect post-test scores to be higher, the mean difference will be negative. This is not an error.

SPSS Paired-Samples T Test dialog with PreTestScore and PostTestScore in the Paired Variables box

Figure 4: Paired-Samples T Test dialog with PreTestScore as Variable 1 and PostTestScore as Variable 2

Step 3: Run the Test

Click OK. SPSS produces three output tables: Paired Samples Statistics, Paired Samples Correlations, and Paired Samples Test.

Interpreting the Output

SPSS generates three tables for the paired samples t-test. Each provides different information, and you will need values from all three for a complete interpretation and APA report.

Paired Samples Statistics Table

This table reports the descriptive statistics for each variable separately.

SPSS Paired Samples Statistics, Correlations, and Test output tables

Figure 5: Complete paired samples t-test output showing all three tables

What to look at:

  • Mean: PreTestScore = 62.11, PostTestScore = 68.86. Post-test scores are on average 6.75 points higher.
  • N: 150 for both variables. No cases were excluded due to missing data.
  • Std. Deviation: PreTestScore = 10.291, PostTestScore = 10.392. The spread of scores is similar across both measurements.
  • Std. Error Mean: The precision of each mean estimate. Smaller values indicate more precise estimates.

Paired Samples Correlations Table

This table shows the Pearson correlation between the two measurements.

  • Correlation: .668, Sig.: .000 (p < .001)

The correlation of .668 is moderate to strong and statistically significant. This confirms that pre-test and post-test scores are positively related: participants who scored higher before the intervention also tended to score higher after it. This is expected in a within-subjects design and is one reason the paired t-test has more power than the independent version. By accounting for this correlation, the paired test removes between-subject variability from the error term.

If this correlation were near zero or negative, it would suggest something unusual about your data (e.g., the pairing structure may be wrong, or the two measurements may not be from the same construct).

Paired Samples Test Table

This is the main results table. It reports the paired differences and the t-test result.

Reading the Paired Differences columns:

  • Mean: -6.753. This is the average of all individual differences (PreTestScore minus PostTestScore). The negative sign means post-test scores are higher than pre-test scores on average.
  • Std. Deviation: 8.428. This is the standard deviation of the difference scores, not of either variable individually. You will need this value for calculating Cohen's d.
  • Std. Error Mean: 0.688. The standard error of the mean difference.
  • 95% Confidence Interval: [-8.113, -5.394]. The entire interval is negative, meaning we are 95% confident the true population mean difference falls between -8.11 and -5.39. Because the interval does not contain zero, the difference is statistically significant.

Reading the test statistics:

  • t: -9.814. The t statistic is negative because the mean difference is negative (pre-test scores minus post-test scores). The absolute value (9.814) represents how many standard errors the mean difference is from zero.
  • df: 149 (N - 1 = 150 - 1).
  • Sig. (2-tailed): .000 (p < .001). The result is statistically significant at any conventional alpha level.

Putting the Output Together

The paired samples t-test shows a statistically significant increase in test scores from pre-test (M = 62.11, SD = 10.29) to post-test (M = 68.86, SD = 10.39), with a mean improvement of 6.75 points. The 95% confidence interval for the mean difference [-8.11, -5.39] does not include zero, and the t-test is significant, t(149) = -9.81, p < .001.

The correlation between pre-test and post-test scores (r = .668) confirms that the within-subjects design is appropriate and that the paired t-test is the correct choice over the independent version.

Calculating Cohen's d (Effect Size)

The formula for Cohen's d in paired designs differs from the independent samples version. For independent samples, you divide by the pooled standard deviation of the two groups. For paired samples, you divide by the standard deviation of the difference scores.

Formula

d=MdiffSDdiffd = \frac{M_{\text{diff}}}{SD_{\text{diff}}}

Where:

  • MdiffM_{\text{diff}} is the mean of the paired differences (from the Paired Samples Test table)
  • SDdiffSD_{\text{diff}} is the standard deviation of the paired differences (from the same table)

Both values come directly from the SPSS output. No manual pooling is needed.

Calculation

Using the values from the Paired Samples Test table:

d=6.7538.428=0.801d = \frac{-6.753}{8.428} = -0.801

The absolute value is 0.80.

Interpretation

Cohen's dEffect SizePractical Meaning
0.2SmallDifference exists but is difficult to observe
0.5MediumDifference is noticeable and may be practically meaningful
0.8LargeDifference is substantial and clearly meaningful

Table 2: Cohen's d benchmarks for interpreting effect size (Cohen, 1988)

With d = 0.80, this is a large effect. The study skills intervention produced an improvement of approximately 0.80 standard deviations in test scores. Combined with the highly significant p-value (p < .001) and the narrow confidence interval, these results provide strong evidence that the intervention had a substantial positive impact on student performance.

Why the Paired Formula Differs

In the independent samples t-test, Cohen's d uses the pooled standard deviation because you are comparing two separate groups with their own variability. In the paired design, there is only one set of difference scores, and the relevant variability is how much those differences vary across participants. Using the pooled SD from the two variables would inflate the denominator and underestimate the effect size, because it ignores the correlation between measurements.

Some methodologists distinguish between dzd_z (using SD of differences, which is what we calculated here) and davd_{av} (using the average of the two SDs). The dzd_z version is the standard approach for within-subjects designs and is what thesis committees typically expect (Lakens, 2013).

Expert Help Available

Struggling with your statistical analysis?

We help students and researchers understand their data with SPSS, Excel, R, and Amos.

Get Expert Help

What to Do When Assumptions Are Violated

Non-Normal Difference Scores

If the difference scores are severely non-normal (skewness beyond +/-2) and your sample is below 30, the Wilcoxon signed-rank test is the standard non-parametric alternative. It compares the ranks of the absolute differences rather than the raw values.

To run it in SPSS: Analyze > Nonparametric Tests > Legacy Dialogs > 2 Related Samples. Move both variables to the Test Pairs List and select "Wilcoxon" under Test Type. Click OK.

With 30 or more participants, the paired t-test is robust to moderate normality violations (Schmider et al., 2010). Document the violation, report the skewness and kurtosis of the difference scores, cite the robustness literature, and proceed with the parametric test.

Outliers in the Difference Scores

Extreme outliers in the difference scores can disproportionately affect the mean and standard deviation. Identify outliers using boxplots of the difference scores or by examining standardized values beyond +/-3.

If outliers exist, first verify that they are legitimate data points (not data entry errors). If legitimate, run the analysis with and without the outliers and report both results. If the conclusion does not change, the outliers are not influential. If the conclusion changes, discuss this sensitivity in your Results chapter.

Reporting in APA Format

Reporting the Actual Results From This Tutorial

A paired samples t-test was conducted to evaluate the effect of a study skills intervention on test scores. Post-test scores (M = 68.86, SD = 10.39) were significantly higher than pre-test scores (M = 62.11, SD = 10.29), t(149) = -9.81, p < .001, d = 0.80, 95% CI [-8.11, -5.39]. The effect size was large, indicating that the intervention produced a substantial improvement in student performance.

Non-Significant Result Template

If the result had been non-significant, the report would follow this structure:

A paired samples t-test was conducted to compare test scores before and after the intervention. There was no significant difference between pre-test scores (M = 62.11, SD = 10.29) and post-test scores (M = 63.40, SD = 10.55), t(149) = -1.12, p = .264, d = 0.13. The effect size was negligible.

With Wilcoxon Alternative (Non-Normal Difference Scores)

The Shapiro-Wilk test indicated that the difference scores were not normally distributed (W = 0.94, p = .003). A Wilcoxon signed-rank test was therefore conducted. Post-test scores were significantly higher than pre-test scores (Z = -5.42, p < .001, r = .44).

APA Table Format

For thesis Results chapters that require a summary table:

VariableConditionNMSDtdfpd
Test ScorePre-test15062.1110.29-9.81149< .0010.80
Post-test15068.8610.39

Table 3: Paired samples t-test results comparing pre-test and post-test scores

Reporting Checklist

Every paired samples t-test report should include:

  1. The purpose of the test (what comparison was made and why)
  2. The means and standard deviations for both conditions
  3. The t statistic, degrees of freedom, and exact p-value (or "< .001" when very small)
  4. Effect size (Cohen's d) with interpretation
  5. The 95% confidence interval of the mean difference
  6. The correlation between the two measurements (optional but recommended, especially when justifying the paired design)

Common Mistakes

1. Testing Normality on Each Variable Instead of the Differences

As covered in the Assumptions section above, the paired t-test checks normality of the difference scores, not each variable on its own. Two non-normal variables can still produce normally distributed differences. Always compute and test the difference variable.

2. Using the Independent Samples T-Test for Paired Data

If the same participants are measured twice, the measurements are correlated. Using the independent t-test ignores this correlation, inflates the error term, and reduces statistical power. You may miss a real effect that the paired test would detect. Check your research design: same people measured twice means paired, different people means independent.

3. Ignoring the Sign of the T-Value

As explained in Step 2, SPSS computes Variable 1 minus Variable 2 in the order you entered them, so an improvement from pre to post produces a negative t-value. This is not an error. Report the value as SPSS gives it and clarify the direction using the means from the Paired Samples Statistics table.

4. Omitting Cohen's d or Using the Wrong Formula

The paired design requires dividing by the SD of the differences, not the pooled SD used for independent samples (see Why the Paired Formula Differs above). Mixing up the formulas underestimates your effect size. Both values you need are in the Paired Samples Test table.

5. Running Multiple Paired T-Tests Across Several Time Points

If you measured participants at three or more time points (pre, mid, post), running all pairwise paired t-tests inflates the Type I error rate. With three comparisons at alpha = .05, the probability of at least one false positive rises to approximately .14. Use repeated measures ANOVA instead, then follow up with pairwise comparisons using a Bonferroni correction if the omnibus test is significant.

What Your Thesis Committee Will Ask

"Why did you use a paired samples t-test and not an independent samples t-test?" Explain that the same participants were measured before and after the intervention, making the observations dependent. The paired t-test accounts for this dependency by analyzing the difference scores, which removes between-subject variability and increases statistical power. Using the independent test on paired data would violate the independence assumption and waste the advantage of the within-subjects design.

"How did you verify the normality assumption?" Describe that you computed the difference scores (PostTestScore minus PreTestScore) and tested their normality. With a sample of 150, cite the robustness of the t-test to moderate normality violations (Schmider et al., 2010) and report the skewness and kurtosis of the differences if applicable.

"The effect size is large. Could this be due to practice effects rather than the intervention?" This is a legitimate concern with pre/post designs. Acknowledge that without a control group, you cannot definitively attribute the improvement to the intervention alone. Practice effects, maturation, regression to the mean, and other threats to internal validity are possible. If your thesis uses a single-group design, discuss these limitations honestly in the Discussion chapter. A stronger design would include a control group that takes the same tests without receiving the intervention.

"Why should I trust a t-test when you only have two time points?" The paired t-test is specifically designed for two related measurements. It is the most powerful test available for this exact comparison. If there were additional time points, repeated measures ANOVA would be appropriate. Two time points with a paired t-test is the standard approach in pre/post research (Field, 2018).

Frequently Asked Questions

Next Steps

After completing the paired samples t-test, the next analysis depends on your research design and the questions that remain.

If your study includes a control group alongside the pre/post measurements, you may need a mixed-design ANOVA to test both within-subjects and between-subjects effects simultaneously. If your design involves three or more related measurements, move to repeated measures ANOVA, which extends the paired t-test logic to multiple time points.

For examining whether continuous variables predict your outcome rather than comparing conditions, linear regression provides the framework. Make sure your foundational analyses are in place: descriptive statistics for sample characteristics and normality testing for assumption documentation should appear in your Results chapter before the t-test.

References

American Psychological Association. (2020). Publication manual of the American Psychological Association (7th ed.). American Psychological Association.

Cohen, J. (1988). Statistical power analysis for the behavioral sciences (2nd ed.). Lawrence Erlbaum Associates.

Field, A. (2018). Discovering statistics using IBM SPSS statistics (5th ed.). SAGE Publications.

Lakens, D. (2013). Calculating and reporting effect sizes to facilitate cumulative science: A practical primer for t-tests and ANOVAs. Frontiers in Psychology, 4, 863.

Pallant, J. (2020). SPSS survival manual (7th ed.). Open University Press.

Schmider, E., Ziegler, M., Danay, E., Beyer, L., & Bühner, M. (2010). Is it really robust? Reinvestigating the robustness of ANOVA against violations of the normal distribution assumption. Methodology, 6(4), 147-151.