 # How to Run Mediation Analysis in R: 7 Easy Steps

Are you excited to learn how to run a mediation analysis in R? You’re in the right place! This comprehensive guide will walk you through every step of the mediation analysis in R using a dummy dataset of 30 respondents.

## Lesson Outcomes

By the end of this lesson, you will be able to:

• Understand the concept of mediation analysis and its purpose in exploring indirect effects.
• Visualize a mediation model using a simple diagram.
• Install and load necessary R packages for mediation analysis
• Import and explore a dataset in R, computing descriptive statistics and correlations among variables.
• Specify a mediation model in R.
• Estimate and fit the mediation model to your dataset in R
• Interpret the results of a mediation analysis, including direct effects, indirect effects, and total effects.
• Create a visualization of the mediation analysis results in R.
• Apply the mediation analysis process to your own research questions and datasets.

Are you ready? Let’s get started and explore the process!

## What is Mediation Analysis?

Before we start crunching numbers, let’s briefly discuss what mediation analysis is. It’s a statistical technique that helps us understand how an independent variable (X) influences a dependent variable (Y) through a mediator variable (M).

Mediation analysis is particularly useful for determining if the effect of X on Y is entirely, partially, or not mediated by M.

## Our Dataset: A Brief Overview

In our example, we’ll be working with a dataset of 30 respondents. Let’s say these respondents are employees, and we want to study the relationship between job satisfaction (X), job performance (Y), and workplace motivation (M).

We hypothesize that job satisfaction influences job performance indirectly through workplace motivation. So, we’re going to perform a mediation analysis to see if this is true.

To visualize our hypothesis, we can create a simple diagram with three variables: job satisfaction (X), workplace motivation (M), and job performance (Y).

In this diagram, the arrow from X to M represents the effect of job satisfaction on workplace motivation (a path). The arrow from M to Y represents the effect of workplace motivation on job performance (b path). The indirect effect of job satisfaction on job performance through workplace motivation is the product of the a and b paths (a * b).

## How To Run Mediation Analysis in R

Now that we have a clear understanding of our dataset and hypothesis, let’s jump into R and start working with the data.

### Step 1: Install and Load Packages

First, we’ll need to install and load the necessary packages for conducting mediation analysis in R as well as visualizing the results:

# Install packages
install.packages("psych")
install.packages("lavaan")
install.packages("ggplot2")
install.packages("semPlot")

library(psych)
library(lavaan)
library(ggplot2)
library(semPlot)

Here is a brief description of the above packages:

1. psych: Package for performing psychological and psychometric analyses, such as factor analysis and descriptive statistics.
2. lavaan: Package for structural equation modeling (SEM) with user-friendly syntax and a range of fit indices.
3. ggplot2: Flexible data visualization package based on the Grammar of Graphics for creating complex and customizable plots.
4. readxl: Lightweight package for importing Excel files (.xls and .xlsx) into R data frames.
5. semPlot: Visualization tool for creating path diagrams of structural equation models (SEMs) with customization options.

### Step 2: Import and Explore the Dataset

Next, we’ll import our dataset into R and look at the first few rows to familiarize ourselves with the data. You may use your dataset, or you download my Excel dummy dataset HERE – for educational purposes only.

NOTE:

• If your dataset is an Excel .xlsx file, use the following syntax:
# Import the dataset

#Explore dataset
head(data)
• If your dataset is an Excel .csv file, use the following syntax:
# Import dataset

# Explore dataset
head(data)

Assuming our dataset contains three columns – job_satisfaction, workplace_motivation, and job_performance – the output should look something like this:

### Step 3: Descriptive Statistics and Correlations

Before we run the mediation analysis, let’s compute some descriptive statistics and correlations for our variables.

# Descriptive statistics
summary(data)

# Correlations
correlations <- cor(data)
print(correlations)

This will give us an overview of our variables’ mean, standard deviation, and correlations.

Performing descriptive statistics and correlation analysis prior to mediation analysis is important for several reasons:

1. Data understanding: Descriptive statistics provide a summary of your dataset and help you understand the central tendency, dispersion, and shape of the distribution for each variable. This understanding is crucial before diving into more complex analyses like mediation analysis, as it helps you identify any potential issues or outliers in the data.
2. Assumptions checking: Many statistical techniques, including mediation analysis, rely on certain assumptions about the data. Descriptive statistics can help you assess whether these assumptions are met. For example, normality of the variables is often an assumption in mediation analysis, and you can examine this through descriptive statistics like skewness and kurtosis.
3. Preliminary insights: Correlation analysis provides an initial understanding of the relationships between your variables. It helps you examine the strength and direction of the associations, which can be useful in generating hypotheses or informing the mediation model. Strong correlations between the independent variable (X) and the mediator (M), as well as between the mediator (M) and the dependent variable (Y), might indicate the presence of mediation effects.
4. Multicollinearity assessment: Examining correlations can also help you detect multicollinearity, a situation where two or more predictor variables are highly correlated. Multicollinearity can cause issues in mediation analysis, as it may lead to unstable estimates or inflated standard errors. By identifying multicollinearity early on, you can address it before proceeding with the mediation analysis.

### Step 4: Specify the Mediation Model

Now that we better understand our dataset, it’s time to specify the mediation model. We’ll use the R ‘lavaan‘ package to define the model using the following syntax:

mediation_model <- '
# Direct effects
workplace_motivation ~ a * job_satisfaction
job_performance ~ c * job_satisfaction + b * workplace_motivation

# Indirect effect (a * b)
indirect := a * b

# Total effect (c + indirect)
total := c + indirect
'

In this model, we define the direct effects of job satisfaction (X) on workplace motivation (M) and job performance (Y). We also specify the indirect effect (a * b) and the total effect (c + indirect).

NOTE: If you wonder why you’re not getting any output for the above R script is because this only specifies the mediation model as a string but does not perform the analysis or print any results. The actual mediation analysis in R will be performed in the next step.

## Step 5: Estimate the Mediation Model

With our mediation model specified, we can now estimate it using the ‘lavaan‘ package. We’ll fit the model to our dataset and then summarize the results.

# Estimate the mediation model
mediation_results <- sem(mediation_model, data = data)

# Summarize the results
summary(mediation_results, standardized = TRUE, fit.measures = TRUE)

The summary will show the estimated direct effects (a, b, and c paths), the indirect effect (a * b), and the total effect (c + indirect) along with their significance levels – as seen below:

Alright, but what do all these numbers mean? Let’s discuss this next.

### Step 6: Interpret Mediation Output in R

Based on the output of your mediation analysis using the dummy dataset we used in this lesson, here’s how to interpret mediation analysis results in R:

Parameter Estimates:

1. Path a (job satisfaction -> workplace motivation): The estimated coefficient for the direct effect of job satisfaction (X) on workplace motivation (M) is 1.218. This suggests that, on average, a one-unit increase in job satisfaction is associated with a 1.218-unit increase in workplace motivation, assuming a linear relationship. The standardized coefficient (Std.all) is 1.000, which indicates a strong positive relationship between job satisfaction and workplace motivation.
2. Path b (workplace motivation -> job performance): The estimated coefficient for the direct effect of workplace motivation (M) on job performance (Y) is 0.727. This suggests that, on average, a one-unit increase in workplace motivation is associated with a 0.727-unit increase in job performance, assuming a linear relationship. The standardized coefficient (Std.all) is 0.632, which indicates a moderate positive relationship between workplace motivation and job performance.
3. Path c (job satisfaction -> job performance): The estimated coefficient for the direct effect of job satisfaction (X) on job performance (Y) without considering the mediation effect is 0.516. This suggests that, on average, a one-unit increase in job satisfaction is associated with a 0.516-unit increase in job performance, assuming a linear relationship. The standardized coefficient (Std.all) is 0.368, which indicates a weak to moderate positive relationship between job satisfaction and job performance.

Defined Parameters:

1. Indirect effect (a * b): The estimated indirect effect of job satisfaction (X) on job performance (Y) through workplace motivation (M) is 0.885. This suggests that, on average, a one-unit increase in job satisfaction results in a 0.885-unit increase in job performance indirectly through its effect on workplace motivation. The standardized indirect effect (Std.all) is 0.632, which indicates a moderate positive relationship.
2. Total effect (c + indirect): The estimated total effect of job satisfaction (X) on job performance (Y), considering both the direct and indirect effects, is 1.401. This suggests that, on average, a one-unit increase in job satisfaction is associated with a 1.401-unit increase in job performance when considering both the direct and indirect effects. The standardized total effect (Std.all) is 1.000, which indicates a strong positive relationship.

Just so you know, the results presented here are based on a fictional dataset created for demonstration purposes only, and the interpretations should not be considered meaningful. However, the process of interpreting the mediation analysis results remains the same for real-life datasets.

### Step 7: Visualize Mediation in R

To make our results more accessible, let’s create a diagram using the ‘ggplot2’ package:

# Load the necessary libraries
library(ggplot2)

# Create a bar plot to visualize the path coefficients
ggplot(path_data, aes(x = path, y = coefficient, fill = path)) +
geom_bar(stat = "identity", position = position_dodge()) +
geom_text(aes(label = round(coefficient, 3)), vjust = -0.3, size = 4) +
theme_minimal() +
theme(legend.position = "none") +
ylab("Coefficient") +
xlab("Path") +
ggtitle("Mediation Analysis Results")

The above script will create a bar chart displaying the coefficients for each path in our mediation model:

The bar chart we created above is a good representation of our mediation analysis. Still, we can push this further and generate a mediation diagram with the path estimates displayed on the arrows, making it easier to interpret the relationships between the variables using the following R script:

# Load the necessary libraries
library(ggplot2)
library(semPlot)

# Create a bar plot to visualize the path coefficients
bar_plot <- ggplot(path_data, aes(x = path, y = coefficient, fill = path)) +
geom_bar(stat = "identity", position = position_dodge()) +
geom_text(aes(label = round(coefficient, 3)), vjust = -0.3, size = 4) +
theme_minimal() +
theme(legend.position = "none") +
ylab("Coefficient") +
xlab("Path") +
ggtitle("Mediation Analysis Results")

# Plot the bar plot
print(bar_plot)

# Plot the mediation diagram with path estimates
semPaths(mediation_fit, whatLabels = "est", style = "lisrel", intercepts = FALSE)


This will generate a mediation diagram with the path estimates displayed on the arrows, making it easier to interpret the relationships between the variables in our model:

Here’s a brief explanation of how to interpret the above diagram:

1. X -> M (a path): This arrow shows the effect of job satisfaction (X) on workplace motivation (M). A positive number means that as job satisfaction increases, workplace motivation also increases. A negative number indicates that as job satisfaction increases, workplace motivation decreases. The magnitude of the number reflects the strength of this relationship.
2. M -> Y (b path): This arrow represents the effect of workplace motivation (M) on job performance (Y), assuming job satisfaction (X) is held constant. A positive number means that as workplace motivation increases, job performance also increases. A negative number indicates that as workplace motivation increases, job performance decreases. The magnitude of the number reflects the strength of this relationship.
3. X -> Y (c path): This arrow shows the direct effect of job satisfaction (X) on job performance (Y), without considering the mediator (workplace motivation). A positive number means that as job satisfaction increases, job performance also increases. A negative number indicates that as job satisfaction increases, job performance decreases. The magnitude of the number reflects the strength of this relationship.

To interpret the results, consider the signs (positive or negative) and the magnitudes of the path coefficients. A larger absolute value indicates a stronger relationship between the variables.

If the indirect effect (a * b) is significant, it suggests that workplace motivation mediates the relationship between job satisfaction and job performance. In this case, part of the effect of job satisfaction on job performance can be explained through workplace motivation.

You may also want to read:

## Wrapping Up

In this article, we explored the process of conducting a mediation analysis in R using a dataset of 30 respondents. We began by discussing the concept of mediation analysis, which is used to investigate the role of a mediator variable in explaining the relationship between an independent and dependent variable.

Throughout the process, we used a dummy dataset containing three variables: job satisfaction, workplace motivation, and job performance. We applied the lavaan package to estimate the mediation model and the ggplot2 and semPlot packages to visualize the results.

Finally, we discussed how to interpret the path coefficients in the mediation model, as well as the overall findings of the mediation analysis. By following these steps, you can perform your own mediation analysis in R and gain insights into the relationships between your variables of interest.

We hope you found this article helpful in understanding and conducting mediation analysis in R. With this knowledge, you can apply mediation analysis to your research projects and find out the underlying mechanisms behind complex relationships in your data. ##### Leonard

Leonard is a Ph.D. student in Data Science and holds an MBA and B.Sc. He has an impressive public speaking profile on education, engineering, and research. He loves to help students achieve their academic objectives and believes education is the key to building a better future for mankind.

Articles: 51