Normality Test in R: A Step-by-Step Guide

Performing a normality test in R is an essential skill for data analysts and statisticians alike, which helps to understand the distribution of data within a sample. Normality refers to a bell-shaped curve representing how frequently different values occur within specific ranges or bins.

In this article, we will discuss various methods to check normality in R and calculate it using measures like mean and standard deviation, z-scores, and normal probability plots.

Lesson Outcomes:

After reading this guide, you will be able to:

  • Understand the concept of normality in statistics
  • Test for Normality in R using Histograms
  • Analyze Normality in R with QQ Plots
  • Determine Normal Distribution with Statistical Tests like Shapiro-Wilk test in R

So without further ado, let’s jump straight in.

What is Normality?

Before we dive into how to calculate normality in R, let’s first define what normality is.

Normality refers to the distribution of data within a sample. A normal distribution, sometimes called Gaussian distribution, is a bell-shaped curve that represents the distribution of data in a sample. It’s characterized by its mean and standard deviation. The mean represents the center of the distribution, while the standard deviation represents the spread of the data around the mean.

The normal distribution is important because many statistical tests assume that the data being analyzed follows a normal distribution. If your data does not follow a normal distribution, some statistical tests may not be appropriate or may produce inaccurate results.

How to Check for Normality in R

To check if your data follows a normal distribution, you can use several methods in R. Here are some common methods:

  1. Histogram

One way to check for normality is by creating a histogram of your data using the hist() function in R. A histogram is a graphical representation of the distribution of your data. It shows how frequently different values occur within a particular range or bin.

Here is an example code to create a histogram in R:

# Create sample data
data <- rnorm(100)

# Create histogram
hist(data)

This code above will randomly generate 100 numbers from a standard normal distribution (mean = 0, standard deviation = 1) using the rnorm() function and creates a histogram using the hist() function.

As you can see below, the resulting histogram shows a bell-shaped curve if the data follows a normal distribution.

  1. QQ Plot

Another method to check for normality is by creating a QQ plot using the qqnorm() function in R. A QQ plot is a graphical representation of how well your data matches a theoretical normal distribution.

Here is an example code to create a QQ plot in R:

# Create sample data
data <- rnorm(100)

# Create QQ plot
qqnorm(data)
qqline(data)

This code shown above will generate 100 random numbers from a standard normal distribution using the rnorm() function and creates a QQ plot using the qqnorm() function.

The qqline() function adds a reference line to the plot, representing a perfect match between your data and the theoretical normal distribution. If your data follows a normal distribution, the points on the QQ plot should fall approximately on this line.

  1. Shapiro-Wilk Test

Finally, you can use statistical tests to check for normality in R. One commonly used test is the Shapiro-Wilk test, which tests whether your data follow a normal distribution.

Here is an example code to perform a Shapiro-Wilk test in R:

# Create sample data
data <- rnorm(100)

# Perform Shapiro-Wilk test
shapiro.test(data)

As with the previous examples, rnorm() function will help us generate 100 random numbers from a standard normal distribution and performs a Shapiro-Wilk test using the shapiro.test() function.

If your data follow a normal distribution, the resulting output should show a p-value greater than 0.05.

How To Conduct Normality Test in R

Now that we’ve covered how to check for normality in R, let’s move on to how to calculate normality in R. There are different ways to calculate normality depending on what you mean by “normality”. Here are some common methods:

  1. Mean and Standard Deviation

The mean and standard deviation are two common measures of central tendency and variability respectively that describe the characteristics of a normal distribution.

Here is an example code to calculate the mean and standard deviation of a dataset in R:

# Create sample data
data <- rnorm(100)

# Calculate mean and standard deviation
mean_data <- mean(data)
sd_data <- sd(data)

# Display mean and standard deviation
cat("Mean of the data:", mean_data, "\n")
cat("Standard deviation of the data:", sd_data, "\n")

So back to our favorite rnorm() function, we generated 100 random numbers from a standard normal distribution and calculated the mean and standard deviation of the dataset using the mean() and sd() functions – as seen in the following capture:

  1. Z-Score

A z-score is a measure of how many standard deviations an observation is from the mean of a normal distribution. It’s commonly used in hypothesis testing and statistical inference.

Here is an example code to calculate the z-score of a dataset in R:

# Create sample data
data <- rnorm(100)

# Calculate z-score
z_score <- (data - mean(data)) / sd(data)

# Display z-scores
print(z_score)

This time we calculated the z-score of each observation in the dataset by subtracting the mean of the dataset from each observation and dividing by the standard deviation of the random numbers dataset we generated using the rnorm() function.

  1. Normal Probability Plot

A normal probability plot, also known as a Q-Q plot, is a graphical representation of how well your data follows a normal distribution. It plots the observed data against the expected values of a normal distribution.

Here is the code snippet to create a normal probability plot in R:

# Create sample data
data <- rnorm(100)

# Create normal probability plot
qqnorm(data)
qqline(data)

The qqline() function adds a reference (red) line to the plot, representing a perfect match between your data and theoretical normal distribution.

Conclusion

And there you have it. In this article, we’ve shown you how to check for normality using different methods in R, such as histograms, QQ plots, and statistical tests like Shapiro-Wilk test.

We’ve also covered how to calculate normality in R using measures like mean and standard deviation, z-scores, and normal probability plots. Understanding how to calculate normality in R can help you make more accurate statistical inferences from your data.