In the previous lesson, we learned how to find standard deviation with Excel. This time we will learn how to find the standard deviation on R step-by-step with clear examples.
Though R offers a number of data types and structures, in this tutorial we will focus on how to find standard deviation in RStudio for the most commonly used types, respectively for data frames, vectors, and arrays.
Assuming you already have R and RStudio installed on your computer, go ahead and launch RStudio. In the meantime, let’s quickly overview a few important things about standard deviation – it won’t take long.
What is Standard Deviation Anyway?
In simple terms, standard deviation tells us how spread a set of data points is relative to their average (mean) in a given dataset. A low standard deviation is preferred as it tells us the data is more reliable as is clustered around the mean. In contrast, a high standard
The general notation for standard deviation is sd. However, standard deviation has two formulas (as well as two notations), depending on if the standard deviation is calculated for the whole population or a sample of it.
Here is how the population and standard deviation formulas look side by side, with the differences highlighted in red:
As you can see the symbol for population standard deviation is represented by the lowercase Greek letter Sigma σ while the notation for the sample standard deviation is the more-familiar letter s.
Now, that’s a fair amount of confusion about standard deviation, notation, calculation, and proper use in statistical research. Lucky for you, I have the perfect fix for that!
Take a few minutes and go through the Population vs. Sample Standard Deviation Explained lesson first and you’ll feel confident when jumping in hot waters with R next.
Calculate Standard Deviation on R
In R, the dedicated function for standard deviation is sd() and basically calculates the square root of the variance in the input object. The object and the values it contains will be defined first and then inserted as input objects in the sd() function for computation.
Next, let’s learn how exactly we calculate the standard deviation in R using the build-in sd() function and some step-by-step examples.
Using Excel Dataset
Let’s start by calculating the age standard deviation on R for a group of respondents in an Excel dataset.
You can follow me along by downloading the Excel dataset used in this lesson HERE. To import an Excel dataset in RStudio, navigate to File → Import Dataset → From Excel and select the file with the extension .xlsx downloaded above.
Our sample Excel dataset contains two columns: age and weight as seen in the following picture.
To find the standard deviation in R for the age subset in the imported Excel dataset, type in the RStudio console:
sd() = standard deviation function in R
Standard_Deviation_on_R = Excel dataset object
$ = operator used to extract a specific part of an object, e.g., age column.
And the standard deviation for age is 14.46402. Now, go ahead and calculate the standard deviation for the weight subset in the same Excel file.
Using Data Frames
In R, data frames consist of three components: rows, columns, and data. In a nutshell, data frames are everything that can store tabular data.
We can import a data frame in R from a text or Excel file (as we did previously) or can create a manual data frame and extract the standard deviation of a numerical column from it using the sd() function in R.
First, let’s create a data frame in R consisting of five top tech companies and their price per share (NASDAQ) at the moment of writing this post:
We will use the data.frame() function to create the df object in R. This data frame will have five columns and two rows, similar to the table above, containing the company ID (1 to 5), company name, and the share price for each company.
Here is how we create this data frame in R using one command:
df <- data.frame(company_id = c(1:5),
company_name = c("APPL", "MSFT", "AMZN", "GOOGL", "TSLA"),
share_price = c(174.24, 308.31, 3259.95, 2781.35, 1078),
stringsAsFactors = FALSE)
df = data frame object containing the company ID, company name, and share price of the top five US tech companies.
stringAsFactors = an argument for the data.frame() function and is used to determine whether strings in a data frame should be regarded as factors or as ordinary strings. In this case, we want to treat data as factor variables so we added the FALSE flag to the stringsAsFactors argument.
Finally, let us calculate the standard deviation on R for the share price of the top five US tech companies using the now-famous sd() R function:
As you see, the computed standard deviation for the given share price is 1422.415.
Of course, we can add additional rows and columns to a data frame and expand our analysis for standard deviation in R beyond just the share price.
A vector is the most basic data structure in R and consists of a collection of data components of the same kind.
For example, in R the vector 1:10 will contain the values from 1 to 10 respectively 1, 2, 3, 4, 5, 6, 7, 8, 9, 10.
Furthermore, a vector can contain specific values as well. For instance, the vector vc(2,4,6) will contain the values 2, 4, and 6.
Let’s start by creating a vector vc using the operator (:) and vector components containing values from 1 to 10.
vc <- 1:10
You can visualize the vc vector components using the concatenate command cat as follows:
Next, calculate the standard deviation on R for the vc object using the command:
Here is the complete output. As we can see, the standard deviation in R for the vc vector is 3.02765
In R, an array is a collection of objects that may carry two or more dimensions of data (multi-dimensional) and hold values that are of the same data type. Arrays should not be confused with vectors that are uni-dimensional in nature.
To find the standard deviation for an array In R, we need to create the array by using the built-in function array(). To do so, we will take two vectors as arguments (e.g., vc1 and vc2) and then set the dimensions of the matrix using the dim function.
First, let us define the vc1 vector with the elements 12 and 8 using the command:
vc1 <- c(12,8)
And configure the vc2 vector consisting of 39 and 17 elements:
vc2 <- c(39,17)
Next, we need to create an array using the vc1 and vc2 vectors and use the dim function to set the dimensions of the matrix (columns by rows) as follows:
arr <- array(c(vc1, vc2), dim = c(2, 2))
And finally, we can use the sd() function to calculate the standard deviation on R for the newly created array object:
Bellow is the complete output in R for the above commands. As you can see, the standard deviation for the arr array is 13.832
In this R tutorial for statistics, we learned how to calculate standard deviation in RStudiuo for imported Excel datasets, data frames, vectors, and arrays.
Though calculating the standard deviation in SPSS or Excel can be somehow more straightforward, R gives us a lot of flexibility and control over the data we input and manipulate.
I hope you found some value in this R tutorial. If so, kindly help spread the knowledge by sharing this article with your friends and colleagues.
Cite this article on your website or research paper:
Field, A., Miles, J., & Field, Z. (2012). Discovering statistics using R. SAGE Publications.