In the previous lesson, we learned how to find ** standard deviation with Excel**. This time we will learn

*step-by-step with clear examples.*

**how to find the standard deviation on R**Though R offers a number of ** data types and structures**, in this tutorial we will focus on how to find standard deviation in RStudio for the most commonly used types, respectively for data frames, vectors, and arrays.

Assuming you already have R and RStudio installed on your computer, go ahead and launch RStudio. In the meantime, let’s quickly overview a few important things about standard deviation – it won’t take long.

## What is Standard Deviation Anyway?

In simple terms, standard deviation tells us how spread a set of data points is relative to their average (mean) in a given dataset. A low standard deviation is preferred as it tells us the data is more reliable as is clustered around the mean. In contrast, a high standard

The general notation for standard deviation is ** sd**. However, standard deviation has two formulas (as well as two notations), depending on if the standard deviation is calculated for the whole population or a sample of it.

Here is how the population and standard deviation formulas look side by side, with the differences highlighted in red:

As you can see the symbol for population standard deviation is represented by the lowercase Greek letter Sigma **σ** while the notation for the sample standard deviation is the more-familiar letter **s**.

Now, that’s a fair amount of confusion about standard deviation, notation, calculation, and proper use in statistical research. **Lucky for you, I have the perfect fix for that! **

Take a few minutes and go through the ** Population vs. Sample Standard Deviation Explained** lesson first and you’ll feel confident when jumping in hot waters with R next.

## Calculate Standard Deviation on R

In R, the dedicated function for standard deviation is **sd()** and basically calculates the square root of the variance in the input object. The object and the values it contains will be defined first and then inserted as input objects in the** sd()** function for computation.

Next, let’s learn how exactly we calculate the standard deviation in R using the build-in** sd()** function and some step-by-step examples.

### Using Excel Dataset

Let’s start by calculating the age standard deviation on R for a group of respondents in an Excel dataset.

You can follow me along by downloading the Excel dataset used in this lesson

. To import an Excel dataset in RStudio, navigate toHEREFile → Import Dataset → From Exceland select the file with the extension.xlsxdownloaded above.

Our sample Excel dataset contains two columns: ** age** and

**as seen in the following picture.**

*weight*To find the standard deviation in R for the *age *subset in the imported Excel dataset, type in the RStudio console:

**sd(Standard_Deviation_on_R$Age)**

*Where:*

**sd() ** = standard deviation function in R

**Standard_Deviation_on_R** = Excel dataset object

**$** = operator used to extract a specific part of an object, e.g., *age* column.

And the standard deviation for age is **14.46402**. Now, go ahead and calculate the standard deviation for the ** weight** subset in the same Excel file.

### Using Data Frames

In R, data frames consist of three components: ** rows**,

**, and**

*columns***. In a nutshell, data frames are everything that can store tabular data.**

*data*We can import a data frame in R from a text or Excel file (as we did previously) or can create a manual data frame and extract the standard deviation of a numerical column from it using the **sd() **function in R.

First, let’s create a data frame in R consisting of five top tech companies and their price per share (NASDAQ) at the moment of writing this post:

APPL | MSFT | AMZN | GOOGL | TSLA |

174.24 | 308.31 | 3259.95 | 2781.35 | 1078 |

We will use the **data.frame() **function to create the **df** object in R. This data frame will have five columns and two rows, similar to the table above, containing the company ID (1 to 5), company name, and the share price for each company.

Here is how we create this data frame in R using one command:

```
df <- data.frame(company_id = c(1:5),
company_name = c("APPL", "MSFT", "AMZN", "GOOGL", "TSLA"),
share_price = c(174.24, 308.31, 3259.95, 2781.35, 1078),
stringsAsFactors = FALSE)
```

*Where:*

**df** = data frame object containing the company ID, company name, and share price of the top five US tech companies.

* stringAsFactors* = an argument for the data.frame() function and is used to determine whether strings in a data frame should be regarded as factors or as ordinary strings. In this case, we want to treat data as factor variables so we added the

**flag to the**

*FALSE**stringsAsFactors*argument.

Finally, let us calculate the standard deviation on R for the share price of the top five US tech companies using the now-famous **sd()** R** **function:

**sd(df$share_price)**

As you see, the computed standard deviation for the given share price is **1422.415**.

Of course, we can add additional rows and columns to a data frame and expand our analysis for standard deviation in R beyond just the share price.

### Using Vectors

A vector is the most basic data structure in R and consists of a collection of data components of the same kind.

For example, in R the vector **1:10*** *will contain the values from 1 to 10 respectively 1, 2, 3, 4, 5, 6, 7, 8, 9, 10.

Furthermore, a vector can contain specific values as well. For instance, the vector ** vc(2,4,6)** will contain the values 2, 4, and 6.

Let’s start by creating a vector * vc* using the operator (

**:**) and vector components containing values from 1 to 10.

**vc <- 1:10**

You can visualize the **vc** vector components using the concatenate command *cat** *as follows:

**cat(vc)**

Next, calculate the standard deviation on R for the ** vc** object using the command:

**sd(vc)**

Here is the complete output. As we can see, the standard deviation in R for the ** vc** vector is

**3.02765**

### Using Arrays

In R, an ** array** is a collection of objects that may carry two or more dimensions of data (

*multi-dimensional*) and hold values that are of the same data type. Arrays should not be confused with vectors that are

*uni-dimensional*in nature.

To find the standard deviation for an array In R, we need to create the array by using the built-in function **array()**. To do so, we will take two vectors as arguments (e.g., *vc1 *and *vc2*) and then set the dimensions of the matrix using the ** dim** function.

First, let us define the ** vc1** vector with the elements 12 and 8 using the command:

**vc1 <- c(12,8)**

And configure the ** vc2** vector consisting of 39 and 17 elements:

**vc2 <- c(39,17)**

Next, we need to create an array using the vc1 and vc2 vectors and use the dim function to set the dimensions of the matrix (*columns by rows*) as follows:

**arr <- array(c(vc1, vc2), dim = c(2, 2))**

And finally, we can use the sd() function to calculate the standard deviation on R for the newly created array object:

**sd(arr)**

Bellow is the complete output in R for the above commands. As you can see, the standard deviation for the ** arr** array is

**13.832**

## Wrapping Up

In this R tutorial for statistics, we learned how to calculate standard deviation in RStudiuo for imported Excel datasets, data frames, vectors, and arrays.

Though calculating the standard deviation in SPSS or Excel can be somehow more straightforward, R gives us a lot of flexibility and control over the data we input and manipulate.

*I hope you found some value in this R tutorial. If so, kindly help spread the knowledge by sharing this article with your friends and colleagues.*

**Cite this article on your website or research paper:**

[citationic]

## References

Field, A., Miles, J., & Field, Z. (2012). *Discovering statistics using R*. SAGE Publications.