Simple Example of Linear Regression in R

In this practical example of linear regression in R, we will learn how to predict the fuel efficiency of a car based on its weight. We will start by importing a dataset, calculating linear regression using the lm() function and making predictions using the predict() function, and learning how to interpret the linear regression results in R.

Though we will use mtcars R demo dataset to demonstrate how to calculate liniar regression, remember that you can use any R datasets available that contains a predictor variable and a response variable.

Without further ado, launch R or R Studio on your computer and let’s get started.

Step 1: Import a Dataset in R

To get started, we need a dataset to work with. We will use the mtcars dataset, which contains the weight and fuel efficiency (in miles per gallon) of different cars. This dataset is built-in to R and can be loaded using the data() function. Type the following in the R shell:

data(mtcars)

You can take a look at the data by using the head() function, which will show you the first few rows of the dataset:

head(mtcars)

The output should look something like this:

                   mpg cyl disp  hp drat    wt  qsec vs am gear carb
Mazda RX4         21.0   6  160 110 3.90 2.620 16.46  0  1    4    4
Mazda RX4 Wag     21.0   6  160 110 3.90 2.875 17.02  0  1    4    4
Datsun 710        22.8   4  108  93 3.85 2.320 18.61  1  1    4    1
Hornet 4 Drive    21.4   6  258 110 3.08 3.215 19.44  1  0    3    1
Hornet Sportabout 18.7   8  360 175 3.15 3.440 17.02  0  0    3    2
Valiant           18.1   6  225 105 2.76 3.460 20.22  1  0    3    1

As you can see, the dataset contains information on the mpg, cyl, disp, hp, drat, wt, qsec, vs, am, gear and carb for different car brands.

Step 2: Calculate Linear Regression in R

Now that we have our data loaded, we can start performing linear regression. To perform linear regression, we use the lm() function.

model <- lm(mpg ~ wt, data = mtcars)

The first argument of the function is a formula that specifies the model. In this case, the model is predicting mpg (fuel efficiency) using wt (weight). The data = mtcars argument specifies that the data set to use is mtcars.

Step 3: Get the Summary of the Regression Model

Once you have fit the model, you can get a summary of the model by using the summary() function. The summary includes information on the residuals, coefficients, R-squared value, F-statistic, and p-value.

Here is an example of how to get the summary of the model:

summary(model)

This summary of the linear regression model should look like this:

Call:
lm(formula = mpg ~ wt, data = mtcars)

Residuals:
    Min      1Q  Median      3Q     Max 
-4.5275 -2.3279 -0.4826  1.2975  6.8724 

Coefficients:
            Estimate Std. Error t value Pr(>|t|)    
(Intercept) 37.28536    1.82451  20.527  < 2e-16 ***
wt           -5.34447    0.55342  -9.659 1.29e-10 ***
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 2.576 on 30 degrees of freedom
Multiple R-squared:  0.7528,	Adjusted R-squared:  0.7448 
F-statistic: 93.59 on 1 and 30 DF,  p-value: 1.294e-10

Step 4: Interpret the Linear Regression Results

Now that we got our output, here is how you should interpret the linear regression results for our example:

  • The output shows that the coefficient for wt (weight) coefficient is -5.34447, and the intercept is 37.28536.
  • The p-value is less than 0.05, indicating that the relationship between weight and fuel efficiency is statistically significant.
  • The R-squared value, which measures the proportion of the variation in the response variable explained by the predictor variable, is 0.7528. This means that the car’s weight can explain 75.28% of the variation in fuel efficiency.

Step 5: Plot the Regression Line in a Graph

Just using numbers without visualizing them in a graph is not fun. You can plot the regression line in a graph using the ggplot2 package in R. Here is a guide on how to install packages in R, just in case you need it.

library(ggplot2)

# Create a scatterplot
ggplot(mtcars, aes(x=wt, y=mpg)) +
  geom_point() +
  geom_smooth(method='lm', se=FALSE) +
  ggtitle("Linear Regression of mpg vs wt")

The scatterplot shows the relationship between weight (wt) and fuel efficiency (mpg). The line shows the regression line, which represents the best fit line through the data.

The line is based on the coefficients from the linear regression model that we fitted earlier. The scatterplot provides a visual representation of the relationship between the predictor and response variables, and the regression line provides a summary of that relationship.

Step 6: Using Liniar Regression in R to Make Predictions

Now that we have our linear regression model in R, it is time to use it to make predictions using the following syntax:

predictions <- predict(lm_fit, newdata=data.frame(wt=c(3,4)))
predictions

The output of this code will be the predicted values of mpg for two values of wt where:

  • predict is the R function used to make predictions based on a linear regression model.
  • lm_fit is the object that stores the fitted linear regression model. In this example, lm_fit is the object created from the linear regression analysis using the lm function.
  • newdata is an argument that specifies the values of the independent variable (in this case, wt) for which you want to make predictions. The values are passed in as a data frame using the data.frame function. The values for wt in this example are c(3,4), meaning the predictions will be made for cars weighing 3,000 and 4,000 pounds, respectively.

The output of this code will be the predicted values of the dependent variable (in this case, mpg) based on the values of the independent variable specified in newdata.

And here is the output for this prediction using the function above:

     1         2 
22.56687 20.80958 

These numbers represent the predicted values of mpg for two cars with weights of 3,000 pounds and 4,000 pounds, respectively. The predicted values can be interpreted as follows:

  • For a car weighing 3,000 pounds, the linear regression model predicts a value of 22.57 mpg.
  • For a car weighing 4,000 pounds, the linear regression model predicts a value of 20.81 mpg.

It’s important to note that these are only predictions and may not necessarily match the actual mpg values for these cars. However, the linear regression model provides us with a way to estimate the relationship between wt and mpg and make predictions based on this relationship. This can be useful for making decisions and predictions in real-world applications.

If needed, you can compare the predictions with the actual values by using the cbind() function to combine the predictions and actual values into a single data frame using the following syntax:

results <- cbind(predictions, mtcars$mpg)

Conclusion

Conducting linear regression in R is a powerful way to understand the relationship between variables and make predictions. In this article, we have shown how to perform linear regression in R using the lm() function, how to make predictions using the predict() function and interpret the results.