In this practical example of linear regression in R, we will learn how to predict the fuel efficiency of a car based on its weight. We will start by importing a dataset, calculating linear regression using the
lm() function and making predictions using the
predict() function, and learning how to interpret the linear regression results in R.
Though we will use
mtcars R demo dataset to demonstrate how to calculate liniar regression, remember that you can use any R datasets available that contains a predictor variable and a response variable.
Without further ado, launch R or R Studio and let’s get started.
Step 1: Import a Dataset in R
To get started, we need a dataset to work with. We will use the
mtcars dataset, which contains the weight and fuel efficiency (in miles per gallon) of different cars. This dataset is built-in to R and can be loaded using the
data() function. Type the following in the R shell:
You can take a look at the data by using the
head() function, which will show you the first few rows of the dataset:
The output should look something like this:
mpg cyl disp hp drat wt qsec vs am gear carb Mazda RX4 21.0 6 160 110 3.90 2.620 16.46 0 1 4 4 Mazda RX4 Wag 21.0 6 160 110 3.90 2.875 17.02 0 1 4 4 Datsun 710 22.8 4 108 93 3.85 2.320 18.61 1 1 4 1 Hornet 4 Drive 21.4 6 258 110 3.08 3.215 19.44 1 0 3 1 Hornet Sportabout 18.7 8 360 175 3.15 3.440 17.02 0 0 3 2 Valiant 18.1 6 225 105 2.76 3.460 20.22 1 0 3 1
As you can see, the dataset contains information on the mpg, cyl, disp, hp, drat, wt, qsec, vs, am, gear and carb for different car brands.
Step 2: Calculate Linear Regression in R
Now that we have our data loaded, we can start performing linear regression. To perform linear regression, we use the
model <- lm(mpg ~ wt, data = mtcars)
The first argument of the function is a formula that specifies the model. In this case, the model is predicting
mpg (fuel efficiency) using
wt (weight). The
data = mtcars argument specifies that the data set to use is
Step 3: Get the Summary of the Regression Model
Once you have fit the model, you can get a summary of the model by using the
summary() function. The summary includes information on the residuals, coefficients, R-squared value, F-statistic and p-value.
Here is an example of how to get the summary of the model:
This summary of the linear regression model should look like this:
Call: lm(formula = mpg ~ wt, data = mtcars) Residuals: Min 1Q Median 3Q Max -4.5275 -2.3279 -0.4826 1.2975 6.8724 Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) 37.28536 1.82451 20.527 < 2e-16 *** wt -5.34447 0.55342 -9.659 1.29e-10 *** --- Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 Residual standard error: 2.576 on 30 degrees of freedom Multiple R-squared: 0.7528, Adjusted R-squared: 0.7448 F-statistic: 93.59 on 1 and 30 DF, p-value: 1.294e-10
Step 4: Interpret the Linear Regression Results
Now that we got our output, here is how you should interpret the linear regression results for our example:
- The output shows that the coefficient for wt (weight) coefficient is -5.34447, and the intercept is 37.28536.
- The p-value is less than 0.05, indicating that the relationship between weight and fuel efficiency is statistically significant.
- The R-squared value, which measures the proportion of the variation in the response variable explained by the predictor variable, is 0.7528. This means that the car’s weight can explain 75.28% of the variation in fuel efficiency.
Step 5: Plot the Regression Line in a Graph
Just numbers without visualising them in a graph is not fun. You can plot the regression line in a graph using the
ggplot2 package in R – here is a guide on how to install packages in R, just in case you missed that.
library(ggplot2) # Create a scatterplot ggplot(mtcars, aes(x=wt, y=mpg)) + geom_point() + geom_smooth(method='lm', se=FALSE) + ggtitle("Linear Regression of mpg vs wt")
The scatterplot shows the relationship between weight (
wt) and fuel efficiency (
mpg). The line shows the regression line, which represents the best fit line through the data.
The line is based on the coefficients from the linear regression model that we fitted earlier. The scatterplot provides a visual representation of the relationship between the predictor and response variables, and the regression line provides a summary of that relationship.
Step 6: Using Liniar Regression in R to Make Predictions
Now that we have our linear regression model in R, it is time to use it to make predictions using the following syntax:
predictions <- predict(lm_fit, newdata=data.frame(wt=c(3,4))) predictions
The output of this code will be the predicted values of
mpg for two values of
predictis the R function used to make predictions based on a linear regression model.
lm_fitis the object that stores the fitted linear regression model. In this example, lm_fit is the object created from the linear regression analysis using the lm function.
newdatais an argument that specifies the values of the independent variable (in this case, wt) for which you want to make predictions. The values are passed in as a data frame using the data.frame function. The values for wt in this example are c(3,4), meaning the predictions will be made for cars weighing 3,000 and 4,000 pounds, respectively.
The output of this code will be the predicted values of the dependent variable (in this case,
mpg) based on the values of the independent variable specified in
And here is the output for this prediction using the function above:
1 2 22.56687 20.80958
These numbers represent the predicted values of
mpg for two cars with weights of 3,000 pounds and 4,000 pounds, respectively. The predicted values can be interpreted as follows:
- For a car with a weight of 3,000 pounds, the linear regression model predicts a value of 22.57 mpg.
- For a car with a weight of 4,000 pounds, the linear regression model predicts a value of 20.81 mpg.
It’s important to note that these are only predictions and may not necessarily match the actual mpg values for these cars. However, the linear regression model provides us with a way to estimate the relationship between
mpg and make predictions based on this relationship. This can be useful for making decisions and predictions in real-world applications.
If needed, you can compare the predictions with the actual values by using the cbind() function to combine the predictions and actual values into a single data frame using the following syntax:
results <- cbind(predictions, mtcars$mpg)
Conducting linear regression in R is a powerful way of understanding the relationship between variables and making predictions. In this article, we have shown how to perform linear regression in R using the
lm() function, how to make predictions using the
predict() function and interpret the results.