In this practical example of linear regression in R, we will learn how to predict the fuel efficiency of a car based on its weight. We will start by importing a dataset, calculating linear regression using the lm()
function and making predictions using the predict()
function, and learning how to interpret the linear regression results in R.
Though we will use mtcars
R demo dataset to demonstrate how to calculate liniar regression, remember that you can use any R datasets available that contains a predictor variable and a response variable.
Without further ado, launch R or R Studio on your computer and let’s get started.
Step 1: Import a Dataset in R
To get started, we need a dataset to work with. We will use the mtcars
dataset, which contains the weight and fuel efficiency (in miles per gallon) of different cars. This dataset is built-in to R and can be loaded using the data()
function. Type the following in the R shell:
data(mtcars)
You can take a look at the data by using the head()
function, which will show you the first few rows of the dataset:
head(mtcars)
The output should look something like this:
mpg cyl disp hp drat wt qsec vs am gear carb
Mazda RX4 21.0 6 160 110 3.90 2.620 16.46 0 1 4 4
Mazda RX4 Wag 21.0 6 160 110 3.90 2.875 17.02 0 1 4 4
Datsun 710 22.8 4 108 93 3.85 2.320 18.61 1 1 4 1
Hornet 4 Drive 21.4 6 258 110 3.08 3.215 19.44 1 0 3 1
Hornet Sportabout 18.7 8 360 175 3.15 3.440 17.02 0 0 3 2
Valiant 18.1 6 225 105 2.76 3.460 20.22 1 0 3 1
As you can see, the dataset contains information on the mpg, cyl, disp, hp, drat, wt, qsec, vs, am, gear and carb for different car brands.
Step 2: Calculate Linear Regression in R
Now that we have our data loaded, we can start performing linear regression. To perform linear regression, we use the lm()
function.
model <- lm(mpg ~ wt, data = mtcars)
The first argument of the function is a formula that specifies the model. In this case, the model is predicting mpg
(fuel efficiency) using wt
(weight). The data = mtcars
argument specifies that the data set to use is mtcars
.
Step 3: Get the Summary of the Regression Model
Once you have fit the model, you can get a summary of the model by using the summary()
function. The summary includes information on the residuals, coefficients, R-squared value, F-statistic, and p-value.
Here is an example of how to get the summary of the model:
summary(model)
This summary of the linear regression model should look like this:
Call:
lm(formula = mpg ~ wt, data = mtcars)
Residuals:
Min 1Q Median 3Q Max
-4.5275 -2.3279 -0.4826 1.2975 6.8724
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 37.28536 1.82451 20.527 < 2e-16 ***
wt -5.34447 0.55342 -9.659 1.29e-10 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 2.576 on 30 degrees of freedom
Multiple R-squared: 0.7528, Adjusted R-squared: 0.7448
F-statistic: 93.59 on 1 and 30 DF, p-value: 1.294e-10
Step 4: Interpret the Linear Regression Results
Now that we got our output, here is how you should interpret the linear regression results for our example:
- The output shows that the coefficient for wt (weight) coefficient is -5.34447, and the intercept is 37.28536.
- The p-value is less than 0.05, indicating that the relationship between weight and fuel efficiency is statistically significant.
- The R-squared value, which measures the proportion of the variation in the response variable explained by the predictor variable, is 0.7528. This means that the car’s weight can explain 75.28% of the variation in fuel efficiency.
Step 5: Plot the Regression Line in a Graph
Just using numbers without visualizing them in a graph is not fun. You can plot the regression line in a graph using the ggplot2
package in R. Here is a guide on how to install packages in R, just in case you need it.
library(ggplot2)
# Create a scatterplot
ggplot(mtcars, aes(x=wt, y=mpg)) +
geom_point() +
geom_smooth(method='lm', se=FALSE) +
ggtitle("Linear Regression of mpg vs wt")
The scatterplot shows the relationship between weight (wt
) and fuel efficiency (mpg
). The line shows the regression line, which represents the best fit line through the data.
The line is based on the coefficients from the linear regression model that we fitted earlier. The scatterplot provides a visual representation of the relationship between the predictor and response variables, and the regression line provides a summary of that relationship.
Step 6: Using Liniar Regression in R to Make Predictions
Now that we have our linear regression model in R, it is time to use it to make predictions using the following syntax:
predictions <- predict(lm_fit, newdata=data.frame(wt=c(3,4)))
predictions
The output of this code will be the predicted values of mpg
for two values of wt
where:
predict
is the R function used to make predictions based on a linear regression model.lm_fit
is the object that stores the fitted linear regression model. In this example, lm_fit is the object created from the linear regression analysis using the lm function.newdata
is an argument that specifies the values of the independent variable (in this case, wt) for which you want to make predictions. The values are passed in as a data frame using the data.frame function. The values for wt in this example are c(3,4), meaning the predictions will be made for cars weighing 3,000 and 4,000 pounds, respectively.
The output of this code will be the predicted values of the dependent variable (in this case, mpg
) based on the values of the independent variable specified in newdata
.
And here is the output for this prediction using the function above:
1 2
22.56687 20.80958
These numbers represent the predicted values of mpg
for two cars with weights of 3,000 pounds and 4,000 pounds, respectively. The predicted values can be interpreted as follows:
- For a car weighing 3,000 pounds, the linear regression model predicts a value of 22.57 mpg.
- For a car weighing 4,000 pounds, the linear regression model predicts a value of 20.81 mpg.
It’s important to note that these are only predictions and may not necessarily match the actual mpg values for these cars. However, the linear regression model provides us with a way to estimate the relationship between wt
and mpg
and make predictions based on this relationship. This can be useful for making decisions and predictions in real-world applications.
If needed, you can compare the predictions with the actual values by using the cbind() function to combine the predictions and actual values into a single data frame using the following syntax:
results <- cbind(predictions, mtcars$mpg)
Conclusion
Conducting linear regression in R is a powerful way to understand the relationship between variables and make predictions. In this article, we have shown how to perform linear regression in R using the lm()
function, how to make predictions using the predict()
function and interpret the results.