Whether you are a student working on a research paper or just learning statistics for passion and wonder how to calculate a linear regression in SPSS with a few simple steps, this tutorial is for you!
But we won’t stop there. We will take one step further and learn what regression analysis can be used for, what regression analysis shows, and understand the significance of each term and value in the results.
There are two types of linear regression: simple linear regression and multiple linear regression. In this article, we will conduct a simple linear regression analysis in SPSS.
Learning Outcomes
Upon completing this lesson, you will learn:
- How to prepare and import a data set in SPSS
- How to calculate a linear regression in SPSS
- Understand and interpret the regression analysis results
Without further ado, let’s get started.
What is Linear Regression Analysis
You probably heard the term regression analysis used in your statistics class and thought it must be something difficult, right?
Well, I am here to tell you that regression analysis is actually pretty simple and one of the first predictive techniques you will need to learn to become a researcher or data scientist.
Among all types of regression analysis out there, linear regression is arguably one of the most basic and widely used types of predictive analysis and linear modeling.
As the name implies, linear regression uses a line (also called the regression line) to measure the relationship between one or more variables. Think of this relationship as the cause (independent variable) and effect (dependent variable) where linear regression generates a line to show the outcome.
In statistics, the independent variable is often called predictor or explanatory variable. The dependent variable is sometimes referred to as the predicted or outcome variable – among other names.
In the context of a research paper, this prediction is formulated through a hypothesis, e.g., the impact of advertising on revenue.
As mentioned before, there are two types of linear regression respectively simple and multiple linear regression.
Though our focus here is on simple linear regression, the difference between the two is that simple linear regression uses ONE independent variable to predict an outcome while multiple linear regression uses two or more independent variables.
When you are asked to perform a linear regression analysis
That’s everything you need to know about linear regression analysis for now. Next, we will learn how to import a data set into SPSS.
Import Data In SPSS
As this is a hands-on tutorial on calculating a linear regression analysis in SPSS, we will need some data to generate a regression line. Don’t worry if you do not have such a file at hand; I have already done that for you in Excel.
Let’s assume we want to investigate the effect of Advertising on Sales for a given company. Here is what the Excel dataset sample you downloaded above looks like.
Assuming you downloaded the Excel data set above, open SPSS Statistics, and in the top menu, navigate to File → Import Data → Excel.
Browse to the location of the sample Excel file, select it, and click Open. Click OK when prompted to read the Excel file. Once the data set is imported into SPSS, it should look like this:
Now, let’s find out how to calculate a linear regression in SPSS. On the SPSS top menu navigate to Analyze → Regression → Linear.
Next, we have to instruct SPSS which is our dependent and dependent variable in the data set.
Remember, in linear regression, we investigate a causal relationship between an independent variable and a dependent variable. In our Excel example, the independent variable is Marketing (cause) and the dependent variable is Sales (effect). In other words, we want to predict if the Sales variable is affected by any changes in the Marketing variable.
In the Linear Regression window, select the Sales variable and click the arrow button next to the Dependent box to add Sales as the dependent variable in the regression analysis.
Do the same for the Marketing variable but this time click the arrow next to the Independent box. Your regression analysis window should look like this:
We can use other input options to customize the linear regression analysis further, e.g., Method, Statistics, Plots, Style, etc. For now, we will keep things simple and choose the default settings as they are sufficient for this case.
Click OK to start the analysis. And here is how the linear regression analysis results look like in SPSS:
Pretty simple right? We have the independent and dependent variables used in this analysis, the Model Summary, ANOVA, Coefficients, and a bunch of terms and numbers. SPSS did its part. Now is our turn to understand what does regression analysis shows.
Understanding Linear Regression Analysis Results
Take a deep breath, and let’s understand what our regression analysis results tell us. Don’t worry, by the end of this section, you will know precisely the meaning of each term and value and which are the most important aspects of the regression analysis to be included in your research paper.
- The Variables Entered/Removed table shows a descriptive summary of the linear regression analysis.
- Model 1 (Enter) simply means that all the requested variables were entered in a single step and are given equal importance. The Enter model is commonly used in regression analysis reason why Model 1 is the default regression model in SPSS.
- The Variables Entered shows the independent variable (Marketing) used in this analysis. No variables were removed therefore the Variables Removed column is blank.
- In SPSS, the dependent variable, in our case Sales is specified under the descriptive table.
- Model Summary table tells us a summary of the results of regression analysis in SPSS.
- R refers to the correlation variables Correlation is important for regression analysis because we can presume that one variable affects another one if both variables are correlated. If two variables are not correlated, it is probably pointless to look for a cause-and-effect relationship.
Correlation does not warrant a cause and effect relationship but is a necessary condition for a causal relationship to exist.
The R-value ranges from -1 to +1 where -1 is a perfect negative correlation, +1 is a perfect positive correlation and 0 represents no linear correlation between variables.
In our case, R = 0.663 shows the variables Marketing and Sales are correlated.
- R Square measures the total influence of independent variables on the dependent variable. Note that the R Square value is explained in percentage (%). For instance, in our example, the R Square = 0.440 which means 44% percent of the Sales are influenced by the Marketing strategy of the company.
- Adjusted R Square is a penalty applied in case your model is non-parsimonious. In simple words, if your research conceptual framework contains variables that are unnecessary to predict an outcome, there will be a penalty for it expressed in the Adjusted R Square value.
In our example, the difference between the Adjusted R Square (0.416) and the R Saure (0.440) is 0.024 which is insignificant.
Remember, we should look for simple and not convoluted explanations for the phenomenon under investigation.
- Standard Error of the Estimate refers to how accurate the prediction around the regression line is. If your Standard Error value is between -2 and +2 then the regression line is considered to be closer to the true value.
In our case, the Standard Error of the Estimate = 1.40 which is a good thing.
- ANOVA test is a precursor to linear regression analysis. In other words, it tells us if the linear regression results we got using our sample can be generalized to the population that the sample represents.
Keep in mind that for a linear regression analysis to be valid, the ANOVA result should be significant (<0.05).
- Sum of Squares measures how much the data points in your set deviate from the regression line and helps you understand how well a regression model represents the modeled data.
The rule of thumb for both Regression and Residual Sum of Squares is that the lower the value, the better the data represents your model.
Keep in mind that the Sum of Squares will always be a positive number, with 0 being the lowest value and representing the best model fit.
- DF in ANOVA stands for Degree of Freedom. In simple words, DF shows the number of independent values that were used to calculate the estimate.
Keep in mind that a lower sample size usually means a lower degree of freedom (such as our example). In contrast, a larger sample size allows for a higher degree of freedom which can be useful in rejecting a false null hypothesis and yielding a significant result.
- Mean Square in ANOVA is used to determine the significance of treatments (factors, respectively the variation in between the sample means. The Mean Square is important in calculating the F ratio.
- F test in ANOVA is used to find if the means between two populations are significantly different. The F value calculated from the data (F=18.071) is usually referred to as F-statistics and is useful when looking into rejecting a null hypothesis.
- Sig. stands for Significance. If you don’t want to get into the nuts and bolts of the ANOVA test, this is probably the column in the ANOVA result you would like to check first. A Sig. value <0.05 is considered significant. In our example, Sig. = 0.000 which is less than <0.05, therefore, significant.
Finally, we are ready to move to the regression analysis results table in SPSS.
- In the Coefficients table, only one value is essential for interpretation: the Sig. value respectively the last column – so let’s start with it first.
- Sig. also known as the p-value shows the level of significance that the independent variable has on the dependent variable. Similar to ANOVA, if the Sig. value is < 0.05, there is a significance between the variables in the linear regression.
In our case Sig. = 0.000 shows a strong significance between the independent variable (Marketing) and dependent variable (Sales).
- Unstandardized B (Beta) basically represents the slope of the regression line between independent and dependent variables and tells us for one unit increase in the independent variable how much the dependent variable will increase.
In our case, for every one-unit increase in Marketing, the Sales will increase 0.808. The unit increase can be expressed in, e.g., currency.
The Constant row in the Coefficients table shows the value of the dependent variable when the independent variable = 0.
- Coefficients Std. Error is similar to the standard deviation for a mean. This can get complicated quickly so I will simplify it for you.
The larger the Standard Error value is the more spread the data points on the regression line are. The more spread out the data points, the less likely significance will be found between variables.
- Standardized Coefficients Beta value ranges from -1 to +1 with 0 meaning no relationship; 0 to -1 meaning negative relationship and 0 to +1 positive relationship. The closed the Standardized Coefficient Beta value to -1 or +1, the stronger the relationship between variables.
In our case, the Standardized Coefficient Beta = 0.663 shows a positive relationship between the independent variable (Marketing) and the dependent variable (Sales).
- t represents the t-test and is used to calculate the p-value (Sig.). In broader terms, the t-test is used to compare the mean value of two data sets and determine if they originate from the same population.
Wrapping Up
As you can see, learning how to calculate a linear regression in SPSS is not difficult. On the other hand, understanding the linear regression output can be a bit challenging, especially if you don’t know which values are relevant to your analysis.
The most important thing to keep in mind when assessing the result of your linear regression analysis is to look for statistical significance (Sig. <0.05). Even if the rest of the values don’t mean much to you right now, trust me, they will eventually will.