Linear regression is a type of data analysis that considers the linear relationship between a dependent variable and one or more independent variables. It is typically used to visually show the strength of the relationship or correlation between various factors and the dispersion of results – all for the purpose of explaining the behavior of the dependent variable. The goal of a linear regression model is to estimate the magnitude of a relationship between variables and whether or not it is statistically significant. Say we wanted to test the strength of the relationship between the amount of ice cream eaten and obesity. We would take the independent variable, the amount of ice cream, and relate it to the dependent variable, obesity, to see if there was a relationship. Given a regression is a graphical display of this relationship, the lower the variability in the data, the stronger the relationship and the tighter the fit to the regression line. In finance, linear regression is used to determine relationships between asset prices and economic data across a range of applications. For instance, it is used to determine the factor weights in the Fama-French Model and is the basis for determining the Beta of a stock in the capital asset pricing model (CAPM). Here, we look at how to use data imported into Microsoft Excel to perform a linear regression and how to interpret the results.
There are a few critical assumptions about your data set that must be true to proceed with a regression analysis. Otherwise, the results will be interpreted incorrectly or they will exhibit bias:
If those three points sound complicated, they can be. But the effect of one of those considerations not being true is a biased estimate. Essentially, you would misstate the relationship you are measuring. The first step in running regression analysis in Excel is to double-check that the free Excel plugin Data Analysis ToolPak is installed. This plugin makes calculating a range of statistics very easy. It is not required to chart a linear regression line, but it makes creating statistics tables simpler. To verify if installed, select "Data" from the toolbar. If "Data Analysis" is an option, the feature is installed and ready to use. If not installed, you can request this option by clicking on the Office button and selecting "Excel options". Using the Data Analysis ToolPak, creating a regression output is just a few clicks.
The independent variable in Excel goes in the X range. Given the S&P 500 returns, say we want to know if we can estimate the strength and relationship of Visa (V) stock returns. The Visa (V) stock returns data populates column 1 as the dependent variable. S&P 500 returns data populates column 2 as the independent variable.
[Note: If the table seems small, right-click the image and open in new tab for higher resolution.] Using that data (the same from our R-squared article), we get the following table: The R2 value, also known as the coefficient of determination, measures the proportion of variation in the dependent variable explained by the independent variable or how well the regression model fits the data. The R2 value ranges from 0 to 1, and a higher value indicates a better fit. The p-value, or probability value, also ranges from 0 to 1 and indicates if the test is significant. In contrast to the R2 value, a smaller p-value is favorable as it indicates a correlation between the dependent and independent variables.
The bottom line here is that changes in Visa stock seem to be highly correlated with the S&P 500.
However, an analyst at this point may heed a bit of caution for the following reasons:
We can chart a regression in Excel by highlighting the data and charting it as a scatter plot. To add a regression line, choose "Add Chart Element" from the "Chart Design" menu. In the dialog box, select "Trendline" and then "Linear Trendline". To add the R2 value, select "More Trendline Options" from the "Trendline menu. Lastly, select "Display R-squared value on chart". The visual result sums up the strength of the relationship, albeit at the expense of not providing as much detail as the table above.
The output of a regression model will produce various numerical results. The coefficients (or betas) tell you the association between an independent variable and the dependent variable, holding everything else constant. If the coefficient is, say, +0.12, it tells you that every 1-point change in that variable corresponds with a 0.12 change in the dependent variable in the same direction. If it were instead -3.00, it would mean a 1-point change in the explanatory variable results in a 3x change in the dependent variable, in the opposite direction.
In addition to producing beta coefficients, a regression output will also indicate tests of statistical significance based on the standard error of each coefficient (such as the p-value and confidence intervals). Often, analysts use a p-value of 0.05 or less to indicate significance; if the p-value is greater, then you cannot rule out chance or randomness for the resultant beta coefficient. Other tests of significance in a regression model can be t-tests for each variable, as well as an F-statistic or chi-square for the joint significance of all variables in the model together.
R2 (R-squared) is a statistical measure of the goodness of fit of a linear regression model (from 0.00 to 1.00), also known as the coefficient of determination. In general, the higher the R2, the better the model's fit. The R-squared can also be interpreted as how much of the variation in the dependent variable is explained by the independent (explanatory) variables in the model. Thus, an R-square of 0.50 suggests that half of all of the variation observed in the dependent variable can be explained by the dependent variable(s). |