Introduction
To further analyze and understand the customers’ buying habits, Diligent Consulting Group was requested to analyze the factors that influence the Annual Amount Spent on Organic Food. The first three reports provided a better understanding of customers’ buying habits through simple linear regression analysis. This report further reinforces the purpose of Loving Organic Foods by interpreting additional information, such as explanatory variables, to explain customer motives.
A Brief Comparison of Simple Linear Regression and Multiple Linear Regression
Simple linear regression is used to identify the interdependence of two variables. Alexander et al. (2017) note that linear regression for two variables is based on a linear equation with one independent variable. The equation has the form:
Y = a + b * x, where a and b are constant numbers.
Multiple linear regression is the analysis of the relationship between several independent variables and the dependent variable. In multiple linear regression, the prerequisites of regression analysis and its implementation are completely the same as simple linear regression. A feature of multiple regression is the correlation of independent variables. This method corresponds to the simultaneous processing of all independent variables selected for analysis.
According to Alexander et al. (2017), the multiple linear regression model can be stated by the equation:
yi = β0 + β1X1i + β2X2 i+ ⋯ +βkXki + εi ,
where β0 is the intercept, βi‘s are the slope between Y and the appropriate Xi, and ε is the error term that captures errors in measurement of Y and the effect on Y of any variables missing from the equation that would contribute to explaining variations in Y.
Therefore, multiple linear regression is one of the most common analyzes because, as a rule, several factors simultaneously influence the final result.
The regression output generated in Excel
Interpretation of the coefficient of determination (r-squared) and the global test for statistical significance (the F-test)
Based on Regression Statistics, the following interpretation can be made. The coefficient of determination (r-squared) is equal to 69 % of the total spread in the Annual Amount Spent on Organic Food.
- H0: none of the beta estimates are statistically significant
- HA: at least one of the beta estimates is statistically significant
When considering ANOVA analysis, the global test for statistical significance (the F-test) reflects the p-value. Based on the indicator 2, 44119367422295E-29, we can reject H0 and accept HA. Thus, the overall regression equation is statistically significant.
Interpretation of the coefficient estimates for all the independent variables
Analysis of the coefficient estimate shows that when Age increases by one unit and other independent variables remain fixed, the Annual Amount Spent on Organic Food grows by 14, 12 units. In turn, with an increase in Annual Income by one unit, and other independent variables remain unchanged, the Annual Amount Spent on Organic Food rises by 0, 02 units. Moreover, when the Number of People in Household increases by one unit and the other independent variables remain fixed, the Annual Amount Spent on Organic Food enhances by 2222, 51 units. When Male customers and other independent variables remain fixed, the Annual Amount Spent on Organic Food is negative 1932 units. When Female customers and other independent variables remain fixed, the Annual Amount Spent on Organic Food is negative 1892 units (coef Gender – coef Intercept).
Interpretation of the statistical significance of the coefficient estimates for all the independent variables
- H0: B=0 (coefficient estimate is not statistically significant)
- HA: B is not equal to 0 (coefficient estimate is statistically significant)
Since the p-values correspond to Age and Gender are 0, 23 and 0, 92, respectively, this indicates their insignificant presence in this regression model.
In turn, the p-values corresponding to the Number of People in Household and Annual Income are 4.54 and 4.52, respectively, so the Number of People in Household and Annual Income is significantly present in this model.
The regression equation with estimates substituted into the equation
Annual Amount Spent = -1932 + 14 * Age +0.02 * Annual Income + 2223 * Number of People in Household + 41 * Gender.
An estimate of “Annual Amount Spent on Organic Food” for the average consumer
“Annual Amount Spent on Organic Food” for the average consumer = -1932 + 14 * 48,23 + 0.02 * 161006,62 + 2223 * 4,31 + 41 * 0,57 = 11567,85.
A discussion of whether or not the coefficient estimate on the Age variable in this estimation is different than it was in the simple linear regression model from Module 3 Case
The coefficient of the independent variable (Age) in the simple linear regression equation is different from the coefficient of the corresponding variable in the multiple regression equation. Olive (2017) asserts that this is since the influence of all other features taken into account in this equation is excluded in the linear regression equation.
References
Alexander, H., Illowsky, B., & Dean, S. (2017). Introductory business statistics. Openstax. Web.
Olive, D.J. (2017). Multiple Linear Regression. In: Linear Regression. Springer, Cham. Web.