The Analysis of the Annual Amount Spent on Organic Food Using Multiple Linear Regression Essay

Exclusively available on Available only on IvyPanda® • No AI

Table of Contents

Introduction
Regression Output
The F-test
Coefficient Estimates Interpretation
Statistical Significance of the Coefficient Estimates
Multiple Regression Equation
Coefficient of Elasticity
Conclusion
Reference

Introduction

Customers’ purchasing habits are influenced by a number of factors, including age, gender, and the level of income. By understanding the significance of each quantitative and qualitative variable, a viable prediction of buyer’s preferences may be obtained. The given report analyzes the variables that may impact the annual amount spent on organic food using multiple linear regression. In contrast to simple linear regression that takes into consideration only one independent variable and one dependent variable (both are quantitative), multiple linear regression studies the relationships of several explanatory variables (both qualitative and quantitative) to predict the outcome of a response variable.

Thus, multiple linear regression is a more reliable statistical tool for explaining the dependent variable. The given report runs multiple linear regression to predict annual expenditures on organic food and discusses the results to provide a detailed picture of the influence that each independent variable has on the dependent one.

Regression Output

The regression output that has been generated in Excel is shown in Table 1 and Table 2, which represent regression statistics and coefficient estimates correspondingly.

Table 1. Regression Statistics.

Statistical Measures
Multiple R	0,830460276
R-Squared	0,68966427
Adjusted R-Squared	0,679232817
Standard Error	2111,587193
Observations	124

Table 2. Regression Coefficients.

	Coefficients	SE	t-statistic	p-value
Intercept	-1932,1086	978,5413	-1,97447830	0,050643486
Age	14,1230223	11,77664	1,19924034	0,232816957
Annual income	0,01667728	0,002635	6,328431	4,53506E-09
Number of people in household	2222,50684	153,2477	14,5027009	4,5157E-28
Gender	40,5005831	384,7118	0,10527510	0,916334763

The value of R-squared is equal to 0.689, which means that the independent variables collectively explain approximately 70% of the variance of the dependent variable. In other words, the strength of the relationship between the calculated regression model and the dependent variable can be rated as 70 out of 100.

The F-test

The F-test shows the overall significance of the multiple linear regression model and tests two hypotheses, which are a null hypothesis and an alternative hypothesis. The null hypothesis states that the model with no explanatory variables fits the data as well as the generated regression model. The alternative hypothesis states that the model containing only the value of the intercept is worse at explaining the variance of the dependent variable (Moore, Notz, & Fligner, 2015). Based on the p-value, the decision is made whether to accept or reject the null hypothesis. Given that the p-value for the F-test is equal to the alternative hypothesis needs to be accepted. This means that the fit of the intercept-only model is significantly reduced compared to the calculated regression model.

Coefficient Estimates Interpretation

Being equal to -1932.1, the value of the intercept does not have a meaningful interpretation as it just anchors the regression line in the right place. The age coefficient is equal to 14.123, which means that, all else being equal, a one-unit shift in age changes the dependent variable by 14.123 (for males). The annual income coefficient is equal to 0.0166, which means that, all else being equal, a one-unit shift in annual income changes the dependent variable by 0.0166 (for males).

The number of people in a household coefficient is equal to 2222.500, which means that, all else being equal, a one-unit shift in the number of people in a household changes the dependent variable by 2222.500 (for males). The gender coefficient of 40.500 shows an increase in the dependent variable if there is a shift from males to females. In other words, for females, the annual amount spent on organic food is greater by 40.500 than for males.

Statistical Significance of the Coefficient Estimates

P-values for the coefficient estimates test the null hypothesis that an independent variable has no correlation with the dependent variable. This means that there is no sufficient evidence to conclude that this explanatory variable has a significant impact on a response variable at the population level. At the significance level of 0.05, three coefficient estimates correlate with the dependent variable (see Table 2). These coefficients are as follows: the intercept ( ), the annual income and the number of people in a household .Since their p-values are greater than the significance level, age and gender variables should be excluded from the linear regression model as they do not have a meaningful impact on the dependent variable.

Multiple Regression Equation

The regression equation can be written as: y = -1932.108 + 14.12x₁ + 0,016x₂ + 2222.5x₃ + 40.5x₄, where y is the annual amount spent on organic food, x₁ is a customer’s age, x₂ is a customer’s annual income, x₃ is the number of people in a household, and x₄ is a customer’s gender. To estimate the annual amount spent on organic food by an average customer, one needs to determine the gender of a customer and find the average values of age, annual income, and the number of people in a household. For an average female, the average age is 48 years, the average annual income is $161006, and the average number of people in a household is 4. By substituting these values in the regression equation, the following equation will be obtained:

The calculated value of the annual organic food expenditures is equal to $11063.7946, whereas the calculated average value of the annual organic food is equal to $11046.48387. A little difference between these two values implies that the generated regression equation adequately explains the dependent variable.

For the multiple regression model, the coefficient estimate for the age variable is equal to 14.123, whereas, for the simple regression model, this coefficient is equal to 26.29. Such a difference may be explained by the presence of other variables in the multiple regression model that predict the variance of the dependent variable better. As a result, the age variable does not have such a significant impact as it has in the simple regression model. Moreover, given that the multiple regression model has a qualitative variable with two levels (males and females), the coefficient estimate for age is interpreted in relation to the male gender. In a simple regression model, however, gender has not been taken into account.

Coefficient of Elasticity

Coefficient of elasticity shows a percent change in a dependent variable resulting from a 1% increase in an independent variable. To generate an elasticity coefficient, the annual amount spent on organic food and the annual income variables have been logged. The regression output that has been generated in Excel is shown in Table 3 and Table 4, which represent regression statistics and coefficient estimates correspondingly. The p-values for the decimal logarithm of the annual income and the number of people in a household are equal to and correspondingly, which implies that these variables have a significant relationship with the dependent variable. However, neither gender nor age variables are significant at the significance level of 0.05.

Table 3. Regression Statistics.

Statistical Measures
Multiple R	0,875470796
R-Squared	0,766449115
Adjusted R-Squared	0,758598665
Standard Error	0,079392842
Observations	124

Table 4. Regression Coefficients.

	Coefficient	SE	t-statistics	p-value
Intercept	2,0922107	0,15777592	13,2606464	3,36765E-25
Age	0,0003550	0,00044448	0,79884159	0,425973762
Annual income	0,2894350	0,03085475	9,38056567	5,49173E-16
Number of people in household	0,0952047	0,00577063	16,4981564	1,54342E-32
Gender	0,00801313	0,014469957	0,553777663	0,580770084

By substituting the values presented in Table 4 in the regression equation, the following equation will be obtained: Log(Annual amount spent on organic food) = 2.0922 + 0.00035x₁ + 0.2894Log(Annual income) + 0.0952x₃ + 0.008x₄, where x₁, x₃, and x₄ are age, a number of people in a household, and gender correspondingly.

The value of the coefficient estimate for log(Annual Income) is equal to 0.2894. This means that a 1% increase in the annual income variable results in the 0.2894% increase in the annual amount spent on organic food variable. The value of R-squared is equal to 0.766, which means that the independent variables, including log(Annual income), collectively explain approximately 77% of the variance of the decimal logarithm of the dependent variable. It may be stated that the data is quite close to the fitted regression line.

Conclusion

The given report ran multiple linear regression to predict the annual amount spent on organic food by age, annual income, gender, and the number of people in a household. The generated model explains 70% of the variability of the dependent variable, which speaks of its high predictability power. However, it is worth mentioning that some independent variables do not have a significant impact on the annual expenditures on organic food and may be excluded from the regression equation.

Reference

Moore, D. S., Notz, W. I., & Fligner, M. A. (2015). The basic practice of statistics (7th ed.). New York, NY: W. H. Freeman & Company.