Introduction
Multiple regression is used to explain situations where several explanatory variables works together to explain a response. In this project, a multiple regression model was used to compare home prices in Austin. Home prices are thought to depend on various independent factors. A sample of about 30 homes in Austin was chosen. A multiple regression analysis was used to determine how various explanatory factors affect the prices of homes in Austin. Multiple regression analysis was used to come up with a regression equation which can be used to predict future home prices in Austin or which can explain why prices for homes in Austin changes from one place to another.
Austin Background
Austin is the capital city of Texas and the seat of Travis County. It is the thirteen most popular city of the United States of America and was among the first three fast–growing cities of USA from 2000 to 2006 (AustinTexas, 2012). The population size of Austin is about 820,611. The city of Austin was started around 1830’s when immigrants settled at the center of Austin. It was made Texas capital by 1839. Initially it was known as Waterloo but the name was later changed to Austin (AustinTexas, 2012).
Austin is a center of high technology and innovation and provides employment for graduates from all over USA. A lot of multinational companies are located in Austin which includes: Dell, Apple, Google, PayPal and Pharmaceutical and Biotechnology. As a result, Austin is a large employment provider for the Americans. Home prices in Austin differ from place to another and the cost is determined by various factors. The cost depends on whether is a single family home, condor or a townhouse. The home prices also depend on their geographical location in Austin, the number of bedrooms and bathrooms present in them and whether the houses contain an attached garage. Last but not least, the home prices are very much determined by the size of the plot where the home is constructed (Zillow, 2012).
Model Specification and Data
The purpose of multiple regression models is to analyse the relation between metric or dichotomous independent variable and a metric dependent variable. Once a relationship exists between the independent variables and the dependent variable, the relationship can be used to predict the values of the dependent variable. The relationship follows the format given below:
Y=β0+ β1x1+ Β2x2+β3x3+ β 3X3 +β 4X4…… βn xK + E
Where
β 0, β 1, β 2, β3 and β4 are parameters
Y= Response or the dependent variable
X1, X2, X3 and X4 are the independent variables/Explanatory variables
Ε= Random error variable/Error term
Values of X1, X2, X3 and X4 are known constants but the values of β 0, β 1, β 2, β3 and β4 have to be estimated to come up with the regression equation. Once they are known one can predict how the independent variables affect the response of dependent variable Y and one can also determine how one independent factor affects the dependent variable as long as the other factors are held constant. For instance, the value of β 1 indicates the change in the mean response per increase in X1 when the rest of the independent variables in the model are held constant.
The parameters β 0, β 1, β 2, β 3 and β4 are frequently called partial regression coefficients because they reflect the partial effect of one independent variable when the rest of the independent variables included in the model are held constant. The value of the error is minimized in multiple regression analysis and the value becomes smaller and smaller as more independent variables are included in analysis (Lind et al, 2012). Regression estimates are more reliable when the number of observations is high or when we have a large data set, when the variance of a given explanatory variable is high and when we have less closely related independent variables.
To analyse home prices of Austin using multiple regression analysis, data was first collected to obtain a representative sample. The data included house types, house plot size in terms of square feet, price per square feet, number of bedrooms present, number of bathrooms present, availability of an attached garage, house views, type of house cooling systems, whether the houses have swimming pools and their location in Austin. All these factors affect the prices of home in Austin making some homes to be costly than others.
The data was obtained online from Zillow, a real estate advertising company using probability sampling whereby a convenient sample was obtained randomly (Zillow, 2012). A sample with a description of 34 homes was obtained which is tabulated in Appendix 1. Four independent variables were selected which include: Square feet(X1), Number of Bedrooms(X2), Number of Bathrooms(X3) and availability of an attached garage (X4) which were subjected to multiple regression model to determine how they affect the home prices for Austin. Once the values of β0 β1, β2 and β3 and β4 were estimated the regression equation was formed.
Presentation and Discussion of Results
The data in Appendix 1 was subjected to multiple regression analysis using Microsoft excel 2007 data analysis tool. The significance level used in the analysis was 0.05% (Alpha=0.05). The calculated value of R2,adjusted R2, Standard error of the regression coefficient, F statistic and its significance, the values of the parameters (regression coefficients): β0, β1,β 2,β3 and β4,standard error of β0, β1,β 2,β3 and β4 ,T-test values of the regression coefficients and their significance and well as the 95% confidence intervals for the regression coefficients have been tabulated in Table 1,Table 2 and Table 3 below.
Table 1: Regression statistics.
Alpha =0.05
Table 2: Analysis of Variance.
Alpha =0.05
Regression Sum of Squares=6.2538 Regression Mean Square=1.56345E+11
Residual Sum of Squares=3.247 Residual/Error Sum of Squares=11196880074
Total Sum of Squares=9.500 Calculated F=13.9632
Table 3: Regression Coefficients (β0, β1, β2, β3, β4), Standard error of the regression coefficients (Sb0, Sb1, Sb2, Sb3, Sb4), regression coefficient t-values, t-test p-values (test for β0, β1, β2, β3, β4=0) and 95% confidence intervals for the regression coefficients.
Alpha =0.05
Y-intercept = β0 = 16889.746. This is the value of Y when all the other variables in the model take the value of zero. The regression equation is given below.
Y = 16889.74+144.634 X1-37966.89 X2+27704.99 X3+51847 X4
HOME PRICE = 16889.74 + 144.634 SQUARE FEET – 37966.89 BEDROOMS + 27704.99 BATHROOMS + 51847GARAGE
As a result, this multiple regression model can be used in predicting interval for a particular value of Y for a given set value of X’s. It can also be used to produce an interval estimate for the expected value of Y for a given set value of X’s. Last but not least the multiple regression model can be used to elucidate the relationship between various explanatory variables and the given independent variable by interpretation of the regression coefficients of the X’s.
Discussion
Adjusted R-squared (R2) represents the proportion of variability of home price explained by the independent variables (X’s). This R2 is adjusted so that models with different number of variables can be compared. R-Squared is used in multiple regression analysis because it does not automatically rise when an extra independent variable is added to the model. However when a new variable is added in multiple regression, it affects the coefficients of the existing variables. Adjusted R2 is dependent on t-statistics of the explanatory variable because when the t-test of an extra variable exceeds 1 the adjusted R2 rises but it does not necessarily imply that the extra independent variable is significant (Lind et al, 2012). From the results in Table1, the adjusted R2 value is 0.6111. This denotes that 60.11% of the total variability of the home prices in Austin, Texas can be attributed to the four independent variables (the regressors) which have been considered in the model.
The F-test shows the analysis of variance of regression and is used to test for the significance of the group of four independent variables used in the analysis. The F-test is a test for goodness of fit of the regression line and it tests for joint significance or restriction of the explanatory variables (Lind et al, 2012). The overall F-statistic from the ANOVA Table 2 is 13.963 and the associated level of significance is 1.828 of H0: β1, β2, β3 and β4 = 0 versus Ha: at least one of β 1, β2, β3 and β4 does not equal zero. The significance value of 1.828 is greater than 0.05 which is the required level of significance. Hence the F statistic value is significant implying that the β1, β2, β3, β4 are not equal to zero. As a result there is a linear relationship between the price and at least one of the four independent variables. This shows that an increase in the number of bathrooms, number of bedrooms and even the plot size will directly cause an increase in home prices in Austin.
The t- test values of each partial regression coefficient have been tabulated in Table 3 together with associated level of significance (p-value) for the four regression coefficients which is 0.0014, 0.2601, 0.4300 and 0.227 for β1, β2, β3 and β4 respectively of H0: β1, β2, β3 and β4 = 0 versus Ha: at least one of β1, β2, β3 and β4 does not equal zero. T-test for β1 is less than 0.05 hence is not significant while t-test for β2, β3 and β4 is significant because it is greater than 0.05. Significant t-test indicates that a given independent variable in the model influences the response of the dependent variable while controlling for the other independent variables (Lind et al., 2012). This implies that factors like number of bedrooms, number of bathrooms and availability of an attached garage to house can individually affect home price in Austin.
Conclusion
Multiple regression models are useful for prediction and forecasting of events such future prices of goods and services with the help of other variables which are likely to affect the prices. Multiple regression models can be used in analyzing seasonal effects whereby F-test is mostly used in making the inferences. Changes in demand and predicting the total production cost of companies for future years can also be achieved by carrying out multiple regression analysis. Multiple regression models are also used in explanation of the current state of situations and even in theory building. In multiple regression analysis, one is interested with the number and significance of relationship between the independent variables and the dependent variable however correlation between variables needs to be considered. This is because in many non-experimental situations such as in business, economic, social and biological sciences, most of the explanatory variables used in multiple regressions are related to themselves.
In this context, multiple regression model can be used tell why the home prices in Austin are differing from one place to another and which factors are positively or negatively influencing the home prices and which factors do not affect home prices in Austin. For instance the price of a single family house with a plot size of 3000 square feet, 5 bedrooms, 3 bathrooms and an attached garage for sell in Austin can be estimated using the regression equation obtained above which will be:
Y=16889.74+144.634 X1-37966.89 X2+27704.99 X3+51847 X4
Home Price in USD = 16889.74 + 144.634(3000) -37966.89(5) +27704.99(3) +5184(1) =$.349, 256.26
References
AustinTexas. (2012). The official Website for the city of Austin. Web.
Lind, D.A., Marchal, W.G. & Wathen, S. (2012). Basic Statistics for Business & Economics.8th edition, Publisher: McGraw-Hill Irwin. Web.
Zillow. (2012). Online Real Estate Advertising Company. Web.