## Introduction

Regression analysis is a statistical tool that is used to develop and approximate linear relationships among various variables. Regression analysis formulates an association between a number of variables. When coming up with the model, it is necessary to separate between dependent and independent variables.

Regression models are used to predict trends of future variables. The paper carries out a simple regression analysis between the price of houses and the area of houses.

## Scatter diagram

A scatter diagram is a graph that plots two related variables on a Cartesian plane. The independent variable is plotted on the x – axis while the dependent variable is on the y – axis. In this case, the price of houses (in thousands) is plotted on the y – axis while the area of houses (in square feet) is plotted on the x – axis.

Scatter diagram tries to establish if there exists a linear relationship between two variables plotted on the diagram. This can be observed by looking at the trend of the scatter plots.

Points on the scatter diagram tend to slope upwards. It is an indication of a positive linear relationship between the prices of houses and the area of houses in square feet. This implies that as the area of houses increases, the price also increases. Few points fall on the regression line drawn in the scatter diagram.

There are several that fall outside the regression line. Further, there is a lot of concentration around 2000 square feet. This concentration affects the linearity of the model. This can be an indication of a weak regression line. A strong regression line is indicated by concentration of points along the regression line.

## Correlation coefficient

The correlation coefficient is 0.6364. The coefficient is positive and greater than five. This implies that there is a positive linear relationship between price of houses and the area of a house. That is, as the area of the houses increase, the price of a house also increases.

## Relationship between the variable

There are a number of factors that affect the price of houses. A direct factor is the area of the house. However, that is not the only factor that affects the price of houses. Examples of these other factors include the location of the house and proximity to various social amenities.

Only one factor will be used in this regression analysis thus yielding a simple regression. The dependent variable is the price of houses while the independent variable is the area of the houses (in square feet).

The regression line will attempt to establish a linear relationship between price of houses and the area of the houses. A sample of twenty houses is used to estimate the regression equation.

The regression line will take the form Y = b_{0} + b_{1}X

Y = Prices (in thousands)

X = Area (square feet)

The theoretical expectations are b_{0} can take any value and b_{1} > 0.

## Regression Results

The result of regression for each independent variable is shown in the table below.

From the above table, the regression equation can be written as Y = 160.40 + 0.0667X_{. }The intercept value of 160.39619 is not dependent on the area of the house but on other factors such as the location of the house. The value captures all other factors that were not included when modeling the regression line.

The coefficient value of 0.066744 implies that as the area (square feet) of the house increases by one unit, the price of the house will increase by 0.0667 units. The positive value of the coefficient implies a positive relationship between the price and area of houses.

## Evaluation of regression model

Evaluation of the regression model can be done by testing the statistical significance of the variables. Testing statistical significance shows whether the explanatory variable is a significant determinant of the price of houses. A t – test will be used since the sample size is small.

A two tailed t- test is carried out at 95% level of confidence.

Null hypothesis: Ho: b_{i} = 0

Alternative hypothesis: Ho: b_{i} ≠ 0

The table below summarizes the results of the t – tests.

The null hypothesis implies that the variables are not significant determinants of demand. The alternative hypothesis implies that variables are significant determinant of demand. From the table above, the values of t – calculated are greater than the values of t – tabulated.

Therefore, the null hypothesis will be rejected and this implies that the area (in square feet) is a significant determinant of the explanatory variable. Thus, area (in square feet) is statistically significant at the 95% level of significance.

The value of the intercept is not relevant when testing the significance of the regression variables. Since the explanatory variable is statistically significant, it implies that the regression line can be used for prediction.

The regression model shows that the slope is not strong enough though the regression coefficient shows a positive relationship between the prices and the area of houses.

Thus, the model can be used in predicting the prices since as the price increases there is a corresponding increase in areas of houses. More variables should be added to the regression line so as improve on the regression equation.

## R-square value

Coefficient of determination estimates the amount of variations of the dependent variable explained by the independent variables. A high coefficient of determination implies that the explanatory variables adequately explain variations the demand function.

A low value of coefficient of determination implies that the explanatory variables do not explain the variations in price of houses adequately. For this regression, the value of R^{2} is 40.5%. This implies that the area (in square feet) explains only 40.5% of the variation in price of houses.

It is an indication of a weak explanatory variable. Also, the value of adjusted R^{2} is low at 37.20%. The value of R^{2} can be improved on by adding more variables in the regression model.

## Regression equation

The regression line is Y = 160.39619 + 0.06674X_{.}

## Prediction

From the regression line, prices of houses can be estimated. For instance, the regression line can be used to estimate the price of a house whose size is 3000 square feet. The computation of the price is shown below.

Regression equation Y = 160.39619 + 0.06674 X

The house price ($ 1, 000) = (0.0667 × House area (square feet)) + 160.3962

= (0.0667 × 3, 000) + 160.396 = 360.63114

= 360.63114 × 1, 000

= $360,631.14

A house of size 3,000 square feet house will cost $360,631.14.