Updated:

Factors Influencing California House Prices Analyzed via Multiple Regression Essay

Exclusively available on Available only on IvyPanda® Written by Human No AI

Introduction

This report aims to identify the factors that influence the price or value of a house in California, using data on housing prices. The data are based on 1990 census data and provide information about houses in a California district (Zaky, 2019). The data consists of 20640 rows and 10 columns, featuring both continuous and categorical variables related to house prices in the area. The dataset contains ten variables, including the median house value, which serves as the outcome variable. The predictor variables include longitude, latitude, total rooms, total bedrooms, population, median income, households, housing median age, and proximity to the ocean.

Methodology

This report used multiple linear regression to predict house values in California. Prediction involves estimating an unknown outcome variable from predictor variables (Kang & Zhao, 2020). A multiple linear regression model (MLRM) is a supervised machine learning technique widely used across various fields for predicting continuous outcomes. This technique aims to model the causal relationship between a single output variable and multiple input variables.

The model is a straightforward algorithm that provides valuable insights into the relationship between the dependent and independent variables. The model’s expected outcome is a description of every unit change in the predictor variables that causes a specific change in the outcome variables. Multiple linear regression analysis provides a coefficient of determination (R-squared), which indicates the percentage of variation in the outcome variable accounted for by the overall model or its independent variables. The model in this report includes the median house value as the outcome variable and the remaining variables as predictor variables.

Results and Analysis

A multiple linear regression model was fitted in R Studio, and the results were generated, showing the cause-and-effect relationship between the outcome variable and the independent variables. The model implemented to generate the results is as follows;

  • model1<-lm(housing_median_age~longitude+latitude+total_rooms+total_bedrooms+population+households+median_income+ocean_proximity,data=house)
  • summary(model1)

The model is fit using the lm function, which fits linear models. The median age of housing is specified as the outcome variable, and the other variables are specified as input variables. The model code specifies the dataset for which the model is to be applied. A code is used to summarize the model and generate the model output, as shown in Table 1.

Table 1 presents a summary of the model, including the coefficient estimates for each independent variable and their respective p-values. The results show that when all predictor variables are not taken into consideration and are assumed to be zero, the mean median value of houses within a block in California is significant (p < 0.05) and is -$150.8. This implies that the median house value is negative when all the factors are constant. All predictor variables are significant (p-values < 0.05) in predicting the median house price in California. However, the proximity of the house to the island, when compared to <1H to the ocean, is not significant (p>0.05).

Results indicate a negative cause-and-effect relationship between longitude and house value (-2.260). This implies that the farther west the house is, the lower the price. Moving a house in California farther to the west results in a decrease in its price of $2,260. Houses farther to the north are worth less than those farther to the south. A unit increase far to the north causes the value of houses in California to drop by $2.275. An increase in the total number of rooms per block reduces house prices in California. An additional room within a block reduces the median price of houses by $0.0008374. Similarly, the more bedrooms a block has, the lower the house price in California. Adding one bedroom within a block results in a $0.008975 decrease in house price.

The total number of people residing within a block can negatively impact house values in California. An additional person in a residual block would reduce the median house value by $0.0006697. There is a positive relationship between the total number of households and the median house value in California. An additional household within a home unit increases the house value by $0.005682. Increasing the median household income in a block of houses by $1 causes house prices to fall by $ 11,100.

Houses with ocean proximity to the inland have a median value that is $3.102 lower than houses <1H proximate to the ocean. However, those with proximity near the bay are $8.639 more than houses <1H to the ocean. The value of houses near the ocean is $1.024 less than those <1H proximate to the ocean. The overall multiple linear regression model predicts median house prices significantly (F(11, 20421) = 600.8, p < 0.05). The model explains 24.41% of the variation in house prices in California. This implies that there is a 75.59% unaccounted-for variation in the median house value in California, not explained by the predictor variables in the model.

Multiple Regression Model to Predict California's House Value.
Table 1: Multiple Regression Model to Predict California’s House Value.

Conclusion

The fitted multiple linear regression model provides a significant prediction of house prices in California. However, the model accounts for about 24% of the changes in house prices in the area. The price of houses in California is predicted by median income, longitude, population, latitude, total number of bedrooms, total number of households, housing median age, and total number of rooms. Other significant predictors include proximity to the ocean, to the bay, and to the inland, compared with <1H to the ocean.

References

Kang, H., & Zhao, H. (2020). . Journal of Physics: Conference Series, 1631(1), 1-7.

Zaky, A. (2019). . [Data set].

Cite This paper
You're welcome to use this sample in your assignment. Be sure to cite it correctly

Reference

IvyPanda. (2026, March 20). Factors Influencing California House Prices Analyzed via Multiple Regression. https://ivypanda.com/essays/factors-influencing-california-house-prices-analyzed-via-multiple-regression/

Work Cited

"Factors Influencing California House Prices Analyzed via Multiple Regression." IvyPanda, 20 Mar. 2026, ivypanda.com/essays/factors-influencing-california-house-prices-analyzed-via-multiple-regression/.

References

IvyPanda. (2026) 'Factors Influencing California House Prices Analyzed via Multiple Regression'. 20 March.

References

IvyPanda. 2026. "Factors Influencing California House Prices Analyzed via Multiple Regression." March 20, 2026. https://ivypanda.com/essays/factors-influencing-california-house-prices-analyzed-via-multiple-regression/.

1. IvyPanda. "Factors Influencing California House Prices Analyzed via Multiple Regression." March 20, 2026. https://ivypanda.com/essays/factors-influencing-california-house-prices-analyzed-via-multiple-regression/.


Bibliography


IvyPanda. "Factors Influencing California House Prices Analyzed via Multiple Regression." March 20, 2026. https://ivypanda.com/essays/factors-influencing-california-house-prices-analyzed-via-multiple-regression/.

More Essays on Housing
If, for any reason, you believe that this content should not be published on our website, you can request its removal.
Updated:
This academic paper example has been carefully picked, checked, and refined by our editorial team.
No AI was involved: only qualified experts contributed.
You are free to use it for the following purposes:
  • To find inspiration for your paper and overcome writer’s block
  • As a source of information (ensure proper referencing)
  • As a template for your assignment