Updated: May 29th, 2026

Sydney Real Estate Market Analysis: Prices, Area, and Key Determinants Case Study

Exclusively available on Available only on IvyPanda® • No AI

Table of Contents

Abstract
Introduction
Formation of the Random Sample
Descriptive Statistics
Inferential Analysis
Regression Analysis
Multiple Regression
Conclusion
References

Abstract

The present report aimed to conduct analysis of the Sydney real estate market. Using a random sampling algorithm, 75 were selected from a dataset of 500 rows for analysis. The in-depth examination of the data included calculating descriptive statistics and confidence intervals, examining the shape of the distributions, conducting tests of comparisons with the target value and between groups, and performing simple and multiple regression analyses. This report describes the results and conclusions of the statistical tests conducted, with broad practical perspectives for informed market and management decisions.

Introduction

Conducting statistical tests is a feasible strategy for generating valid conclusions with practical implications for meaningful and rational decision-making. Unlike management approaches based solely on leaders’ opinions and often false perceptions of market performance, statistical analysis is data-driven, thereby minimizing the risk of biased results and systematic errors (Black, 2023). This report utilizes a comprehensive statistical framework that includes both descriptive measures, which allow for identifying surface trends in variable distributions, and inferential tools, which facilitate the exploration of relationships between variables. The multifaceted approach to conducting statistical analysis in this paper was motivated by the desire to investigate the real estate market as deeply and comprehensively as possible and to identify the determinants of price supply.

The present statistical analysis focused on the real estate market in Sydney, for which 500 rows were collected across 11 variables of interest. The variables included unique property identification number (ID), property price in thousands of dollars (Price), property area in square meters (Lot size), number of bedrooms, number of bathrooms, number of floors, presence of a driveway to the house, presence of a rest room in the house, the fact that gas is used to heat the house, central air conditioning, and number of garage spaces. Thus, the data collected for 500 random properties was sufficiently general to describe the room and sufficient to conduct an in-depth statistical study. This report describes the results of tests conducted, including examining descriptive statistics for variables, constructing confidence intervals to estimate population statistics, and conducting regression analysis to examine causal relationships between predictors and price. Thus, the paper’s overall objective was to conduct a comprehensive analysis of the Sydney real estate market.

Formation of the Random Sample

The original dataset consisted of 500 records; however, to create a representative sample, only 75 rows were selected. To increase the randomness of the selected values, a sophisticated data selection algorithm based on a table of random numbers and the student’s personal identification number was used. Specifically, the last three digits of the student’s ID number included 279 — meaning that the 79 was thrown in the random number table, and the 2nd column had to be selected.

Starting with the value at the intersection of these indices and proceeding downward, 75 random three-digit numbers within the range 0 to 500 were selected; the selected values are shown in Appendix A. Because the goal of creating a representative sample also required no duplicates, some random values were repeated, and index selection continued until all 75 numbers were strictly unique. Once these values were sorted, the =VLOOKUP() function was applied, allowing all remaining information to be transferred error-free across the ten variables for each randomly selected property.

Descriptive Statistics

Two variables in the dataset were of increased interest within the research project: property value and area. The results of the descriptive analysis conducted for these are shown in Table 1: this includes both measures of central tendency and measures of variability in the distributions. The sample mean property value was 874.80 thousand dollars (SD = 419.26), and the sample mean property area was 522.62 square meters (SD = 212.33). The minimum property value was 398.75 thousand dollars, and the maximum was 2,375.00 thousand dollars; for the properties’ areas, the minimum and maximum were 165 and 1,146 square meters, respectively.

Among the additional characteristics of interest, one can focus on the Kurtosis and Skewness indicators, which assess the distribution’s shape. Since it is well known that for a standard (bell-shaped) numerical distribution, Kurtosis should be equal to three and Skewness should be approximately equal to zero, it follows that, from the point of view of analytical study, neither variable is normally distributed (Turney, 2023a; Turney, 2023b). However, further graphical analysis is required to examine the distributions’ shapes more closely.

Table 1 — Results of descriptive statistics for the variables of interest

Results of descriptive statistics for the variables of interest.

As an additional analysis, examining the quartile statistics for both variables was important. Table 2 shows the first, second (median), and third quartiles for the price and floor area distributions. From the calculations, 50% of all price values lie within the interquartile range of 562.50 to 1,062.50 thousand dollars, with a spread of precisely 500.00 thousand dollars. For the property area, 50% of all observations lie within the interquartile range of 347.50 to 660.00 square meters, yielding a spread of 312.50 square meters.

Table 2 — Apartment statistics for the two variables of interest

Apartment statistics for the two variables of interest.

Figure 1 and Figure 2 display the constructed histograms for the price and property area variables in the generated random sample. From Figure 1, the histogram is not bell-shaped, indicating a significant departure from normality. The shape of this distribution can be classified as skewed right, and thus, the location of measures of central tendency in this distribution is as follows: mode < median < mean (Hasanalipour & Razmkhah, 2022). Notably, the distribution is entirely devoid of values from 1,798.75 to 2,148.75 thousand dollars.

For Figure 2, which characterizes the shape of the area variable, the beveling to the right is also noticeable, but to a much lesser extent. However, in this area, the distribution of the variable is not normal, and the deviation from a bell-shaped distribution is significant. Thus, in both cases, it can be said that the concentration of elements in areas of lower values is higher, which means that as the values of price and area increase, the sample shows a decrease in the prevalence of such lines.

Figure 1 — Distribution histogram for the price variable of the sample.

Figure 2 — Distribution histogram for the area variable of the properties in the sample.

Inferential Analysis

One-Sample T-Test

In addition to the descriptive examination of trends in the data, the calculation of inferential statistics was also of analytical interest. More specifically, it was hypothesized that the average house price in Sydney is not equal to 835 thousand dollars, which supports the alternative hypothesis (μ ≠ 835). In contrast, the null hypothesis was that the average house price in Sydney was equal to the target (μ = 835).

Both hypotheses were nondirectional, indicating the need for a two-tailed statistical test. The one-sample t-test was chosen to assess the statistical significance of differences in the sample mean price for the 75 properties with the target value (SL, 2021). The test results indicated insufficient evidence to reject the null hypothesis (t(74) = 0.822, p =.414). This implied that the null hypothesis could not be rejected, and therefore, the sample mean price for the property was not significantly different from $835,000. The calculated 95% confidence interval indicated that the actual population mean price could be expected to fall within the 95% confidence interval of 779.92 to 969.68 thousand dollars. The probability of being wrong in this estimate is 5%; increasing the precision of the interval would yield a broader range of potential values.

Two-Sample T-Test

Inferential analysis can also help test the statistical significance of differences in mean values between two groups. The presence of a recreation room in the house (“recreation”) was used as a criterion for dividing the entire sample into groups. As a result of filtering, Group A (no recreation room) contained 65 rows, and Group B (with recreation room) had only 11 rows, which, however, was sufficient to perform a two-sample t-test (SL, 2022).

The results of the analysis are demonstrated in Table 3. As can be viewed, for the nondirectional hypotheses, it was shown that there were no differences between the two mean values of house prices according to the presence of a restroom, (t(73) = -0.45, p =.65). This implied that the mean price offers for properties with a rest room (M = 928.07, SD = 330.80) were not statistically significantly different from the mean prices for properties without a rest room (M = 865.64, SD = 433.82). Additionally, a 95% confidence interval was constructed for the difference in mean values between the two groups, and it was shown that a population difference in mean values between -336.63 and 211.78 thousand dollars could be expected with 95% accuracy (Table 4).

Table 3 — Results of the two-sample t-test

Results of the two-sample t-test.

Table 4 — Calculation of 95% confidence interval limits for the mean difference

Calculation of 95% confidence interval limits for the mean difference.

Regression Analysis

Another branch of statistical inferential analysis is regression analysis, which allows for determining causal relationships between continuous variables and predicting values based on the regression models constructed. Figure 3 shows a scatter plot of real estate prices and floor area. At first glance, the scatterplot shows a linear upward trend: as property area increases, property price is expected to increase as well.

Figure 3 also contains a regression line for the scattered data, showing the regression equation and the coefficient of determination; their discussion requires special care. First, the model’s reliability seems low, with a coefficient of determination (R2) of 0.3587, indicating that only up to 35.87% of the variance in price can be explained by property area. This is relatively low and reflects the linear model’s low accuracy (Black, 2023). At the level of graphical representation, the low accuracy is explained by the strong scatter of data points around the linear trend; therefore, further conclusions should be interpreted with caution.

Second, the slope of the regression equation is positive, indicating that price increases with property area. Notably, this measure is quantified as follows: for every square meter increase in property area, the price increases by 1.18 thousand dollars, implying that the price per square meter for the sample in Sydney is 1.18 thousand dollars. On the other hand, the y-intercept of this model indicates the property indicated for zero rooms, which is numerically equal to 256.74 thousand dollars. This result can be interpreted in two ways: either as the value of an empty plot of land without development, or as the regression analysis’s error, which has no physical meaning.

Figure 3 — Scatter plot for area and property price with the regression model plotted.

Given the low coefficient of determination, it was necessary to test the model’s statistical significance further; the results of the regression analysis are shown in Table 5. As can be seen from the analysis, the regression model was statistically significant (F(1,73) = 40.84, p <.05). That said, interestingly, room square footage was also a statistically significant predictor (B = 1.183, p =.000).

Table 5 also contains the limits of the 95% confidence interval for the slope: with 95% accuracy, the actual value of the slope (cost per square meter) can be expected to lie between 0.814 and 1.551 thousand dollars per square meter. With the regression equation constructed, it is possible to predict a property’s price for a given floor area value. Hence, for example, if the house has an area of 585 square meters, its predictive value is calculated as (256.738 + 1.183 ∙ 585 =) 948.56 thousand dollars.

Table 5 — Results of regression analysis for the two variables

Results of regression analysis for the two variables.

Multiple Regression

Regression analysis performed for more than one predictor is called multiple regression. This approach allows us to assess the effects of multiple independent variables on the dependent variable while accounting for their correlations. Table 6 contains the results of the analysis: the first finding is that the model is statistically significant (F(9,65) = 19.73, p =.000). In addition, it can be seen from the results that the model is reasonably robust R2 = 0.73, indicating that the predictor combination determines up to 73% of the variance of the price data.

Table 6 — Multiple regression results

Multiple regression results.

Based on the obtained data, the multiple regression equation can be constructed:

Price = -282.31 + 0.44∙Size + 70.06∙Bed + 314.07∙Bath + 84.61∙Stories + 91.18∙Driveway + 62.05∙Recreation + 7.69∙Gasheat + 102.88∙Aircon + 95.43∙Garage.

One of the central equations derived indicates that each predictor has a positive effect on price growth. In this case, all independent variables except the number of bedrooms, driveway availability, number of bathrooms, gas heating, and air conditioning were statistically significant predictors. Consequently, excluding these predictors would produce a more accurate model. It also implied that the following effects would have been observed:

A price increase of 0.44 thousand dollars with an increase in floor area per square meter.
A price increase of 314.07 thousand dollars when the number of bathrooms increased.
A price increase of 84.61 thousand dollars when the floor area of the unit increases.
The cost rises by $95,430 as the number of garage spaces goes up.

Additional analysis of the residuals indicates that the model is unreliable, as the plot of residuals versus predicted price values shows a fan-shaped pattern (PSU, 2021; Figure 4).

Figure 4 — Graph of the dependence of residuals on the predicted values.

Conclusion

This paper aimed to comprehensively analyze Sydney’s real estate market, yielding several exciting results. First, in a sample of 75 houses, the average property value did not differ between houses with and without a restroom. Second, it was shown that there is a linear relationship between floor area and property price, the robustness of which increases when additional factors are included in the equation, whether it be the number of bathrooms, the stories of the building, and the availability of garage spaces — all of which lead to an increase in price as the eigenvalues increase.

Third, it was shown that the average property value in Sydney was not different from $835,000. Fourth, it was shown that most properties were in areas of lower area and price, creating a rightward skew in the distributions. Thus, a comprehensive statistical analysis of the Sydney real estate market was conducted, yielding results of practical value.

References

Black, K. (2023). Business statistics: For contemporary decision making. John Wiley & Sons.

Hasanalipour, P., & Razmkhah, M. (2022). Inference on skew-normal distribution based on Fisher information in order statistics. Communications in Statistics-Simulation and Computation, 51(4), 1525-1541.

PSU. (2021). Residuals vs. predictor plot. Penn State ECS.

SL. (2021). One-sample T-test using SPSS Statistics. Laerd Statistics.

SL. (2022). Independent t-test for two samples. Laerd Statistics.

Turney, S. (2023a). What is kurtosis? | Definition, examples & formula. Scribbr.

Turney, S. (2023b). Skewness | Definition, examples & formula. Scribbr.

Appendix A — Selected random numbers

Appendix A — Selected random numbers.