Multiple Regression and Correlation Analysis Project Research Paper

Exclusively available on Available only on IvyPanda® • No AI

Table of Contents

Introduction
Correlation matrix
Multicollinearity
Multiple regression analysis
Assumptions and testing
Conclusion
Works Cited

Introduction

In the paper, a regression model will be developed to give a linear relationship between gross domestic product (GDP) and foreign direct investment, budget deficit, and money supply for Qatar. Further, correlation matrix will be developed for the variables.

Finally, a comprehensive analysis will be carried out on the regression model developed. The data used in the analysis will be collected from The World Bank, International Monetary Fund and Trading Economics (The World Bank 1; International Monetary Fund 1; Trading Economics 1).

Correlation matrix

Correlation coefficient measures the degree of association between two variables. The correlation matrix for the variables is presented in Table 1. Based on the results of correlation analysis, it can be observed that there is a weak positive relationship between gross domestic product (GDP) and foreign direct investment (FDI). The value of the correlation coefficient between the two variables is 0.4611.

Secondly, the correlation coefficient between GDP and the budget deficit is 0.9487. It indicates a strong positive relationship between the two variables. Further, the correlation coefficient between GDP and money supply (M3) is 0.9933. It shows a very strong positive relationship between the two variables.

Also, the correlation coefficient between GDP and the consumer price index is 0.9280. It shows a strong positive relationship between the variables. Thus, it can be observed that there is a strong positive relationship between the explained variable and the explanatory variables (Verbeek 28).

Table 1. Correlation matrix

	Gross domestic product	Foreign direct investment	Budget deficit	Money supply (M3)	Consumer price index
Gross domestic product	1
Foreign direct investment	0.461132	1
Budget deficit	0.948729	0.211088	1
Money supply (M3)	0.99335	0.491049	0.944616	1
Consumer price index	0.928099	0.699044	0.778696	0.920747	1

Multicollinearity

Multicollinearity is a scenario where there is a strong association between the explanatory variables. Thus, the correlation coefficient between the independent variables is quite high. The existence of multicollinearity between the independent variables affects the results of regression.

Based on the results of correlation, it can be observed that there is a strong positive correlation coefficient between money supply and budget deficit. The value of the coefficient is 0.9446. Also, there is a slightly high positive correlation between money supply and FDI, that is, 0.4910.

Therefore, money supply will be dropped in the formulation of the regression model. This will improve the results of the regression equation (Verbeek 28).

Multiple regression analysis

The multiple regression model is used when more than one explanatory variable is used when coming up with the regression equation. From the correlation analysis, the explained variable is GDP (Y) while the explanatory variables are FDI (X₁) and the budget deficit (X₂). The regression line will take the form

when the ordinary least squares method is used. The simplified regression equation is Y = b₀ + b₁X₁+ b₂X_2.Further, it is necessary to specify the theoretical expectation. It is expected that b_o will take any value while b₁ and b₂ will be positive (b₁ and b₂ > 0).

Regression equation

The results of the regression analysis are presented in Table 2. From the table, the value of b_o is 12,296,921,623, b₁ is 7.1175, and b₂ is 0.41198. The least squares regression equation is shown below.

Y = 12,296,921,623 + 7.1175X₁+ 0.41198X₂

Table 2. Regression output

Summary Output

Regression Statistics
Multiple R	0.98555145
R Square	0.97131167
Adjusted R Square	0.96870364
Standard Error	9620452837
Observations	25

	Coefficients	Standard Error	t Stat	P-value
Intercept	1.2297E+10	2365727509	5.19795	3.267E-05
X Variable 1	7.11747715	0.9630547	7.39052	2.144E-07
X Variable 2	0.41198431	0.017080311	24.1204	2.571E-17

Interpretation of the coefficient

The coefficient of b₁, 7.1175, shows that if the value of FDI increases by one unit, then the value of GDP will increase by 7.1175. This shows that a unit increase in the value of FDI will result in a more than proportionate increase in GDP.

The value of second coefficient b₂ is 0.41198. It shows that an increase in the value of budget deficit by one unit results in an increase in the value of GDP by 0.41198 of a unit. The positive coefficients are in agreement with the theoretical expectations. This shows that a unit increase in the value of budget deficit will result in a less than proportionate change in GDP (Verbeek 28).

Confidence interval for each slope

The confidence interval is calculated based on the values of the sample collected from a given population. It gives the range within which the population parameter is expected to lie. Further, the population parameter is not known.

Therefore, confidence interval does not give the exact value of the population parameter. The intervals will be calculated at 95% confidence level. The confidence interval is calculated using the formula shown below.

Sample statistic + Z value * standard error / √n

b₁ = 7.1175 ± 1.96 * 0.9630547 / √25

= 7.1175 ± 0.377517442

= 6.739959703 ≤ b₁ ≤ 7.494994588

The confidence interval indicates that the slope of FDI for the entire population will lie between 6.7399 and 7.4949.

b₂ = 0.41198 ± 1.96 * 0.017080311 / √25

= 0.41198 ± 0.006695

= 0.405289 ≤ b₂ ≤ 0.41868

The confidence interval indicates that the slope of budget deficit for the entire population will lie between 0.405289 and 0.41868.

Standard error

Standard error is similar to the standard deviation though it applies in the case of a sample. It measures the accuracy with which the sample statistic represents the population parameter. A small value of the standard error is desirable because it indicates that the sample statistic is close to the population parameter. The standard error of the regression model is 9,620,452,837.

The value is quite high and it shows that the results generated by the sample regression model will deviate significantly from the population model. Further, the standard error of the coefficient of FDI is 0.9630547 while the error of the coefficient of the budget deficit is 0.0170803.

The values of standard error are low. This shows that the coefficients of the explanatory variables are estimated with a lot of precision. Further, the values of the coefficient are larger than the standard error. It indicates that the value is different from zero.

ANOVA

ANOVA analyzes the difference between means. First, ANOVA estimates the mean of each sample separately. This gives the sample mean. Secondly, it estimates the mean of all the samples put together. This gives the overall mean. Further, it estimates the deviation of each observation in each sample from the corresponding sample mean. This gives the variation within the samples (explained variance).

The next step entails estimating the deviation of the sample mean from the overall mean. This gives the variation between samples (unexplained variance). The final step is the division of variation between the samples by variation within the samples.

This gives the F-statistic. The significance F is estimated from the F ratio and the degrees of freedom of the numerator and the denominator. The significance F indicates whether all the samples are drawn from the population (Verbeek 28).

Prediction

Point prediction

FDI – Highest value = 8124736264

Budget deficit – Highest value = 4.34105E+11

The point prediction is 2.48969E+11. It gives the predicted value of GDP at the maximum points.

Confidence interval

b₁ = 8124736264 ± 1.96 * 417221610.5 / √25

= 8124736264 ± 163550871.3

= 7961185393 ≤ b₁ ≤ 8288287135

b₂ = 4.34105E+11 ± 1.96 * 23524584830 / √25

= 4.34105E+11 ± 9221637253

= 4.24883E+11 ≤ b₂ ≤ 4.43327E+11

The confidence interval gives the range within which the population parameter will lie.

Prediction interval

b₁ = 8124736264 ± 1.96 * 417221610.5 / √1 + 1/25

= 8124736264 ± 163550871.3

= 7290787182 ≤ b₁ ≤ 8958685346

b₂ = 4.34105E+11 ± 1.96 * 23524584830 / √1 + 1/25

= 4.34105E+11 ± 9221637253

= 3.87E+11 ≤ b₂ ≤ 4.81E+11

Prediction interval covers a wider scope than confidence interval. It gives the range within which the future value will lie.

Validity of the regression

Hypothesis

Null hypothesis H0: β0 = β1 =, = βp

Alternative hypothesis H1: βj ≠ 0, for at least one value of j

The null hypothesis implies that the overall regression line is not significant. The alternative hypothesis implies that overall regression line is significant.

Tests statistics

The overall significance of the regression model will be analyzed using a two tailed F-test. The test will be carried out at 5% significance level.

Rejection rule

The null hypothesis will be rejected if the value of F-calculated is greater than the value of F-tabulated.

Decision

The ANOVA table is presented in Table 3. The F-calculated is 372.4311 while the value of F-tabulated is 3.4434. Therefore, the null hypothesis will be rejected at the 5% level of significance. This implies that the overall regression is significant at the 5 % level of significance.

The value of R-square is 97.13% while the value of the adjusted R-square is 96.87%. The value of R-square indicates the explanatory variables explain 97.13% of the variations in the explained variable. It is an indication of a strong regression model.

Table 3. Summary of ANOVA

ANOVA
	df	SS	MS	F	Significance F
Regression	2	6.89393E+22	3.45E+22	372.4311648	1.08332E-17
Residual	22	2.03617E+21	9.26E+19
Total	24	7.09755E+22

Test of significance

Hypothesis

Null hypothesis: Ho: bi = 0 (variables are not significant determinants)

Alternative hypothesis: Ho: b_i ≠ 0 (variables are significant determinant)

The null hypothesis implies that the variables are not significant determinants of demand. The alternative hypothesis implies that variables are significant determinant of demand.

Tests statistics

The test of significance of each explanatory variable will be carried out using t-test. A two tailed t-test is carried out at 95% level of confidence.

Rejection rule

The null hypothesis will be rejected if the value of t-calculated is greater than the value of t-tabulated.

Decision

The value of the t-statistic for the b₁ is 7.3905 while for b₂ is 24.1204. The value of t-tabulated is 1.9432 at 5% significance level. Since the value of the t-statistic is greater than the value of t-tabulated, the null hypothesis will be rejected and conclude that FDI and the budget deficit are significant determinants of GDP at 5% level of significance.

Assumptions and testing

The first assumption of a multiple regression model is that the sample data of the variables are normally distributed. Secondly, the model assumes that there exists a linear relationship between the explained variable and the explanatory variables. Third, the variables used in the analysis are estimated without an error. Finally, the model assumes that the variance of the error term is constant (Vinod 78).

Conclusion

The paper developed a regression model for the relationship between GDP (as the explained variable) and FDI and the budget deficit (as the explanatory variables). The discussion above indicates that FDI and the budget deficit are valid and significant determinants of GDP.

Besides, the variables explain a significant proportion of the variations of the dependent variable. Also, the overall regression line is significant. The regression line can be improved by adding other significant variables and increasing the sample size.

Works Cited

International Monetary Fund 2013, Data and Statistics. Web.

The World Bank 2013, World Economic Indicators. Web.

Trading Economics 2013, Qatar Economic Indicators. Web.

Verbeek, Marno. A Guide to Modern Econometrics, England: John Wiley & Sons, 2008. Print.

Vinod, Hrishikesh. Hands on Intermediate Econometrics Using R: Templates for Extending Dozens of Practical Examples. New Jersey: World Scientific Publishers, 2008. Print.