The Collinearity and Multicollinearity Concepts Essay

Exclusively available on Available only on IvyPanda® • No AI

What are collinearity and multicollinearity?

The phenomenon that appears when in a multiple regression two predictor variables have a non-zero correlation is known as collinearity; at the same time, the effect when more than two variables are correlated with one another is recognized as multicollinearity. The latter phenomenon does not tend to influence the statistic variables such as p values, F and R² ratios, and also does not affect the regression model generally. Overall, multicollinearity should not produce an effect on the predictions that are made by the regression model. However, when one is interested on the individual predictors and their effects, multicollinearity is likely to represent a significant challenge.

How common is collinearity or multicollinearity?

Multicollinearity or collinearity can be found in the vast majority of quasi-experimental and non-experimental studies when observational data is collected. The only type of study unlikely to have these phenomena is a designed experiment. However, the major issue is not merely the presence or absence of collinearity or multicollinearity but the effect they produce on the data analysis. In other words, in practice it may not happen often that two variables are in a perfect linear correlation, but in economic estimations it is often the case that multiple variables are in a strong correlation.

Causes for multicollinearity can also include:

Insufficient data. The this is the problem, it is possible to address it by means of collecting more data.
Incorrect use of dummy variables. For instance, due to a researcher’s error one of the redundant categories my end up included; or the dummy variables may be added to some of the categories.
Combining two variables that are supposed to be separate. For instance, a researcher could use the variable of total investment income that is actually a combination of income from savings interest and income from stocks and bonds.
Using two variables that are identical. For instance, one could by mistake include bond income and investment income and savings as two separate variables.

Types of multicollinearity

Is it possible to distinguish between two primary types of multicollinearity:

Structural multicollinearity – a mathematical artifact caused by the use of the existing predictors to generate new ones (for example, creating the predictor y² from the predictor y).
Data-based multicollinearity – a result of an experiment with a flawed design or basing the analysis on the observational data only.

Level of collinearity or multicollinearity

Imperfect multicollinearity can be of high and low levels. In particular, it occurs when the variables used in an equation are correlated in an imperfect manner. Differently put, this phenomenon can be presented as: X₃=X₂+v, (knowing that v represents a variable that is random and can be seen as an ‘error’ in the exact linear correlation).

Low level: Under the low level of correlation between X’s, β can be estimated confidently because OLS provides a wide range of information for that.

High level: Under the high level of correlation between X’s, the amount of information available to OLS is very limited and it complicates one’s capability to distinguishing the effects on Y produced by X1 and X2. In that way, β can be estimated with a high level of uncertainty.

Consequences of Imperfect Multicollinearity

When multicollinearity is imperfect, the estimators of OLS can be acquired; besides, they can be categories as Best Linear Unbiased Estimators or BLUE. Regardless of the fact that linear unbiased estimators possess minimal variance, those of OLS are usually larger than the variances that were found without the impact of multicollinearity.

Concluding when imperfect multicollinearity is present we have:

Estimates of the OLS may be imprecise because of large standard errors.
Affected coefficients may fail to attain statistical significance due to low t-stats.
Sing reversal might exist.
Addition or deletion of few observations may result in substantial changes in the estimated coefficients.

Perfect Multicollinearity

When the linear relationship is perfect, it is possible to use an example of the model below:

Y=β₁+β₂X₂+ β₃X₃+e

where the sample values for X₂ and X₃ are:

Perfect Multicollinearity

It is seen that X₃=2X₂. In turn, it may seem that the quantity of the explanatory variables is two, in reality is it just one. The explanation is simple –this is the case due to the collinearity of X₂and X₃or because X₃is the precise linear function of X_2.

Consequences of Perfect Multicollinearity

When multicollinearity is perfect, the estimators of OLS are nonexistent. An attempt to estimate an equation with a perfect multicollinearity of specification in Eviews will fail. In response to such attempt Eviews will show an error.

The Detection of Multicollinearity

Two regressor model

High Correlation Coefficients

Among independent variable, pairwise correlations may have high absolute value. In cases where the correlation is > 0.8 there is a chance that severe multicollinearity takes place.

More‐than‐2 regressor model

High R² with low t-Statistic Values

It may be possible for the general fit of an equation to remain high even though its individual regression coefficients are insignificant.

High Variance Inflation Factors (VIFs)

Variance Inflation factors estimate the increase in the variance of the measured coefficient due to the impact of multicollinearity. This relationship is reflected in the degree to which the explanatory variables in the equation are capable of explaining a single variable. There are tests that allow assessing multicollinearity; they are Farrar-Glauber test, Condition number test, and Perturbing the data.

Remedies for Multicollinearity

Do Nothing

At times, doing nothing is the best suitable course of actions. However, it is possible to inform that a number of predictors tend to jointly predict a certain result and it makes sense to collect more data to estimate their effects separately.

Drop a Redundant Variable

One more important aspect to pay attention to is the impact of unnecessary variables. If one of the variables is redundant, its presence in the equation can be characterized as a specification error. Such variables need to be detected and dropped according to the guidance in the economic theory.

Transform the Multicollinear Variables

In addition, it can be possible to minimize multicollinearity by means of re-specification of the model. One example of such reduction is via creating a combination of the multicollinear variables.

Increase the Sample Size

Moreover, the precision of an estimator can also be maximized when the sample size is increased. This measure also reduces the negative impact of multicollinearity. However, the addition of data may end up not being feasible.

Use other approaches

Ridge Regression and Principal Component Regression are the additional strategies helping to address multicollinearity.

Real life examples

Example One

One predictor variable in the model is found to be in a direct causal relationship with another predictor variable.

For instance, the model that includes the total gross income of a town as well as the size of its population is used to determine the contribution to the town’s charity funds. It is possible that the estimations will find that the two variables are in a high correlation since the town’s population size contributes directly to its total gross income. To address the issue, the model needs to be restructured avoiding the regression of two related variables (omitting one of the variables or combining them).

Example Two

A model includes two predictor variables that in turn represent a common latent variable construct (the halo effect)

For instance, several measures of customer satisfaction are used to assess customer loyalty and two of them (network satisfaction and product satisfaction) are in a high correlation because the customers’ description of satisfaction did not refer to these predictors separately. In this case – the two measures work as a reflection of a single predictor – the overall customer satisfaction. In that way, to avoid the problem, the latter common predictor could be used as a variable instead of two separate reflections.

References

P. Kennedy. A Guide to Economics. Cambridge MA: MIT Press, 2003.

D. Neeleman. “Some remarks on linear economic model,” in Multicollinearity in Linear Economic Models. New York, NY: Springer Science & Business Media, 2012, ch. 1, pp. 1-14.

What are collinearity and multicollinearity?

How common is collinearity or multicollinearity?

Causes for multicollinearity can also include:

Types of multicollinearity

Level of collinearity or multicollinearity

Consequences of Imperfect Multicollinearity

Perfect Multicollinearity

Consequences of Perfect Multicollinearity

The Detection of Multicollinearity

Two regressor model

High Correlation Coefficients

More‐than‐2 regressor model

High R2 with low t-Statistic Values

High Variance Inflation Factors (VIFs)

Remedies for Multicollinearity

Do Nothing

Drop a Redundant Variable

Transform the Multicollinear Variables

Increase the Sample Size

Use other approaches

Real life examples

Example One

One predictor variable in the model is found to be in a direct causal relationship with another predictor variable.

Example Two

A model includes two predictor variables that in turn represent a common latent variable construct (the halo effect)

References

High R² with low t-Statistic Values