Correlation and Regression in Statistical Research Report (Assessment)

Exclusively available on Available only on IvyPanda® • No AI

Introduction

Correlations, linear regressions, and multiple regressions are the three most common methods used to investigate a relationship between several quantitative variables in research. These methods are utilized in many areas of quantitative scientific research, such as economics, medicine, psychology, sociology, etc. Correlations and regressions are different from one another. Correlation is used to quantify the strength of a relationship between two variables. Regression, on the other hand, reverses this relationship and expresses it in the form of an equation, which allows predicting the value of one or several variables based on the known values of the remaining ones. The choice between using correlation or regression largely depends on the design of the study and the research questions behind it. The purpose of this paper is to evaluate correlations, linear regressions, and multivariate regressions, identify the essential assumptions behind them, assess their essential components, and underline their weaknesses and limitations.

Correlational Research

The word “correlation” stands for the association between two or more variables (Cramer, 2016). The purpose of a correlational statistical model is to express and explore the strength of this association and identify the relationship between different variables. The relationship, or co-variation, is typically expressed as a positive or negative (Cramer, 2016). Alternatively, when there is no association between the variables, it is indicated by the pattern of the results. The relationship between the variables is presented through correlational coefficients, ranging from -1 to +1 (Cramer, 2016). Since correlations are linear, these coefficients indicate the measure of the distance of the individual results from the straight line. The most popular method of testing a correlation is the Pearson method.

Strengths of Correlational Method

A correlational research design is typically implemented in situations when the object of the study is to study the relationship between two or more variables without isolating them (Cohen, Cohen, West, & Aiken, 2013). It is used to observe and analyze naturally occurring phenomena that would otherwise be impossible to isolate or induce. In medical research, correlational methods are utilized to study the relationship between patients and vulnerability to diseases. For example, the relationship between smoking and lung cancer cannot be tested in an experiment, as the very design of such an experiment is unethical towards the patients and breaks many rules and norms of modern medicine. Also, a correlational method allows the researcher to see the connection between two or more variables clearly, as the results of correlation can be easily presented in a graphic form (Cohen et al., 2013).

Weaknesses of Correlational Method

There are two main weaknesses of the correlational method. First, a correlation cannot imply causation between any of the studied variables. Identifying causation is the prerogative of the experimental research methods that allow isolating one or several variables (Cohen et al., 2013). Correlational research does not isolate any of the variables one from another, meaning that even if there is a very strong association between them, there is a chance that it is caused by a factor excluded from the research. Second, correlation does not allow for making accurate future projections and cannot go beyond data given in research (Cohen et al., 2013).

Regression Analysis

Regression is the most basic type of regression that is commonly used in predictive analysis. It allows for establishing a mathematical relationship between two or more variables, which allows the prediction of future results should be one or several variables be missing, based on the available data (Chatterjee & Hadi, 2015). Also, regressions are often used in analyzing data acquired through experimental research to establish causation. Using an equation allows establishing the strength that a dependent variable has on an independent variable. This, in turn, allows for forecasting the impact of potential changes by influencing one or several variables. This is particularly valuable in economics, where regression analysis is one of the popular methods of financial forecasting. Nevertheless, regression sees use in many other scientific areas that deal with large amounts of quantitative data.

Multivariate Regression

Multivariate regressions are regressions that utilize the standard single regression model but have more than one unique outcome variable (Draper & Smith, 2013). This enables the simultaneous observation of several variables in a single experimental setting. Multivariate regression methods are concerned with multivariate probability distributions and are used as part of statistical inference. An example situation where multivariate regression is used involves the comparison of several measures of health, such as data on cholesterol, blood pressure, and weight, with the eating habits of the participants (Chatterjee & Hadi, 2015).

Strengths of Regression Analysis

Regression is a solid statistical tool that has plenty of strengths that made it popular among researchers. Linear regression is a sturdy and statistically simple model that allows for studying or linear relationships between dependent and independent variables (Seber & Lee, 2012). It is easy to use and implement to determine the strength of the relationship between different them. At the same time, regression is valuable for making forecasts and predicting mean results based on the available data. It is useful for testing hypotheses in experimental research and helps determine if the research design has been applied properly (Seber & Lee, 2012).

Weaknesses of Regression Analysis

The disadvantages of regression analysis largely depend on the type of regression used in particular research. Linear regressions can represent only linear relationships between different variables and usually look only at the mean of the dependent variable. Also, regressions are vulnerable to outliers. Outliers constitute any data that does not fall within the standard 95% accuracy margin. Outliers can be univariate and multivariate and can have a very detrimental effect on the accuracy of the regression analysis (Seber & Lee, 2012).

Assumptions in Correlation and Regression

Most statistical tests have to make several assumptions about the nature and consistency of data that they have to work with, to properly function. While this speeds up the process of statistical analysis, it also introduces an element of uncertainty and weakness to the research design. When assumptions are not met, there is a chance of Type 1 or Type 2 errors occurring, which involve the overestimation or underestimation of the effect that the change has on the results of the research.

The assumptions for the correlation method are often overlooked. The list of potential assumptions includes (Hampel, Ronchetti, Rousseeuw, & Stahel, 2011):

Continuous levels of measurement for each variable.
Tested variables are related.
Outliers are absent or negligible to the overall result of the research.
Scatterplot linearity and homoscedasticity, which refers to the position of the dots on the scatterplot chart.

If any of the following assumptions are not met, then the results of the correlation are considered questionable, thus requiring alterations to the research design.

Regressions also have a list of assumptions necessary for testing the validity and plausibility of the results. These assumptions are the presence of a linear correlation, multivariate normality, no autocorrelation, and homoscedasticity. The assumptions are relatively similar to that of correlation, but multivariate regression adds more to the list (Hampel et al., 2011). Variables have to be measured reliably and without error. Also, the distribution of variables must be approximately equal and lack any kurtotic variables or substantial outliers. Of course, some more assumptions and variables could potentially influence the results of the research, but the ones presented here are considered the least resistant to violations and are hard to deal with by performing alterations to the research design (Hampel et al., 2011).

Conclusions

Correlations and regressions are the main tools used in contemporary quantitative research. These methods exist for different reasons, with correlations allowing to establish and measure the strength of a relationship between several variables, and regressions permitting to represent each variable as part of a statistical equation. Correlations are used to study naturally occurring phenomena, whereas regressions are used to analyze data gathered through experiments and isolation of one or several independent variables of interest. Correlations cannot be used to imply a caustic relationship between the two variables, whereas experimental research is not always applicable to study phenomena that occur naturally due to organizational and ethical implications. Despite being two relatively robust designs, both correlations and regressions require certain assumptions to be met to the results to be considered accurate.

References

Chatterjee, S., & Hadi, A. S. (2015). Regression analysis by example (5th ed.). New York, NY: John Wiley & Sons.

Cohen, J., Cohen, P., West, S. G., & Aiken, L. S. (2013). Applied multiple regression/correlation analysis for the behavioral sciences (3rd ed.). New Jersey, NJ: Lawrence Erlbaum Associates.

Cramer, H. (2016). Mathematical methods of statistics. New Jersey, NJ: Princeton University Press.

Draper, R. N., & Smith, H. (2013). Applied regression analysis (3rd ed.). New York, NY: John Wiley & Sons.

Hampel, F. R., Ronchetti, E. M., Rousseeuw, R. J., & Stahel, W. A. (2011). Robust statistics: The approach based on influence functions. New Jersey, NJ: John Wiley & Sons.

Seber, G. A. F., & Lee, A. J. (2012). Linear regression analysis (2nd ed.). New Jersey, NJ: John Wiley & Sons.