Regression analysis is useful for business analysts because they apply this tool to identify a relationship between variables to analyze the modern state of affairs and predict the future development of some relationship. Every regression analysis has a single dependent variable and at least one explanatory variable that is used to define the dependent one. Often, the relationship between the two variables is linear, meaning that all interception points cluster around a straight line. However, the world is not perfect, and some exceptions occur, referring to outliers that stand out from the big picture. This essay will define outliers, discuss how an outlier or absence thereof can influence the slope of a regression of Y versus a single X, and explain whether an outlier always has such an impact.
To begin with, one should present the definition of an outlier. Albright and Winston (2016) argue that an outlier “is an observation that falls outside of the general pattern of the rest of the observations” (p. 426). This information allows supposing that an outlier’s presence can have a significant impact on regression analysis results. It relates to the fact that an outlier affects the slope of a regression of Y versus a single X. For example, Figure 1 by Albright and Winston (2016) demonstrates how an outlier can be represented in a scatter plot and how it affects the graph. The graph indicates that most observations cluster around a straight line, while a single outlier that is the highest point has different values. Thus, the authors explain that the outlier “tilts the regression line toward it” (Albright & Winston, 2016, p. 513). If an outlier results in such changes, it is called an influential point. Simultaneously, Figure 1 shows that the outlier’s absence will result in a significantly different regression line.
In addition to that, one should emphasize that it is not always that outliers make the slope of a regression change. Albright and Winston (2016) mention that the regression analysis results with an outlier and without it can have the same results. Since an outlier is a point that is found outside of the general pattern, it is necessary to emphasize that this position is caused by its extreme values regarding X-axis, Y-axis, or both. For example, it is reasonable to use Figure 2 by Albright and Winston (2016) to understand when these points make no difference. Initially, the graph did not have any outliers, and all the observations fell within standard values. Thus, it is reasonable to imagine that the visual has an outliner with 60 points across the Y-axis and 65 points across the X-axis. For the purpose of this paper, a black point has been added to Figure 2. Even though this outlier implies extreme values, it does not change a regression line significantly.
In conclusion, regression analysis draws significant attention to outliers and their impact. It is so because these points can impact the slope of a regression differently, and the two graphs have demonstrated it. On the one hand, an outlier can make no difference when it has extreme values that are a continuation of a regression line. On the other hand, an outlier can tilt the regression line toward it when placed outside of a regression line. Such an outlier is called an influential point, and it necessarily changes the regression analysis results.
Reference
Albright, S. C., & Winston, W. L. (2016). Business analytics: Data analysis and decision making (6th ed.). Cengage Learning.