Sampling Errors and Measurement Errors in Statistical Research
The term “sampling error” refers to the error occurring when attempting to estimate statistical parameters of a certain population by using data obtained from a sample (Cozby & Bates, 2015, p. 148). Sampling error is always present in statistical research, because a sample, in practice, can never give results which are exactly equal to the population parameters. Also, sampling error increases if there is a selection bias present in the sample (i.e., if representatives of the population with a certain trait were more likely to be selected as participants than representatives of that population without that trait).
However, there exist ways to assess the magnitude of sampling error, such as the confidence intervals. If there is a n% confidence interval for a given statistic, this means that in n% of samples, the mean will lie within than interval (Field, 2013). For example, if a given statistic has the mean of 100, and a 95% CI of (90; 110), this means that in 95% of similar samples, the mean will lie between 90 and 110 (George & Mallery, 2016).
The method of confidence interval calculation is more “effective” when there is no selection bias, that is, sampling is random, and no members of the population are more likely to be selected than some other members of the population. This is probably because the confidence interval is calculated based on the given sample, and it accounts for random error, but not systematic error. It may also be possible to state that mitigating the error resulting from selection bias, or assessing its magnitude, is very difficult (or, in fact, impossible, if it is unknown whether there is a selection bias in the sample), so it is paramount to be careful to avoid selection bias.
The term “measurement error” denotes the error occurring because of non-complete reliability of an instrument; that is, the latter may give different results at different instances of measurement, even though the true value of the measured characteristic remains unchanged (Cozby & Bates, 2015, pp. 100-103). (For instance, an IQ test which gives different results depending on the mood of the participant is unreliable. However, a tape measure is 100% reliable when measuring the length of an object, for it always produces the same result.)
Using unreliable instruments in behavioral and marketing research is, generally speaking, pointless, for the results of such studies will be unstable, imprecise, non-replicable (Cozby & Bates, 2015, p. 101), and practically unusable. Nevertheless, in such studies, there is usually no way to gain a 100% reliable measure of a characteristic, and researchers have no way other than to be satisfied with reasonably unreliable measurement instruments.
To increase the reliability of a test (and to reduce the measurement error), an effective way is to increase a number of items assessing a particular trait (Cozby & Bates, 2015). There exist several ways to measure the reliability of a test. Most of them are based on the Pearson’s product-moment correlation coefficient (Cozby & Bates, 2015, p. 102). For instance, the notion of test-retest reliability; the same respondents are measured twice, at different points of time, and a correlation coefficient between the two results is calculated. However, this requires testing the same individuals twice, which is not always possible. Therefore, there exist methods which do not require re-testing, for instance, split-half reliability (the correlation between two halved of the test results are obtained) or Cronbach’s alpha (the average of the Pearson coefficients obtained by calculating all possible split-half reliabilities) (Cozby & Bates, 2015). Cronbach’s alpha is probably one of the most popular reliability assessment methods. As the mean of multiple Pearson correlation coefficients, it can range from -1 to 1 (Warner, 2013). The closer the value is to 1, the greater is the reliability, and the lower is the measurement error; the value of.8 is often considered to indicate sufficient reliability (Cozby & Bates, 2015).
From the discussion above, it is clear that it is probably easiest to identify the measurement error (thanks to the possibility to estimate its magnitude using, e.g., Cronbach’s alpha). It can also be modified by using a better (more reliable) instrument, or adding more questions to the used one, if the latter is already agreeably reliable.
Sampling error resulting from the fact that a population parameter is estimated from a statistic is always present, and it can be estimated rather decently using confidence intervals. Also, it can be lowered by increasing the sample size (Cozby & Bates, 2015).
However, sampling error resulting from selection bias will usually be the hardest error to identify, because sampling bias is rarely present on purpose; in most cases, it results from factors which were overlooked/unnoticed by the researchers. Even when it is known that there exists a selection bias, its exact effects are usually very individual for each case, and are virtually impossible to measure.
When Does Statistical Significance Mean Managerial Significance?
There is no universal rule for determining whether statistical significance entails managerial significance; managerial significance needs to take into account multiple factors, some of which will often not be mathematical (Timpany, n.d.). In addition, statistical significance depends on sample size (Field, 2013), from which it follows that even a small, practically negligible difference between two groups will be statistically significant if the sample size is large enough.
Nevertheless, non-significant results should not be dismissed outright; e.g., if α=.05, a result with p=.06 or especially p=.051 should still be taken into account, because these values are too close to α, and it would be unreasonable to think that a result with p=.051 means “no difference” and p=.049 means “true difference”. In such borderline cases, many other factors should be taken into account.
A good statistic to consider is the effect size, which assesses the magnitude of the difference between groups. There is no standard way to calculate an effect size, and in different statistical tests, different effect sizes are usually calculated (such as R2, partial η2, etc.) (Warner, 2013). However, there usually exist more or less standard “verbal” labels (small/medium/large effect), which help determine whether the difference is practically/managerially significant; for instance, a small effect is rarely practically important, whereas a medium one may deserve attention (Warner, 2013).
On the whole, a variety of factors should be taken into account when considering managerial significance (Wang, 2010). In particular, it should be considered whether the magnitude of the difference (not only the effect size, but also the absolute difference should) really matters in practice. For instance, two types of employees may gain a mean annual salary equal to $100,000 for the first group and $100,600 for the second group, and the difference is statistically significant, but it is doubtful that this difference will be practically significant. On the contrary, a difference of 30 mg in the dose of an active ingredient in two different batches of medical pills may be crucial, and even if it is not conventionally statistically significant (e.g., p=.08), the potential effect of such a difference should be taken into account, and it is likely that this result will be managerially significant.
In other words, there are no standardized methods of determining whether statistical significance in a particular case entails managerial significance. Therefore, when making a decision about managerial significance, the magnitude of the difference and its potential impact should be considered.
References
Cozby, P. C., & Bates, S. C. (2015). Methods in behavioral research (12th ed.). New York, NY: McGraw-Hill Education.
Field, A. (2013). Discovering statistics using IBM SPSS Statistics (4th ed.). Thousand Oaks, CA: SAGE Publications.
George, D., & Mallery, P. (2016). IBM SPSS Statistics 23 step by step: A simple guide and reference (14th ed.). New York, NY: Routledge.
Timpany, G. (n.d.). Statistical vs. managerial significance. Web.
Wang, X. (2010). Performance analysis for public and nonprofit organizations. Sudbury, MA: Jones & Bartlett Publishers.
Warner, R. M. (2013). Applied statistics: From bivariate through multivariate techniques (2nd ed.). Thousand Oaks, CA: SAGE Publications.