Abstract
The problem of multiple comparisons is met with in many clinical trials, epidemiological studies, or public health studies, in which case, data fishing is a possibility. The aim of this essay is to provide a brief yet, a comprehensive review on the problem of multiple comparisons, and how data fishing risks public health studies’ outcomes.
Introduction
Public health or medical research centers on an input to output relationship. The aim is to examine whether input (explanatory) variables relate to the effect (output or outcome) variables. Alternatively, the purpose may be to test the null hypothesis (results are not because of chance), that is testing if the effect is only because of the input variables. As an example testing the link between obesity and diabetes can be a cause effect relationship (obesity is a cause for diabetes) or the relationship between obesity (expressed by weight) and diabetes (expressed by blood glucose level). Confounding (confusing) factors may shadow either cases, in the previous example, age or gender can be confounding factors. Therefore, in data analysis, it is essential to identify the variables as input, output, or confounding (Campbell, 2006).
The problem of multiple comparisons
Testing the null hypothesis serves to guard against unjustifiable conclusions. Testing one hypothesis (effect of a drug A to control hypertension) is primary analysis, occasionally, researchers use data obtained from the study population to examine multiple outcome variables (secondary analysis). Multiple comparisons means testing more than one hypothesis, in other words it is comparing two study groups for more than one output (outcome). Failure to use appropriate statistical methods weakens conclusions (Curran-Everett, 2000).
There are two inherent problems on using multiple comparisons, first is the false positive result that is higher probability of detecting non-existing positive outcome, which is a procedural problem (type I error). Second is the false negative that is the limited trial power to detect true treatment effects on secondary analysis, which is a problem, related to sample size (type II error). Therefore, concern about multiple comparisons health or trial studies arise if they are conducted with logical error as data mining, or data fishing (Lord and others, 2004). Examples of multiple comparisons in clinical trials and public heath studies include studying multiple effects (outcomes), multiple treatment comparisons (drugs A, and B to control hypertension). Examples also include subgroups analysis to detect treatment differences, study of prognostic factors, repeated outcome measures overtime, and provisional analysis for treatment effects during various stages of a trial study (Lord and others, 2004).
Data fishing
Data fishing (dredging or mining) is the inapt search for statistically significant relationships in a data set whether intentional or unintentional because of badly chosen procedure. In this case, the assumed statistical significance is false, that is significance tests do not protect conclusions against data fishing (Roddick and others, 2003).
Smith and Ebrahim (2002, pp. 1437-1438) describe an example of data fishing, the first is about the relationship between cancer breast, alcohol and tobacco consumption published in the lancet (October, 2002) then the relationship was contradicted few days later (14 November, 2002). They pointed that such contradictory results participate in generating conflicts and meaningless results. They suggested that data fishing in this case and other similar cases produce false results because of looking at various possible associations and or selection bias that produce study data where an exposure relates to different traits that increase or decrease a disease risk. Another reason for such false results can be confounding factors and this is clear when observational studies report associations not confirmed by controlled trials. Some argue that if a research hypothesis built on sound understanding of pathogenesis, then results will not be false. Smith and Ebrahim (2002, pp. 1437-1438) suggested this is not true because even then if the statistical technique is poor in controlling confounding factors, there will be a great degree of measurement errors. They inferred further measures to improve study design by measuring confounding factors better and increasing sensitivity analysis should result in more accurate results (Smith and Ebrahim, 2002).
Analysis of variance
In normal data distribution, the sample mean and standard deviation are approximations for the supposed population mean and standard deviation. If two sample means differ by more than a certain value compared to a combined standard error, then the two samples did not come from the same population. In statistical analysis, there are two sources of variability, the within group variability and the variability between groups. The use of analysis of variance is to decide whether the observed overall variation is because of either of these two sources (that is to test whether various means for diverse conditions or groups are alike across one variable) (Pipkin, 1986).
The One Way ANOVA tests the hypothesis of no differences between the several treatment groups, but do not determine which groups are different, or the sizes of these differences. Multiple comparison tests isolate these differences by running comparisons between the experimental groups. There are six multiple comparison tests to choose from for the One Way ANOVA, the choice of which test depends on the treatments required to test. There are two types of multiple comparisons available for the One Way ANOVA. The type of comparison depends on the selected multiple comparison test, all pair wise, or comparison versus a control group (Hardle and Hlavka, 2007).
Conclusion
Testing the null hypothesis using the P value can be false if the study includes multiple comparisons and produces multiple P values. This occurs on putting many independent research questions or comparing multiple groups. In this case using ANOVA tests produces more reliable results.
Reference List
Campbell, M.J (2006). Statistics at Square two (2nd edition). Sheffield: Blackwell Publishing.
Curran-Everett, D (2000). Multiple comparisons: Philosophies and illustrations. Am J Physiol Regulatory Integrative Comp Physiol, 279, R1-R8.
Hardle, W., and Hlavka, Z (2007). Multivariate Statistics: Exercises and Solutions. New York: Springer.
Lord, S., V., Gebski, V., L., and Keech, A., C (2004). Multiple analyses in clinical trials: sound science or data dredging? MJA, 181 (8), 452-454.
Pipkin, F B. (1986). Medical Statistics Made Easy. 2nd. Edition. Philadelphia: W.B. Saunders.
Roddick, J., F., Fule, P., and Graco, W., J (2003). Exploratory medical knowledge discovery: Experiences and issues. SigKDD Explorations, 5(1), 94-99.
Smith, D., G., and Ebrahim, S (2002). Data dredging, bias, or confounding. BMJ, 325, 1437-1438.