Introduction
Any statistical test that uses Pearson’s chi square distribution, frequently shortened to chi-squared distribution, is known as a chi-squared test. This statistical test is used to compute variations in the frequency distribution of certain events observed in a population with those of a specific distribution.
The events under consideration must be mutually exclusive, that is, they cannot happen at the same time, and have a sum probability of 1. However, for this method to work effectively, certain assumptions must be made, although the wording may differ in different textbooks, they generally state that:
- The sample has been drawn randomly from the selected population;
- The sample size, n, must be sufficiently large;
- The observations are independent.
All of these assumptions must be satisfied in the process of collecting data.
Sample size assumption
The chi-square test can be used to compute the differences in proportions by use of a contingency table. However, it is worthy to note that the chi-square test only gives an approximate probability value, on which a correlation factor is then applied. This technique only works if the sample size, n, is sufficiently large.
If the sample size is not large, then another statistical test, such as Fisher’s exact test, can be more suitable. A Fisher’s exact test yields more ‘exact’ values of the p-value because the size of the deviation from a null hypothesis can be computed exactly, instead of using estimates.
Moreover, if the chi-squared test is used on a small sample size, we might end up with a Type II error, in which the test fails to reject the null hypothesis even though it is erroneous (Aron et al, 2010).
Independence assumption
The second requirement in a chi-squared test is that of independence among the observations, that is, the test cannot be used on correlated data. In case the observations are related, then McNemar’s test will give the most exact p-values.
Randomness of the population Assumption
This assumption implies that a chi-squared distribution is drawn from a fixed population. This assumption leads to an interesting conclusion about the source of a chi-squared distribution: the sample from which the sample has been drawn is normally distributed (Aron et al, 2010).
However, this assumption need not be considered when computing the goodness of fit test and the independence test. This assumption is important because the distribution of counts under the null hypothesis is multinomial, and the normal distribution can be used to estimate the multinomial distribution if the sample size, n, is significantly large and the p-values are not too small.
It can be proven through the Central Limit Theorem that the multinomial distribution behaves like a normal distribution for large values of n. Hence, the assumption is tied to independence and sample size assumptions, and gives fairly accurate estimates of population parameters.
Conclusion
The chi-squared distribution is important in a number of statistical analyses. First, in testing the goodness of fit, it is used to compute deviations from experimental values and the expected values.
Secondly, it is used in testing the independence of two measures of a population, in such a scenario, we use the Chi square test for independence of two measures. Third, the Chi square test for single variance can be used to test the deviation of the theoretical variance from the observed variance.
Reference
Aron, A., Coups, E. J., and Aron, E. N. (2010). Statistics for the Behavioral and Social Sciences: A Brief Course, 4th Edition. Ontario: Pearson Education Canada