Purpose
The purpose of this research paper was to use a non-parametric Chi-Square test to determine the potential relationship between gender and age among California DUI drivers involved in fatal crashes. Accordingly, the statistical analysis should help determine the significance of the relationship in the number of such crashes among genders — men and women — as a function of age. The results of this test will answer the questions listed under Methodological Questions.
Focus
This research paper focuses on the problem of drunk driving in California, US. By ignoring traffic laws and regulations, drunk drivers not only endanger their own lives and the lives of their passengers but also endanger other road users, including pedestrians. Newsom et al. (2021) collected official data on the number of fatal crashes for intoxicated drivers in 2018 and provided it in their report. The data, differentiated by gender and age of drivers, are of interest in terms of the Chi-Square test to determine the relationship. Specifically, Chi-Square is a non-parametric test that detects a relationship between two categorical variables, just as applicable in this case, where the variables are age groups and gender.
Methodological Questions
The methodological basis for this research paper was to conduct a non-parametric test to determine the independence between two categorical variables, age and gender, of California drivers responsible for fatal crashes due to drunk driving. The data came from a search of official statistics that resulted in a government report by Newsom et al. (2021). A close reading of that report revealed information on the age-gender distributions of fatal crashes in which drivers were under the influence of alcohol or drugs. The data were copied into IMB SPSS v.28 and then used to run the Chi-Square test. More specifically, the expected frequencies for each cell in the original table and the number of degrees of freedom were manually calculated to determine the χ2 parameter. The null hypothesis of the study assumed that there was no significant relationship between the age and gender of drunk drivers, and thus the variables were independent. In contrast, the alternative hypothesis postulated that both variables were not independent, implying that there was a relationship between them.
To determine the statistical significance of the findings, a critical level of significance alpha equal to.05 was used, comparing the calculated p-value, which allowed either rejecting or accepting the null hypothesis. Hence, the methodology described allowed the following questions to be fully answered:
- Is there a relationship between age and gender in fatal crashes involving drunk drivers?
- Are the results statistically significant?
- If the results were significant, what exactly is the relationship between age and gender found?
Data Collection
The statistical test used data provided by Newsom et al. (2021) in their 2018 annual report on California traffic accidents. The authors reported a large number of data and tables, depending on the focus and angle of their review. However, Table 22a (p. 87) was of particular interest, which presented data on fatal crashes for DUI drivers based on gender and age.
Analysis and Procedure
One of the primary procedures for processing the data is to visualize it using a scatter plot. Figure 1 below shows the data points as a function of age for the two gender groups and the linear regression trends for each set. Based on this visualization, an initial conclusion can be drawn that men are more likely than women to be involved in crashes due to drunk driving, as each data point for women has a lower frequency than the same value for men. This is also supported by scientific data reporting that male drivers are more likely to be responsible for fatal crashes (Høye, 2020). At the same time, the trend is downward for age, that is, for both gender groups, the number of accidents due to drunk driving decreases on average with increasing age, although before the age of 21, the frequency of these cases increases.
In inferential analysis, examining the potential relationship between the data is of most interest. Table 1 shows the raw data used for the Chi-Square test. The analysis demonstrated a significant relationship between the variables, namely χ2(8, N = 19232) = 183.378, p <.001. In other words, the results indicate that there is a significant relationship between the age and gender of DUI drivers because there is sufficient evidence to reject the null hypothesis.
Table 1. Raw data on the number of fatal crashes for gender and age groups.
Table 2 reports intermediate Chi-Square test results determining values of expected frequencies. Since the significance of the relationship between age and gender has already been determined, it is interesting to identify the nature of this relationship. In particular, as seen in Table 2, the number of fatal crashes caused by DUI male drivers aged 31 to 40 and 60 to 69 was higher than expected when the null hypothesis was confirmed. It may follow that men in these age groups are more likely to commit drunk driving crimes. For women, similar findings hold for the following age groups: under 18, 18 to 30, 41 to 59, and over 70. Although men tend to commit these types of crimes more often, the results cautiously show that the number of age groups of women in fatal crashes that deviate from the null hypothesis is higher than for men.
Table 2. Comparison of observed and expected frequencies for the gender-age distribution.
References
Høye, A. (2020). Speeding and impaired driving in fatal crashes — results from in-depth investigations. Traffic Injury Prevention, 21(7), 425-430. Web.
Newsom, G. C., Omishakin, T., & Gordon, S. (2021). Annual report of the California DUI management information system. Web.