# Statistics: Chi-Square Test and Regression Analysis Report

Available only on IvyPanda
Updated: Mar 18th, 2020

## Introduction

Chi-square test and regression analysis are important statistical tests with many applications in sciences (Satorra & Bentler 2001; Zikmund & Babin 2010). The reliability of study findings is determined by the types of statistical tests used. The statistical tests are used to provide statistical power to studies because studies with sufficient power could have reliable findings and good conclusions.

Chi-square test assesses the relationship of frequencies that are assumed in the hypothesis and those observed in experiments (Satorra & Bentler 2001). The test determines how close expected data are to the observed data. In other words, the statistical test is said to determine the goodness of fit. It is a nonparametric test which implies that it does not assume a normal distribution of data.

Nonparametric tests incorporate fewer assumptions than parametric tests. However, parametric tests use larger sample sizes and have been shown to have more power than nonparametric tests. Chi-square test is used to analyze nominal data mostly in chi-square distributions (Satorra & Bentler 2001). The distribution of data in the chi-square distribution is positively skewed.

Regression analysis is used to test the relationship between independent and dependent variables in a study. Regression analysis does not assume that the independent variables are correlated with the dependent variables in a study.

The analysis yields a line of relation that could be used to create a formula for establishing further relationships between variables. Regression analysis is used to ascertain whether independent variables influence dependent variables in a study. However, the statistical method does not assume that there is a correlation between the predictor and causative variables.

## Chi-square test (usefulness, situations, and conditions)

The chi-square is a nonparametric test used by researchers to estimate the relationship between variable frequencies in a study. The test has many uses in the sciences because it relies on fewer assumptions than parametric tests (Zikmund & Babin 2010). The test attempts to establish the statistical significance of the relationships among variables in a study.

To determine the statistical significance of the relationship among frequencies, chi-square test calculates the X2 value. The formula for the value could be contained in contingency tables, which also give the degrees of freedom (Lydersen, Fagerland & Laake 2009; Satorra & Bentler 2010). The complexity of the formula is enhanced by an increase in the number of rows in the table used in the analysis.

The X2 value is used to reject or accept the null hypothesis. If the computed value is more than the average value, then the null hypothesis is not accepted. The rejection of the null hypothesis can indicate that the two categories of frequencies computed are not independent.

Frequencies which are not independent imply that they are dependent on each other. In such cases, it could be concluded that the relationship between the two categories of variables is statistically significant (Satorra & Bentler 2010).

Chi-square test is used in many scientific studies. For example, the test could be used to determine the statistical significance of the relationship between disease and exposure in epidemiology. The results from such studies could be important in determining whether the disease is associated with exposure to disease, causing microorganisms. Data values for disease and exposure are displayed in a contingency table to make vital computations.

Such a study would assume that there is no relationship between the disease and exposure values. A small value of the X2 could support the assumption that there is no association between disease and exposure. Therefore, it could be concluded that exposure does not correspond to disease. On the other hand, accepting the alternative hypothesis could imply that exposure and disease are statistically correlated.

Chi-square test is used to assess the efficacies of two drugs in a clinical trial. Clinicians would be interested in determining the pharmacological efficacies of two drugs used to treat a disease. The expected and observed frequencies of healing for the two drugs could be entered into contingency tables.

The X2 value for the two contingency tables could be compared to determine the value, which implies higher cure rates than the other. For example, the drug with a higher X2 value could imply that it has better cure rates than the other drug. Therefore, a better drug could be adapted to treat the targeted disease. Such studies are essential in both in vivo and in vitro assessments of drug efficacies.

Chi-square test is applicable in situations where there are many categories of possible outcomes. For example, patients could be categorized based on the disease status following treatment. The status could be unchanged, worse, or improved. Determining the X2 values for the different categories of patients could go a long way in determining the efficacies of drugs used in the hospital.

Another categorical situation could involve organizing students based on their levels (freshman, junior, and senior). Another situation accommodated by chi-square test involves more than one aspect of classification. For example, students could be organized on whether they are “conservative” or “liberal” in addition to their academic levels (freshman, junior, and senior).

Patients could also be categorized based on their ages and sex, as well as their disease status. This grouping of items in statistical computations in more than two categories is known as cross-categorization. Cross-categorization in chi-square test increases the assessment levels of studies.

This implies that many aims of a study could be achieved by the use of a chi-square test. For example, clinicians can understand the disease status, age, and sex of patients on particular medications.

There are certain conditions which must be met for efficient applications of the chi-square test. One of the conditions is that the data must not follow a normal distribution. Data which are normally distributed involve large sample sizes (n>30). Such sample sizes assume that the measures of central tendency lie in the middle of the distributed data.

Another condition is that the number of observations in experiments should be too small to address the normality of data in all the categories used in the analysis (Satorra & Bentler 2010). Chi-square test should be used when a study uses more than one category of data. The categories of data are analyzed to give the statistical significance of the expected and observed frequencies.

Another condition for the chi-square test is that the hypothesis used should assume the distribution of the median, but not the mean (Satorra & Bentler 2010). This could be explained by the high level of variation of the small sample mean values from the true sample mean. Thus, the calculated means in chi-square calculations involving small sample sizes could not be used to make conclusions about the population.

However, it has been demonstrated that medians are not affected by sample sizes. Therefore, the average median of a small sample could be used to make conclusions about the population (Satorra & Bentler 2010).

## Regression analysis (usefulness, situations, and conditions)

Useful applications of regression analysis are found in many areas like finance, pharmacology, economics, marketing, and biology, among other areas of study (Zikmund & Babin 2010). The statistical tool is applied to assess the impact of the causative variables on the predictor variables (Phillips & Moon 1999). For example, econometrics uses economics and mathematics knowledge to determine the effects of various factors on economics.

Regression analysis is used to determine the factors that influence the price elasticity of demand. This is achieved by the use of estimation of one or more curves concerned with the demand for a product. Factors like the price of the product and the price of its competitors are considered. Regression analysis is useful in the determination of prices in litigation. Through regression analysis, firms could be grouped in terms of their charges.

In such cases, regression analysis considers many factors in the litigation context. Firms could be categorized in the same market if their services show the same patterns of elasticity of demand (Preacher, Curran & Bauer 2006; Zikmund & Babin 2010).

Regression analysis could also be used to identify the factors that are essential in determining salaries earned by employees in the public service. Several factors could be hypothesized to affect the amount of money earned by public service employees. These factors could be experience, age, motivation, and educational level, among other factors. The analysis could tackle each of the factors at a time.

For example, educational levels could be determined by years of schooling and different academic programs. It could be assumed that higher levels of education do not correlate with higher earnings in the public service. Regression analysis could be used to provide values that are essential in concluding.

An r-squared value closer to 1 could indicate that the variables are much related, while R-squared values closer to zero (0) could imply that variables are less related. However, the R-squared values could not be used to establish the implication of the computed relationships. A P-value could be interpreted to give the implication of the findings (Preacher et al. 2006).

For example, a P-value less than 0.05 (P<0.05) would indicate that the results are statistically significant, leading to the rejection of the null hypothesis. Regression analysis could also be used to determine the factors that make patients prefer some hospitals.

It could be hypothesized that past treatment outcomes, pricing, location, and distance do not influence patients’ patterns of attending hospitals to access health care. The factors could be analyzed to determine their significance in determining the hospital visits. Results from the study would be important in enhancing the quality of health care in the future.

Regression analysis is best applied in specific situations. Regression analysis requires to be applied in situations where the right mechanisms are used to specify missing data. This is crucial because the analysis that involves the computation of variables with a lot of missing data would result in bias (Phillips & Moon 1999). On the other hand, data sets that have all variable entries tend to have no bias and would lead to a better interpretation of results.

Another situation requires correct data sampling to be used in regression analysis. It has been shown that incorrect sampling procedures result in the collection of data that do not represent the population from which a sample is derived. It has also been shown that right sampling techniques ensure that the sampling probability is improved within a sample.

Improving the sampling probability would also go a long way in explaining the mechanisms through which missing data would be compensated in a study. Regression analysis would also fit in situations where the conditional likelihood and correct approach of estimating weighted attributes are specified.

Regression analysis is best applied in situations which satisfy specific conditions. Firstly the sample should be a representative of the population. Secondly, the error expected in the analysis should be a random value with an average of zero on the independent variables.

Thirdly, the dependent variables should show linear independence. Fourthly, the variations of the error should not change across observations, and when variations occur other methods should be used (Preacher et al. 2006).

## Conclusion and recommendations

Chi-square test is a powerful statistical method for assessing the significance of differences between observed data in experiments and assumed data. Determining the significance of the differences in data is essential in making conclusions about a sample and the population from which the sample is derived. Regression analysis is used to establish the correlation between causative variables and predictor variables in a study.

The statistical method does not hypothesize that variables are correlated. The relationship findings obtained through regression analysis could be used to make conclusions about the population from which a sample is derived. Both the chi-square test and regression analysis have many applications in several study areas across the world.

It would be recommended that scientists use the right statistical tests that would give reliable results. Also, studies in the future should find ways of making a chi-square test and regression analysis accommodate more situations for analysis.

Lydersen, S, Fagerland, MW, & Laake, P, 2009, “Recommended tests for association in 2× 2 tables”, Statistics in medicine, Vol. 28, No. 7, pp. 1159-1175.

Phillips, PC, & Moon, HR, 1999, “Linear regression limit theory for nonstationary panel data”, Econometrica, Vol. 67, No. 5, pp. 1057-1111.

Preacher, K.J, Curran, PJ, & Bauer, DJ, 2006, “Computational tools for probing interactions in multiple linear regression, multilevel modeling, and latent curve analysis”, Journal of Educational and Behavioral Statistics, Vol. 31, No. 4, pp. 437-448.

Satorra, A, & Bentler, PM, 2001, “A scaled difference chi-square test statistic for moment structure analysis”, Psychometrika, Vol. 66, No. 4, pp. 507-514.

Satorra, A, & Bentler, PM, 2010, “Ensuring positiveness of the scaled difference chi-square test statistic”, Psychometrika, Vol. 75, No. 2, pp. 243-248.

Zikmund, WG, & Babin, BJ, 2010, Exploring Marketing Research, 10th edn, Mason, OH: South-Western Publisher.