## Introduction

Customers of the major bottling company started complaining that the real amount of soda in the bottles does not correspond to the amount advertised, which is sixteen ounces. The problem requires investigation. Therefore, it was decided to assess several bottles picked at random at the bottling plant, and check whether the complaints of customers were feasible. The data used is presented in the table below.

## The Mean, Median, and Standard Deviation

The first part of the investigation consisted of measuring the mean, the median, and standard deviation for ounces in the bottle. The total number of bottles was 30. The data about the bottles is provided in Table 1 below.

*Table 1. Data collected from a sample of 30 bottles.*

The mean is calculated by summing all the numbers for the ounces and dividing them by the number of bottles (George & Mallery, 2016). The mean of the sample is x̅ = 15.8540.

The median is the central value of the range of numbers, so they are arranged from the lowest to the highest, and the middle number is found. If there is an even number of cases, the median is the average of the two middle numbers (George & Mallery, 2016) Median = 15.99.

The estimate of standard deviation (SD) of the population from the sample data can be found by using the formula:

SD = (((x_{1} – x̅)^{2} +…+ (x_{n} – x̅)^{2}) / (n – 1))^{0.5}.

In this case, SD = 0.661381.

## Confidence Interval

Knowing the mean, the sample size, and the standard deviation is enough to calculate the standard error of the mean (Warner, 2013):

SE_{mean} = SD / (n – 2)^{0.5} = 0.661381 / 28^{0.5} = 0.12499.

The (n – 2) here denotes the degrees of freedom; they are used to adjust the calculations to the fact that the data was collected from a sample, not the whole population (Warner, 2013).

Because in a normal distribution, 95% of the sample cases will fall into the interval of ±1.96 standard deviations from the mean, 95% confidence interval of the mean can be obtained as follows:

95% CI_{mean} = x̅ ± (1.96 × SE_{mean}) = 15.8540 ± 0.24498;

95% CI_{mean} = (15.6070; 16.1010).

## Hypothesis Testing

It should be assumed that the random fluctuations of the number of ounces have an approximately normal distribution, and they do not depend on the mean (so the standard deviations and the standard errors will be equal regardless of the means). Noteworthy, calculating the t-statistic does not require raw data; knowing the means, SDs, and sample sizes are sufficient (Field, 2013). Therefore, a hypothetical sample with the mean=16.0 and SD=0.661381 will be used.

Two means of independent samples of the same size can be compared using a t-test (Field, 2013):

t = ((x̅_{1} – x̅_{2}) – (µ_{1} – µ_{2})) / SE_{mean},

where µ_{1}, µ_{2} are the two population means.

The null hypothesis is: µ_{1} = µ_{2}, or (µ_{1} – µ_{2}) = 0. The alpha level will be α=.05. If the null is true, then (Field, 2013):

t = (x̅_{1} – x̅_{2}) / SE_{mean}.

Now, it is possible to calculate the t statistic:

t = (15.8540 – 16.0) / 0.12499 = -1.1681.

Because the sample size n=30 in the given data, and is assumed to be 30 in the hypothetical data, the degrees of freedom for the t-test: df = 30+30-2 = 58 (Field, 2013). The critical value (the maximal absolute value of t-statistic that one can expect to gain when the null hypothesis is true) of the two-tailed t-test for df=60 and α=.05 is 2.00; for df=60 and α=.01, two-tailed, it is 2.66 (Field, 2013). For lower degrees of freedom, the critical values are higher.

The modulus of the obtained t-statistic is smaller than the critical values for it. Therefore, the null hypothesis was not rejected at α=.05. In fact, it was not rejected even at α=.01. Thus, no statistically significant difference was found at α=.01 between the means of the given sample and a hypothetical sample with the same SD and with the mean = 16.0.

## Discussion

Therefore, it can be concluded that the claim of less soda per bottle was not supported by evidence; the difference in the mean of the drawn sample and the needed mean value of 16.0 was not statistically significant, that is, it can be attributed to statistical error. Noteworthy, the median value of the distribution was practically equal to 16 ounces (it was 15.99); so, the number of customers who received less soda is nearly the same as the number of clients who got more soda.

However, it should be stressed that SD was 0.661381, so nearly 34% of the bottles contained 15.3386-16.00 (mean – SD) ounces of soda, and nearly 13.75% of the bottles contained 14.6772-15.3386 ounces (George & Mallery, 2016, p. 113). Thus, it is likely that the customers who complained that there is little soda in the bottles were simply among those who were unlucky enough to purchase bottles with less soda. To mitigate the problem in the future, it is possible to either purchase more accurate equipment, which would measure soda more precisely, therefore reducing the spread of the values around the mean (and decreasing SD); or to increase the mean of soda poured into each bottle to make it more favorable for customers while still writing on the bottles that they contain 16 ounces. To decide which of these variants is better, it should be calculated whether purchasing new equipment (to decrease SD) or pouring more soda in the bottles (to increase the mean) while still writing the same mean of 16 ounces would be cheaper. Also, the legal issues should be taken into account; it may be illegal to write an incorrect mean on the package.

## References

Field, A. (2013). *Discovering statistics using IBM SPSS Statistics *(4th ed.). Thousand Oaks, CA: SAGE Publications.

George, D., & Mallery, P. (2016). *IBM SPSS Statistics 23 step by step: A simple guide and reference* (14th ed.). New York, NY: Routledge.

Warner, R. M. (2013). *Applied statistics: From bivariate through multivariate techniques* (2nd ed.). Thousand Oaks, CA: SAGE Publications.