What is a sample and how does it differ from a population? Provide an example of each concept.
A population and a sample have different meanings in statistics. A population means the entire information under a statistical inquiry. It is the entire set of observations of a study obtained through complete enumeration or census (Waclawski, & Church, 2002). The population is sometimes called the universe.
A sample means a portion of the population that is selected for a given statistical inquiry. It is drawn from the parent population and has properties similar to those of the parent population. Therefore, a sample statistic gives an estimate of the population parameter (Di Pofi, 2002). To achieve this, it must be representative of the population in property and character; it must give an accurate picture of the parent population so that the corresponding properties of the whole observations can be estimated accurately.
For example, to find out the opinion of American citizens about the firearm policy in the U.S., the population would be the entire United States citizens. A sample of 1000 people can be drawn randomly from different states to represent the population. The sample must be random and representative to avoid selection bias.
Describe the basic characteristics of a normal distribution and the normal curve. What is the usefulness of the normal distribution and its applications in statistics?
The normal distribution is an important type of theoretical distribution. Unlike other theoretical distributions such as Poisson and Binomial distributions that are discreet, the normal distribution is a continuous probability distribution (Di Pofi, 2002). It is often known as a Gaussian distribution after F. Gauss who first studied and derived the normal distribution equation. The normal curve is a symmetrical curve that runs indefinitely in either direction of the horizontal axis but does not touch it (Di Pofi, 2002). The normal distribution is expressed as N (µ, σ), with µ representing the mean and σ representing the deviation (SD) about the mean. Varying values of µ and σ means that a normal distribution can take different shapes. The probability distribution, p (x), of a variable in a normal distribution is determined by the area under the normal curve. The normal curve encloses an area that has a probability of 1.
The normal distribution has several basic properties; first, its mean, mode, and median are all equal. The normal distribution also has only one mode. It is symmetrical about the µ; the two halves about the µ mirror each other (Di Pofi, 2002). Also, the area under the curve, which is 1, is divided into 0.5 to the right and left of the mean value of a normal distribution. At the µ value, the curve is concave, at ±σ, it curves inwards forming the points of inflection and at ±3σ the curve becomes convex relative to the x-axis.
The normal distribution has many applications in practice. Most sampling distributions including Z-distribution, F-test, and t-test involve normal distribution for large values of n (n→∞). Also, calculations involving test statistics such as F-test and student’s t-test are based on the assumption that the parent population is normally distributed (Di Pofi, 2002). The normal distribution is often applied in the central limit theorem, which is an important theorem in statistics. The normal distribution is also applied in defining confidence limits in statistical sampling. Distributions that do not follow a normal distribution are first transformed using a logarithmic notation and a normal curve drawn to find the probability distribution. The normal distribution also finds applications in quality control and production processes that need control limits to be set.
What is the usefulness of transforming raw scores to standardized scores, such as z scores? What do these numbers tell us?
A normal distribution is only defined by µ and σ. As stated earlier, due to variations in µ and σ values, different shapes of normal curves can be obtained. However, by transforming a set of normally distributed observations to standardized scores (Z-values), it is possible to get a standardized form of the normal distribution with a constant probability under the curve, which can be read from tables. This area corresponds to the probability values of the random variable in the distribution.
Normal distribution values are converted to Z scores using the expression, Z = X- µ/σ. Thus, for a standard curve, the mean is equal to 0 and the standard deviation is ±1. An important characteristic of the standard normal distribution, p (z), is the area under the curve. At µ±σ, the area under the curve is 68.26%; at µ±2σ the area is 95%; and at µ±3σ the area is 99.73% (Di Pofi, 2002). These values represent the confidence limits within which the probability lies.
A test measuring basic literacy skills in children is normally distributed, with µ = 18 and σ = 5. Calculate the z score for each of the following test scores, then explain what each z score means or tells us about that raw score:
X = 11
X = 17
X = 21
X = 25
Provide a real-world example demonstrating the application and usefulness of standardized scores. Include any potential issues in using these scores.
The normal distribution is given by N (µ, σ); thus, the distribution given can be represented as N (18, 5). First, the raw scores are transformed to Z scores using the formula Z = X-µ/σ. Thus, X = 11 becomes, 11-18/5 = -1.4; X = 17 becomes 17-18/5 = -0.2; X = 21 becomes 21-18/5 = +0.6; and X = 25 becomes 25-18/5 = +1.4. When Z = -1.4, covers the area under the curve to the left of Z = -1.4, which from the standardized table has a probability of 0.5808. Hence, 58.08% of the children have basic literacy skills. When Z = -0.2, the probability is 0.9207, thus 92.07% have basic literacy skills; when Z = +0.6, the probability is 0.2257, thus 22.57% of the children have basic literacy skills; and when Z = +1.4, the probability is 0.4192, thus 41.92% of the children have basic literacy skills.
Standardized scores have many real-world applications. For instance, Z-scores have been applied in hiring new managers. It is known that extroverted individuals make good managers. Thus, an applicant with a higher extroversion score (e.g. 44), assuming a constant standard deviation (e.g. 2) about the mean (e.g. 40) would be preferred to an applicant with lower scores (e.g. 37). However, one limitation of using Z scores is that they can only measure up to a maximum of ±3.9.
References
Di Pofi, J. A. (2002). Organizational diagnostics: Integrating qualitative and quantitative methodology. Journal of Organizational Change Management, 15(2), 156-168
Waclawski, J. & Church, A. H. (Eds.). (2002). Organization development: A data-driven approach to organizational change. San Francisco: Jossey-Bass.