The impacts of dropout prevention programs are usually assessed using experimental methods since the outcomes are based on similar characteristics from randomly assigned treatment and control groups. Non-experimental methods are however necessary for some situations and therefore it is worth determining whether propensity-score methods can replicate the impacts of experimental designs. This study compares outcomes of experimental designs with those of propensity-score matching using secondary data analysis from the School Dropout Demonstration Assistance Program (SDDAP) as well as the National Education Longitudinal Study (NLES). Propensity-score methods are identified as ineffective in replicating experimental impacts of school dropout programs.
Roberto Agodini and Mark Dynarski’s article “are experiments the only option? A look at dropout prevention programs” is published in The Review of Economics and Statistics 2004 volume 86, number 1 from page 180 to 194. Agodini and Dynarski determine the possibility of getting unbiased program impact estimates when propensity-score methods are used to determine dropout. This is done by comparing estimates derived using experimental methods with those derived from propensity-score methods. It is first recognized that experimental designs are valid enough in portraying the impacts of a program since the control and the treatment groups have similar observed and unobserved characteristics. However, conducting experimental studies in some settings such as when programs are not achieving the minimum capacity or in cases where treatment affects the entire population that would be qualified for program services. The propensity-score method is thus viewed as an alternative design that can evaluate programs and display similar impacts to those that would be attained if experimental methods are used. In this study, Agodini and Dynarski (p.180) seek to know the ability of propensity-score methods to duplicate experimental impacts of student absenteeism, dropout, self-esteem and educational aspirations. The authors also seek to know the extent to which propensity-score methods can replicate experimental impacts using less extensive data that is readily obtained for public use. Finally, the authors seek to determine the precision of propensity-score-based impacts estimates.
Agodini and Dynarski (p 182) primarily conduct data analysis to determine the possibility of propensity-score methods replicating experimental methods in getting impacts of services programs. Data is obtained from the School Dropout Demonstration Assistance Program (SDDAP) as well as the National Education Longitudinal Study (NLES). SDDAP addresses the school dropout problem in the U.S. with a target on middle- and high school students. An experimental design was used to evaluate SDDAP targeted programs whereas a comparison design was used to evaluate SDDAP restructuring programs. For the experimental design, randomly assigned (to either a treatment or a control group) students were assessed. In the comparison design, the treatment groups were assigned to 16 SDDAP targeted programs in a random manner. Baseline evaluations were done and follow-up was conducted (two follow-ups for one cohort and one follow-up for the other cohort) and data was collected using detailed questionnaires as well as data from school records.
Two comparison groups were selected using the propensity-score methods to match the sixteen treatment groups. One group was obtained from SDDAP restructuring programs whereas the second group was obtained from NELS data. The propensity-score matching helped in selecting a comparison group that had average similar characteristics to those of the treatment group and not an exact match. Agodini and Dynarski (p. 185) utilized the logit model estimates to determine comparison groups, followed by assignment of a propensity score to members of treatment groups as well as the equivalent comparison member and finally selecting the nearest neighbor from the comparison group and assigning to the treatment group subject. The t-test helped determine how similar propensity scores were for the treatment and comparison subjects. Further, an f-test helped in determining how similar the characteristics of the two groups were collective. A p-value greater than.05 for both tests indicated that the two groups matched well in their characteristics. Several characteristics were used to determine the eligibility of subjects and these included demographic characteristics, parental education, time use, school attendance and participation in school activities, the student’s background as well as academic performance of students among others. To determine standard errors in the experimental impacts, the authors used standard analytic formula whereas those of propensity-score were determined using bootstrap methods.
It is established that random assignment of subjects helps in reducing experimental bias (Nancy & Grove, p. 245). This is attained in this study among the treatment groups but it is lacking in the comparison groups. It is also important to note that experimental research design ensures that internal validity is controlled in that the independent variables are kept uniform for all subjects. As such, the experimental design in this study maintained internal validity which is not guaranteed in the comparison group even though the selection criteria maximizes on a similarity of characteristics. It is, therefore, no wonder that the findings of this study establish propensity-score methods as highly unlikely to replicate the experimental impacts on school dropout programs. By using secondary data, the study may not be reliable for making comparisons even for the experimental design since the possibility of manipulating the independent variables is highly limited (Calmorin & Calmorin, p. 74). The use of primary data is therefore advisable to enable the manipulation of variables and effective comparison.
It is commendable that the sample size for the entire comparison group study is large enough (about 3,000 subjects) and therefore may have substituted for the internal validity which is achievable in experimental studies even when small samples are used. In the selection process for the comparison group design, important factors such as the region where students were studying were not a major consideration yet the environment (rural or urban) may have determined the outcome variables. It is also important to mention that the propensity-score method being a non-experimental method; failed to determine the effects of observed and non-observed factors on the outcome. As such, there is no firm basis for comparing the experimental and propensity-score methods outcomes. This study also ignores confounding variables such as an individual’s motivation which may affect the outcomes of the study.
Based on the findings that propensity-score methods can barely replicate experimental effects in dropout prevention initiatives, it is advisable for policymakers to advance experimental methods. This would avoid working on assumptions related to outcomes. Whereas propensity-score methods are not appealing for replicating experimental methods, they should be considered in settings that allow the researcher to direct participation, since this would deter unobserved factors from the study.
References
Agodini, Roberto and Dynarski, Mark. Are experiments the only option? A look at dropout prevention programs. The Review of Economics and Statistics, 2004, 86,1: 180-194.
Burns, Nancy and Grove, Susan K. The practice of nursing research: conduct, critique, and utilization, (5th edition). St. Louis, Missouri: Elsevier Saunders, 2005.
Calmorin, Laurentina Paler and Calmorin, Melchor A. Research methods and thesis writing’ 2007 Ed. (2nd edition). Samaplaoc, Manila: Rex Bookstore, Inc., 2007.