Fatigue Impact on Human Raters Essay (Book Review)

Exclusively available on Available only on IvyPanda® • No AI

Table of Contents

Introduction
Summary
Critique
Conclusion
Works Cited

Introduction

The article “A Study of the Impact of Fatigue on Human Raters when Scoring Speaking Responses” was conducted in 2014 by Guangming Ling, Pamela Mollaun, and Xiaoming Xi. Informed by a research gap on the impact of time-related fatigue on rating quality in language testing, these authors set out to examine the effects of fatigue on human raters’ scoring performance within the context of the TOEFL iBT Speaking Test. The authors succeeded in showing how rating accuracy, quality, and consistency vary across time due to fatigue-related effects.

Summary

The study employed a quantitative experimental research design to “examine the effects of fatigue on human raters of audio responses by comparing rating accuracy and consistency at various time points throughout a scoring day, under shift conditions that differ by shift length and session length” (Ling, Mollaun, and Xi 479). The authors consulted other sources to show that the burden placed on raters’ concentration and cognitive capabilities in scoring speaking responses may trigger time-related fatigue and also aggravate scoring accuracy and consistency. Overall, the findings proved that (1) scoring productivity and quality differ widely across hours irrespective of shift conditions, (2) shorter scoring shifts (e.g., 6-hours shifts and those that switch to a new speaking task every 2 hours) enjoy higher rating accuracy, greater hourly productivity, and greater rating consistency across time than longer scoring shifts, and (3) longer scoring sessions and expanded scoring length are positively associated with increased fatigue.

Critique

The main argument of the study is that fatigue occasioned by longer scoring shift length and session length affects the human raters’ rating quality, accuracy, as well as consistency. The writers’ perspective is embedded in the fact that scoring shift length and session length in language testing may indeed occasion undesirable fatigue-related outcomes. This perspective is consistent with that of other authors, who argue that exam proximity (time between cognitive tasks) has a powerful impact on performance due to cognitive fatigue (Fillmore and Pope 12). Drawing from this comparison, it can be argued that the writers’ ideas on fatigue are similar to those of other writers. Indeed, their ideas are effective in demonstrating how scoring shift length and session length affect the rating quality, accuracy, and consistency in language testing.

The authors do not show any bias and are qualified to write in this area based on their affiliation with the United States Educational Testing Service. Their main ideas are easily agreeable based on the experiments done in the study to prove them; however, the authors fail to discuss how other factors (e.g., regional traditions of language testing) can impact rating quality, accuracy, and consistency (McNamara and Knoch 556). Although their introduction and literature review sections are easy to understand, other areas (e.g., methodology and results in sections) seem complicated to a general reader. Additionally, although the writers use reputable sources to back their arguments, they fail to connect the available literature to their main findings in a manner that could reinforce their main contributions in the area. Overall, the study can be recommended to students and professionals who may want to understand the effects of fatigue in language testing and scoring.

Conclusion

The article is effective in demonstrating how fatigue-related factors affect the rating accuracy of human raters in language testing. Language testing and scoring professionals may find this article interesting as it provides a glimpse of ideal scoring shift length and session length for optimal productivity in rating quality, accuracy, and consistency.

Works Cited

Fillmore, Ian and Devin G. Pope. The Impact of Time between Cognitive Tasks on Performance: Evidence from Advanced Placement Exams, 2012. Web.

Ling, Guangming, Pamela Mollaun and Xiaoming Xi. “A Study of the Impact of Fatigue on Human Raters when Scoring Speaking Responses.” Language Testing. 31.4 (2014): 479-499. ERIC. Web.

McNamara, Tim and Ute Knoch. “The Rasch Wars: The Emergence of Rasch Measurement in Language Testing.” Language Testing. 29.4 (2012): 555-576. ERIC. Web.

More related papers Related Essay Examples