Nowadays a great variety of testing fixtures for the assessment of intellectual and cognitive abilities in children and adolescents in the educational settings exist. Wechsler intelligence test is one of the most commonly used in the present-day practice. The Wechsler Intelligence Scale for Children–Fourth Edition (WITC-IV) is the latest version of the test used to assess children from 6 to 16 years of age.
According to Mrazik and his colleagues (2012), WITC-IV “requires advanced graduate training in psychology to learn its administration, scoring, and interpretation properties” (p. 280). The majority of administration and scoring errors emerge due to the lack of expertise and experience in the test conductors.
WITC-IV involves multitasking, and it creates challenges in mastering the test administration both for the psychology graduates and experienced practitioners. According to Loe, Kadlubek, and Marks (2007), “Each response must be recorded and evaluated in real time with respect to both administration and scoring procedures while simultaneously maintaining rapport with the examinee and attending to a multitude of behavioral and affective responses” (p. 237).
Therefore, the causes for the mistakes may be many. Moreover, the test manual doesn’t cover the specific techniques for the successful conduction of children assessments, and the skills of effective administration, scoring, and interpretation may be developed only through practice (Mrazik, Janzen, Dombrowski, Barford, & Krawchuk, 2012, p. 280).
In the previous research findings, the most common mistakes in WITC-IV conduction were related to the recording of data and answers, administration, and computation. The administration errors may include the lack of clarification in case of the ambiguous answers in verbal testing or the incorrect assignment of test item points, and the computation mistakes involve the wrong conversion of scores to standard (Loe, Kadlubek, & Marks, 2007, p. 238).
And it is observed that the psychology students make these errors in the WITC-IV subtests that have a wide range of responses, i.e. the testing related to the verbal aptitude assessment such as “Comprehension, Vocabulary, and Similarities” (Loe, Kadlubek, & Marks, 2007, p. 238). The multiple errors negatively affect the reliability and validity of results and Full-Scale IQ interpretation.
According to APA’s standards for educational and psychological testing (2003), “for each total score, subscore, or combination of scores that is to be interpreted, estimates of relevant reliabilities and standard errors and standard errors of measurement or test information functions should be reported” (p. 3). For the improvement of the reliability indexes, the measurements and assessments must be conducted in a consistent and repeated way.
The provided information about the assessment errors is a vital element for the evaluation of testing tools. The adoption of the error measurement practices assists in the improvement of the interpretation of scores. Moreover, according to standard 3.8 (2003), the consideration of sample, testing purposes, and intentions is crucial for the appropriate test development (p. 4). The appropriate development of test involves the administering regulations, identification of the scoring procedures, and data reports.
The standards for educational and psychological testing (2003) inform that the level of interpretability and the utility of the test results are highly dependent on the directions given to the testing participants, the conditions of test conduction, and the consistency of scoring techniques (p. 5). By following these rules, the examiner reduces the chance of the error emergence and ensures that test follows the standardization requirements.
Test Fairness
According to APA, AERA, and NCME (1999), test bias “is said to arise when deficiencies in a test itself or the manner in which it is used result in different meanings for scores earned by members of different identifiable subgroups” (p. 74). Warne, Yoon, and Price (2014), give a more detailed definition of the term; the test bias includes score gaps between different groups of examinees, differences in functioning and interrelations of test items, and the consequences of test scores interpretation provoking the social inequalities (p. 571).
In is observed by the researchers, that the differences in intelligence and academic achievements testing scores are most noticeable among the distinct ethnic groups (Lohman, 2005). The score gaps are observed between the Asian, White, Hispanic, and African American students. The gaps in scores are commonly explained by the test bias. Therefore, the comparison of scores across different racial groups cannot be considered reliable.
All educational tests are characterized by the predictive validity. “If mean score gaps themselves are not evidence of bias, perhaps a test could be biased if it is better at predicting outcomes for some groups and worse at predicting outcomes for other groups, a situation called differential predictive validity” (Warne, Yoon, & Price, 2014, p. 572). It is observed that the differences in predictive validity in the assessment of different ethnical groups may be influenced not merely by the test scores but by the multiple independent variables including the financial situation, anxieties, college environment, etc. (Zwick, 2007). To avoid this bias, the different groups’ scores must be interpreted separately.
According to Warne, Yoon, and Price (2014), in case the test puts a demographical group in a disadvantageous position in the society then it is biased and has “poor consequential validity” (p. 577). It is said that when a standardized psychological or educational test has negative impacts on the social position of a group, then it is unfair. The consequential validity is of moral and ethical character while the other biases are caused by factor invariance and are based on statistics.
According to the standards (2003), the concept of fairness is interrelated with the examiners’ responsibility in administration, scoring, reporting, and interpretation of test results (p. 6). The interpretation of test is dependent on the level of professionalism and the technological advancement. According to the information mentioned in the standards’ “Fairness in Testing and Test Use” section (2003), fairness may be regarded as “lack of bias, equitable treatment in the testing process, equality in outcomes of testing, and opportunity to learn” (p. 6).
It is possible to assume that it may be unfair to implement testing results to identify if an individual can graduate the college or be hired for a job, yet the researchers claim that fairness and test bias are qualitatively different concepts (Warne, Yoon, & Price, 2014, p. 577). The unfair usage of test results doesn’t necessarily indicate that the test is biased. The concept of test bias is of the technical character and is related to the interpretation of the scores.
According to APA, AERA, and NCME (1999), the lack of test bias may be regarded as the premise of the fairness of the test results. Therefore, for the data validity and reliability increase, the psychologists need to follow the standardized procedures in the test conduction and administration, consider the great variety of independent variables that may provoke test bias, especially the ethnic and racial ones, and compare the scores only across the groups with the same demographic background.
Technology and Psychological Assessment
“Computer-based assessment is a relatively new but exponentially growing field in psychological assessment” (Krkovic, Pasztor-Kovacs, Gyongyver, & Greiff, 2013, p. 1). New informational technology allows the psychological assessment conductors to benefit significantly by adapting the paper-and-pencil versions of tests into the computerized formats. Computerization of assessments allows elaboration of the innovative assessment tools and usage of various video and audio resources. It is more flexible than the traditional forms of tests, it has the time-saving properties, and is characterized by convenience in the data storage and report.
“Recent research results show that the collection and analysis of many kinds of assessment tasks data can be easily done by computers” (Krkovic, Pasztor-Kovacs, Gyongyver, & Greiff, 2013, p. 1). Along with the content-analysis, the computerized versions of tests provide the automatic scoring of the results. The major strengths of technology-based assessment are a high level of test accessibility (online, offline), immediate feedback in testing and scoring, high level of testing efficiency, and the increase of results objectivity.
As the recent study indicates, the implementation of technology is extremely helpful in the assessment of children with “physical or communicative impairments” (Warschausky et al., 2012, p. 472). Nevertheless, the weakness of the technology usage is the potential risk of the shifts in the test’s psychometric properties. When the technology is applied to assess individuals with different mental and physical disabilities, the standardized procedures usually become modified, and, as the result, the construct validity may suffer (Warschausky et al., 2012, p. 472).
According to the APA standards (2003), the test conduction requires the specific conditions and accommodations: time frames, answering formats, selection of test, etc. (p. 2). According to Pade (n.d.), while using the computerized versions of tests, it is important to take into consideration the norms and standards of test conduction, “recognizing that most were based on paper-and-pencil methods and this has a potential impact on equivalency” (p. 13).
While deciding which format of testing to use, it is important to consider the examinee’s needs and capabilities, location, testing environment, and conditions. The reliability and validity of results depend on accountability of all these aspects in the application of both computerized and paper-and-pencil versions of tests.
The application of technology implies a high level of expertise and psychologist’s technical competence. According to APA (2013), “psychologists have a primary ethical obligation to provide professional services only within the boundaries of their competence based on their education, training, supervised experience, consultation, study or professional experience” (p. 7). The given rationale is regarded as the main principle of the ethical practice of psychology and compliance with it leads to the reduction of test bias. To avoid the risks of computerized tests’ misuse, the assessments must be administered by the competent specialists. In this case, the validity of scores interpretation will be increased, and the risk of the potential negative impacts on the clients will be prevented.
References
American Educational Research Association, American Psychological Association, & National Council on Measurement in Education. (1999). Standards for educational and psychological testing. Washington, DC: American Educational Research Association.
American Psychological Association. (2013). Guidelines for the practice of telepsychology. Web.
Krkovic, K., Pasztor-Kovacs, A., Gyongyver, M., & Greiff, S. (2013). New technologies in psychological assessment: The example of computer-based collaborative problem solving assessment. Web.
Loe, S., Kadlubek, R. M., & Marks, W. J. (2007). Administration and scoring errors on the WISC-IV among graduate student examiners. Journal of Psychoeducational Assessment, 25(3), 237-247.
Lohman, D. F. (2005). Review of Naglieri and Ford (2003): Does the Naglieri Nonverbal Ability Test identify equal proportions of highscoring White, Black, and Hispanic students? Gifted Child Quarterly, 49, 19-28. Web.
Mrazik, M., Janzen, T. M., Dombrowski, S. C., Barford, S. W., & Krawchuk, L. L. (2012). Administration and scoring errors of graduate students learning the WISC-IV: Issues and controversies. Canadian Journal of School Psychology, 27(4), 279-290.
Pade, H. (n.d.). The evolution of psychological testing: Embarking on the age of digital assessment. Web.
Summary of the standards for educational and psychological testing. (2003). Web.
Warne, R. T., Yoon, M., & Price, C. J. (2014). Exploring the various interpretations of “test bias”. Cultural Diversity and Ethnic Minority Psychology, 20(4), 570-582. Web.
Warschausky, S., Tubbergen, M., Asbell, S., Kaufman, J., Ayyangar, R., & Donders, J. (2012). Modified test administration using assistive technology: Preliminary psychometric findings. Assessment, 19(4), 472-479.
Zwick, R. (2007). College admission testing. Web.