Comparing Methods for Item Analysis Research Paper

Exclusively available on Available only on IvyPanda® • No AI

Table of Contents

Introduction
Arguments for the Issue
Arguments against the Issue
Conclusion
Reference List

Introduction

Basically, item analysis refers to methods utilized in examining the specific characteristics of test items utilized in psychological testing procedures (though it is similarly applicable other types of testing procedures in various fields as well). It enables researchers to analyze the test items, evaluate their implementation, eliminate items which have been deemed as “poor performers” and helps to ensure that only items which have “high to moderate performance” are included within the testing procedures.

Lastly, this method of analysis helps to appraise individual test takers from their selection of items when taking the test. It is normally the case that during formal test developments item analysis actually precedes the creation of the final test however in less formal settings item analysis is implemented after the testing procedure in order to indicate the overall quality of the test that was administered.

Studies such as those by Berger, Hanschmann, Reese, Koukouraki, Wandel, & Bacher (2011) which have compared and contrasted a variety of present day psychological testing procedures explains that it is neither the length of the test nor the inherent complexity of the items presented, rather, it is the veracity and accuracy of the items placed that makes a test effective or not (Berger, Hanschmann, Reese, Koukouraki, Wandel, & Bacher, 2011).

Thus, from the perspective of Berger, Hanschmann, Reese, Koukouraki, Wandel, & Bacher (2011) item analysis plays a crucial role in psychological testing procedures since it enables tests to be more efficient and valid rather than lengthy yet producing undesirable results. Another perspective on this matter comes from Scroggins, Thomas, & Morris (2009) who explains that it is often the case that all people have unique behaviors, perceptions and methods of evaluation (Scroggins, Thomas, & Morris, 2009).

As such, generalized test results are often not the best way of evaluation due to the potential that the way in which certain questions are perceived may differ on an individual basis resulting in inaccurate results (Scroggins, Thomas, & Morris, 2009).

For Berger, Hanschmann, Reese, Koukouraki, Wandel, & Bacher (2011), item analysis helps to “iron out the problems” so to speak when it comes to dealing with particular testing groups to the extent that the questions utilized within the test can be implemented in such a way that they are more in line with the method of testing and evaluation that should be implemented for a distinct population set (Berger, Hanschmann, Reese, Koukouraki, Wandel, & Bacher, 2011).

On the other hand it must be noted that researchers such as Wu, King, Witkiewitz, Racz, & McMahon (2012) point out that item testing can actually reduce the ability of a test to identify subtle nuances in an individual’s personality and behavioral processes (Wu, King, Witkiewitz, Racz, & McMahon, 2012).

It is based on this that this paper will examine the pros and cons of item analysis. My assumption is that, item analysis is an effective method of helping to increase the reliability and validity of a test but does so at the cost of reducing other aspects of data that could have been derived from the result of the test.

Arguments for the Issue

Based on the work of You, Leung, Lai, & Fu (2011) it was noted that item analysis helps to improve the validity of testing results by identifying items that are either unfair or biased towards the results of the study (You, Leung, Lai, & Fu, 2011).

You, Leung, Lai, & Fu (2011) points out that while at times it is not the intent of the researcher to place leading, biased or otherwise intentionally manipulated questions that are meant to illicit a desired response (though it sometimes does occur) such situations do occur during the initial stages of the testing procedure (You, Leung, Lai, & Fu, 2011).

As a result, this compromises the validity of the examination since to create a “leading question” creates an answer that may not necessarily reflect the true intent of the subject being examining. It is based on this that through item analysis such situations can be avoided resulting in a study that can be considered a more reliable approximation of a subject’s psychological makeup.

In line with the concept of question bias, is the question of whether or not a test continues to remain valid despite it being shortened as a direct result of item analysis. Miller, Watkins, & Webb (2009) attempt to answer this question by pointing out that while a broad spectrum of questions enables researchers to more succinctly evaluate the results of individual studies, it would be fallacious to assume that the more questions there are the more accurate the results would be as a result (Miller, Watkins, & Webb, 2009).

Rather, based on the work of Miller, Watkins, & Webb (2009) it is more apt to assume that a test would be more reliable and valid if the questions utilized within the test are geared more towards the intended population set rather than create a broad spectrum analysis utilizing an overly long generalized examination.

The inherent problem with the utilization of such an examination, in the words of Ma, Tan & Ma (2012), is that it creates a greater likelihood for the subjects involved in the study to score poorly due to potential encounters with questions that they do not possess the knowledge of or cannot answer properly (Ma, Tan & Ma, 2012).

Evidence of this was noted in the Cox, Fernandez, Chambers, Bandstra, & Parker (2011) study which examined the utilization of questions in grade school psychological evaluations that pertained to issues of a sexual nature that the students at the time had no prior knowledge to (Cox, Fernandez, Chambers, Bandstra, & Parker, 2011).

The end result of such a testing procedure created results that many psychologists at the present would consider invalid given that the testers would more likely guess at the answers rather than give results that were at all accurate (Cox, Fernandez, Chambers, Bandstra, & Parker, 2011).

It is based on this that studies such as those by Jones (2011) indicate that shortened forms of tests utilizing item analysis are actually better, more reliable and thus more valid than long form/ generalized tests that neglect to take into consideration the use and implementation of item analysis as a method of improving current testing methods Jones (2011).

Arguments against the Issue

This paper has so far elaborated on arguments which indicate that item testing actually increases a test’s reliability and viability even when the test is shorted, the follow section will now delve into aspects related to the inherent problems in the item testing procedures. First and foremost, it must be noted that in any method of analysis the discrimination index plays a vital role in helping researchers examine various aspects of an individual’s personality (Berger, Hanschmann, Reese, Koukouraki, Wandel, & Bacher, 2011).

In this particular facet of the testing procedure, it is not only the end result that is evaluated, rather, what is also evaluated is how well a certain percentage of people did on a particular item and then subsequently scoring and then separating such individuals into upper and lower level percentiles (Berger, Hanschmann, Reese, Koukouraki, Wandel, & Bacher, 2011).

By calculating the percentile by which particular groups are able to answer certain aspects of the test, this enables researchers to in effect determine subtle differences in perceptions, psychological attributes as well as a variety of other snippets of data that enables the creation of a broader understanding of how a particular group thinks (Carless, 2009). Unfortunately, as indicated by Wei & Williams (2009) such aspects of the research process in effect disappear as a direct result of item analysis (Wei & Williams, 2009).

This reason this occurs is due to the fact that as the testing material is “edited” and reduced from its original form this in effect reduces the likelihood of ascertaining different responses from certain groups regarding particular sections of the text. It must also be noted that studies such as those by Wu, King, Witkiewitz, Racz, & McMahon (2012) have indicated that while item analysis is an effective method of helping to increase testing reliability and validity it should not be thought of that all tests that are lengthy should in effect be reduced by item analysis (Wu, King, Witkiewitz, Racz, & McMahon, 2012).

Wu, King , Witkiewitz, Racz, & McMahon (2012) points out that due to the plethora of different forms of behavior, perception and personality out there, a lengthy psychological actually enables researchers to gain a better understanding of the individual. While item analysis can lead to the same thing it does so at a considerably reduced rate depending on the type of test involved.

Not only that, Cox, Fernandez, Chambers, Bandstra, & Parker (2011) explains that item analysis should not be considered a method for increasing a test’s validity when the basis of the test itself is highly erroneous and invalid (Cox, Fernandez, Chambers, Bandstra, & Parker, 2011).

Scroggins, Thomas, & Morris (2009) points out that in the mid 1900s the concept of homosexuality was in effect considered to be a mental disorder which shows that within the realm of psychology there are still instances where a mistaken hypothesis can still be made (Scroggins, Thomas, & Morris, 2009).

Scroggins, Thomas, & Morris (2009) also points out that several tests which employed item analysis gave results that were in conflict with other testing procedures. It is based on this that Wei & Williams (2009) recommends that item analysis be utilized as a method of ensuring the consistency and validity of responses rather than a method which helps to make a study “more valid” by way of shortening tests Wei & Williams (2009).

Conclusion

Based on the results of this examination it can be stated that item analysis is an effective method of helping to increase the reliability and validity of a test but does so at the cost of reducing other aspects of data that could have been derived from the result of the test. As such, it is recommended that item analysis should be utilized in testing procedures that attempt to gleam as much information as possible from a wide group of people but rather should be isolated to examination of individuals or small groups.

Reference List

Berger, R. R., Hanschmann, H. H., Reese, J. P., Koukouraki, E. E., Wandel, R. R., & Bacher, R. R. (2011). Normative data collection of the Marburg Concentration Test for Pre-school Children (German: MKVK). Child: Care, Health & Development, 37(1), 129-134.

Carless, S. A. (2009). Psychological testing for selection purposes: a guide to evidence-based practice for human resource professionals. International Journal Of Human Resource Management, 20(12), 2517-2532.

Cox, K., Fernandez, C. V., Chambers, C. T., Bandstra, N. F., & Parker, J. A. (2011).

Impact on Parents of Receiving Individualized Feedback of Psychological Testing Conducted with Children as Part of a Research Study. Accountability In Research: Policies & Quality Assurance, 18(5), 342-356.

Jones, A. T. (2011). Comparing Methods for Item Analysis: The Impact of Different Item-Selection Statistics on Test Difficulty. Applied Psychological Measurement, 35(7), 566-571.

Ma, S., Tan, Y., & Ma, S. (2012). Testing a Structural Model of Psychological Well-Being, Leisure Negotiation, and Leisure Participation with Taiwanese College Students. Leisure Sciences, 34(1), 55-71.

Miller, H. A., Watkins, R. J., & Webb, D. (2009). The use of psychological testing to evaluate law enforcement leadership competencies and development. Police Practice & Research, 10(1), 49-60.

Scroggins, W. A., Thomas, S. L., & Morris, J. A. (2009). Psychological Testing in Personnel Selection, Part III: The Resurgence of Personality Testing. Public Personnel Management, 38(1), 67-77.

Wei, H., & Williams, J. (2009). Instrumental or Emotional Aggression: Testing Models of Bullying, Victimization, and Psychological Maladjustment among Taiwanese Seventh-Graders. Social Work Research, 33(4), 231-242.

Wu, J., King, K. M., Witkiewitz, K., Racz, S., & McMahon, R. J. (2012). Item Analysis and Differential Item Functioning of a Brief Conduct Problem Screen. Psychological Assessment, 24(2), 444-454.

You, J., Leung, F., Lai, C., & Fu, K. (2011). An Item Response Theory Analysis of the Impulsive Behaviors Checklist for Adolescents. Assessment, 18(4), 464-475.