The validity of a test instrument refers to the ability of a test instrument to measure what it purports to measure. There are three different forms of validity— content, criterion, and construct validity.
Content validity is established based on the extent to which a scale reflects the intended domain of content (Carmines & Zeller, 1991, 20). Criterion validity is established based on the extent to which a scale corresponds to a pre-defined criterion and construct validity is established when a researcher engages in a series of activities that simultaneously defines a construct and develop a scale to measure the developed construct (Kaplan & Saccuzzo, 2005, 149-150).
In order to determine the level of content validity, one has to examine whether the scale is developed appropriately in that he/she has to examine items such as the level of reading necessary to understand the items and respond appropriately. Additionally, one needs to examine concepts such as the level of logical reasoning necessary to interpret the items. The notions of construct underrepresentation and construct-irrelevant variance also prove applicable when one discusses content validity.
Construct underrepresentation refers to cases when the scope of the test instrument proves to be limited in that it does not cover the important aspects of a construct (Kaplan & Saccuzzo, 2005, 137). A construct-irrelevant variance occurs within a test scale when the scores obtained through the use of a scale are influenced by something outside the scope of the construct.
Content validity proves to be very important in all forms of testing but it proves to be critical in educational testing especially since the lion’s share of educational testing is based on knowledge acquired in the course of a given academic endeavor. For instance in GED testing, the content is based on knowledge acquired during the course of education through high school. If a test developer decided to include content from a quantum physics class, this would prove to be outside the scope of GED testing and would adversely affect the content validity of that test.
In counseling, validity can be important in that counselors utilize many scales aimed at assessing constructs such as self-esteem. In examining such constructs, one has to rely heavily on self-reported scales. In addressing the validity of a scale, it is prudent to examine the reviews of the given scales. Such reviews can be found in the Mental Measurements Yearbook and other similar databases. The reviews of the scales represent one of the most accurate sources for information on the given scales. An example of this can be seen when I searched for a scale to measure self-esteem. I found the Abuse Disability Questionnaire (ADQ). The review of this scale listed information on the construct, concurrent, and divergent validity.
Test reliability refers to the precision, consistency, or the ability of a test to produce the same or statistically similar results each time a test is administered. Essentially, reliability is the measure of an error occurring in test administrations. When administering a test that measures a psychological construct there will be some measure of error because, by its very nature, psychological constructs can never be measured accurately. An example of this can be seen in the measure of the effects of an event on an individual. How can we accurately measure this? There is no precise way. We can devise tools that will measure constructs such as self-esteem.
These tools, however, will never be able to measure self-esteem without some error in measurement. Kaplan and Saccuzzo (2004) utilized the example of measuring an object utilizing a rubber ruler to illustrate the construct of reliability in testing. Each time the object is measured with a rubber ruler, it will yield a different measurement because the rubber ruler is flexible and will react differently.
In examining the construct of reliability, one can see that there are four forms of reliability—Inter-rater, test-retest, parallel forms, and internal consistency. Inter-rater reliability is utilized to assess the degree to which raters are able to consistently estimate the same phenomenon. Test-retest reliability is used to establish the consistency of a test measure over time. Parallel-forms reliability is used to establish the consistency of two tests which are constructed in the same way and from the same domain of items. The only difference between the tests in parallel forms reliability is the fact that the two tests contain different items. Internal consistency reliability assesses the consistency of the test results across different items within a test.
In parallel forms reliability, there are two tests consisting of items taken from the same pool of items. One way to assure that reliability is properly assessed using parallel forms reliability is to create a large pool of test questions and randomly select questions to be utilized by the two separate tests. After this is done, a correlation is established between the two tests and the result indicates the error in testing. This error in testing is indicative of the reliability of the tool. This method of establishing reliability can be problematic in that it requires the test developer to create lots of items that measure the same construct and it presupposes that the randomly divided halves are equivalent (Research Methods Knowledge Base, 2006).
In counseling, reliability is vital in that counselors utilize many scales in order to conduct diagnosis and to discern a diagnosis. In so doing, the scales utilized by counselors have to produce the same or similar results each time they are utilized. The implications of unreliable tests are such that they are rendered useless if they cannot be depended upon to produce consistent results both over time and when used once. Overall reliability and validity speak to the usability of a scale.
References:
Carmines, E.G. & Zeller, R. A. (1991). Reliability and viability assessment. Thousand Oaks, CA: Sage.
Kaplan, R.M. & Saccuzzo, D.P. (2005). Psychological testing: principles, applications, and issues (6th edition). Belmont, CA. Thompson Wadsworth.
Research Methods Knowledge Base. (2006). Types of reliability. Web.