The growing use of computer technology has increased an interest in adaptive testing when it comes to tutoring and testing systems. Computer adaptive testing (CAT) is a form of educational measurement that has been designed to adapt the examination of proficiency in testing activities. It is a computer based test that adapts to the examinee’s ability level which makes it more of a tailor made test.
We will write a custom Research Paper on Computer Adaptive Testing specifically for you
301 certified writers online
Computer adaptive testing is slowly replacing the traditional methods of testing commonly referred to as the paper and pencil tests. The increasing developments and innovations in computer technology have began to render the paper and pencil mode of testing obsolete as more educational institutions and training facilities incorporate the use of computers in their examination testing activities (Setchi, 2010).
Computer adaptive testing has been seen to be a theoretically sound and efficient method of testing large scale programs. It has also become more feasible in educational research and practice. The recent developments in e-learning technology have enabled educational institutions to start incorporating online instructions and online testing to their examination programs.
The continued computer and technology developments will move paper and pencil testing towards a more computer based testing scenario. Computer adaptive testing is mostly carried out in large educational institutions, or in the certified and licensed examination centers (Tao et al, 2008).
History of Computer Adaptive Testing
The concept of adaptive testing can be traced back to over a century ago. The first adaptive test developed in the early 1900s was considered to be the individually administered Binet-Simon intelligence test that was developed by Alfred Binet. This test involved administering various subtests based on the examinee’s current ability level.
If the examinee passed all of the subtests in the Binet-Simon intelligence test then a higher ability of subtests was administered to the examinee. If the examinee failed all of the subtests at any given level of ability, then the test was terminated.
This made the Binet test to be referred to as an adaptive test because of the different subtests that measured an individual’s ability and knowledge level. After the Binet test, two more testing methods were developed in the 1940s. These were the staircase method and the sequential analysis system (Ayala, 2009).
The staircase method of testing involved the adjustment of ability levels to meet those of the examinee. This method was mostly used by psychophysicists and experimental psychologists in examining their patients. The sequential analysis system was similar to the Binet test where examinees were given tests based on the current level of ability.
In 1951, Hick designed an adaptive testing approach that would be used as a foundation for adaptive testing today.
He developed an adaptive test based on the concept that an intelligence test should be a branch process that has questions with a 0.5 chance of being answered correctly. Patterson further developed Hick’s adaptive test in 1960 by taking a pool of questions and arranging then in a way that the examinee would receive a harder question if they got the previous one correct (Ayala, 2009).
1970 saw Lord developing a testing theory that would create a tailored testing program. The tailored test as Lord described it would be used to attain a better measurement of the examinees ability levels by selecting and administering the relevant questions and testing criteria.
However many critics disputed his test as not being viable and feasible because it produced poor results. This later became to be an ironic twist as his work came to mark the beginning of tailored testing procedures and item response theory testing (Ayala, 2009).
Methodology used in Computer Adaptive Testing
Computer adaptive tests are assessment tools that have a theoretical background with the most common underlying psychometric theory being that of item response theory (IRT). The CAT theory of testing does not only involve the use of paper and pencil test questions but also a wide range of exercises that test the examinee’s ability levels.
The CAT tests mostly use dichotomous item response theory models where the examinee’s answers to a question are evaluated to be either correct or incorrect. The process of computer adaptive testing involves administering questions known as items to the examinee one at a time.
Get your first paper with 15% OFF
The presentation of each item or question and the decision to finish the test are dynamically adopted options that are based on the examinee’s responses and response rate (Lester & Paraguacu, 2004).
The computer adaptive test basically selects questions that are based on the information given by the examinee in the previous question so as to maximize the precision of the exam (Thissen & Mislevy, 2000). The CAT usually applies an iterative algorithm that first estimates the student’s knowledge level.
This iterative algorithm usually involves examining all the items that have not yet been administered to the examinee so as to determine the next best question to ask the examinee. This step is usually based on the previous item estimations of the examinee’s knowledge and ability levels. The next step of the CAT will involve presenting the chosen item/question to the examinee who then answers it correctly or incorrectly.
The third step will involve updating the ability estimate of the examinee based on all the previous answers to the items or questions. The fourth step involves a repetition of steps one to three until the termination criterion is met adequately (Lester & Paraguacu, 2004).
The number of questions that are usually contained in a computer adaptive test are not fixed or similar to tests that have been presented to the other examinees. Also in the computer adaptive test, nothing is usually known about the examiner of the test prior to the administration of the first item. The item algorithm is usually started by selecting an item that is of an easy, medium or difficult nature compared to the previous item.
The adaptive nature of the CAT ensures that different examinees receive different tests. The psychometric technology used in administering the test allows for the equitable computation of scores based on the item response theory (Green, 2000).
One of the major elements that are used in developing the computer adaptive test is the item response model also known as the IRT model which determines how the test taker will respond to a question through the use of their current knowledge and ability levels.
The next element used in developing a CAT is an item pool which is a certain amount of questions that have been calibrated to measure the different knowledge levels of the examinee. If the item pool is well developed and calibrated, then the computer adaptive test will be successful in measuring an individual’s knowledge ability.
Item selection is the other component that is used in developing the computer adaptive test where the psychometrics of the test chooses the next item to be administered to the examinee. This selection is usually based on the current estimation of the individual’s knowledge ability during the test.
The termination criterion is the basis used to decide when the test will be completed. This is mostly based on the type of test that has been administered and its purpose to the examinee (Lester & Paraguacu, 2004).
The basic purpose of the item response theory in adaptive testing is to measure the response of the examinee during and after the test. The IRT models the relationship between an examinee’s ability level on the trait being measured by the item and the examinee’s response to the item.
It uses estimating scores to predict the items and the performance of the examinee in the test. The IRT model mathematically describes the relationship that exists between a person’s trait level and the performance of the examinee in the item (Tsang, 2010).
There are three item response models that are used in the item response theory. They include the one parameter IRT model, the two parameter model and the three parameter IRT models. The one parameter model which is also known as the Rasch model presents the relationship that exists between an examinee and the difficulty of the items or questions in the test.
The two parameter model which is known as the sequential probability ratio test (SPRT) model involves selecting items in the test in a random manner. The sequential probability ratio is usually calculated according to the item response of the examinee (Tsang, 2010).
The use of IRT in adaptive testing is based on two major principles which are the measurement of the student’s performance as an unknown numeric value that is explained by their knowledge level and the performance of a student which can be explained by an estimated knowledge level that answers an item i which can be predicted through the use of probabilistic theories and modeled by means of a function known as the item characteristic curve (ICC).
This curve determines the probability of a student with a certain level of knowledge, θ, being able to answer an item correctly. Each item or question must be able to define an item characteristic curve based on the previous answer given by the examinee (Lester & Paraguacu, 2004).
The three parameters that are usually used to determine the shape of the ICC curve are the discrimination factor (a), the difficulty factor (b), and the guessing factor. The discrimination factor which is represented by the slope of the curve is usually used to determine the success of the examinee’s responses to the item presented to them.
If the item is difficult, then their success rate will be low and if the item is easy, the success rate will be high. The difficulty factor of the ICC curve is used to determine the examinee’s knowledge levels in terms of answering an item correctly or incorrectly.
The guessing factor of the ICC curve is the probability that a student with no knowledge at all will answer an item correctly through the random selection of a response (Lester and Paraguacu, 2004).
Advantages and Disadvantages of Computer Adaptive Testing
The advantages of using computer adaptive testing are that they provide uniform scores for most of the examinees or test takers, the number of questions that are usually presented to the examinee are usually minimized by 50 percent while still ensuring there is a high level of precision. This minimization ensures that the examinee has enough time to finish the test.
The computer adaptive test also ensures that the test taker does not waste time answering difficult or easy questions. The minimization of the test time also ensures that the institution administering the exam saves on test costs and seat time.
The computer adaptive tests are more preferable when compared to the traditional testing methods as they show the examinee their results immediately after the test is taken. CAT reduces the examinee’s exposure to some items or questions as different examinees usually receive different sets of items or questions (Thissen & Mislevy, 2000).
Other advantages of computer adaptive tests are that the selection of items is usually tailored according to the individual examinee. This ensures that irrelevant questions are omitted from the test thereby enhancing the responder’s compliance. The CAT test also reduces the floor and ceiling effect and it allows the examinee to specify the desired degree of precision to be used in the test.
The examiner can be able to identify the individuals that make inconsistent response patterns during the test. Items and groups of respondents who have a similarity in their responses can also be identified through the use of computer adaptive testing (Fayers & Machin, 2007).
The disadvantages of CAT are that comprehensive item pools have to be developed and tested before the test is designed. The calibration of the item pool is usually a major challenge when developing a CAT because all the items of the test have to be pre-administered to a sizeable sample after which they are analyzed. To achieve this process, the examinee’s responses are recorded but they do not contribute to the examinee’s total score.
This process is referred to as pre-testing or pilot testing and it usually presents logical and ethical challenges. All the items of the test have to be pre tested with a large sample of 1,000 examinees or more so as to obtain a stable item. Another disadvantage of the computer adaptive test is that the item algorithm usually has exposure control that is conditioned on the examinee’s knowledge ability.
This exposure control algorithm is not usually controlled making it possible for some items in the test to be similar for more than two examinees that have the same knowledge ability. This is a serious problem as groups sharing the same items might as well have a similar functional ability level (Thissen & Mislevy, 2000).
Another disadvantage of the computer adaptive test is that the examinee cannot be able to review past items or a response to previous questions as the CAT does not allow it. This means that they cannot be able to review their response which is usually possible with the paper and pencil test.
Another disadvantage of adaptive testing is that if the examinee answers one item incorrectly the next item will be easier than the previous one.
This creates a situation where an examinee can be able to detect the incorrect answers and correct them. Once the questions become easier the examinee reviews the items that had an incorrect answer, assigning them a correct answer which allows them to achieve a very high score (Wainer & Mislevy, 2000).
Comparison between CAT and Paper and Pencil Testing
When a computerized adaptive test is developed as an equivalent to a paper and pencil test, the construct validity of the CAT has to be based primarily on the comparison that exists between the paper and pencil test and the computer adaptive test. The items that are usually developed for a CAT test basically measure the same constructs that are contained in the paper and pencil test items.
CAT, paper and pencil tests are usually highly correlated and these correlations should be similar when corrected for measuring errors. The covariance structures that exist between paper and pencil tests and computer adaptive tests are usually used to demonstrate the similarity that exists between the two types of tests. Another comparison of the two tests is that some ability measures are usually statistically equivalent.
These measures include the sequence of the items, the composition of the items, the kind of information contained in the item and the level of ability being measured by the test (Gaskill & Marshall, 2007).
Another comparison that exists between the paper, pencil test and the computer adaptive test is that the CAT is usually constructed to measure the same traits or ability levels that the paper and pencil test seeks to measure.
The paper pencil tests and the computer adaptive tests usually highlight the important parts of a question where the paper pencil tests underline the important phrase of the item while the CAT highlights the phrase and relevant section of the item.
Another comparison is the constructs that are measured by the paper and pencil and CAT are usually similar. The paper and pencil item parameters are usually used as estimates for developing the item pools that are used in the computer adaptive test (Kolen & Brennan, 2004).
Another comparison that exists between the paper-pencil and computer adaptive tests is that both of these tests are used to measure the performance of the examinee. The mode and paradigm effects that are used to test the examinee are similar for both tests. Wang and Shin (2010) have identified three criteria that can be used in evaluating the comparability between the CAT and the paper, pencil tests.
These criteria are validity, psychometric criteria and statistical assumptions or test administration. The validity criterion is mostly used to assess whether the constructs measured by both tests are similar in nature.
The psychometric criterion evaluates the comparability between the paper pencil test and the CAT by analyzing properties such as the reliability of the test, the measurement of errors and the distribution of the examinee’s scores. Statistical assumption or test administration is similar for both tests as their main purpose is to measure the examinee’s knowledge level and ability (Wang & Shin, 2010).
Armed Services Vocational Aptitude Battery (ASVAB) and CAT-ASVAB
The Armed Services Vocational Aptitude Battery (ASVAB) test is an aptitude test that is used by many Armed Forces around the world in their recruitment and selection exercises. The ASVAB is made up of nine subtests that are used to measure the aptitude levels and knowledge abilities of armed forces recruits.
The nine subtests usually examine various subjects that range from mathematical applications to general science topics or vocabulary related topics. The results of the ASVAB usually determine whether an individual qualifies to join the Armed Forces or not.
The purpose of the ASVAB is not to test an individual’s intellectual capacity but to determine the ability of an individual and how this ability can be used to train the individual for a specific job in the armed forced (Kaplan, 2009).
ASVAB tests were originally designed during World War II as a tool of selecting armed forces recruits to join the military. The ASVAB tests were designed to provide a general means of measuring a person’s intellectual ability as well as their aptitude levels. These levels and abilities were used to determine which branch of the military the person would be placed in.
The first form of the ASVAB was the Armed Forces Qualification Test (AFQT) which was developed in 1948 to be used as a standard screening test. The AFQT test underwent various changes over the years so that it could become more accurate when screening individuals.
The changes were meant to gear the tests to provide accurate scores that would be used in measuring the success of military training programs. The results of the tests were used to help the recruits to select which branches of the military they wanted to serve in. The ASVAB was basically developed as a result of these changes to the AFQT test (Grayson, 2004).
The very first ASVAB test was administered in 1968 but it was not used for recruitment or selection purposes. It was however used in 1976 to recruit individuals to join the armed forces. The basic structure of the ASVAB remained the same for several years until it underwent some changes in 1980 that saw the items of the test undergoing changes to ensure they were up to date with technological innovations.
The computerized version of the ASVAB test was developed in 1993 and it became operationalized in 1996. The latest revision to the ASVAB test was performed in 2002 where two subtests were removed from the ASVAB creating room for the addition of one ASVAB subtest.
The two subtests that were removed included the numerical operations and the coding speed which were replaced with the assembling objects subtest. The items/questions in the ASVAB were also updated to ensure they were up to date with the technological advancements in the environment (Wiley, 2010).
The ASVAB has been viewed to be one of the most commonly used ability tests especially when it comes to measuring an individual’s capabilities. The ASVAB test mostly has its background from the mental testing movement as it represents the state of art that exists in multiple aptitude test batteries. The first large scale adaptive test to be developed in the late 1980s was the ASVAB.
Adaptive testing during this time was still at its infancy and had not become as common as it is now. The large scale nature of the ASVAB was designed to select two million people to join the armed forces in one year. This made it to be the largest single testing program in the world.
The ASVAB was later developed into an adaptive test by incorporating the computer based testing system into its structure. This allowed the test to save on time and money that was spent in examining the two million people in one year (Wainer et al, 2010).
There are various versions of ASVAB tests that have been developed for testing purposes and they include the institutional version, the production version, the computer adaptive screening test (CAST) and the armed forces classification test (AFCT). The institutional version of the ASVAB is a test that is mostly used by the military in high schools and institutions of higher learning.
This type of test is usually administered in collaboration with the Department of Defense and the Department of Education in the United States. The purpose of this test is to provide school counselors with the necessary information that will be used in recommending career options to high school students.
The production version of the ASVAB test is usually used in selecting and recruiting individuals into the armed forces. It also identifies which military jobs an individual is qualified for based on their test scores (Wiley, 2010).
The production version of the ASVAB test is usually administered in two forms which are the paper form and the computerized test. The computerized version of the ASVAB test is referred to as the CAT-ASVAB test and is commonly used in processing large numbers of army recruits. The computer adaptive screening test (CAST) is mostly a recruiting test that is used to screen the various applicants or individuals who have applied to work in the armed forces.
The purpose of the CAST test is to reduce the number of recruits to a more manageable number. This is accomplished by administering small mini tests that are used to determine which individuals will progress to take the main ASVAB test. The armed forces classification test (AFCT) is mostly given to those military personnel who want to be retrained for a different job within the armed forces.
The contents that make up an ASVAB subtest include word knowledge, arithmetic reasoning, general science, paragraph comprehension, mathematical knowledge, electronical information, auto and shop information, assembling objects and mechanical comprehension (Wiley, 2010).
The computerized version of the ASVAB test is known as the computer adaptive test- ASVAB. The CAT-ASVAB test has similar features and similarities to the paper and pencil test as it also measures the same aptitude levels of the examinee. The test usually displays the questions and scores of the answers.
The computerized testing version of the ASVAB is one of the most thoroughly researched tests for examining an individual’s proficient capability in recent times. The CAT-ASVAB was the first large scale adaptive battery test to ever be administered in the recruitment and selection of armed forces recruits since 1976 (Wall & Wall, 2010).
The paper and pencil ASVAB test was the most commonly used recruitment tool before the CAT-ASVAB was developed. Before it was entirely launched into the market, the CAT-ASVAB test underwent some experimental designs to collect important data that would be used in examining the adequacy of the adaptive testing algorithms.
The purpose of the experimental design was to develop a full battery CAT version of the ASVAB that would measure the same dimensions as the paper and pencil ASVAB test. The items that were needed in the experimental design were the item pools, psychometric developments (item selection, and scoring) and the system of delivery.
The experimental CAT-ASVAB system was used in a large scale study between 1982 and 1984 to ascertain whether the CAT-ASVAB could replace the paper and pencil test. This exercise involved testing 7,518 recruits who were scheduled for a training exercise in 23 military training schools. They were tested by the paper and pencil ASVAB prior to recruitment and the experimental CAT-ASVAB before taking any basic training.
The results of the study showed that predictive validity and equivalent constructs could be obtained through the use of the CAT-ASVAB. These results were later used to fully implement the CAT-ASVAB in 1984 (Segall & Moreno, 2010).
The CAT-ASVAB is similar to the computerized adaptive testing in that it contains fewer questions when compared to the paper, pencil test which makes it easier to take and faster to complete. The items in the computerized adaptive ASVAB test are mostly tailored to an individual’s knowledge ability level ensuring that the test does not administer simple or difficult items.
The items selected by the individual are usually based on whether they get the previous question right. The subtests that make up the CAT-ASVAB include general science, word knowledge, arithmetic reasoning, comprehension in paragraphs, electronic information, mathematical knowledge, auto and shop information, mechanical comprehension and the assembling of various objects (Wall & Wall, 2010)
The advantages of using the CAT-ASVAB over the paper and pencil ASVAB test is that the computerized test takes a shorter time to complete than the paper and pencil test. The test scores are usually released immediately and the computerized computation of the results minimizes the chances of errors. The CAT-ASVAB test can also be taken without any prior scheduling as is usually required for the paper and pencil test.
The main disadvantage of the CAT-ASVAB test that is similar to the computer adaptive test is that it does not allow the examinee to review the previously answered questions which is usually possible in the paper and pencil test. It also does not allow the examinee to skip difficult questions and move on to other questions in the test as it requires the examinee to complete the current item so as to move to the next.
Applications of CAT Tests
Because of the adaptive nature of the CAT tests, these examination approaches have been used in the fields of education, organizational licensing and certification. There are an estimated 30 operational computerized adaptive testing programs in use around the world.
These programs are usually used to evaluate four to six million people every year who want to attain educational credentials or receive some form of licensing or certification. The number of CAT programs in use is increasing every year which can be attributed to the changes in technology (Fetzer et al, 2008).
The major CAT programs that are in use around the world include the armed services vocational aptitude battery (ASVAB) test which is used in the recruitment and selection of armed service personnel, the graduate management admission test (GMAT-CAT) which is administered to individuals wishing to join a masters program more specifically the Master in Business graduate program, the Microsoft certified professional exams that provides certification to information technology professionals and the American institute of certified public accountants exam (AICPA) that provides certification to public accountants, audit and tax professionals.
CAT has also been used in fingerprint imaging and web based testing (Fetzer et al, 2008).
Effects of Test Anxiety in CAT and Paper and Pencil Testing
The increased use of online testing methods such as the computerized adaptive testing has raised some serious concerns as to whether test anxieties can be properly managed before taking the test. While the CAT tests have proved to be more effective and efficient than the paper and pencil tests, concerns have been raised on whether CAT tests consider the cognitive and psychology demands of the test taker.
Psychological and cognitive problems usually arise when the examinee has no control over the pace of the test. This is mostly so in the CAT test where the examinee lacks the ability to go back to previously answered questions. The CAT test also makes it difficult for the examinee to move on to the next question until they have answered the current question.
This increases their anxiety levels increasing the possibility of them answering a question incorrectly. When compared to the paper and pencil tests, the effects of anxiety in the CAT tests are much higher.
Computer adaptive tests have become the most common form of testing for educational or professional purposes. This form of testing will continue to be in use given the increasing technological innovations and the need to produce efficient, effective and reliable test scores.
The traditional method of testing which is the paper and pencil test will soon be obsolete as more training institutions embrace the CAT and CAT-ASVAB testing methods in their recruitment or certification programs. The traditional methods will also be obsolete when it comes to examining large numbers of people in educational institutions or in army recruitment exercises.
Ayala, R.J., (2009). The theory and practice of item response theory. New York: Guilford Press.
Fayers, P.M., & Machin, D., (2007). Quality of life: the assessment, analysis and interpretation of patient reported outcomes. England, UK: John Wiley and Sons Limited.
Fetzer, M., Dainis, A., Lambert, S., & Meade, A., (2008). Computer adaptive testing (CAT) in an employment context. Roswell, US: Previsor.
Gaskill, J., & Marshall, M., (2007). Comparisons between paper and computer based tests, British Colombia. Canada: Society for the Advancement of Excellence in Education.
Grayson, F.N., (2004). Officer candidate tests. New Jersey: Wiley Publishing Inc.
Green, B.F. (2000). System design and operation. In Wainer, H. (Ed.) Computerized Adaptive Testing: A Primer. Mahwah, New Jersey: Lawrence Erlbaum Associates.
Kaplan (2009), Kaplan ASVAB: the armed services vocational aptitude battery. New York: Kaplan Publishing.
Kolen, M.J., & Brennan, R.L., (2004). Test equating, scaling and linking: methods and Practices. Netherlands: Springer Science.
Lester, J.C., & Paraguacu, F., (2004) Intelligent tutoring systems. Berlin, Germany: Springer Verlag Heidelberg.
Segall, D.O., & Moreno, K.E., (2010). Development of the CAT-ASVAB. Web.
Setchi, R., (2010). Knowledge-based and intelligent information and engineering Systems. Berlin, Germany: Springer Verlag Heidelberg.
Tao, Y.H., Wu, Y.L., & Chang, H.Y., (2008). A Practical Computer Adaptive Testing Model for Small-Scale Scenarios. Educational Technology & Society, Vol.11, No. 3, pp.259–274.
Thissen, D., & Mislevy, R.J. (2000). Testing Algorithms. In Wainer, H. (Ed.) Computerized Adaptive Testing: A Primer. Mahwah, New Jersey: Lawrence Erlbaum Associates.
Tsang, P., (2010). Hybrid learning. Berlin, Germany: Springer Verlag Heidelberg Wainer, H., & Mislevy, R.J. (2000). Item response theory, calibration, and estimation. In Wainer, H. (Ed.) Computerized Adaptive Testing: A Primer. Mahwah, NJ: Lawrence Erlbaum Associates.
Wainer, H., Bradlow, E.T., & Wang, X., (2010). Testlet response theory and its Applications. Cambridge, UK: Cambridge University Press.
Wall, J. & Wall, E. (2010). McGraw-Hill’s ASVAB basic training for the AFQT. New York: McGraw Hill Companies Inc.
Wang, H., & Shin, C.D., (2010). Comparability of computerized adaptive and paper- pencil tests. Test, Measurement and Research Services Bulletin, Issue.13, pp 1- 7.
Wiley (2010). The ASVAB in a nutshell. Web.