How to Use Item Response Theory (IRT) for Adaptive Testing?
by Chanan Braunstein, Owner
How to Use Item Response Theory (IRT) for Adaptive Testing
Item response theory (IRT) is a powerful psychometric framework that allows us to design, analyze, and score tests in a more efficient and accurate way. One of the most exciting applications of IRT is computerized adaptive testing (CAT), which is a method of delivering tests that are tailored to each examinee's ability level. In this post, we will explain what IRT and CAT are, how they work, and why they are beneficial for both test takers and test developers.
What is Item Response Theory (IRT)?
Item response theory (IRT) is a family of mathematical models that describe how examinees respond to test items based on their ability and the item characteristics. Unlike classical test theory (CTT), which assumes that all items are equally difficult and informative, IRT recognizes that different items have different levels of difficulty, discrimination, and guessing. These parameters are estimated from the data using statistical methods, and they are independent of the specific sample of examinees or test form.
IRT models can be used to measure various types of abilities, such as cognitive, affective, or psychomotor. Depending on the number of dimensions and the shape of the item response function, there are different types of IRT models, such as unidimensional or multidimensional, dichotomous or polytomous, logistic or normal-ogive. Some of the most common IRT models are the one-parameter logistic (1PL), the two-parameter logistic (2PL), and the three-parameter logistic (3PL) models for dichotomous items, and the graded response model (GRM) and the partial credit model (PCM) for polytomous items.
What is Computerized Adaptive Testing (CAT)?
Computerized adaptive testing (CAT) is a method of delivering tests that are customized to each examinee's ability level. Instead of administering a fixed set of items to all examinees, CAT uses an algorithm that selects the most appropriate items for each examinee based on their previous responses and the item parameters. The algorithm starts with an initial estimate of the examinee's ability, then selects an item that matches that estimate. After the examinee responds to the item, the algorithm updates the ability estimate and selects another item. This process continues until a stopping criterion is met, such as a predefined number of items, a desired level of precision, or a content constraint.
CAT has many advantages over traditional fixed-form testing, such as:
- Efficiency: CAT can reduce the test length and time by selecting only the most informative items for each examinee. This can also reduce fatigue and boredom for the test takers.
- Accuracy: CAT can provide more precise ability estimates by avoiding items that are too easy or too hard for the examinee. This can also reduce measurement error and standard error of measurement.
- Security: CAT can increase the security of the test by using a large pool of items and varying the test form for each examinee. This can also reduce cheating and exposure.
- Flexibility: CAT can accommodate different types of items, such as multiple choice, constructed response, or multimedia. It can also adapt to different testing purposes, such as diagnostic, formative, or summative.
How to Use IRT for CAT?
To use IRT for CAT, we need to follow some steps:
- Item calibration: We need to estimate the item parameters for each item in the pool using IRT models. This requires some form of trialing before items are used in a high-stakes setting⁴.
- Item selection: We need to define a rule for selecting the next item for each examinee based on their current ability estimate and the item parameters. There are different methods for item selection, such as maximum information, maximum posterior weighted information, or minimum expected posterior variance³.
- Ability estimation: We need to define a method for updating the ability estimate for each examinee after each response. There are different methods for ability estimation, such as maximum likelihood, expected a posteriori, or Bayesian modal³.
- Stopping criterion: We need to define a condition for terminating the test for each examinee. There are different methods for stopping criterion, such as fixed length, fixed precision, or content balancing³.
- Scoring: We need to define a method for scoring the examinees based on their final ability estimates and the item parameters. There are different methods for scoring, such as number correct, theta score, or scaled score³.
Conclusion
Item response theory (IRT) and computerized adaptive testing (CAT) are two powerful tools that can improve the quality and efficiency of testing. By using IRT models to describe how examinees respond to items based on their ability and item characteristics, we can design tests that are adaptive to each examinee's level. This can result in shorter tests that are more accurate, secure, and flexible. If you are interested in learning more about IRT and CAT, you can check out some of the resources below: