From the Item Bank
The Professional Testing Blog
Managing Low-Volume Examination Programs: Where Theory Meets RealityMarch 15, 2017 | | Leave a comment
By Christine Depascale, MS, and Joy Matthews-López, PhD
Low-volume examination programs face challenges that don’t necessarily apply to mid- to large-volume programs. These challenges can be especially prominent when trying to produce interpretable quantitative outcome measures.
How low of a volume constitutes a low-volume program?
What is considered a low-volume program—less than 500 test-takers; 100; 25? Unfortunately, there is no definitive answer. It depends on how often the examination is administered (in a single administration, via windows, or continuously). Offering a single, annual administration to 100 examinees is different than continuously testing 100 examinees over a year. Obviously, having more than 100 examinees would be preferable to having fewer than 100, but being able to process a single cohort of 100 examinees and to conduct quantitative analyses on their response data is doable, whereas producing monthly or quarterly item analyses on 100 examinees across a year of testing is sub-optimal, at best.
What are the challenges?
Nearly every aspect of testing is negatively affected when a program experiences a low number of examinees, from SME recruitment, item development, and form assembly to examination delivery, test security and score reporting. These challenges center on too few available SMEs for defensible test development meetings and too few candidates for stable item statistics.
For example, when considering item difficulty, if the statistic is based on only 10 candidates, the impact from one individual candidate getting an item either correct or incorrect affects the item statistic by 0.10. In this example, this magnitude of difference can make the difference between a 0.60 vs. 0.70 p-value. However, when item difficulty is based on 30 candidates, the individual impact from the one candidate is considerably less (.03, which is a difference between a 0.60 and a 0.63 p-value).
Tests that consist of items that are produced by small committees of SMEs, coupled by items seen by small cohorts of examinees, can impact fairness and ultimately call into question the interpretability of tests scores and the decisions based on those scores. Unless managed properly, the plethora of problems faced by low-volume programs can undermine the very essence of the exam program.
Managing the challenges facing low-volume programs
So what can be done to mitigate the problems faced by low-volume programs? Is there a way to successfully navigate the problems and manage the challenges? The answer is yes, through team work and stellar communication between the test sponsor and the test developer (and psychometric staff).
Where candidate volumes are low and traditional analyses should not be performed, hybrid methods can be used, such as:
Low volume programs may also experience constraints by the number of available subject-matter-experts to participate in job-analysis studies, passing score studies, item writing and review meetings. The response rate to validation surveys may also be constrained. A close working relationship between the test sponsor and the test developer (and psychometric staff) is vital to the success of managing the challenges typical to a low-volume program. Test sponsors need information about what is and what is not defensible. They need to be on the front lines of the test development process so that when tough decisions need to be made, the best possible information is available for decision makers to proceed with eyes wide open and a solid rationale.
We recently heard of a situation in a low-volume program where test developers had too few “good” items to assemble an exam form. Their choice was between including items on the form that had poor statistics, or to intentionally fail to meet the exam’s blueprint. Which is more egregious: to include poorly performing items that could affect the reliability of the test form (and scores) or to fail to adhere to the mandated blueprint (which is a content validity issue)? Clearly, neither option is desirable, but a decision had to be made.
In this case, our recommendation would have been to have the test developer clearly outline to the test sponsor the problem at hand, the available options, and the risks (type and magnitude) associated with those options. In the end, the test sponsor needed to have enough information to make an informed decision and then to own that decision. By finding and implementing innovative solutions to problems such as using hybrid models (quantitative supplemented by qualitative information), using a team approach, and clearly communicating options, problems can be addressed and challenges can be successfully managed. Sometimes certification programs have to work with what they have, and for some programs, candidate and certificant volume may always be low. This reality needs to be weighed against the benefits of offering certification, and often the benefits can far outweigh the challenges.
Categorized in: Test Development