Professional Testing, Inc.
Providing High Quality Examination Programs

From the Item Bank

The Professional Testing Blog

 

Managing Low-Volume Examination Programs: Where Theory Meets Reality

March 15, 2017  | By  | 

By Christine Depascale, MS, and Joy Matthews-López, PhD

Low-volume examination programs face challenges that don’t necessarily apply to mid- to large-volume programs. These challenges can be especially prominent when trying to produce interpretable quantitative outcome measures.

How low of a volume constitutes a low-volume program?

What is considered a low-volume program—less than 500 test-takers; 100; 25?  Unfortunately, there is no definitive answer.  It depends on how often the examination is administered (in a single administration, via windows, or continuously).  Offering a single, annual administration to 100 examinees is different than continuously testing 100 examinees over a year. Obviously, having more than 100 examinees would be preferable to having fewer than 100, but being able to process a single cohort of 100 examinees and to conduct quantitative analyses on their response data is doable, whereas producing monthly or quarterly item analyses on 100 examinees across a year of testing is sub-optimal, at best.

What are the challenges?

Nearly every aspect of testing is negatively affected when a program experiences a low number of examinees, from SME recruitment, item development, and form assembly to examination delivery, test security and score reporting. These challenges center on too few available SMEs for defensible test development meetings and too few candidates for stable item statistics.

  • Too few available SMEs for defensible test development meetings (such as Job Task Analysis, Item Writing/Review, or Standard Setting)

    • This can introduce potential bias. With a high-volume program, if one SME is not available, it is easier to find a replacement that still maintains the representative nature of the panel; however with a low-volume program, finding a replacement may be difficult or impossible. In a situation like this, potential bias is introduced since the group’s demographics and combined skill set is not representative of the profession’s body of knowledge or the program’s population of examinees.
    • This can also impact the number, breadth, and depth of items written. Not having enough (breadth and depth) of active items can negatively impact the assembly of a new exam form because there may not be enough items to create forms that meet blueprint requirements and align with statistical or psychometric targets.
  • Too few candidates taking the examination for stable item statistics

    • This can affect the monitoring and evaluation of items and test forms. Traditional item analyses are unstable and vary greatly when data are based on exceptionally low numbers of examinees. Two factors play into this:
    • For statistics based on percentages and averages, one individual candidate has a greater impact on the statistical outcomes when the candidate volume is smaller.

For example, when considering item difficulty, if the statistic is based on only 10 candidates, the impact from one individual candidate getting an item either correct or incorrect affects the item statistic by 0.10. In this example, this magnitude of difference can make the difference between a 0.60 vs. 0.70 p-value. However, when item difficulty is based on 30 candidates, the individual impact from the one candidate is considerably less (.03, which is a difference between a 0.60 and a 0.63 p-value).

  • With low volumes of candidates, each group of candidates taking a single administration of the examination will vary greatly in its representativeness to overall population.
  • For example, 50% of the population of a particular profession are educators; however when the examination was administered in July, only 3 of the 10 candidates (30%) were educators. When the examination was taken again in September, 6 of the 10 candidates (60%) were educators. The item statistics from these two administrations may vary considerably as a result.
  • This can also affect candidate privacy and confidentiality. Examinees may be identifiable when candidate volume is low, which may create a privacy issue.
  • Not having interpretable data on item and person performance has other negative consequences, such as adversely impacting the equating process, which can pose fairness (and validity) issues. For example, if the forms are equated to have the same relative difficulty, and the difficulty measure is compromised (as discussed above), then the equating process will be adversely impacted. As a result, two candidates of similar ability that receive these forms may have different outcomes.

Tests that consist of items that are produced by small committees of SMEs, coupled by items seen by small cohorts of examinees, can impact fairness and ultimately call into question the interpretability of tests scores and the decisions based on those scores. Unless managed properly, the plethora of problems faced by low-volume programs can undermine the very essence of the exam program.

Managing the challenges facing low-volume programs

So what can be done to mitigate the problems faced by low-volume programs? Is there a way to successfully navigate the problems and manage the challenges? The answer is yes, through team work and stellar communication between the test sponsor and the test developer (and psychometric staff).

Where candidate volumes are low and traditional analyses should not be performed, hybrid methods can be used, such as:

  • Supplementing quantitative analyses with qualitative feedback and guidance from SMEs.
  • Producing reports on an extended timeline, such as semi-annually or annually.
  • When possible (and sensible), data aggregated across administrations can be used and supplemented with supporting (and sensible) explanations and context.

Low volume programs may also experience constraints by the number of available subject-matter-experts to participate in job-analysis studies, passing score studies, item writing and review meetings.  The response rate to validation surveys may also be constrained.  A close working relationship between the test sponsor and the test developer (and psychometric staff) is vital to the success of managing the challenges typical to a low-volume program. Test sponsors need information about what is and what is not defensible. They need to be on the front lines of the test development process so that when tough decisions need to be made, the best possible information is available for decision makers to proceed with eyes wide open and a solid rationale.

We recently heard of a situation in a low-volume program where test developers had too few “good” items to assemble an exam form. Their choice was between including items on the form that had poor statistics, or to intentionally fail to meet the exam’s blueprint. Which is more egregious: to include poorly performing items that could affect the reliability of the test form (and scores) or to fail to adhere to the mandated blueprint (which is a content validity issue)? Clearly, neither option is desirable, but a decision had to be made.

In this case, our recommendation would have been to have the test developer clearly outline to the test sponsor the problem at hand, the available options, and the risks (type and magnitude) associated with those options. In the end, the test sponsor needed to have enough information to make an informed decision and then to own that decision. By finding and implementing innovative solutions to problems such as using hybrid models (quantitative supplemented by qualitative information), using a team approach, and clearly communicating options, problems can be addressed and challenges can be successfully managed.  Sometimes certification programs have to work with what they have, and for some programs, candidate and certificant volume may always be low.  This reality needs to be weighed against the benefits of offering certification, and often the benefits can far outweigh the challenges.

Categorized in:

Comments are closed here.