Professional Testing, Inc.
Providing High Quality Examination Programs

From the Item Bank

The Professional Testing Blog

 

Inventory Planning in Testing: How Many Items Do We Need to Write?

December 20, 2015  | By  | 

It’s one of the most basic questions in planning and maintaining an examination program: How many items do we need to write, review, and pretest? Write too many, and you’re expending resources on inventory that will go stale. Write too few, and you can’t assemble the requisite number of test forms to your specifications.

Getting Started
To answer the question, you need to review the rules you have established for yourself – your set of constraints. For a new licensing or certification testing program, here is a set of considerations:

  • Your test specifications will provide the basic constraints. They will specify how many items must cover each topic (e.g., 20 in Content Domain I). They may specify cognitive levels (e.g., 10 Recall items in Domain I). They may specify item types (e.g., 12 short-response items per form). Initial inventory planning must happen at the basic constraint level. You may have 150 items available for a 100-item form, but if those don’t include your 10 Recall items for Domain I, you can’t assemble your form.
  • You will have to decide how many test forms to start with and how much overlap the forms will have.
  • You will want to estimate the proportion of newly written items that will survive review and be approved for use.
  • Finally, you need to make an allowance for item enemies. These are items that cannot appear on the same test form (typically because they are too similar to one another or one hints at another’s answer).

Let’s assume a domain requires 10 items, and that you require two forms with 20 percent overlap. That means you will use 18 items overall. Say you want a 10 percent reserve for enemies. So you must approve 19.8 or 20 items. Allowing for only 75 percent to be approved (survive review), you will need 20 ∕ 0.75 or 27 items written in that domain.

Figure 1. You may calculate the number of items your subject-matter experts should write using a simple table. The constraints are shown in the top two rows.

Figure 1. You may calculate the number of items your subject-matter experts should write using a simple table. The constraints are shown in the top two rows.

Ongoing Tests
Medium- and long-term inventory needs depend primarily on candidate volume and how quickly content becomes obsolete.

  • A program with modest numbers may simply decide to introduce new forms periodically (say annually) combining some fresh items with some previously used ones.
  • A program with tens of thousands of candidates per year will usually set item-exposure constraints. Considerations include the stakes of the test as well as test sponsors’ assessment of the likelihood that content will be compromised. A constraint may specify that an item will be exposed to no more than (say) 10,000 candidates, or for no more than two years.
  • Where some content areas evolve more quickly than others, programs may choose to focus item-development efforts in those areas.

In inventory planning for ongoing testing, an additional level of attrition needs to be considered: what proportion of pretested items will be fit to use?

subsets

Figure 2. The set of items approved for operational use is a subset of items approved for pretest, which is in turn a subset of items written.

Figure 2. The set of items approved for operational use is a subset of items approved for pretest, which is in turn a subset of items written.

This is an area in which your psychometrician will probably thank you for allowing generous margins. As you pretest items, you may find that most do an adequate job measuring candidates’ knowledge. However, in generating equivalent forms, the test assembler (be it an individual or a computer) is looking for specific statistical attributes. To allow the best possible forms to be assembled, it is good to provide a generous buffer, pretesting perhaps twice as many items as are needed for operational use.

How many items is that?

Let’s say you choose to retire items after two years of operational (scored) use. Building on the previous example, where you required 18 scored (operational) Domain I items to build two forms, you will be using 9 new scored items every year.

If you choose to pretest twice the number of items you ultimately need (to cover enemies and also give leeway in form assembly), you will need to approve 18 items for pretest. For that, you will need to write 24 items per year (18 ∕ 0.75 approval rate). As you build up your reserves, you will want to adjust these numbers downward.

Note that to pretest large numbers of items, you normally couple each set of scored, operational items with different sets of unscored, pretest items.

forms

Figure 3. You might pilot test 6 sets of items with only two base forms, but only if your candidate volume supports it.

Figure 3. You might pilot test 6 sets of items with only two base forms, but only if your candidate volume supports it.

Sensible Planning
The biggest constraint for most organizations is human and material resources. I’ve had the pleasure of seeing organizations save tens of thousands of dollars by reviewing their inventory plan and optimizing it.

  1. The most frequent problem is producing way more content than can ever be used. While I advocate for a generous buffer, there can be too much of a good thing.
  2. Each test-assembly constraint has a cost, as shown above. Such constraints are a necessary part of planning for a fair test. But they should be adopted only after careful consideration of costs and benefits.
  3. As a program matures, assumptions can be fine-tuned (maybe 90 percent of newly written items are surviving review; now you can write fewer items).
  4. Finally, as the reserve of approved items grows, it becomes possible to avoid over production – while watching out for obsolescence of reserve items.

Categorized in:

Comments are closed here.