Professional Testing, Inc.
Providing High Quality Examination Programs

From the Item Bank

The Professional Testing Blog

 

Evaluating New Item Types for Usability

June 1, 2016  | By  | 

Usability testing can significantly enhance the quality of alternative item types and is one method used in incorporating Universal Design and/or User-Centered Design approaches into a testing program. In her March 2015 post Alternative Item Types – The Big Picture, Dr. Cynthia Parshall mentions usability testing as part of her 6-step model for the design and development of alternative item types. Within Step 4, “Iteratively Refine Item Type Design,” “conducting usability testing” is an activity done in conjunction with “developing item writing materials and sample items” and “evaluating and revising the item type design.”

This post provides a description of usability testing in general and highlights questions to consider when evaluating usability of new item types. Parshall and Harmes provides a more detailed discussion of using usability testing within measurement settings within their 2009 article, Improving the Quality of Innovative Item Types: Four Tasks for Design and Development.

What is Usability Testing?

Usability testing, which has a long history in computer software design, is a research tool used to evaluate potential usability problems when developing new products, applications, or services so they can be resolved or managed prior to going to market. In Handbook of Usability Testing, Jeffery Rubin and Dana Chisnell refer to something being usable when there is an “absence of frustration” when someone uses it, and defines usability as when “the user can do what he or she wants to do the way he or she expects to be able to do it, without hindrance, hesitation, or questions.” Rubin and Chisnell also define six attributes of usability. These six attributes lie at the heart of usability testing.

  • Usefulness focuses on whether users will use the product.
  • Efficiency focuses on the ease of use of the product.
  • Effectiveness focuses on whether the product behaves as the user expects it to, and whether users can use the product to achieve the intended goal without making errors.
  • Learnability focuses on how easy it is for the user to learn to use the product.
  • Satisfaction focuses on how the user feels about the product.
  • Accessibility focuses on how usable the product is for those with disabilities.

The term usability testing has been used to refer to both formal and informal testing methodologies and uses a variety of techniques. In the design and development of computer-based applications (as well for those of alternative item types, which are often computer-based), usability testing will try to identify the potential problems associated with the six attributes of usability through evaluating the elements of the software that center on user interactions, including screen display design and user interfaces. Organizations that do extensive formal usability testing often have facilities that serve as usability labs equipped with specialized equipment. Others have employed portable “labs”, remote techniques, and other lower cost methods. Commonly used in usability testing, the Think Aloud method has been shown to be highly effective. This method, which can be implemented in almost any setting, involves having study participants talk aloud their thoughts and actions while using the product or a “low-fidelity” prototype of the product (e.g., prototype created in PowerPoint or on paper) to perform realistic tasks related to the product’s intent. During the Think Aloud, the participants’ comments, interactions, and, if possible, nonverbal expressions/reactions are noted. Often these sessions are recorded so that commonalities and comparisons between participants can be more accurately captured.

As stated in Parshall and Harmes’ 6-step model, usability testing is part of an iterative process consisting of multiple rounds of drafting, testing, and revising. Beginning usability testing early in the process with prototypes is recommended as it can help identify and revise the product before intensive programming has occurred.

Only a small number of participants are required for each individual round of usability testing. Nielsen has noted that approximately 85% of the usability problems in an application are identified with as few as 5 participants. At that point, resources are best devoted to making these changes, and then conducting a follow-up round. However, considerations should be made to the variations across the target population. If, for example, subgroups within the population respond to the product differently, than the usability study would benefit from representation from the various subgroups.

Considerations when Evaluating Usability of New Item Types

Usability testing is important to the creation of new item types because it helps identify and reduce sources of measurement error. When applying usability testing to the design and development of item types, it is beneficial to consider both the user perspective of the item writer and that of the test taker. The item writer uses the item type templates and specifications to write new items within the item type. The test taker uses the items to provide information about what he or she knows regarding the given construct. Within the iterative process, while both user groups are considered throughout the full process, item writer considerations are more likely emphasized earlier within the design phase while test taker considerations are emphasized to a greater degree during development. Considering the six attributes of usability, potential questions that could be asked when evaluating the usability of a new item type are:

  1. Usefulness:
    • How likely will items based on this item type be used on an examination form?
    • Can the items created based on the item type be incorporated into test forms based on the examination’s blueprint/specifications?
    • How likely is it that item writers will actually write items using the item type?
    • How likely will test takers respond to the items on the form vs. skipping them?
    • Will test takers more likely apply to take the examination if the item type is included on the examination?
  2. Efficiency:
    • How much resources (e.g., cost, time) are used when developing items based on the item type?
    • Will item writers have difficulty writing items within the item type?
    • How difficult is it for the test taker to respond to items based on the item type?
    • How long does it take test takers to respond to the item?
  3. Effectiveness:
    • Are item writers able to develop items that accurately measure the intended construct using the given item type templates and/or specifications?
    • Are test takers able to respond to the item type in the way that is intended?
    • Are test takers making errors in using the item type? (e.g., item type requires selecting three responses but test takers are providing more or fewer responses; the item type requires the test taker to place an “X” on correct graphic but many of the test takers place an “X” next to the correct graphic)
  4. Learnability:
    • How much training is required before item writers are able to develop an item based on the item type?
    • Are item writers able to develop high-quality items based on the provided template/instructions or do they need additional training or feedback?
    • How much preparation or “training” does test taker need in order to be able to respond to the items within the item type?
    • Do test takers need a tutorial and sample items to learn how to respond to the item type?
  5. Satisfaction:
    • What are the item writers’ feelings regarding the items they write within the item type? Are they generally satisfied with the items or do they often want to rewrite or delete the items?
    • What are the test takers and the public’s perspective of the items written based on the item type? Do they perceive the items based on the item type relevant to the overall purpose of the examination?
    • How do the test takers like responding to the item? Do they enjoy the interactions with the items or do they feel frustrated and confused as they interact with the item type?
  6. Accessibility:
    • To what extent are test takers with a disability able to respond to the item type without accommodations?
    • Can appropriate accommodations be provided for those who require them(e.g., consider the a visually impaired test taker required to respond to a graphic-based hotspot item where he or she is required to click on a particular area of the graphic )?

 

Note that there are other questions to ask, such as whether the items measure something that is hard to measure with cheaper-to-produce item types. Such questions fall outside the scope of usability testing.

In Summary

Usability testing is an effective tool that can greatly enhance the quality of new item types.  When conducting usability testing, key points to consider are:

  • Start usability testing early with prototypes.
  • Think Aloud methodology is effective and can be done inexpensively.
  • Usability testing is not “one and done”; it needs to be repeated as the product develops.
  • Consider item writers as well as test takers.
  • Even small numbers of study participants can provide valuable information to guide development.

Parshall, C. G., & Harmes, J. C. (2008). The design of innovative item types: Targeting constructs, selecting innovations, and refining prototypes. CLEAR Exam Review, 19(2).

Rubin, J. & Chisnell, D. (2008). Handbook of Usability Testing. 2nd Edition. Indianapolis: IN. Wiley.

Image Attribution: INTVGene

Tags: ,

Categorized in: ,

Comments are closed here.