Professional Testing, Inc.
Providing High Quality Examination Programs

From the Item Bank

The Professional Testing Blog

 

When to Use a Performance Test or an Alternative Assessment Method, Part I

July 22, 2015  | By  | 

A renowned American Professor of Measurement and Testing, Robert Ebel, published the following definition of a performance test:

“In a performance test the subject is required to demonstrate his or her skill by manipulating objects or instruments.”

This definition was first published in a measurement textbook in 1965. The definition seems solid, except that evolving technology may have changed the concept of manipulating objects or instruments since the time of publication.  For example, robotic surgery to perform a kidney transplant from a location 7,000 miles away from the patient.  This type of ultra-high speed data transfer and related virtual environment begins to change the concept of manipulating objects or instruments.

Performance examinations have great “curb appeal” because examinees are instructed to perform job related tasks, just like in real life.  What could be better than that…right? Some credentialing bodies have investigated the feasibility of performance testing but, in the final analysis could not afford the associated costs. The return on investment (ROI) was not adequate.  In general, the costs will probably be at least three times the cost of paper pencil or computer based testing.  The price tag will more likely be many times the cost of a traditional multiple choice test.  Most of the added costs are associated with efforts to standardize the test administration and examiner costs related to delivering valid and reliable scores to those seeking credentials.

Here are a few cost considerations for making a defensible performance examination.

  • Proctor training, travel, meals and lodging
  • Travel, meals, honorarium and lodging for examiners/scorers
  • Number of independent examiners involved for each examinee (i.e., typically the more the better)
  • Standardization or training for examiners
  • Intelligent scoring systems (e.g., for essay tests)
  • Facilities for test administration such as laboratories, machine shops, operating rooms, etc
  • Measurement expertise and software to derive reliability and/or validity estimates
  • Scanning equipment (i.e., bubble readers), custom designed scan forms for the examiners/raters that provide data for the performance test
  • Customized scoring software that calculates and combines examiners’ scores, criteria and feedback to examinee
  • Logistical support for scheduling examiners and testing facilities

When should a performance test be used?  The answer has to involve considerations of what the potential impact could be to patients, clients, and citizens who will receive the goods or services provided.  In some cases, the performance test costs are worth the cost avoidance (e.g., personal injury) involved with not using performance tests.  Most people will agree that proficiency with using a firearm by law enforcement personnel cannot be evaluated with a multiple choice test.  Computerized simulators of firearm ranges are becoming more sophisticated, but still do not replace a real firing range.  There are certain human skills that simply need to be measured in a situation that is as close to real life as feasible.

One rule of thumb might be that measurement of human performance skills should be seriously considered when the skills are used to produce physical products that go inside patients being served.  Indeed, professionals that produce products through the use of fine motor skills have historically been tested through performance exams.  The certified dental technician professional that prepares the crowns, implants and other restorations that are placed in patients by dentists is a good example.  Dental laboratories and dental technicians have a voluntary certification available where their skills are tested in a standardized manner in laboratory situations.  Similarly, dental skills for placement of the restorations by dentists are also tested with performance examinations administered regionally across the United States.  For dentist the performance measure is part of the licensure process.  In both situations, multiple examiners independently score each product produced and numerous products are scored.  Again, most people would agree these are examples of skills where the effort to conduct performance testing is worth the costs.

However, most of the certification and licensure boards have moved away from hands-on performance examinations.  In the past, performance tests were used by some state credentialing boards in the fields of pharmacy, massage, medicine, and cosmetology.  Most performance testing in these fields have moved to multiple choice assessments and typically in computer based testing (CBT) environments. In some cases the performance tests have evolved into different forms of computer simulations.

For some credentialing authorities, the evolution away from hands-on examination was hastened by pressure to defend the scores produced by performance examinations.  In some states, the statutes and administrative rules provided examinees with easy access to due process remedies if they felt a test score was in some way not reliable or valid.  The administrative remedy was as easy as filing for a hearing and serving as one’s own counsel.  Hearing venues were available in video conferencing facilities located in regional centers, so travel was not an issue.  For the credentialing authority the burden of defending a failing score could be substantial as attorneys, expert witnesses and court transcripts had to be provided.

For some professions, the task of defending the scores produced was to some degree facilitated if a video of the process (e.g., chiropractic manipulation) existed after the scores were distributed.  A hands-on process (e.g., a massage) is more difficult to document or maintain in storage relative to a physical object like a denture.  Without having a tangible three dimensional object in evidence to support court testimony, the defense of a failing examination score became more problematic.  Having tangible products remaining after an examination also provided credentialing authorities with the opportunity for examiner feedback and training. These feedback opportunities gave authorities the ability to achieve and maintain consistently high reliability in raters’ scores.  The products produced from one examination become the training materials for standardizing examiners on future examinations.

Please stay tuned for:
When to Use a Performance Test or an Alternative Assessment Method, Part II

Categorized in:

Comments are closed here.