Professional Testing, Inc.
Providing High Quality Examination Programs

From the Item Bank

The Professional Testing Blog

 

When to Use a Performance Test or an Alternative Assessment Method, Part II

October 7, 2015  | By  | 

This is a continuation of the previous blog article that discussed the use of performance examinations for certification and licensure.  To summarize the previous blog, performance examinations (i.e., “hands-on”) are expensive to use and maintain. Many certification and licensure authorities have abandoned hands-on performance examinations for some alternative measure of a person’s skills. However, there are situations where the expense and effort are worth the expenditure because of consequences to clients.  Situations where a performance examination may be worth the expense are those where failing to measure an examinee’s performance skills (fine motor) could result in pain or suffering for other people (e.g., patients).

What about situations where fine motor skills are not critical, but complex cognitive processes need to be assessed by another human?  There are still several options to “hands-on” performance tests.  Some alternatives might include essays, short answers, portfolios, exhibitions, and structured oral examinations.  The largest group of rater/grader scored standardized examinations in the United States, and perhaps the world, are essays.  Essays are used to measure a variety of complex cognitive skills and are routinely used in multistate bar exams, national K-12 student achievement exams, national teacher credentialing exams, college entrance examinations (ACT/SAT), and certification examinations.

Most people have written an essay at one point during their education, and writing skills are commonly evaluated on criteria such as organization of thought, grammar, and syntax. This type of essay is an ideal instrument for measuring writing skills, however, essays can be used for much more The standardized essay is different from the essay format most people have taken while progressing through K-12 grades and postsecondary levels.  They are a handy instrument for measuring the highest levels of cognitive functions (evaluation) for any professional domain.  For example, they can be used to evaluate how well the principles of law are incorporated into a complex legal scenario or evaluate the examinee’s approach to a complex hazardous materials spill.  For the purpose of this blog we’ll reference this type of essay as a “standardized essay”.

Standardized essays can be useful in credentialing when a job/practice analysis has been performed and the content domain has been clearly defined.  To make a good measure, validated writing prompts, scoring rubrics, scoring anchors, examiner/ grader training, and evaluation of rater reliability are needed.  In other words, a significant amount of organized research and development will need to occur in order to provide the highest level of reliability and validity of test scores as possible. The cost for producing a standardized essay is not inexpensive, however, this type of examination is typically less than a “hands-on” performance examination.  The reason why an essay costs less than a performance measure is that “hands on” performance examinations frequently require simulators (e.g., flight), laboratories, or complex computer equipment for test administration and the standardized essay just requires a proctor, table, chair and a comfortable room.

The development of the essay is an intuitive process because it is familiar to most people, including most subject matter experts (SMEs).  The primary drawback of the essay approach is that the amount of the examinee’s knowledge and abilities that can be measured with a few writing prompts may be limited.  If the essay prompt happens to ask (i.e., samples) something that an otherwise competent person does not know, the assessment might misclassify the examinee as failing when the examinee is actually competent.

A solution to the limited sampling of the examinees knowledge and abilities is to simply increase the number of essay prompts and limit the examinee’s responses (e.g., less than 100 words). It is possible to ask 50 short essay questions in a 2.5 hour period.  Classical measurement theory tells us that this increase in the number of items increases the reliability of the examination.  Also, the content validity of the measure is increased because the test covers more of the examinee’s knowledge and abilities,   although the depth of coverage may not be as large as a three page essay answer.  Lee J. Cronbach (deceased – Professor Emeritus at Stanford) called this the bandwidth-fidelity relationship.

Writing assessments are not without costs because raters still have to be recruited, trained, and in many cases raters have to be paid for their services. Scoring anchors (sample essays) and scoring rubrics (scoring guides) must be produced to standardize and guide the raters/scorers.  Rater performance (reliability) must still be psychometrically evaluated and reported.  Just like “hands-on” performance tests, a writing assessment can provide substantial challenges to achieving consistent reliability from one examinee to the next.

Despite the draw backs to essay type assessments, there are good reasons that this form of assessment is still administered in high volumes across the nation. For one, written essay assessments do not require complex computer technology to develop or administer in a standardized manner. Also, in most cases, scoring anchors and scoring rubrics can be developed in a reasonably cost effective manner using subject matter experts. Furthermore, in some cases, the raters (non SMEs) can be trained to use the grading rubric which in turn saves money related to using only SMEs.  The next blog will discuss some alternative item types that may avoid the costs associated with examiner scored performance examinations.

Please stay tuned for: When to Use a Performance Test or an Alternative Assessment Method, Part III.

Tags: ,

Categorized in:

Comments are closed here.