When to Use a Performance Test or an Alternative Assessment Method, Part II - Professional Testing Blog

Our Blog

From the Item Bank
The Professional Testing Blog

When to Use a Performance Test or an Alternative Assessment Method, Part II October 7, 2015 \| By Michael Jones, Ph.D. \| This is a continuation of the previous blog article that discussed the use of performance examinations for certification and licensure. To summarize the previous blog, performance examinations (i.e., “hands-on”) are expensive to use and maintain. Many certification and licensure authorities have abandoned hands-on performance examinations for some alternative measure of a person’s skills. However, there are situations where the expense and effort are worth the expenditure because of consequences to clients. Situations where a performance examination may be worth the expense are those where failing to measure an examinee’s performance skills (fine motor) could result in pain or suffering for other people (e.g., patients). What about situations where fine motor skills are not critical, but complex cognitive processes need to be assessed by another human? There are still several options to “hands-on” performance tests. Some alternatives might include essays, short answers, portfolios, exhibitions, and structured oral examinations. The largest group of rater/grader scored standardized examinations in the United States, and perhaps the world, are essays. Essays are used to measure a variety of complex cognitive skills and are routinely used in multistate bar exams, national K-12 student achievement exams, national teacher credentialing exams, college entrance examinations (ACT/SAT), and certification examinations. Most people have written an essay at one point during their education, and writing skills are commonly evaluated on criteria such as organization of thought, grammar, and syntax. This type of essay is an ideal instrument for measuring writing skills, however, essays can be used for much more The standardized essay is different from the essay format most people have taken while progressing through K-12 grades and postsecondary levels. They are a handy instrument for measuring the highest levels of cognitive functions (evaluation) for any professional domain. For example, they can be used to evaluate how well the principles of law are incorporated into a complex legal scenario or evaluate the examinee’s approach to a complex hazardous materials spill. For the purpose of this blog we’ll reference this type of essay as a “standardized essay”. Standardized essays can be useful in credentialing when a job/practice analysis has been performed and the content domain has been clearly defined. To make a good measure, validated writing prompts, scoring rubrics, scoring anchors, examiner/ grader training, and evaluation of rater reliability are needed. In other words, a significant amount of organized research and development will need to occur in order to provide the highest level of reliability and validity of test scores as possible. The cost for producing a standardized essay is not inexpensive, however, this type of examination is typically less than a “hands-on” performance examination. The reason why an essay costs less than a performance measure is that “hands on” performance examinations frequently require simulators (e.g., flight), laboratories, or complex computer equipment for test administration and the standardized essay just requires a proctor, table, chair and a comfortable room. The development of the essay is an intuitive process because it is familiar to most people, including most subject matter experts (SMEs). The primary drawback of the essay approach is that the amount of the examinee’s knowledge and abilities that can be measured with a few writing prompts may be limited. If the essay prompt happens to ask (i.e., samples) something that an otherwise competent person does not know, the assessment might misclassify the examinee as failing when the examinee is actually competent. A solution to the limited sampling of the examinees knowledge and abilities is to simply increase the number of essay prompts and limit the examinee’s responses (e.g., less than 100 words). It is possible to ask 50 short essay questions in a 2.5 hour period. Classical measurement theory tells us that this increase in the number of items increases the reliability of the examination. Also, the content validity of the measure is increased because the test covers more of the examinee’s knowledge and abilities, although the depth of coverage may not be as large as a three page essay answer. Lee J. Cronbach (deceased – Professor Emeritus at Stanford) called this the bandwidth-fidelity relationship. Writing assessments are not without costs because raters still have to be recruited, trained, and in many cases raters have to be paid for their services. Scoring anchors (sample essays) and scoring rubrics (scoring guides) must be produced to standardize and guide the raters/scorers. Rater performance (reliability) must still be psychometrically evaluated and reported. Just like “hands-on” performance tests, a writing assessment can provide substantial challenges to achieving consistent reliability from one examinee to the next. Despite the draw backs to essay type assessments, there are good reasons that this form of assessment is still administered in high volumes across the nation. For one, written essay assessments do not require complex computer technology to develop or administer in a standardized manner. Also, in most cases, scoring anchors and scoring rubrics can be developed in a reasonably cost effective manner using subject matter experts. Furthermore, in some cases, the raters (non SMEs) can be trained to use the grading rubric which in turn saves money related to using only SMEs. The next blog will discuss some alternative item types that may avoid the costs associated with examiner scored performance examinations. Please stay tuned for: When to Use a Performance Test or an Alternative Assessment Method, Part III. Tags: alternative assessment method, performance test Categorized in: Industry News Comments are closed here.		Recent Posts Virtual Meetings—Good for the Interim But Less Than Ideal? – Part 2 Virtual Meetings—Good for the Interim But Less Than Ideal? – Part 1 Nudges and the work-from-home world – How to inspire your team from a distance Expectations When Developing a New Certification Examination—Be Aware So You Can Manage Your Expectations and Those of Subject-Matter-Experts The Revision of ISO/IEC 17000 Archives April 2021 February 2021 May 2020 September 2019 July 2019 January 2019 December 2018 November 2018 October 2018 September 2018 March 2018 January 2018 December 2017 November 2017 October 2017 September 2017 July 2017 June 2017 May 2017 April 2017 March 2017 February 2017 January 2017 December 2016 November 2016 October 2016 September 2016 August 2016 July 2016 June 2016 May 2016 April 2016 March 2016 February 2016 January 2016 December 2015 November 2015 October 2015 September 2015 August 2015 July 2015 June 2015 May 2015 April 2015 March 2015 February 2015 January 2015 Categories Industry News Item Type Item Writing Licensing Boards Marketing PTI Updates Regulatory Power Test Development White Paper