Professional Testing, Inc.
Providing High Quality Examination Programs

From the Item Bank

The Professional Testing Blog

 

Why Report Scaled Scores?

November 30, 2017  | By  | 

Raw Scores vs. Scaled Scores

When reporting examination scores, one of the big decisions that must be made is on how to report them – with raw scores or scaled scores. Most examinations are initially scored with raw scores. Raw scores are scores without any sort of adjustment or transformation, which is simply the number of questions the candidate answered correctly. Raw scores do not always present the full picture to candidates because they do not take into account qualitative factors such as difficulty of the questions or performance relative to other candidates.

Raw Scores and Passing Standards

Input from subject matter experts who review the difficulty of the questions in the item pool relative to the skills and abilities of the target audience and provide guidance on where the passing score should be set is essential to the standard setting process. As a result, the actual number of questions that a candidate has to answer correctly to pass may vary from one form to another if the difficulty of the question set changes. In other words, if a candidate sees a more difficult set of questions, it is not fair to expect the candidate to be able to answer the same percentage correct as someone who sees an easier set of questions. If only percentages are reported, candidates are not able to compare their scores across time, because a higher percentage on an easier set of items does not mean that they are performing better on the exam than a lower percentage on a more difficult set of items. Without knowing difficulty of each question, raw scores are impossible to decipher in terms of their actual meaning across different examination forms.

The Benefits of Scaled Scores

To deal with this issue, a candidate’s raw score is often transformed into and reported as a scaled score for comparative and interpretive reasons. Candidates are held to the same passing standard regardless of which examination form they take, so scaled scores are reported instead of raw scores to provide a direct comparison of performance across examination forms and administrations. This process ensures that the passing standard communicated to candidates remains the same.

Suppose that an examination has two forms, and one is more difficult than the other. It has been determined by equating that a score of 66% on examination form 1 is equivalent to a score of 71% on examination form 2. Scores on both forms can be converted to a scale so that these two equivalent scores have the same reported scores. For example, they could both be assigned a score of 350 on a scale of 100 to 500 (scales in this region are most common, but theoretically any scale can be used). How points are distributed across a scale range depends on where the passing score is set.  In the above example of 350 as a common passing score, the number of raw points below the passing score are equally distributed between 100 and 350, while the number of points above that score are equally distributed between 350 and 500. It is important to remember passing scores are not arbitrarily set (i.e., 350 does NOT mean 35 percent!).

Another purpose of scaling scores is that a candidate can actually determine how their performance has changed between attempts. Because the difficulty of the question set that a candidate sees in one attempt may vary from the next, reporting raw scores or percentages is meaningless in terms of a candidate’s ability to see whether their performance has improved. A lower raw score or percentage on a more difficult set of questions might actually mean improved performance if the first set of questions was a lot easier. Scaling allows a candidate to see improvements (or not) by putting all attempts on the same scale or metric.

The biggest drawback of reporting scaled scores is candidate confusion in the interpretation of their scaled score. It is important that the candidate not confuse the scaling process with weighting. Each point earned on an exam is worth 1 point regardless of whether it is earned through a dichotomously scored item (correct or incorrect) or polytomously scored item (multiple points possible), and those points are scaled through a mathematical conversion that allows for comparisons of a candidate’s testing attempts across time. Even though it looks as though points are given a weight in this process, they are not weighted which is one of the few downsides of using scaled scores. A good way to avoid this confusion is to provide an explanation or rationale of the scaled scoring process to candidates with their score report.

For high stakes testing, using scaled scores in reporting is an industry standard and best practice. It is a vital component in providing candidates with meaningful information that can assist them in interpreting their results and possibly improving their performance on subsequent attempts.

For more information on score reporting, please see  Eight Tips for Reporting Failing Test Scores on Licensing and Certification Tests and Revision of the Standards: An Advisory Note on Sub-Score Reporting

Categorized in: ,

Comments are closed here.