Making sense of scores generated by assessment

Assessment data can provide lots of useful information about your pupils. But, with so many technical terms - standardised scores, normal distribution, confidence intervals, and so on - it can be tricky to navigate. So what do they all mean?

What are raw scores?

You may well remember getting ‘7 out of 10’ for a primary school spelling test, or ‘63%’ for a secondary maths test. Such scores, known as raw scores, are readily understandable and useful in indicating what proportion of the total marks a person has gained in a test. However, raw scores do not account for factors such as the difficulty of a test or performance relative to other test takers. This makes them less useful in enabling teachers to compare pupils’ performance meaningfully between one test and another, and to monitor progress over a period of time. In cases where a comparison is to be made to a cohort as a whole, it is often more useful to use a standardised score.

What are standardised scores?

Standardised scores enable test-takers to be compared with a large, nationally representative sample that has taken the test prior to publication.

As standardised scores are converted onto a common scale they enable meaningful comparisons between scores from other standardised tests. Standardised scores from most educational tests cover the same range, from 70 to 140 with the average standardised score usually set at 100, irrespective of the difficulty of the test.

Standardised scores are an example of a norm-referenced assessment. This is an assessment where pupils are assessed in comparison with the performance of other pupils. ‘Norm’ groups (e.g. the samples with which pupils are compared) are usually ones in which the scores have a normal distribution – they are distributed in a ‘normal’ or ‘bell-shaped’ curve.

When we describe standardised scores, we often use the terms ‘average’, ‘above’ or ‘below average’. In order to make a slightly more useful distinction these terms can be used:

Pupils scoring

Their performance is

Below 70
Much below average
70 – 84
Below average
85 – 94
Low average
95 – 105
Average
106 – 115
High average
116 – 130
Above average
131+
Much above average


All pupils scoring between 85 and 115 are demonstrating a broadly average performance. If the distribution is normal, this is expected to be around about two-thirds of the population.

With standardised scores it is possible to make comparisons between a pupil’s score and the national average, a pupil’s performance in different subjects (e.g. whether they are performing similarly in both maths and English), a pupil’s performance from one test to another and a pupil’s performance relative to different pupils from, say, the same class.

However, as no single assessment can cover all aspects of a curriculum, even a pupil’s standardised score may not be their ‘true’ score. To get around this, rigorously developed tests such as those published by NFER, provide confidence intervals or bands. These tell us the range in which a pupil’s ‘true score’ is (say, 90% of the time) likely to fall. To give one real-life example, on one year 4 reading test, a standardised score of 108 has a 90 percent confidence interval of -5/+4, meaning a nine-in-ten chance that the ‘true’ score is between 103 (i.e. 108-5) and 112 (i.e. 108+4).

What are scaled scores?

Scaled scores show whether a pupil has met an expected standard or not. This is different to the 100 in a standardised test, where 100 represents the average during the standardisation. For England’s National Curriculum Tests, the score of 100 represents the threshold of the expected standard, not the national average on the test. If a pupil scores 100 or above, they have achieved the expected standard. If they score lower than 100, they have not met the expected standard and are still working towards it.

Scaled scores are an example of a criterion-referenced assessment. This is an assessment where pupils are assessed against a criterion or set of criteria rather than evaluating them in comparison with the performance of other pupils. The criteria represent a level of expertise or mastery of skills or knowledge.

What are age-related expectations and age-standardised scores?

Age-related expectations identify what is expected of a pupil by a specified age or year group. In the national curriculum, there is a set standard of expectation which is defined by threshold descriptors indicating what a pupil should be able to do by the end of key stage 2.

These age-related expectations are examples of criteria that pupils are expected to meet and should not be confused with age-standardised scores which come from norm-referenced assessments.

Almost invariably, in ability tests taken in the primary and early secondary years, on average older pupils achieve slightly higher raw scores than younger pupils. However, age-standardised scores are derived in such a way that the ages of the pupils are taken into account by comparing a pupil’s standardised score with others of the same age (in years and months) in the nationally representative sample. Thus a younger pupil may gain a lower raw score than an older pupil, but have a higher standardised score. This is because the younger pupil is being compared with other younger pupils in the reference group and has a higher performance relative to his or her own age group. 

 

 

Standardised score

Age-standardised score

Scaled score

Age-related expectations

Description

This is a score that is converted onto a common scale so that the achievement of pupils can be compared directly. They are useful for comparing attainment of pupils who took different versions of a test or for monitoring relative progress. 
These are scores that allow for differences in the age of pupils taking the same test. These are derived in such a way that the ages of the pupils are taken into account by comparing a pupil’s standardised score with others of the same age (in years and months) in the nationally representative sample.
The score shows whether a pupil has met an expected standard or not.
These describe what is expected of a pupil by a specified age or year group. For example, for the national curriculum, these are the threshold descriptors indicating what a pupil should be able to do by the end of key stage 2.

Norm Or Criterion-referenced?

Norm-referenced
Criterion-referenced

Use of 100

The average standardised score is usually set at 100, irrespective of the difficulty of the test. Standardised scores for most educational tests cover the same range, from 70 to 140.

 

The score of 100 represents the threshold of the expected standard. If a pupil scores 100 or above, they have achieved the expected standard. If they score lower than 100, they have not met the expected standard and are still working towards it.
The threshold for the expected standard can be mapped to standardised OR scaled scores. In NFER tests, for example, the expected standard on the summer term tests is mapped to standardised scores.