The difference between scales used by Assembly and RS Assessment from Hodder Education


Assembly and RS Assessment from Hodder Education work closely together on the scales used to interpret the results of PUMA and PiRA standardised assessments. There are some important nuances between the results visible within MARK (the RS Assessment system) and Assembly Analytics (Assembly's MAT analytics tool). This article will help you understand those differences.

The RS Assessment Approach

RS Assessment use the raw scores achieved in the PUMA and PiRA assessments to calculate scores using a range of scales. Here are brief summaries of a few of the important ones:

  • The Standardised Score is based on a typical normal distribution with a standard deviation of 15 and an average of 100. This means in practice that most results are between 70 and 130 (two standard deviations either side of 100). The raw score is converted into a standardised score based on a conversion table that is the same for all students within a year group.
  • The Age Standardised Score is similar to the Standardised Score, but with the important difference that the score allocated to a student is adjusted based on their age within a year group.
  • The percentile rank is allocated based on the number of students that would have achieved that given rank from a representative national sample. So if a student is in the 70th percentile, it means their result was better than 70% of a representative sample of students from the same year group. There is a fixed relationship between standardised scores and percentile ranks (e.g. the 50th percentile will intersect with a standardised score of 100).
  • The performance indicators give a sense of the age related expectations of that child based on their standardised score. Students are shown as "working towards", "working at" or "working at greater depth" based on defined thresholds (see the Performance Indicators page on the RS Assessment website for more information). RS Assessment set these thresholds at a level that is slightly more demanding than the expected standard of the tests, to ensure that they remain secure estimates of a student's performance.

The Assembly Approach

Assembly, on the other hand, mainly shows aggregated data derived from PUMA and PiRA scores in Assembly Analytics, and the scales used are subtly different to (but derived from) the ones used by RS Assessment. Assembly's scales are as follows:

  • The Key Stage 2 scaled score equivalent, which takes the standardised score / percentile rank from PiRA/PUMA and converts it into the equivalent grade in the latest release of the DfE's Key Stage 2 SATs scale, which uses a scale from 80 to 120. This is not a normal distribution, so 100 is NOT the average, but is instead what the government considers to be the "Expected Standard". 110 is what the DfE believes to be a "High score". This allows us to give you an average score across multiple year groups and classes in a language that primary schools often use for their broader accountability purposes.  
  • Assembly's performance indicators consider all students who have a KS2 scaled score equivalent of 100+ as "Expected Standard", and students of 110+ as "High Score". This is less cautious than the RS Assessment calculation, since there is no inbuilt adjustment to ensure these are secure judgments. This is an important difference: because Assembly focuses on aggregated data, an inbuilt buffer would skew average performance down.

In practice therefore, whereas the Assembly performance indicator for Expected Standard is always set at a level that matches our calculation of the threshold for a Key Stage 2 Scaled Score Equivalent of 100+, the equivalent RS Assessment performance indicator actually equates to a slightly more challenging level, that would translate to an Assembly Key Stage 2 Scaled Score equivalent of 101+ in both Reading and Mathematics (based on the most recent DfE data). Similarly, while Assembly calculates High Score as 110+ as a Key Stage 2 Scaled Score equivalent, the RS Assessment threshold is set at a slightly more challenging level.

Which one you use to interpret progress and understanding will depend on the purpose of your judgements. To assess whether a student is secure in the knowledge of a particular topic, the PiRA / PUMA Performance indicator is the more valuable. However, to judge the aggregate performance of a larger cohort, the Assembly measure is likely to be more accurate on average.