How much difference does it make? Notes on understanding, using, and calculating effect sizes for schools

Publication Details

A good way of presenting differences between groups or changes over time in test scores or other measures is by ‘effect sizes’, which allow us to compare things happening in different classes, schools or subjects regardless of how they are measured. This booklet is designed to help school staff to understand and use effect sizes, and includes handy tips and warnings as well as useful tables to calculate effect size values from change scores on standardised tests.

Author(s): Ian Schagen, Research Division, Ministry of Education and Edith Hodgen, New Zealand Council for Educational Research.

Date Published: March 2009

Please consider the environment before printing the contents of this report.

This report is available as a download (please refer to the 'Downloads' inset box).  To view the individual chapters please refer to the 'Sections' inset box.

Section 7: Cautions, caveat, and Heffalump traps for the unwary

Effect sizes are a handy way of looking at data, but they are not a magic bullet, and should always lead to more questions and discussion. There may be circumstances which help to explain apparent differences in effect sizes – for example, one group of students might have had more teaching time, or a more intensive programme, than another. Looking for such apparent differences is one of the main reactions that effect sizes should lead to.

One thing to watch out for is "regression to the mean". This is particularly a problem when specific groups of individuals such as those with low (or very high) attainment are targeted for an intervention. If we take any group of individuals (class, school, nation) and test them, and then select the lowest attaining 10 percent, perform any kind of intervention we like, and then retest, we will normally find that the bottom 10 percent have improved relative to the rest. This is because there is a lot of random variation in individual performance and the bottom 10 percent on one occasion will not all be the bottom 10 percent on another occasion.

This is a serious problem with any evaluation which focuses on under- (or over-) performing students, however defined. It is essential that progress for such students be compared with equivalent students not receiving the intervention, and not with the progress of the whole population, or else misleading findings are extremely likely.

Whenever you calculate an effect size, make sure you also estimate a standard error and confidence interval. Then you will be aware of the uncertainty around the estimates and not be tempted to over-interpret the results.

Effect sizes are not absolute truth, and need to be assessed critically and with a full application of teacher professional judgement. However, if you believe some teaching initiative or programme is making a difference, then it should be possible to measure that difference. Effect sizes may be one way of quantifying the real differences experienced by your students.

Judging effect sizes
  • The difference is "real" if the confidence interval does not include zero.
  • The importance of the difference depends on the context.
  • Groups consisting only of students with the highest or lowest test scores will almost always show regression to the mean (low scorers will show an increase; high scorers a decrease regardless of any intervention that has taken place).

Contact Us

Education Data Requests
If you have any questions about education data then please contact us at:
Email:      Requests EDK
Phone:    +64 4 463 8065