Education that fits: Review of international trends in the education of students with special educational needs

Publication Details

The purpose of this review is to outline international trends in the education of students with special educational needs, with the aim of informing the Ministry of Education’s current review of special education.

Author(s): David Mitchell PhD, College of Education, University of Canterbury, for the Ministry of Education.

Date Published: July 2010

Please consider the environment before printing the contents of this report.

This report is available as a download (please refer to the 'Downloads' inset box).  To view the individual chapters please refer to the 'Sections' inset box.  For links to related publications/ information that may be of interest please refer to the 'Where to Find Out More' inset box.

Chapter 9: Assessment

15In Chapter Eight, we saw how the trend in western countries was for SWSEN to participate and progress in the general curriculum, albeit with appropriate modifications and adaptations. In this chapter, parallel issues will be explored with respect to assessment, namely the extent to which SWESEN are expected to participate in a country’s national or state assessment regimes and what, if any, alternate assessment procedures are permitted. Both trends are part of the wider concern for standards-based reform in education that is dominating much of the educational and political discourse around the world16. The vast bulk of literature on modified and alternate assessment has emanated from the US and this section of the review reflects that.

9.1 Policies Requiring Access to General Education Accountability Systems

United States. Until recently, in the US, accountability in special education was defined in terms of progress in meeting IEP goals. That all changed in IDEA 97, which required all students, including those with disabilities, to participate in their states’ accountability systems. This was followed by a policy memorandum from the U.S. Department of Education (2000), to the effect that an exemption from a state’s assessment programmes was no longer an option for students with disabilities. Both IDEA 97 and the No Child Left Behind Act (NCLBA) of 2002 required the provision of alternate assessment for students who could not participate in state or district assessments with or without accommodations. Districts are permitted to measure up to 3% of their students using alternate assessments (1% against alternate achievement standards and 2% against modified standards – a distinction that will be described in more detail below). The use of alternate assessment is a decision to be made by a student’s IEP team. To quote IDEIA, IEPs must include ‘a statement of any appropriate accommodations that are necessary to measure the academic achievement and functional performance of the child on state- and district-wide assessments’ (IDEIA, 2004, p.118). As well, the NCLBA stipulated that student performance be disaggregated by special education status, among others, and, to avoid sanctions, by 2013/2014 schools must show that students in various subgroups are making adequate yearly progress toward mastering content standards.

At this juncture, it is worth quoting at length a personal communication from David Egnor, Assistant Division Director, National Initiatives, Research to Practice Division, Office of Special Education Programs, US Department of Education:

… one of the main pushes in the U.S. particularly among special education administrators, but also teachers, is to develop standards-based IEPs. I believe that standards-based IEPs are becoming much more attractive from an administrative point of view as a direct result of our country's increasing focus on standards-based educational reform … and which will ratchet up even further under the Obama administration. That is, requiring standards-based IEPs for every student with a disability (not currently required for all students with disabilities, although things are moving that way) provides a way, from an administrative perspective, to more efficiently administer and monitor special education service delivery and to do so within a standards-based accountability environment, where, in the past, special education practice historically focused more on individualized services and outcomes for students with disabilities. My view is that the growth of standards-based IEPs in the U.S. is a clear sign that special education practice is undergoing fairly significant changes that are directly tied to standards-based reform under the ESEA/NCLB and the next iteration of our main federal education law currently under consideration in the US Congress. I think that what we are seeing with regard to standards-based IEPs is an outgrowth of the special education inclusion movement, where as a field special education attempts to make the general education environment more accessible to students with disabilities.  Given the focus on standards-based educational reform, it is not surprising that special education administrators, in particular, seek a way to join with the standards-based movement through the IEP development process and, as a result, students' IEPs are emphasizing general education standards more and more. Although a standards-based IEP should not limit the services a student receives (just standardize, to some extent, the educational outcomes we expect), I think that this movement may be unintentionally limiting services for some students with disabilities. I also think that more work needs to be done to explicate how individualization (equity) for students with disabilities can co-exist within the growing context of standards-based reform (excellence).

According to Defur (2002), the thinking behind the earlier requirements was two-fold. Firstly, it was assumed that higher expectations would lead to higher achievement for students with disabilities. Previously, the educational progress of such students had been limited by low expectations, which in turn narrowed their access to the general curriculum and to higher achievement. The second assumption was that assessment information on students with disabilities would lead to improved instructional programmes, which in turn would lead to improved student outcomes. It would seem that this rationale still applies.

England and Wales. In England, tasks and tests set for assessment at the end of Key Stages 2 and 3 (for students aged 11 and 14, respectively) are designed to monitor attainment targets for each of the National Curriculum subjects, and are expected to be accessible to the vast majority of students, including those with special educational needs.  However, those children in Key Stage 2 working at level 1 or below of the National Curriculum eight-level scale are assessed by teacher assessment alone. Similarly, at Key Stage 3, students working at or below level 2 of the National Curriculum scale are assessed by teacher assessment and not by statutory national testing. If a student's statement of special educational needs modifies the statutory assessment arrangements, the provisions within the statement should be followed in respect of the statutory tests and tasks. With regard to the GCSEs and GCE A levels, although the same examinations are available for SWSEN as for other students, special arrangements in examinations may be made for some of them. The nature of these arrangements is determined according to the assessment needs of the individual student, but must not give him or her an unfair advantage over other students. Some may be awarded extra time to complete the assessment task, or may be permitted to take supervised breaks or rest periods during the examination. For visually impaired students, the visual presentation of the papers may be changed by, for example, the use of large print or simplified layout of the examination paper, or by the use of braille versions of the papers. Other candidates may have questions read to them; flashcards may be used to assist hearing-impaired candidates in mental arithmetic tests; or typewritten, word processed or transcribed responses may be accepted from students who are unable to write. Some candidates may also be allowed to take their examinations at a venue other than the examination centre, for example, at home or in hospital (see

In England, too, the ‘P Scales’, referred to in Chapter Eight, can also be employed to provide a means of assessing students with special educational needs for accountability and school improvement purposes, prior to them becoming eligible for assessment on national instruments. These P Scales have eight levels against which students’ progress can be mapped. However, Riddell et al. (2006) while recognising that P Scales are helpful for curriculum planning, noted that ‘whether they will be useful in terms of tracking and comparing the progress of pupils with special educational needs has yet to be fully assessed’ (p.5).

Scotland. According to Riddell et al. (2006), in Scotland there are ‘ongoing difficulties in devising a national system of assessment which is able to recognise the progress of all pupils’ (p.5). The Standard Grade system, they pointed out, is regarded as too difficult for some students with special educational needs, particularly those with significant difficulties in numeracy and literacy.

9.2 Adaptations, Modifications and Alternate Assessment

Geenen & Ysseldyke (1997) identified six types of accountability systems relating to the extent to which students with disabilities are included in assessment regimes:

Total inclusion. This type establishes a single set of standards, with one assessment programme for all students, including those with disabilities. At the time of writing [1997], two US states had developed portfolio-assessment programmes that covered all students.

Partial inclusion. Here there is one set of standards for all students, with alternate or modified standards for students with disabilities. Many states were adopting this arrangement.

Dual systems. This type involves two sets of standards: one for students without disabilities and another one for students with disabilities, the latter usually focussed on ‘functional’ objectives.

Multiple systems. Here there is one set of standards for students without disabilities and multiple sets of standards for those with disabilities, usually based on their disability category.

Total exclusion. In this type, students with disabilities are excluded from standard-setting efforts, state-wide assessments, and data-based reporting procedures. Usually, the IEP is seen as sufficient for accountability purposes, despite the difficulty in aggregating their outcomes.

System-based. This sets standards on a system rather than an individual basis. Here, students with disabilities ‘count’ in the overall statistics.

Research relating to one or more of the models as outlined by Geenen & Ysseldyke (1997) has been reported in the literature.

For example, in a paper by Defur (2002), the Virginia state assessment programme was outlined. This state employed the total inclusion model, albeit with accommodations/modifications/exemptions in parts of the tests for students with disabilities (the author pointed out that after her study, Virginia eliminated the use of total exemptions). It is interesting to note that 98 special education administrators in the state identified some intended and unintended consequences of this assessment policy. Among the intended consequences were (a) ‘some degree of benefit for students with disabilities’ - reported by 83% of the respondents, (b) ‘access to the general curriculum’ (73%), and (c) ‘improved daily performance by students with disabilities’ (but only 21% noted this) (p.206). There were also unintended, negative consequences of the policy. These included (a) higher rates of academic failure (reported by 51% of the administrators), (b) lower self-esteem among students with disabilities (50%), and (c) concerns that these students would experience higher drop-out rates (44%). As well, some were of the opinion that standards should be lowered (33%) and that accommodation options should be increased (37%). And, finally, 55% of the respondents expressed the belief that special education teachers were not adequately trained to assist students with disabilities to meet Virginia’s assessment standards.

In full inclusion assessment models, with no exemptions or accommodations permitted, there is a risk that ‘the accountability procedures may have the incidental effect of discouraging schools from taking on children who are likely to perform poorly in examinations, of encouraging schools to expel children whom they find difficult to teach, or of tempting schools to omit children with learning difficulties from testing programmes’ (OECD, 1999). As proof of this danger, OECD cited a study by Thurlow in 1997 in which it was found that two-thirds of students with disabilities in US schools had been excluded from a National Assessment of Educational Progress. Thus, ‘high stakes’ assessments, and associated ‘league tables’ can have the effects of jeopardising inclusive education (Dyson, 2005; Slee, 2005; McLaughlin & Jordan, 2005). As Watkins & D’Alessio (2009) pointed out, this risk can be exacerbated by the effects of international comparative studies of educational standards – most notably OECD’s PISA studies.

A second study, involving the partial inclusion model, was reported by Browder et al. (2004). Subject specialists and experts in severe disabilities from 31 US states were surveyed and interviewed regarding their views on the extent to which alternate assessment content was aligned with academic and functional curricula in maths and the language arts. The findings were quite mixed, with some states rated as having a high degree of alignment and some having missed the mark. The authors also noted that their results suggested that the alternate assessments included in their study had a strong focus on academic skills, but also reflected an approach that linked academic and functional skills, one which they referred to as ‘a blended curriculum approach’ (p.221). Browder et al. concluded with the recommendation that states should include both content area specialists and experts in severe disabilities in validating performance indicators used in alternate assessment. In another paper by the same authors (Browder et al., 2003), some lessons to be drawn from their research are outlined. These included the need to develop research into (a) ways of teaching students with severe disabilities the more advanced academic skills that were being expected under the US legislation, (b) the impact of alternate assessment in general, and (c) the optimal way of blending functional and academic curricular priorities, and hence assessment approaches. And, finally, they argued that ‘We also need to avoid a transformative approach in which academics become the replacement curriculum’ (p.179).

In a similar vein, Ford et al. (2001) posed some pertinent, albeit rhetorical, questions. Firstly, when a state develops separate standards for students with disabilities, is it suggesting there is no overlap between the 98% of the students included in the regular assessment and the 2% who are not? Secondly, when states elect to use identical standards for those participating in alternate assessment, ‘does this mean that all students should be held to the same set of standards – and that these are the only valued areas of learning?’ (p.215).

In another US study involving Geenen & Ysseldyke’s (1997) partial inclusion model, Ketterlin-Geller et al. (2007) investigated the consistency of test accommodations across 38 3rd grade students’ IEPs, teachers’ recommendations, and students’ performance data. They defined accommodations as representing ‘changes in the medium through which information is presented, the response formats, the external environment, or the timing of the testing situation that are designed to mediate the effects of a student’s disability that inhibit understanding or expression of domain-specific knowledge’ (p.194). They found significant differences among all three of the comparisons, i.e., students’ IEPs, teachers’ recommendations, and students’ performance data. For example, individual teachers often made accommodation decisions without support from the IEP team and there was little correspondence between the accommodations listed on IEPs and teacher recommendations. As Ketterlin-Geller observed, ‘IEPs were more likely to make errors of omission, whereas teachers were more apt to make errors of commission in recommending accommodations’ (p.203). With respect to the latter errors, the researchers commented that by making decisions without recognition of the IEP, teachers may be subverting the legal requirements and that this may significantly affect student success by withholding accommodations or by providing unnecessary accommodations. This, they concluded, compromises both students’ needs and the accountability systems set up to ensure that their needs are being met. ‘The current system’, they stated, ‘needs improvement’ (p.205).

In yet another US study, Karnoven & Huynh (2007) investigated the relationship between IEP characteristics and test scores on an alternate assessment instrument for students with significant cognitive disabilities. They found that whereas the curriculum emphasised in IEPs and alternate assessments were aligned for some students, for others they were not. They concluded that teachers of such students, who may have operated outside the general education curriculum for many years, ‘need professional development on state academic standards, alternate achievement standards, and curriculum design that goes beyond functional domains’ (p.291). As well, they argued that there is a need to create standards-based IEPs and that test developers must contribute to improving the curriculum-assessment link.

For other studies of alternate assessments and some attendant concerns, see papers by Browder et al. (2003); Crawford & Tindall (2006), Kohl et al. (2006), NAREM Associates, in cooperation with OECD (2005), Rabinowitz et al. (2008), Salend (2008), Thompson & Thurlow (2000), Turner et al. (2000), and Zatta & Pullin (2004).

In the US, the National Center on Educational Outcomes has published extensively on alternate assessment for students with significant cognitive disabilities (see Lazarus et al., 2010a and 2010b; Olson, et al., 2002; and Quenemoen et al., 2003). These documents are too lengthy to summarise here, but suffice to say they provide information on States’ accommodation policies on alternate assessments and guidelines for such assessments. Other useful guides to alternate assessment are to be found in the recently published book by Bolt & Roach (2009) and in publications from the US Department of Education, particularly those relating to its policy for including students with disabilities in standards-based assessment used in determining ‘adequate yearly progress’ (Technical Work Group on Including Students with Disabilities in Large Scale Assessments, 2006).

9.3 Some Definitions of Assessment Accommodations and Alternate Assessments

Basically, there are two types of adjustments to nation- or state-wide assessments.

Assessments with accommodations. This involves making changes to the assessment process, but not the essential content. Braden et al. (2001) described accommodations as alterations to the setting, timing, administration and types of responses in assessments. Here, assessors need to distinguish between accommodations necessary for students to access or express the intended learning content and the content itself.

Alternate assessments.  As defined by the US Department of Education (2003), alternate assessmentsare defined as assessments ‘designed for the small number of students with disabilities who are unable to participate in the regular State assessment, even with appropriate accommodations’ (p.68699). They refer to materials collected under several circumstances, including: teacher observations, samples of students’ work produced during regular classroom instruction, and standardised performance tasks. Further, alternate assessments should have:

  • a clearly defined structure,
  • guidelines for which students may participate,
  • clearly defined scoring criteria and procedures,
  • a report format that clearly communicates student performance in terms of the academic achievement standards defined by the State, and
  • high technical quality, including validity, reliability, accessibility, objectivity, which apply, as well, to regular State assessments.

Quenemoen et al. (2003) provided more detailed definitions and examples of the following alternate assessment approaches:

Portfolio: a collection of student work gathered to demonstrate student performance on specific skills and knowledge, generally linked to state content standards. Portfolio contents are individualized and may include wide ranging samples of student learning, including but not limited to actual student work, observations recorded by multiple persons on multiple occasions, test results, record reviews, or even video or audio records of student performance…

IEP-Linked Body of Evidence: Similar to a portfolio approach, this is a collection of student work demonstrating student achievement on standards-based IEP goals and objectives measured against predetermined scoring criteria…This evidence may meet dual purposes of documentation of IEP progress and the purpose of assessment.

Performance Assessment: Direct measures of student skills or knowledge, usually in a one-on-one assessment. These can be highly structured, requiring a teacher or test administrator to give students specific items or tasks similar to pencil/paper traditional tests, or it can be a more flexible item or task that can be adjusted based on student needs. For example, the teacher and the student may work through an assessment that uses manipulatives and the teacher observes whether the student is able to perform the assigned tasks….

Checklist: Lists of skills, reviewed by persons familiar with a student who observe or recall whether students are able to perform the skills and to what level. Scores reported are usually the number of skills that the student is able to successfully perform, and the settings and purposes where the skill was performed.

Traditional (pencil/paper or computer) test: Traditionally constructed items requiring student responses, typically with a correct and incorrect forced-choice answer format. These can be completed independently by groups of students with teacher supervision, or they can be administered in one-on-one assessment with teacher recording of answers.

For useful descriptions of alternate assessments for students with significant cognitive disabilities, see Perner (2007), who gave examples of various States’ methods, such as portfolio and performance-based assessments referred to above.

9.4 Formative Assessment

As might have become apparent in the foregoing, there is a tension between the need for schools to ascertain students’ level of achievement for accountability purposes and the need to take account of what is best educationally for SWSEN (Bauer, 2003). This distinction is sometimes referred to ‘assessment of learning’ (or summative assessment), compared with ‘assessment for learning’ (or formative assessment) (Harlen, 2007; Watkins & D’Alessio, 2009). If the purpose is to compare students against pre-determined standards, then the former is best suited; if the purpose is to improve learning, the latter should be used.

Mitchell (2008) has summarised the distinction between summative and formative assessment. Briefly, summative assessment is concerned with evaluating learners’ performances at the end of a module or a course. The results count towards making a final judgement on what the learners have achieved. Formative assessment evaluates students’ progress during a course or module so that they have opportunities to improve, and teachers to ‘fine tune’ their teaching. In its pure form, formative assessment does not contribute to the overall grade. However, sometimes assessment serves both summative and formative purposes. How one classifies the two types depends on the extent to which assessment leads to feedback that enables learners to improve their performances. The more it does this, the more justified is its classification as formative assessment.

There is evidence to suggest that formative assessment has a positive effect on learning outcomes for SWSEN. Three US studies will serve as examples of such research. Firstly, in an early meta-analysis of 21 studies of the effects of formative evaluation, an effect size of 0.70 was obtained. However, when formative evaluation was combined with positive reinforcement for improvement (i.e., feedback), the effect size was even higher at 1.12 (Fuchs & Fuchs, 1986). Secondly, a study using formative evaluation system with low-achieving students in a large urban school system resulted in significant gains in math achievement (Ysseldyke, 2001). Thirdly, there is evidence to show that teachers trained in formative assessment are more open to changing their instructional strategies to promote learners’ mastery of material (Bloom et al., 1992). Furthermore, it has been shown that without formative assessment, teachers’ perceptions of learners’ performances are often erroneous (Fuchs et al., 1984).

Finally, in a related vein, in recent years, the European Agency for Development in Special Needs Education has argued that assessment processes can either contribute to or hinder the process of inclusion (see various documents on the Agency’s website: Thus, it has focused on what it refers to as ‘inclusive assessment’, which it defines as:

an approach to assessment in mainstream settings where policy and practice are designed to promote the learning of all pupils as far as possible. The overall goal of inclusive assessment is that all assessment policies and procedures should support and enhance the successful inclusion and participation of all pupils vulnerable to exclusion, including those with SEN (Watkins, 2007, p.47).

Educational policy-makers, then, should optimise both the needs of the system and those of its students in determining assessment policies.

9.5 Functional Behavioural Assessment

In the US, a major variant of the IEP is the ‘Behavior Intervention Plan ‘(BIP), with its reliance on ‘Functional Behavior Assessment’ (FBA). BIPs came into force in the US with the 1997 reauthorisation of IDEA, and were reiterated in the 2004 IDEIA. As described by Killu (2008) and Etscheidt (2006), BIPs consider the relationship between student learning and any behaviour problems they manifest that may impede their classroom performance or that of other students. A point of distinction between IEPs and BIPs is that the latter must not only focus on individuals, but must also address school-wide issues that serve as contextual factors that may contribute to the behavioural problems (Killu, 2008).

In a review of FBA, 22 studies focused on learners with or at risk for emotional and behavioural disorders were reported. These studies comprised a mix of antecedent-based interventions, consequence-based procedures and a combination of the two interventions. Regardless of the type of intervention, 18 of the 22 studies showed positive results, with clear reductions of problem behaviours and/or increases of appropriate behaviours (Heckaman et al., 2000).

The principles of FBA are not limited to behaviour, but in recent years have been extended to learning difficulties as well (Daly & Martens, 1997; Jones & Wickstrom, 2002; Duhon et al., 2004).

9.6 Summary

  1. Increasingly, SWSEN, including those with significant cognitive disabilities, are being expected to participate in their countries’ national or state assessment regimes.
  2. High stakes’ assessments can have the effects of jeopardising inclusive education, a risk that can be exacerbated by the effects of international comparative studies of educational standards.
  3. In the US, legislation since IDEA 1997 does not allow SWSEN to be exempted from their states’ assessment programmes. Instead, educational authorities are required to provide alternate assessment for students who cannot participate in state or district assessments with or without accommodations. IEPs now must include a statement of any accommodations that are necessary to measure the academic achievement and functional performance of such students on state- and district-wide assessments.
  4. The main types of alternate assessments comprise portfolios, IEP-linked bodies of evidence, performance assessments, checklists and traditional paper and pencil tests.
  5. The assumptions underlying these provisions are twofold: (a) that higher expectations will lead to improved instructional programmes and (b) that these will lead in turn to higher student achievement.
  6. The requirements for all students to participate in state- and district-wide assessments have been shown in some research to have had unintended negative consequences for students with disabilities, including higher rates of academic failure, lower self-esteem, and concerns that they would experience higher drop-out rates.
  7. Countries or states should include both content area specialists and experts in severe disabilities in validating performance indicators used in alternate assessment.
  8. With the shift to all students being required to participate in their countries’ national or state assessment regimes, teachers of SWSEN will need professional development on their country’s or state’s academic standards, alternate achievement standards, and curriculum design that goes beyond functional domains.  
  9. Formative assessment has been associated with positive outcomes for SWSEN and with improvements in teachers’ perceptions of students’ performances.
  10. Functional assessment is increasingly being applied, not only to behaviour, but also to learning in general.
  11. In determining assessment policies, it is important to recognise and resolve as far as possible the tensions between measuring the health of the education system and protecting the interests of students with special educational needs. In other words, educational policy-makers should optimise both the needs of the system and those of its students in determining assessment policies.


  1. This chapter is mainly drawn from Mitchell et al. (2010) and Mitchell (2008).
  2. See Chapter Six, section 6.5.