The effectiveness of applied behaviour analysis interventions for people with Autism Spectrum Disorder

Publication Details

This systematic review considers the evidence for the effectiveness of interventions grounded in the principles of applied behaviour analysis for people with autism spectrum disorder.

Released on Education Counts: April 2010

Author(s): Marita Broadstock and Anne Lethaby, New Zealand Guidelines Group.

Date Published: 19 December 2008

Please consider the environment before printing the contents of this report.

This report is available as a download (please refer to the 'Downloads' inset box).  For links to related publications/ information that may be of interest please refer to the 'Where to Find Out More' inset box.


A systematic method of literature searching, selection and appraisal was employed in the preparation of this report, consistent with New Zealand Guidelines Group processes (New Zealand Guidelines Group 2001).


A literature search was undertaken in late June 2008 using the following bibliographic and guideline databases and websites:

Searches were restricted to English language material published between the years 1998 and 2007, inclusive.

Search terms/keywords were combined for autism [including autism/ aspergers syndrome/ autistic thinking/ pervasive developmental disorders/ kanner/], and intervention keywords, types and methods (eg, applied behaviour analysis, intervention programme, functional analysis or assessment, discrete trial training, prompting, modelling). Study design filters were applied to these results to identify randomised controlled trials, meta-analyses, systematic reviews, other comparative, observational studies, and evaluation or outcome studies.

Full details of search strategies are described in Appendix 1.

The literature identified was supplemented by an additional 151 publications identified in the RFT (Ministries of Health and Education 2007).  These were identified from those submissions which were relevant to the review made during the consultation process for the draft ASD Guideline, where these identified new citations.

Bibliographies of retrieved publications, and recent narrative reviews, were also examined to identify any additional eligible studies. It should be noted that narrative reviews retrieved for this purpose or to provide background material were not critically appraised for inclusion in the review.

Hand searching of journals, searching of sources of grey literature, and contacting of authors for unpublished research was not undertaken. However a small number of authors were contacted for methodological clarifications. 

Study Selection Criteria

Publication type

Studies published between 1998 – 2007 inclusive, in the English language, including primary (original) research published as full original reports and secondary research (systematic reviews and meta-analyses).

Participant characteristics

The study population were individuals with a clinical diagnosis of autism spectrum disorder from a relevant professional (e.g., psychologist, paediatrician) and/or as classified by standardised assessments (e.g., DSM-IV-R, ICD-10, ADOS, ADI), or where results are reported separately for this group.

Individuals of any age diagnosed with any of the following:

  • Autism;
  • Asperger syndrome;
  • Pervasive Developmental Disorder (PDD);
  • PDD Not Otherwise Specified (PDD NOS).

A range of other disorders may be diagnosed as co-occurring with autism or PDD including attention deficit/hyperactivity disorder (ADHD), intellectual disability, obsessive compulsive disorder, developmental disorders of motor function and, most commonly, specific and general learning problems. Studies involving participants who have a dual diagnosis were included. 

Individuals diagnosed with Rett’s Disorder or Childhood Degenerative Disorder were excluded.


Many interventions in this area offer a mix of approaches which commonly include features grounded in learning theory and reflecting ABA principles. However evaluations of interventions (or combined approaches) were only included in the current review where they were considered to be predominantly based on the principles of applied behaviour analysis and which were implemented for the purpose of treating individuals with ASD. 

Applied behaviour analysis is defined as an intervention in which the principles of learning theory are applied in a systematic and measurable manner to increase, reduce, maintain and/or generalise target behaviours. Interventions included those which were described as predominantly behavioural interventions or behavioural support or behavioural modification or behavioural treatment, or whose techniques were predominantly based on the use of well-established principles of ABA (eg, reinforcement, shaping, chaining, response prompting, stimulus control, prompting, modelling, token economy, punishment, contingencies, fading, discrimination training, generalisation, operant conditioning, establishing operations, functional assessment or functional analysis).


Usual care, another intervention or application of interventions (eg, intensity of intervention, eclectic approach).

Study design

Single case/subject experimental study designs (including “n of 1” studies, ABAB designs, alternate allocation, multiple baselines) were excluded. 

Systematic reviews and meta-analyses were eligible for appraisal where they reported on eligible interventions (solely or separately as a synthesised sub-group), had a clear review question and accessed at least two searching sources. Search sources needed to include one bibliographic database plus at least one of the following: another bibliographic database, reference checking of retrieved articles, google scholar/web of science to check antecedent or descendent citations or handsearching of a number of key journals.

Table 1: Designations of Levels of Evidence for Evaluating Intervention Studies 
Level of Evidence Study Design 
IA  systematic review of level II studies

RCT(s) of good quality


Pseudo-randomised controlled trial (eg, alternate allocation or other  method)

III-2A comparative study with concurrent controls
  • Non-randomised experimental trial
  • a controlled before-and-after study
  • an indirect  comparison of two RCTs (ie, A vs B and B vs C)
     - Cohort study
     - Case-control study
     - Interrupted time series with a control group
Comparative study without  concurrent controls:
  • Historical control study
  • Two or more single arm study 
  • Interrupted time series without a parallel control group
Case series with  either post-test or pre-test/post-test outcomes
National Health and Medical Research Council. (2008)

Study designs can be ranked in a hierarchy according to their “level of evidence” (National Health and Medical Research Council 2008), which reflects the effectiveness of the study design to answer a research question. Eligible study designs were limited initially to those that provide at least level III-3 level of evidence (see Table 1). That is, uncontrolled studies without a comparison group including case reports and case series (level IV evidence) were excluded. 

The precise ‘cut-off’ point in the hierarchy for study designs included was considered after other selection criteria had been applied. The goal was to identify evidence at higher levels of the hierarchy, consistent with a reasonable number of articles (see “study selection” section). 

Sample size

Small sample studies of six or more participants in either intervention or comparator arm. Studies where participants diagnosed with ASD (in either study arm) were five or fewer were excluded as a study quality criterion for group study designs. 


Studies using at least one standardised and/or quantitative outcome measure of, and analyses for, at least one of the following outcomes relating to effectiveness of relevant interventions:

  • social development and relating to others;
  • development of cognitive (thinking) skills;
  • development of functional and spontaneous communication which is used in natural environments;
  • engagement and flexibility in developmentally appropriate tasks and play;
  • engagement in vocational activities (as an adult);
  • development of fine and gross motor skills;
  • prevention of challenging behaviours and substitution with more appropriate and conventional behaviours;
  • development of independent organisational skills and other behaviours;
  • generalisation of abilities across multiple natural environments outside the treatment setting;
  • improvement in behaviours considered non-core ASD behaviours, such as sleep disturbance, self mutilation, aggression, attention and concentration problems;
  • maintenance of effects after conclusion of intervention.

Study exclusion criteria

Research papers were excluded if they:

  • were published prior to 1998, or after 2007 (however earlier primary studies may be reported in included systematic reviews);
  • were non-systematic reviews, letters, editorials, expert opinion articles, comments, case reports, book chapters, articles published only in abstract form, conference proceedings, correspondence, news items, unpublished work;
  • were not published in the English language;
  • reported on samples of five or fewer participants in either arm of the study (intervention of comparator);
  • were case series, case studies or uncontrolled studies;
  • were single case experimental study designs (except where considered and reported in included systematic reviews);
  • reported solely on people diagnosed with Rett’s Disorder or Childhood Degenerative Disorder;
  • were not deemed appropriate to the research question or nature of review, including:
    • (i) studies that assessed the effectiveness of interventions that were not predominantly grounded in the central features of an ABA approach;
    • (ii) comparisons of behavioural phenotypes;
    • (iii) studies describing cognitive concepts (eg, executive function, theory of mind);
    • (iv) screening, diagnosis and assessment studies;
    • (v) studies describing service provision (without evaluation);
    • (vi) studies describing the process of training and accreditation of ABA practitioners;
    • (vii) studies describing the general scientific method for assessment of studies;
    • (viii) litigation issues;
    • (ix) outcome measure or diagnostic test development;
  • assessed the following interventions: sensory integration therapies; auditory integration therapies; specifically developmentally based programmes; “TEACCH”; “floor time”; “Son-Rise”; “gentle teaching”; contact with animals (eg, dolphins, horse-back riding); facilitated communication; the Miller method; chelation therapy; 
  • reported solely on outcomes relevant to safety (without accompanying effectiveness data); the acceptability of, or ethical, economic or legal considerations associated with ABA interventions; or the impact of the intervention on persons other than those diagnosed with ASD.

Study Selection

Selection criteria were applied by a single reviewer to abstracts/titles identified by the search strategy to identify a subset of potentially eligible articles for retrieval as full text. Selection criteria were then applied to the full text articles by two reviewers to identify the final set of included papers for critical appraisal and inclusion in the evidence tables. Double coding processes (and therefore inter-rater reliability analyses) were not undertaken.

There is no clear consensus in the literature about what defines an intervention as ABA-based. Making decisions about whether an intervention was ABA-based was at times challenging as many interventions offer a mix of approaches which included features based on ABA principles. Whilst explicit criteria were employed as described under the study selection section, the judgement of whether an intervention was “predominantly ABA” therefore involved weighing the ABA component of the approach from the description given in the paper’s methods. Where there were doubts about study inclusion, reviewers consulted Professor Jeffrey Sigafoos who was provided with the full text papers, as required. 

Reasons for exclusion were coded hierarchically such that the first reason for exclusion that was reached was applied, even though multiple reasons for exclusion may apply. Reasons for exclusion were coded as follows:

  1. Wrong publication
  • including non-systematic/narrative reviews1, case reports, book chapters,
     animal studies, short notes, letters, editorials, conference abstracts, in vitro
  • studies not deemed appropriate to the research question or nature of the review;
  • single case experimental design studies
  1.  Wrong intervention 
  2. Wrong comparator 
  3. Wrong indication/population/setting 
  4. Wrong outcomes 
  5. Wrong study design (if a reasonable number of higher level evidence studies has been identified by the review).

The exclusion criteria for “wrong study design” was applied last, after exclusion criteria 1-5. Initially, studies were excluded at this stage if they were at level of evidence III-3 or below (studies without a parallel control group, or uncontrolled case series). Once selection criteria had been fully applied, the quantity and level of evidence included within the hierarchy of evidence (see Table 1) was considered. The goal was to include only the highest levels of evidence, as consistent with a reasonable number of studies. If a reasonable number of higher order study designs had not been identified as eligible for inclusion at that point, studies excluded for being the “wrong study design” (ie, level III-3) could be reconsidered for inclusion. Exceptions to this process were single case experimental design studies and case series or reports which were excluded preferentially as wrong publications (exclusion reason 1).

Appraisal Methodology

Levels of Evidence were applied to each included study so as to rank them according to a pre-determined “evidence hierarchy”. These rankings are based on the probability that the design of the study has reduced or eliminated the impact of bias on the results. We employed the NHMRC (2008) interim levels of evidence hierarchy (see Table 1). Systematic reviews of randomised controlled trials represent the highest level of evidence (level I) for studies of intervention effectiveness.

Whilst these evidence levels describe groups of research which are broadly associated with particular methodological limitations, these levels are only a general guide to quality. Each study may be designed and/or conducted with particular strengths and weaknesses which can be assessed using critical appraisal tools. In this review, included studies were formally appraised using the quality checklists from the Scottish Intercollegiate Guidelines Network as appropriate to study design, including those for systematic reviews, randomised controlled trials, and cohort studies (see The quality and resistance to risk of bias of an individual study was scored as either ++ (very good), + (good) or – (fair).

Presentation of Results

After critical appraisal of individual studies and assignment of a level of evidence, details of each study were entered in evidence tables. Results are presented in evidence tables and summarised in text and tabular form, where appropriate. 

The evidence and results tables for appraised studies were ranked and presented in order of level of evidence (higher level study designs reported first), and within each study design type, in reverse chronological order (most recent publications first), and where necessary, alphabatically by first author within each year.

Data Synthesis

It was not possible to perform a quantitative synthesis of the data retrieved because of the degree of heterogeneity of the populations and interventions studied and the lack of high quality RCTs in the topic area.

Data were summarised and synthesised narratively and in tables, as appropriate. Studies were grouped according to broadly similar interventions (eg, early intensive behavioural interventions), and where intervention programme intensity and mode of delivery (parent- or expert-directed) were controlled.

Studies were narratively synthesised to determine the strength of evidence. Strength of evidence is determined by three domains (West et al. 2002):

quality (of the individual studies predicated on the extent to which bias was minimised);

quantity (magnitude of effect, numbers of studies, and sample size or power); and

consistency (the extent to which similar findings are reported using similar and different study designs).

Limitations of the Review Methodology

A structured and systematic approach was employed in reviewing the literature. However, conclusions from systematic reviews are limited by the review’s methodology and the quality of the studies identified and included for appraisal. 

This review has been limited by the restriction to publications in the English language, which may result in study bias. However, reference checking of retrieved reviews did not identify any primary study that had been missed due to language restriction.

The review was limited to the published academic literature which may lead to publication bias and/or over estimation of the benefits of treatments; small-sampled studies are more likely to be published if they report ‘positive’ findings whereas larger studies tend to be published regardless of findings. As a minimum quality criterion, small-sampled group studies (n<6) in either arm were excluded.

The studies were initially selected by examining abstracts and so some studies may have been inappropriately excluded prior to examination of the full-text article. To minimise this possibility, where detail was lacking or ambiguous, papers were retrieved as full text. The expert consultant (Professor Sigafoos) was consulted where there were doubts about eligibility. As discussed under “study selection”, the judgements of this were at times challenging due to the lack of consensus about what interventions represent ABA in the research community, and it is recognised that some readers will come to different conclusions about which interventions are “predominantly based” in ABA. For example, whilst the “social stories” intervention has clear ABA components it was not considered to be “predominantly ABA” in this report. In response to the variation of viewpoints on this issue, results and related conclusions are organised by intervention type in this review.

Another challenge related to defining systematic reviews. The current review required that they reported on eligible interventions (solely or separately as a synthesised sub­group), had a clear review question and accessed at least two searching sources (including at least one bibliographic database). This approach may have excluded reviews due to poor reporting. It was not feasible in the timeframe to systematically contact all authors of potentially eligible review papers to identify such information. Alternatively, narrative reviews may have met eligibility criteria due to use of a systematic search and been included in the review, despite not having systematic identification, appraisal or synthesis processes. Therefore the selection criteria could be regarded as being overly inclusive. Study quality rated for included reviews is a good indicator of the degree to which they were systematic. 

Another decision point that affected selection of studies was around whether the review aims were relevant to the current topic scope. Several reviews were included which had a broader focus than ABA interventions, but they were included only if they reported on and synthesised at least a subset of ABA studies. A particularly problematic study was the Cochrane review by Diggle et al (2002) which met methodological selection criteria as a systematic review (SR) and was relevant to ABA. However it was decided that the study was not eligible for inclusion as its review aim was to consider parent-directed interventions for children with ASD (which may or may not include ABA based procedures). The review scope was therefore not a systematic attempt to identify and report on ABA interventions or a subset of such. The two RCTs in the Diggle review were included as primary studies in the current review as they separately met the inclusion criteria.

An inclusion criteria related to study participants having received a clinical diagnosis of autism spectrum disorder from a relevant professional and/or as classified by standardised assessments. It is noted that there is not complete agreement between classification systems (which have altered over time) or diagnostic methods for autism. The definitions used for ASD, where available, are reported in the Evidence Tables for ease of comparison. In a small number of studies, details were lacking about the diagnostic tools employed for identifying participants as eligible for inclusion in the study, however studies were only included where participants were stated as having received a clinical diagnosis for ASD.

As an additional check that papers were neither missed by the search strategy nor erroneously excluded based on abstract, cross-checking of references of retrieved papers, including those of a number of narrative reviews retrieved as background, was employed to identify additional potentially eligible articles.

The publication dates of papers eligible for inclusion were restricted to 1998 – 2007 inclusive. This date range was specified by the RFT (2007) so as to be consistent with the search period of the New Zealand ASD Guideline (Ministries of Health and Education 2008). It is recognised that investigations of ABA interventions precede these dates; however using the included systematic reviews in the current report would be expected to incorporate that literature. It is recognised that this is an active area of research and that the results and conclusions will need to be revisited in the future in order to incorporate new research developments.

Data extraction, critical appraisal and report draft preparation were performed by two reviewers over a limited timeframe (July to early October, 2008). Double-coding and inter-rater reliability analyses were not conducted. For a detailed description of interventions, methodology, measurement, and analyses in the studies appraised, the reader is referred to the original papers cited.

This review has benefited from comments provided by the expert consultant (Professor Sigafoos), NZGG Research Services Manager Dr Jessica Berentson-Shaw, and two double-blind external peer reviewers contracted by the Ministry of Education.



  1. Some of these studies could be retrieved as background material or to assist in identifying additional eligible studies, but were still coded as ineligible in the report’s appendix.