Technical review of published research on applied behaviour analysis interventions for people with Autism Spectrum Disorder

Publication Details

New Zealand Ministries of Education and of Health requested a technical review of the evidence base on the effectiveness of Applied Behaviour Analysis (ABA) for people with Autism Spectrum Disorders (ASD).

Released on Education Counts: April 2010

Author(s): Oliver Mudford, Neville Blampied, Katrina Phillips, Dave Harper, Mary Foster, John Church, Maree Hunt, Jane Prochnow, Dennis Rose, Angela Arnold-Saritepe, Heather Peters, Celia Lie, Katrina Jeffrey, Eric Messick, Catherine Sumpter, James McEwan and Susan Wilczynski (2009), Auckland UniServices Limited.

Date Published: 15 January 2009 - Revised 16 January 2009

Please consider the environment before printing the contents of this report.

This report is available as a download (please refer to the 'Downloads' inset box).  For links to related publications/ information that may be of interest please refer to the 'Where to Find Out More' inset box.

Review method

Literature search procedure

A detailed description of search and article inclusion and exclusion criteria is included in Appendix B. Briefly:

  1. A comprehensive search for the ASD treatment literature was conducted; 
  2. Titles of articles were examined to apply exclusion criteria; 
  3. Abstracts were scrutinised and exclusion criteria applied; 
  4. Articles excluded from this point are noted in Appendix D3, along with the reason for exclusion; 
  5. References remaining were compared with NSP’s lists of included and excluded articles; 
  6. Three references were retained from a list provided by the Ministry of Education; 
  7. A New Zealand-unique list of references was retained that appeared to report research on ABA for ASD; 
  8. One hundred and twenty-nine references were sent to NSP that had been neither included nor excluded by them at that time;
  9. Original articles from that list were examined and further exclusions were made; 
  10. One hundred and twelve articles were reviewed by members of our review team.

New Zealand reviewers’ methods

NZ-unique articles that reported on comprehensive early intensive behavioural intervention programmes (N=8) were coded partially in NZ, and completed by NSP reviewers with expertise with reviewing this type of research report using NSP criteria. NZ-unique articles that reported on focussed interventions (N=112) were coded using NSP methods by New Zealand reviewers. It can be noted here that nine further articles were excluded because, on closer examination by New Zealand or NSP reviewers, they were found not to have met inclusion criteria.

Reviewer training had been conducted for six of the 13 New Zealand reviewers early in 2008 by NSP. This involved using the National Autism Center’s Coding Manual for Focussed Interventions for the NSP (2007) to code one or more training articles onto an interactive standard coding form which, when complete, was electronically transmitted to the NSP’s data-analysis team in the US. Reviewers were considered trained provided their coding exceeded 80% agreement with a pre-established criterion.

Despite this initial training, feedback from reviewers to the technical manager of the New Zealand team indicated that reviewers found the NSP coding forms difficult to use and prone to creating errors. All the New Zealand reviewers were then retrained to enter their data directly into locally designed evidence tables (customised Excel spreadsheets), which were later merged. Retraining was conducted in Auckland, Hamilton and Dunedin one-on-one by two University of Auckland team members. Each reviewer and trainer spent two hours working together, coding three or four articles from the reviewer’s allocation according to the NSP coding manual and applying the Scientific Merit Rating Scale (SMRS) scoring criteria.

The 112 articles were distributed quasi-randomly but equally among NZ reviewers. Fourteen duplicates were also distributed for the purpose of establishing reliability (assessed as inter-observer agreement, explained below) among the New Zealand review team. In most cases articles were distributed as .pdf versions of the original article, although some articles that were available only from paper journals were distributed among University of Auckland reviewers. When each reviewer had completed their allocated reviews, their evidence tables were checked for obvious errors before being collated into a single evidence table.

Inter-observer agreement

Every reviewer had coded a randomly allocated article also reviewed by another reviewer, so all reviewers’ coding was subjected to at least two reliability checks. Interobserver agreement was calculated for every variable using procedures common in ABA research. Exact agreement percentage was used for some variables, and mean smaller/larger calculations for others. See Table 1 for variables, calculation methods, and results concerning interobserver agreement.

Table 1: Interobserver agreement among New Zealand reviewers’ codings of original research articles.
 VariableMethod% Agreement

Participant demographics

N with ASD
Diagnostic categories

Exact agreement


Dependent variable 

NSP category


Scientific merit SMRS scoreMean S/L89 
Effects of treatmentMain benefits Exact agreement86 
Generalisation effects 68
Maintenance effects64 

Interobserver agreement exceeded the conventional 80% criterion for “acceptable” for 10 of the 12 variables in Table 1. The problematic variables concerned whether generalisation and maintenance effects had been demonstrated with agreement values of 68% and 64% respectively.

Databases (evidence files) from NSP and New Zealand reviewers’ codings

Many articles reviewed reported more than one independent variable and/or dependent variable and/or two or more studies within the article. Every variation can be viewed as a discrete study, or research “item”, where the simplest case (or item) is one study with one dependent variable and one independent variable. Each article was scored as many times as there were variations within it; therefore, the Excel evidence files from both the New Zealand database and the NSP database have more rows (items) than articles. Henceforth, numbers of items rather than numbers of articles will be reported.

The vast majority of behavioural intervention studies reported using a range of behavioural procedures that have been found individually to be empirically-supported in previous more experimental research. Interventions employing multiple strategies can be called “treatment packages”, “multi-component behavioural interventions” or “behavioural intervention packages”. For brevity, NSP use the term “package” and their reviewers noted the predominant features of the package as identified by the authors of the relevant research articles. NSP’s description of the packages is copied in Table 2. It can be noted here that NSP did not consider “Social Stories” packages as being in the realm of ABA. Consequently they did not provide review data for us to include review of that intervention approach.

Table 2: NSP’s categorisation of behavioural interventions (© Copyright National Autism Center, 2008.)
PackageNSP description
AntecedentThese interventions involve the modification of situational events prior to the occurrence of a target behavior as a means of reducing the likelihood an individual will face difficulties in the future. Examples include but are not restricted to: habit reversal, noncontingent access, incorporating echolalia or ritualistic/obsessional activities into tasks, prompting procedures, Power Card strategy, presence/absence of others, maintenance interspersal, choice, behavioral momentum, and varied task difficulty.
BehaviouralThese interventions are designed to reduce problem behavior and teach functional alternative behaviors or skills through the application of basic principles of behavior change. Treatments falling into this category emanate from the fields of applied behavior analysis and positive behavior supports. Examples include but are not restricted to: environmental arrangement, shaping, chaining, task analysis, discrete trial teaching, and reinforcement.  
Early IBI
[intensive behavioural intervention]
This treatment reflects research from comprehensive treatment programs that involve a combination of applied behavior analytic procedures (e.g., discrete trial, incidental teaching, etc.) which is delivered to young children (generally under the age of 8). These treatments may be delivered in a variety of settings (e.g., home, self-contained classroom, inclusive classroom, community), involve a low student-to-teacher ratio (e.g., 1:1). All of the studies falling into this category met the strict criteria of: (a) targeting the defining symptoms of ASD, (b) have treatment manuals, (c) providing treatment with a high degree of intensity, and (d) measuring the overall effectiveness of the program [i.e., studies that measure subcomponents of the program are listed elsewhere in this report]. These treatment programs may also be referred to as ABA programs or behavioral inclusive classrooms.
ExposureThese interventions require that the individual with ASD increasingly face anxiety-provoking situations while preventing the use of maladaptive strategies used in the past under these conditions.
[functional communication training]
These interventions involve substituting an appropriate method of communicating in lieu of a maladaptive strategy used in the past.
Joint attention
(e.g., behavioural definitions from Rocha et al., 2007)
“Coordinated joint attention was defined as the child is actively involved with a person and object and alternates gaze between the adult and an object. Joint attention responding occurred when the child responded appropriately and without prompting to the joint attention bid of another person within 3 sec (i.e. engages with object). Joint attention initiation occurred when an adult initiated with the child to communicate about an object (i.e., initiated towards the child with an object by placing the child's hand on the object, tapping the object, showing the object, or gaze shifting towards an object with or without a point).”
ModellingThese interventions rely on demonstrations of the target behavior that should result in an imitation of the target behavior by the individual with ASD. Examples include live and video modeling.
Naturalistic teachingThese interventions involve using primarily child-directed interactions to teach functional skills in the natural environment. Examples of this type of approach include but are not limited to, focused stimulation, incidental teaching, milieu teaching, embedded teaching, and responsive education and prelinguistic milieu teaching.
Peer trainingThese interventions involve teaching children without disabilities strategies for facilitating their play and interactions with children on the autism spectrum.
[picture exchange communication system]
This treatment involves the application of a specific augmentative and alternative communication system based on behavioral principles that is designed to teach functional communication to children with limited verbal and/or communication skills.                         
[pivotal response training]
Pivotal Response Training focuses on targeting 'pivotal' behavioral areas—such as motivation to engage in social communication, self-initiation, self-management, and responsiveness to multiple cues, with the development of these areas having the goal of very widespread and fluently integrated collateral improvements. Key aspects of PRT intervention delivery also focus on parent involvement in the intervention delivery, and on intervention in the natural environment such as homes and schools with the goal of producing naturalized behavioral improvements. This treatment is an expansion of Natural Language Paradigm, one of the naturalistic teaching strategies.
ReductiveThese interventions rely on strategies designed to reduce problem behaviors in the absence of increasing alternate appropriate behaviors. Examples include but are not restricted to: water mist, behavior chain interruption, protective equipment, ammonia.    
SchedulesThese interventions involve the presentation of a task list that communicates a series of activities or the steps required to complete a specific activity. Schedules can take several forms including written words, pictures, or photographs.
ScriptingThese interventions involve developing a verbal and/or written script about a specific skill or situation which serves as a model for the child with ASD. Scripts are usually practised repeatedly before the skill is used in the actual situation.
Self-managementThese interventions involve promoting independence by teaching individuals with ASD to regulate their behavior by recording the occurrence/nonoccurrence of the target behavior and securing reinforcement for doing so.
Social skillsThese interventions seek to build social interaction skills in children with ASD by targeting basic (e.g., eye contact, name response) to complex (e.g., how to initiate or maintain a conversation) social skills.
Verbal behaviourThese interventions are based on Skinner’s book ‘Verbal Behavior’ and the principles of applied behavior analysis to guide teaching interactions. Interventions included here are much broader than the single approach sometimes referred to as ‘Verbal Behavior Analysis’ or ‘Analysis of Verbal Behavior.’ Examples include but are not restricted to use of multiple discriminative stimuli, intraverbal training, mand training, mand-model training, matrix training, and tact training.

The reviewers’ codings of research articles was further categorised in the NSP database according to the behavioural skill deficits or behavioural excesses targeted for amelioration. Table 3 provides an abbreviated description of the categories.

Table 3: NSP’s categorisation of behaviours targeted for change (i.e., dependent variables) in research articles reporting on behaviour analytic interventions. © Copyright National Autism Center, 2008
Target categoryAbbreviated Description
Skills increased
AcademicThis category represents tasks that are precursors to or required in order to succeed with school activities. Dependent measures associated with these tasks include but are not restricted to preschool activities (e.g., sequencing, color, letter, number identification, etc.), fluency, latency, reading, writing, mathematics, science, history or skills required to study or perform well on exams.
CommunicationThe communication tasks involve verbal or nonverbal signaling to a social partner regarding content of sharing of experiences, emotions, information, or affecting the partner’s behavior and behaviors that involve understanding a partner’s intentional signals for the same purposes. This systematic means of communication involves the use of sounds or symbols. Dependent measures associated with these tasks include but are not restricted to requesting, labeling, receptive, conversation, greetings, nonverbal, expressive, syntax, speech, articulation, discourse, vocabulary, and pragmatics.
Higher Cognitive FunctionsThese tasks require complex problem-solving skills outside the social domain. Dependent measures associated with these tasks include but are not restricted to critical thinking, IQ, problem-solving, working memory, executive functions, organizational skills, and theory of mind tasks.
InterpersonalThe tasks comprising this category require social interaction with one or more individuals. Dependent measures associated with these tasks include but are not limited to joint attention, friendship, social and pretend play, social skills, social engagement, social problem-solving, and appropriate participation in group activities. The area of pragmatics is not included in this list because it is addressed in the communication section.
Learning readinessLearning readiness tasks serve as the foundation for successful mastery of complex skills in other domains identified. Dependent measures associated with these tasks include but are not restricted to imitation, following instructions, sitting skills, or attending to environmental sounds.
Personal responsibilityThis category targets tasks that involve activities which are embedded in everyday routines. Dependent measures associated with these tasks include but are not restricted to feeding, sleeping, dressing, toileting, motor skills, cleaning, family and/or community activities, health and fitness, phone skills, time and money management, and self advocacy.
PlacementThe dependent measure involves level of placement in school, home, or community settings. Examples include but are not restricted to: (a) placement in general education classroom, (b) placement back into the home setting.
PlayTasks that involve non-academic and non-work related activities that do not involve self-stimulatory behavior or require interaction with other persons. Dependent measures associated with these tasks may include but are not restricted to: functional independent play (i.e., manipulation of toys to determine how they ‘work’ or appropriate use of toys, games). Whenever social play is targeted (independently or in conjunction with make believe play), it is best to provide the ‘interpersonal’ code. Each of these descriptors may be further broken down into subcomponents, which may serve as the dependent variable.
Self-regulationTasks that involve the management of one’s own behaviors in order to meet a goal. Dependent measures associated with these tasks include but are not limited to: persistence, effort, task fluency, transfer of attention, being ‘on schedule,’ self-management, self-monitoring, self-advocacy, remaining in seat (or its opposite of ‘out of seat’), time management, or adapting to changes in the environment.
Vocational (Wilczynski & Christian, 2008)The tasks in this category are those required to execute semi-independent or independent work. Dependent measures associated with these tasks may include but are not restricted to using a timecard, computer skills, monitoring work quality, accepting feedback, safety in the workplace, securing assistance or requesting a break in the workplace (do not code in communication), adhering to dress code.
Behaviours decreased
ProblemsThese behaviors can harm the individual or others OR result in damage to objects OR interfere with the expected routines in the community. Problem behaviors may include but are not restricted to: self-injury, aggression, disruption, destruction of property, hazardous, or sexually inappropriate behaviors.
Restrictive/repetitiveThis category is reserved for limited, frequently repeated, maladaptive patterns of motor, speech, and thoughts. The following is a list of representative behaviors: stereotypic and compulsive behaviors, inappropriate speech, or restricted interest.
Sensory/emotionalSensory and emotional regulation involves the extent to which an individual can flexibly modify his or her level of arousal or response to function effectively in the environment. Examples of behaviors that fall into this category include: stimulus refusal, sleep disturbance, anxiety, and depression.

The New Zealand reviewers’ evidence table contained 169 items. NSP had expanded its own review following receipt of our original NZ-unique list to include 81 of the items we had reviewed already. The references for all articles that were identified by our literature searches and that had already been, or were subsequently, reviewed by NSP are contained in Appendix D2. The other 88 items, from 57 original research articles, remained as unique to the New Zealand database.

NSP provided us with sections of their draft database that included their pre-publication results4. The extent of data to which we had access is shown in Table 4. The overall strength of evidence, as rated by the Strength of Evidence Classification System (SECS), for the intervention packages was provided to us. The packages were defined in Table 2. The number of items related to each package is shown in Table 4, with the number of items with a Scientific Merit Rating Scale (SMRS) composite score of ≥ 2.0 shown in brackets. Also, NSP provided the Strength of Evidence (SECS) rating concerning research on each of the intervention packages for every target category (Table 3). As well, we had SECS ratings for intervention packages for three diagnostic categories (Autistic Disorder, Asperger’s Syndrome, and Pervasive Developmental Disorder) and for the following age ranges (0-3, 4-5, 6-9, 10-14, 15-18, 19-21 years). For example, we had access to the SECS rating for antecedent packages overall, antecedent packages for academic skills, antecedent packages for children aged 0-3, and antecedent packages for research participants with an AD diagnosis.

Table 4: Sub-divisions of the NSP database as provided for this report [SMRS = Scientific Merit Rating Scale].
Intervention PackageNumber of items (SMRS ≥ 2.0)
Antecedent221 (85)
Behavioural398 (162)
Early IBI154 (133)
Exposure8 (7)
FCT65 (26)
Joint attention22 (22)
Modelling80 (62)
Naturalistic teaching62 (45)
Peer training71 (44)
PECS45 (27)
PRT20 (18)
Reductive56 (18)
Schedules14 (9)
Scripting14 (13)
Self-management28 (18)
Social skills48 (23)
Verbal behaviour26 (15)
Total1332 (727)

Database of completed reviews

It should be noted that 1332 items were located by NSP reviewers whose literature search was not constrained to the same 10-year block as ours (1998-2007): 823 of the NSP items (i.e., 62%) were published in those years. With the 88 NZ-unique items, the total number of items in the databases which we review in the results sections sub-headed “Evidence from studies published from 1998-2007” was 911.

Final item exclusion criterion introduced. The SECS does not take account of articles (or items) with a SMRS score of < 2.0. Consequently, we report only on reviews of items which scored 2.0 or greater on the SMRS. Applying this last inclusion criterion, 45 items from the original NZ-unique list were retained and the rest moved to our exclusion list. For the same reason, we report on the findings from only 463 items from the NSP database. Hence the database for items from 1998-2007 with SMRS≥2.0 was 508.

See Figure 10 for flow chart showing origin of numbers of items in the final database.

NSP/NZ inter-site agreement between reviewers

We used the data from duplicated reviews to calculate interobserver agreement between NSP and New Zealand reviewers. The duplications occurred because NSP chose to review independently 55 articles (81 items) from our original NZ-unique list. Agreement on composite SMRS score between NSP and New Zealand reviewers was calculated for the first 70 items for which we had both datasets. Allowing 0.5 variation in SMRS scores between reviewers, inter-site agreement was 84%. NSP SMRS scores were higher on 59% of items where there was a difference in scores between sites. For only three items (4% of items) there was a difference >1 SMRS score. Hence, agreement between sites was satisfactory and not notably biased by either site’s particular methods of scoring. This is an encouraging finding since New Zealand reviewers’ SMRS data were scored directly into evidence tables whereas NSP review data was scored to electronic forms and then converted into their evidence tables.

Agreement on rating of main effect, i.e., whether the item showed beneficial, unknown, ineffective, or adverse effects for participants with ASD, was calculated between sites for the same 70 items that had been reviewed independently by both teams. Agreement on category (e.g., both teams reported “beneficial”) was 74%. All cases of disagreement were between the categories of beneficial and unknown. For all but one of the items with disagreement, the NSP reviewers were more conservative in their ratings. It is for this reason that we decided to report NSP findings for items that both teams of reviewers rated.

Figure 10: Flow Chart showing Origins of ABA research items in databases 
Image of Figure 10. Flow chart showing origins of ABA research items in databases.


  1. We have permission from NSP to review their pre-publication results. We did not seek NSP’s consent to pre-empt their publication by presenting every detail of their results.