Education Counts

Page navigation links

  • Education Counts Logo
  • Skip to Primary Navigation Menu
  • Skip to Secondary Navigation Menu
  • Skip to search
  • Skip to content

Site Search

Site Search

Site navigation menus

  • Know your region
  • Find your nearest school
  • Early learning services
  • Directories
  • Publications
  • Statistics
  • Topics
  • Data Services

Search the education counts website

Find pages with

Narrow results by:

TIMSS 1994: Performance assessment in TIMSS: New Zealand results Publications

Publication Details

The aim of this report is to present the tasks, and indicate the performance expectations for which measures were sought, with New Zealand and mean international success rates, and comment where this seems desirable. Direct comparisons between New Zealand success rates and those of selected countries will be made for some items.

Author(s): Robert Garden

Date Published: November 1997

Executive Summary

Why Performance Assessment?

Results from large-scale surveys of achievement have sometimes been criticised because multiple-choice has been the only item-type used in the tests. Tests based on multiple-choice items allow wide coverage of curricula and give reliable measures of achievement, but there are some skills and procedures taught in schools which are thought to be best measured by having students write their own answers to questions, or carry out tasks, which allow students to demonstrate whether or not they have learned these skills and procedures and are able to apply them. Tests for the Third International Mathematics and Science Study (TIMSS) therefore included a range of assessment methods — multiple-choice items, free-response items (short-answer and extended-response), and performance assessment tasks.

What is Performance Assessment?

All achievement test items, including multiple-choice and free-response items, assess student performance. "Performance assessment" is the term most often used in the literature for assessment tasks in which students are required to carry out "hands-on" activities with equipment to show how well they are able to apply strategies and procedures to investigate and solve problems in practical settings. Other terms used in the literature include "alternative assessment", "practical assessment", and "authentic assessment". In TIMSS, tasks consisted of a number of sub-tasks, each of which was assessed. These sub-tasks are referred to as items. They range from demonstrating knowledge or skills needed to carry out the task, to complex outcomes such as describing, or critiquing, an experimental plan.

Who Took Part?

The TIMSS written tests in mathematics and science were administered in more than 40 countries, but inability to raise funding, and lack of interest in performance assessment in some centres, resulted in only about half of these countries administering the performance assessment tasks.

Countries

The following countries participated in the performance assessment component of TIMSS:

Population 1 Population 2
Australia Australia Portugal
Canada Canada Romania
Cyprus Cyprus Scotland
Colombia Czech Republic Singapore
Hong Kong England Slovenia
Iran, Islamic Rep. Hong Kong Spain
Israel Iran, Islamic Rep. Sweden
New Zealand Israel Switzerland
Portugal Netherlands United States
Slovenia New Zealand  
United States Norway  

Students

Three populations took written tests, but the most senior of these, students in their final year of schooling, did not take part in the performance assessment component of TIMSS. The two other TIMSS populations were:

Population 1: Students in the two adjacent grades with the largest proportion of 9-year-olds at the time of testing (standards 2 and 3 in New Zealand ); and

Population 2: Students in the two adjacent grades with the largest proportion of 13-year-olds at the time of testing (forms 2 and 3 in New Zealand ).

Written tests were primarily targeted at the upper class level in each of these populations, and it was to samples of students drawn from these levels, i.e. standard 3 and form 3, that the performance tasks were administered. Names for these levels vary across countries, but in this report they are referred to as standard 3 and form 3 levels respectively.

Both teachers and students in the participating countries showed great interest in the exercise. Students in many places were reported to have been enthusiastic about taking part. Most were said by administrators to enjoy attempting the tasks, even if not always successful. However, it has to be recognised that this may not have been the case if TIMSS had been a high stakes assessment for the students.

This Report

The aim of this report is to present the tasks, and indicate the performance expectations for which measures were sought, with New Zealand and mean international success rates, and comment where this seems desirable. Direct comparisons between New Zealand success rates and those of selected countries will be made for some items.Aggregate scores for some tasks vary slightly (of the order of 1 or 2 percent) from those quoted in earlier national reports because of weighting or, in the case of international means, different methods of calculation. In a very few cases the differences are greater because, when all the data had been received, TIMSS management at the international centre judged it desirable to collapse some codes.

International Reports

Readers interested in more detailed descriptions of the development of the Performance Assessment component of TIMSS, in technical aspects of the data analysis, and comparative data for all participating countries, will find them in the TIMSS Technical Report (Martin & Kelly, 1996) and the international report of the Performance Assessment (Harmon et al, 1997).

The Challenges

Performance assessment tasks were included in TIMSS for two reasons. First, there was a desire to measure achievement in as many mathematics and science curriculum objectives as feasible, in order to increase the validity of the assessment. Second, studies of performance assessment in the past decade have given rise to questions about its feasibility for use in large-scale surveys, and whether inclusion of performance assessment tasks provides useful information not provided by traditional written tests. TIMSS provided an opportunity to investigate these, and other, research questions.

In assessing aspects of student achievement using performance assessment, the cost of what is seen as enhanced validity is lower reliability than is usually attained with traditional pencil and paper tests (Moss, 1992), unless large numbers of tasks are used (Shavelson et al, 1993). This can be accepted so long as the reliability does not fall to a level where measures are so unreliable that all validity is lost. If different raters cannot agree on whether or not a student has completed a task, or an item within a task, successfully, or if individuals give inconsistent ratings, the measures cannot be valid for any purpose.

The challenge for TIMSS was therefore to produce tasks which would give measures of achievement of curricular objectives which experts from the participating countries would agree were valid for this purpose, and which were sufficiently reliable to allow comparisons between country means and, in some cases, between groups within countries. In essence each task had to be testing the same things in the same way, and under comparable conditions, whether in New Zealand , Hong Kong , Norway , Iran , or any other participating country.

In addition, because a student's achievement as measured by performance tasks tends to be very dependent on the particular tasks (Linn & Burton, 1994), it was necessary to have as many students attempt as many tasks as possible. On the other hand, the financial cost per student of performance testing is high and the maximum time available to administer the tasks was 90 minutes.

Cost limited the study in several ways. Of the countries taking a full part in TIMSS at the form 3 level, only 21 secured funding to administer the performance assessment component of TIMSS and, of these countries, only 10 also did so at the standard 3 level. Cost was one factor that ruled out tasks requiring more elaborate equipment and, in some countries, travel costs meant that remote schools could not be included in the performance assessment sub-sample.

The Solutions

Standardisation of tasks and procedures across countries, and across settings within countries, was accomplished through the following actions:

  • trialling of about twice as many tasks as required so that those selected required equipment and materials that were widely available or easily replicated;
  • analysis of trial data and information supplied by national research coordinators from each country to identify and reject tasks or sub-tasks affected by differing geographic, climatic, or cultural conditions;
  • development of a manual for test administrators, setting out in detail the procedures to be followed in preparing for and administering the performance assessment, including specifications for equipment and materials to be used (TIMSS, 1994a);
  • provision of a manual, with exemplars, detailing how coding was to be carried out (TIMSS, 1994b & 1995);
  • provision of training in administration of the assessment, and training in coding the student data.


The need to maximize the number of tasks and students, yet to keep costs within reasonable limits, gave rise to a design which involved:

  • a form of multiple-matrix sampling in which each student attempted either three or four of the 12 tasks in the main survey;
  • rotation of tasks amongst students by a scheme which simultaneously keeps error within acceptable bounds and provides data in a form which allows key analyses addressing research questions to be carried out;
  • data based on student responses being provided in written form, rather than on observation by trained observers.

Selecting the Tasks

Tasks were collected from several of the TIMSS national centres, and from research and evaluation agencies. Of these tasks, 22 were selected for trialling. Nineteen countries trialled the tasks, under standardised conditions, with samples of students. Following the trials, committees of subject-matter specialists, performance assessment administrators, and national research coordinators in each country reviewed and evaluated the tasks. Reports from these committees and data from the field trials were used by the TIMSS Performance Assessment Committee in selecting the 12 tasks required for the main survey.

Trial tasks, or sub-tasks (items), were rejected if they had proved too difficult for students, received low quality ratings from subject-matter experts, or if problems in administration had been encountered. Some were rejected because students could not complete them in time, standardisation of some equipment was difficult, and for some proposed tasks differing climatic conditions (such as humidity) affected the materials or equipment differently in various geographic regions.

From the remaining tasks, 12 were selected. This set included some which were judged to need 30 minutes for completion, and some to need 15 minutes. The investigations, and problems to be solved, balanced science and mathematics content, and represented a range of topic, skill, and procedure areas. Several complete tasks, and a number of sub-tasks, were identical for both populations tested (standard 3 and form 3 in New Zealand ).

The Design

One of the research issues to be addressed was the question of whether information about student achievement collected by means of performance assessment differed from that collected by traditional pencil and paper tests. The sample of students selected to participate in each country was therefore a sub-sample of the students who had completed the written TIMSS tests and questionnaires a few days earlier. This will allow each student's performance assessment data to be associated with the written test data, as well as the student, teacher, and school background data collected for each student.

In selecting the national sub-samples for performance assessment, national centres were permitted to exclude schools which had less than nine students in the target class, and schools which were so remote that it would have been too expensive to send a trained administrator to them. Such exclusions were to be kept to a minimum and, for most countries, the potential for bias (so far as national representativeness of the samples is concerned) was considered to be offset by maintenance of a high quality of project management, and the various quality control measures in place.

In New Zealand , the exclusion rate at standard 3 level was high (27%), partly because of the high proportion of very small schools and partly because of the remoteness factor. The potential for bias in achievement measures from this source is obvious, but comparison between the written test means for rural and urban standard 3 students revealed no significant difference so it is a reasonable assumption that bias in the performance measures, if it exists at all, is very small (see ).

Table 1.1: Comparison of means for rural and urban students
Note:
  1. Source: Garden (1996a, 1996b, 1997).
Class LevelStudentMathematics
Mean %
Science
Mean %
Form 3: Rural 52 57
  Urban 54 58
Standard 3: Rural 55 62
  Urban 53 60


A direct comparison of TIMSS written test mean achievement of the performance assessment sub-samples with the sub-samples not selected for performance assessment () indicates that no significant bias in overall mean achievement occurred, and it is very unlikely that achievement distributions for the performance assessment sub-samples were skewed with respect to the respective populations.

Table 1.2: Comparisons of sub-sample written test means
Class LevelSub-SampleMaths
Mean %
Science
Mean %
Standard 3 Performance Assessment  Girls 54 60
  Non-Performance Assessment Girls 54 62
  Performance Assessment Boys 54 61
  Non-Performance Assessment Boys 52 59
Form 3 Performance Assessment Girls 53 52
  Non-Performance Assessment Girls 51 50
  Performance Assessment Boys 52 53
  Non-Performance Assessment Boys 53 54


The time available for testing (90 minutes) allowed each student to complete three 30-minute tasks, or two 30-minute tasks and two 15-minute tasks. Tasks were grouped into nine 90-minute sets, and for administration each of these sets was arranged at a "station". Clusters of nine students took part simultaneously and changed stations at 30-minute intervals, so that each student attempted three or four tasks. Stations to be visited by each student were pre-allocated at national level to ensure that each task was attempted by approximately equal numbers of randomly selected students, and so that the allocation of stations to students was also random.

Six hundred and thirteen students from 50 standard 3 classes in New Zealand schools, and 824 form 3 students from 49 classes in New Zealand schools, participated in the both the performance assessment and written test components of TIMSS. This meant that approximately 205 standard 3 students and approximately 270 form 3 students attempted each task at the respective levels. This was a greater number than in most countries because where possible in New Zealand two clusters of nine students per class were taken, whereas other national centres commonly selected only one cluster of nine per class. Only Canada had larger samples.

Administration

Ideally, each task needed to be identical from school to school, and from country to country. Similarly, interactions between students attempting the tasks and the test administrator needed to conform to the same criteria for all students. Equipment for the tasks was therefore prepared centrally in each country according to strict specifications, and the detailed manuals for test administrators (TIMSS 1994a, 1994b, & 1994c) distributed from the international study centre. Representatives from each participating country were trained in administering the assessment, and they in turn conducted training sessions in their own countries.

In New Zealand , performance assessment was directed by Robyn Caygill from the Educational Assessment Research Unit at the University of Otago . Twenty-one teachers received two days training in administering the tasks, and were released from their teaching positions for two weeks. They were each allocated schools in which to carry out the assessment.

Besides ensuring that the equipment and accompanying instructions for students were in good order and laid out at the correct stations, administrators checked that students were at their correct stations at the beginning of each 30-minute spell, saw to it that test conditions were maintained, and collected student work. Prior to the assessment beginning they showed students how to use the stopwatches provided, and made sure that students understood how to read the rulers and thermometers provided. Ability to use this equipment was essential to being able to complete certain tasks in which use of equipment was not the performance being measured. Such instruction was not given where use of equipment was one of the outcomes being measured, and nor was instruction or assistance permitted with other procedures, or for student questions relating to the tasks. The only concession in this respect was that administrators could read task instructions to any standard 3 students unable to do so for themselves.

Navigation

  • Publication Series
  • TIMSS
  • TIMSS 1994/95

Downloads

  • Full Report PDF (3.0 MB)
  • Full Report DOC (2.7 MB)

Contact us

Education data requests
If you have any questions about education data please email us at Requests Data and Insights

Home Close Menu
  • Know your Region Show submenu
  • Find your nearest school Show submenu
  • Early Learning Services Show submenu
  • Directories Show submenu
    • Early Childhood Services Directory – APIShow submenu
    • Early Childhood ServicesShow submenu
    • School Directory – APIShow submenu
    • New Zealand SchoolsShow submenu
    • Māori Schools DirectoryShow submenu
      • Māori Schools DirectoryShow submenu
    • Pacific Schools DirectoryShow submenu
    • Tertiary ProvidersShow submenu
    • School Mergers, Closures & NewShow submenu
  • Publications Show submenu
    • Early Childhood EducationShow submenu
    • MāoriShow submenu
      • KME & MMEShow submenu
      • English-medium EducationShow submenu
      • KME or MME, & English-mediumShow submenu
    • SchoolingShow submenu
      • LearnersShow submenu
        • Learners in GeneralShow submenu
        • Education | Learning OutcomesShow submenu
        • Student Engagement | BehaviourShow submenu
      • Learning Support & WellbeingShow submenu
      • WorkforceShow submenu
      • Parents & WhānauShow submenu
      • School Networks | SystemShow submenu
      • CurriculumShow submenu
      • Digital TechnologyShow submenu
      • Large Scale International StudiesShow submenu
    • PacificShow submenu
    • Tertiary EducationShow submenu
      • COVID-19Show submenu
      • LearnersShow submenu
      • Beyond StudyShow submenu
        • DestinationsShow submenu
          • The mobility patterns of New Zealand's doctoral graduatesShow submenu
        • EmploymentShow submenu
        • Income & EarningsShow submenu
        • Other Economic OutcomesShow submenu
        • Social OutcomesShow submenu
      • MonitoringShow submenu
      • Literacy & NumeracyShow submenu
      • Research Performance/FundingShow submenu
      • SystemShow submenu
      • Annual ReportsShow submenu
      • Occasional PapersShow submenu
      • NZ University RankingsShow submenu
      • e-learningShow submenu
    • Learning SupportShow submenu
    • InternationalShow submenu
    • Publication SeriesShow submenu
  • Statistics Show submenu
    • Action Plan for Pacific Education measurement framework dataShow submenu
    • Annual monitoring reading recoveryShow submenu
    • Apprenticeship boostShow submenu
    • Attainment of 18-year-oldsShow submenu
    • AttendanceShow submenu
    • Beyond studyShow submenu
    • Daily attendance dashboardShow submenu
    • ECE financesShow submenu
    • ECE servicesShow submenu
    • ECE staffingShow submenu
    • Early learning participationShow submenu
    • Early leaving exemptionsShow submenu
    • Entering & leaving teachingShow submenu
    • Financial resourcingShow submenu
    • Financial support for tertiary studentsShow submenu
    • First Year Fees Free tertiary educationShow submenu
    • Funding to schoolsShow submenu
    • HomeschoolingShow submenu
    • StaffingShow submenu
      • How does New Zealand’s tertiary education staffing compare internationally?Show submenu
    • Initial teacher education statisticsShow submenu
    • International students in NZShow submenu
    • Language use in ECEShow submenu
    • Micro-credentials & training schemesShow submenu
    • Māori language in schoolingShow submenu
    • NZ's workplace-based learnersShow submenu
    • National school roll projectionsShow submenu
    • Number of schoolsShow submenu
    • Ongoing resourcing schemeShow submenu
    • Pacific language in schoolingShow submenu
    • Per student funding for schoolsShow submenu
    • School board representationShow submenu
    • School boardsShow submenu
    • School donationsShow submenu
    • School leaver pathwaysShow submenu
    • School leaver's attainmentShow submenu
    • School rollsShow submenu
    • School subject enrolmentShow submenu
    • Stand-downs, suspensions, exclusions & expulsionsShow submenu
    • Teacher numbersShow submenu
      • 2021Show submenu
      • 2020Show submenu
    • Teacher turnoverShow submenu
    • Tertiary achievement & attainmentShow submenu
    • Tertiary enrolments in language courses, including Te Reo Māori coursesShow submenu
    • Tertiary participationShow submenu
    • Tertiary population dataShow submenu
    • Tertiary researchShow submenu
    • Tertiary summary tablesShow submenu
    • Pathways from Year 11Show submenu
    • Transient studentsShow submenu
    • Traumatic incidentsShow submenu
    • University rankings fact sheetsShow submenu
    • Vocational education & trainingShow submenu
  • Topics Show submenu
    • He Whakaaro: Education InsightsShow submenu
  • Data Services Show submenu

Site information

  • Site map
  • Contact us
  • About this site
  • Glossary
  • Copyright, Legal & Privacy
  • Links
  • © Education Counts 2026
  • Ministry of Education logo.
  • New Zealand Government logo.
Scroll to top of page