The following article is a summary of a chapter in Adams, G., & Engelmann, S. (1996). Research on Direct Instruction. Ordering information follows the article.
Project Participants and Models
The Follow Through project was the largest, most expensive
educational experiment ever conducted. This federal program was
originally designed to be a service-oriented project similar to Head
Start. However, because of funding cutbacks the emphasis was shifted
from service to program evaluation. Over 75,000 low income children
in 170 communities were involved in this massive project designed to
evaluate different approaches to educating economically disadvantaged
students from kindergarten through grade 3. State, school, and
national officials nominated school districts that had high numbers
of economically disadvantaged students. Parent representatives of
these school districts chose to participate after hearing
presentations from the 20 different program designers (sponsors).
Each participating district implemented the selected sponsor's
approach in one or more schools. For participating, each district
received $750 per student beyond the normal level of funding.
Each sponsor was required to:
·"provide the community with a well-defined, theoretically consistent and coherent approach that could be adapted to local conditions;
·provide the continuous technical assistance, training, and guidance necessary for local implementation of the approach;
·exercise a 'quality control' function by consistently monitoring the progress of program implementation;
·serve as an agent for change as well as a source of program consistency by asking the community in retaining a consistent focus on the objectives and requirements of the approach rather than responding in an ad hoc manner to the daily pressures of project operations;
·ensure implementation of a total program, rather than a small fragment, such as reading, with a resulting possibility for a major impact on the child's life, and
·provide a foundation for comprehending and describing results of evaluation efforts" (Stebbins, St. Pierre & Proper, 1977, p. 5)
The orientation of the sponsors varied from the loosely-structured
open classroom approach to the highly-structured behavior analysis
approach. Nine of the original sponsors qualified for inclusion in
the evaluation. To be included, a sponsor had to have more than three
active sites that could be compared to control sites in the same
communities.
Abt Associates used the system developed by White to classify the
approaches of the different models. The first dimension was the
theoretical orientation of the models:
·The behavioristic approach is based on the belief that all behaviors are learned. The reason that disadvantaged children are behind is because no one has taught them necessary social and academic skills. The training is based on selecting the behavioral objectives that are needed. Then teachers reinforce the steps in the behavioral objectives. The general label for this group became the Basic Skills Models.
·The cognitive development approach is based on the sequence of normal cognitive growth. The reason that disadvantaged children are behind is because they have insufficient normal cognitive experiences. The orientation of this approach is to provide interactions between children and teachers. During these interactions, children learn how to solve problems and learn verbal skills based on a self-directed process. Emphasis is placed on the teacher providing age-appropriate cognitive materials and experiences. The general label for this group was the Cognitive/Conceptual Skills Models.
·The psychodynamic approach is based on the assumption that socioemotional development (the development of the "whole child") is essential to educational improvement. Emphasis is placed on trying to improve children's self-esteem and peer interactions. The goal for the teacher is to provide an environment in which children can move toward the goal of self-actualization through children making their own free choices. However, it is assumed that children know what is best for their personal growth. The general label for this group was the Affective Skills Models.
Basic Skills Models
Direct Instruction Model (University of Oregon)­p;Developed by
Siegfried Engelmann and Wes Becker, this model used the DISTAR
(DISTAR is an acronym for Direct Instruction System for Teaching And
Remediation) reading, arithmetic, and language programs. The model
assumes that the teacher is responsible for what the children
learn.
Behavior Analysis Model (University of Kansas)­p;Developed by
Donald Bushell, this model used a behavioral (reinforcement) approach
for teaching reading, arithmetic, handwriting, and spelling. Social
praise and tokens were given to the children for correct responses
and the tokens were traded for desired activities. Teachers used
programmed reading programs in which the task was presented in small
steps. The instructional program was not specified by the model. Two
sites used the DISTAR materials. Many used Sullivan Programmed
Phonics. Students were monitored and corrective procedures were
implemented to ensure student progress.
Language Development (Bilingual) Model (Southwest Educational
Developmental Laboratory)­p;This curriculum-based model used an
eclectic approach based on language development. When appropriate,
material was presented first in Spanish and then in English.
Cognitive/Conceptual Skills Models
Cognitively-Oriented Curriculum (High Scope Foundation)­p;This
popular program was directed by David Weikart and was based on
Piaget's belief that there are underlying cognitive processes.
Children were encouraged to schedule their own activities and then
follow their schedules. The teacher modeled language through the use
of labeling and explaining causal relationships. Also, the teacher
fostered a positive self-concept through the way the students were
given choices.
Florida Parent Education Model (University of Florida)­p;Based on
the work of Ira Gordon, this program taught parents of disadvantaged
children to teach their children. At the same time, students were
taught in the classroom using a Piagetian approach. Parent trainers
coordinated the teaching. Emphasis included not only language
instruction, but also affective, motor, and cognitive skill
instruction.
Tucson Early Education Model (University of Arizona)­p;Developed
by Marie Hughes, TEEM used a language-experience approach (much like
the whole language approach) that attempted to elaborate the child's
present experience and interest. The model was based on the
assumption that children have different learning styles so the
child-directed choices are important. The teacher assists by helping
children compare, recall, and locate relationships.
Affective Skills Models
Bank Street College Model (Bank Street College of
Education)­p;This model used the traditional middle-class nursery
school approach that was adopted by Head Start. Through the use of
learning centers, children had many options, such as counting blocks
and quiet areas of reading. The teacher is responsible for program
implementation by taking advantage of learning situations. The
classroom is structured to increase learning opportunities.
Open Education Model (Education Development Center)­p;Derived from
the British Infant School model, this model focuses on building the
children's responsibility for their own learning. Reading and writing
were not taught directly, but through stimulating a desire to
communicate.
Responsive Education Model (Far West Laboratory)­p;Developed by
Glenn Nimict, this is an eclectic model using the work of O.K. Moore,
Maria Montessori, and Martin Deutsch. The model used learning centers
and the child's interests to determine when and where the child is
stationed. The development of self-esteem is considered essential to
the acquisition of academic skills.
Program Design
Each model had 4 to 8 sites with children that started school in
kindergarten and some models also had sites with children that
started in first grade. Each Follow Through (FT) school district
identified a non-Follow Through (NFT) comparison school for each
Follow Through site. The comparison school acted as a control group.
Unfortunately, the NFT sites that were selected tended to have
children who were less economically disadvantaged than the Follow
Through sites. Because of this problem, Abt Associates used a
covariance statistical analysis process to adjust for initial
differences.
A total of 9,255 FT and 6,485 NFT children were in the final analysis
group. Students in each school district site were tested at entry and
then each spring until the third grade. The DI Model group included
low income students in 20 communities. These communities varied
widely­p;rural and urban­p;blacks, whites, Mexican Americans,
Spanish American, Native Americans, and a diverse mixture of other
ethnic groups.
The Stanford Research Institute was initially awarded a contract for
data collection and Abt Associates received a contract for data
analysis. The Office of Education determined the final design of the
project with consultation from the Huron Institute. Because the
sponsors had different approaches, the data collection was
comprehensive. Assessment information was collected in the areas of
basic skills (academic), cognitive, and affective behavior. The
process of selecting appropriate assessment instruments was an
arduous task given the time constraints of trying to select the most
reliable, valid tests that could be administered in the least amount
of time.
The following tests were used to assess basic skills, cognitive, and
affective achievement: the Metropolitan Achievement Test (MAT), the
Wide Range Achievement Test (WRAT), the Raven's Colored Progressive
Matrices, the Intellectual Achievement Responsibility Scale (IARS+
and IARS-), and the Coopersmith Self-Esteem Inventory. The MAT is a
respected achievement test that assesseses Basic Skills and
Cognitive-Conceptual Skills. The Basic Skills scales of the MAT
included Listening for Sound (sound-symbol relationships), Word
Knowledge (vocabulary words), Word Analysis (word identification),
Mathematic Computation (math calculations), Spelling, and Language
(punctuation, capitalization, and word usage). The WRAT measured
number recognition, spelling, word reading, and oral and written math
problems.
The Cognitive Skills scales of the MAT included Reading
(comprehension of written passages), Mathematics Concepts (knowledge
of math principles and relationships), Mathematical Problem Solving
(the use of reasoning with numbers). Also, the Raven's Coloured
Progressive Matrices was used. The Raven's test, however, did not
prove to discriminate between models or show change in scores over
time.
Affective Skills was assessed using two instruments. The IARS was
designed to assess whether children attribute their success (+) or
failures (-) to themselves or external forces. The Coopersmith
Self-Esteem Inventory is designed to assess how children feel about
themselves, the way they think other people feel about them, and
their feelings about school.
Comparisons Across Follow Through Sponsors
Students started in either kindergarten or first grade and were
retested yearly through the end of third grade. While critics have
complained about test selection and have usually suggested more
testing, the assessment effort of this study was well beyond any
other educational study conducted before, or since.
Significant Outcome Comparison
Abt Associates analyzed the data by comparing each Follow Through
model's scores to both the local comparison group and the national
pooled comparison group (created by combining the comparison groups
from all nine Follow Through models). Local comparison scores and
national pooled comparison scores were used as covariates to analyze
each variable. A plus (+) was given if (a) the Follow Through (FT)
group exceeded the Non-Follow Through (NFT) group by one-fourth
standard deviation (.25 effect size) and (b) the difference was
statistically significant. A minus (-) was given if the NFT score
exceeded the FT score by one-fourth standard deviation (.25 effect
size) and was statistically significant. If the results did not reach
either the plus or the minus criterion, the difference was null and
left blank.
The following index is based on a comparison of each model's site
with the local and pooled national comparison groups. If either the
pooled or local comparison were plus (+), the effect is recorded as a
plus. If either or both was a minus (-), the effect is recorded as a
minus. Then the plus and minus values are summed and multiplied by
100 so the possible range of scores was from -100 to 100. If the
Follow Through model group scored consistently higher than the
comparison group on a variable, then the index would be a positive
number. If the comparison group scored higher, the index would be
negative. If there was no difference between the two groups, the
score would be zero (0.00).
Figure 1 shows the results of this analysis. As you can see by the
number of negative scores, the local or national pooled comparison
group scores were higher than most Follow Through models.
Only the Direct Instruction model had positive scores on all three
types of outcomes (Basic Skills, Cognitive, and Affective). Overall,
the Direct Instruction model was highest on all three types of
measures.
Figure 1: Significant Outcomes Comparison Across Follow Through
Models
The results were very different from expectations suggested by the
model orientations. The three programs in the Basic Skills model had
the best basic skills, cognitive skills, and affective skills scores.
Of the three orientations, the Basic Skills models (Direct
Instruction, Behavior Analysis, and Southwest Lab) had the best basic
skills scores. The Cognitive models (Parent Education, TEEM, and
Cognitively-Oriented Curriculum) ranked second in cognitive skills
scores; however, the average rank of 5.0 is far from the average rank
of 2.8 for the Basic Skills model. The Affective Models had the worst
affective ranks (6.7 compared to 2.7 for the Basic Skills
models).
Figure 1 provides more details on the models' rankings. The DI model
had, by far, the highest basic skills scores while the other two
Basic Skills models had more modest results (the Behavior Analysis
model had a slight positive score and the Southwest Labs model score
was 0.0).
Figure 1 also shows that none of the Cognitive Models had positive
cognitive scores. In fact, the Direct Instruction Model was the only
model of the nine that had a positive cognitive score (and the
results were extremely positive - over 35%). In contrast, students in
two of the three cognitively-oriented models [TEEM and Cognitive
Curriculum (High Scope)] had the lowest cognitive scores.
Critics have often complained that the DI model was a pressure cooker
environment that would negatively impact students' social growth and
self-esteem. As the Abt Associates' authors note:
Critics of the model have predicted that the emphasis of the model on
tightly controlled instruction might discourage children from freely
expressing themselves and thus inhibit the development of self-esteem
and other affective skills. (Stebbins, St. Pierre & Proper, p.
8)
Because of this expectation, the affective scores are of interest.
Three of the five lowest scoring models on the affective domain were
models that targeted improving affective behavior; none of the
affective models had positive affective scores. In contrast, all
Basic Skills models had positive affective scores with the Direct
Instruction model achieving the highest scores. The theory that an
emphasis on basic skills instruction would have a negative impact on
affective behavior is not supported by the data. Instead, it appears
that the models that focused on an affective education not only had a
negative impact on their students' basic skills and cognitive skills,
but also on their affective skills.
Fine Tuning the Results
The Bereiter-Kurland Reanalysis. A group funded by the
Ford Foundation (House, Glass, McLean, & Walker, 1978) questioned
certain aspects about the test selection and data analysis in the Abt
report. After reviewing the critiques of the Abt Report by House et
al., 1978), Bereiter and Kurland (1981-1982) reanalyzed the data of
that report based on the criticisms that the report used an
inappropriate unit of measurement for the dependent variable and
inappropriate covariates. The Bereiter-Kurland reanalysis was based
on:
·Using the site means as the dependent variable.
·Using these site scores as covariates: socio-economic status and ethnic and linguistic difference from the mainstream.
·Using only models that had data from 6 or more sites.
Each model had the possibility of 77 statistically significant
differences (7 other models times 11 MAT subscale scores). Fifty of
the 77 (65%) possible differences for the DI group were statistically
significant based on Newman-Keuls Tests p=.05). In contrast, the
Behavior Analysis group showed only 18 of 77 (23%) significant
differences.
None of the other six models showed any statistically significant
differences on any of the 11 MAT subscales (0 of 396 possible
combinations). This means, for example, that none of the 11 MAT Bank
Street scores differed significantly from any of the Responsive
Education, TEEM, Cognitive Curriculum, Parent Education, or Open
Education mean scores.
Another way of showing the difference between models was through the
use of effect size comparisons. Figure 2 shows a different display of
the information provided by Bereiter and Kurland (also Figure 2 in
the Bereiter & Kurland review). In Figure 2, the effect size of
the DI model is compared to the average effect size for the other
Follow Through models. The differences are dramatic, even though the
DI data include the Grand Rapids site that did not truly implement
the DI model. The differences would be greater if only DI sites with
implementation fidelity were included.
Figure 2: Effect Size Comparison
(DI to Other Models)
To provide a clearer picture of the differences, Figures 3-4 display
the Bereiter-Kurland findings according to domain. First, Figure 3
shows a comparison of effects for the Basic Skills scores between the
DI group and the average effect size of the other Follow Though
groups. Remember an effect size of .25 is thought to be educationally
significant. Differences in some MAT Basic Skills subscales scores
are over 3.0 (Total Language and Language B). The average difference
in Basic Skills scores between Direct Instruction and the other
models was 1.8.
Figure 3: Bereiter Analysis of
Basic Skills Abt Data*
Figure 4 shows the differences in the cognitive scores between the DI
models and the average Follow Through model. Effect sizes are above
1.0 for all but one difference.
Figure 4: Bereiter Analysis of
Cognitive Ability Abt Data
Overall, the Bereiter-Kurland reanalysis provides even stronger
support for the effectiveness of Direct Instruction. As the authors
noted, only the DI and Behavior Analysis models had positive results
and the DI model results were vastly superior.
Changing the Abt Report Criteria
Becker and Carnine (1981) had two other complaints about the Abt
Associates report, which resulted in the report underrepresenting the
superiority of the DI model. First, because of the problem of
mismatches between comparison groups that initially had higher entry
scores than the Follow Through model groups, Abt Associates deleted
these data from subsequent analyses. Unfortunately for the DI model,
sometimes the scores for the comparison groups were significantly
higher at entry, but by the end of third grade the DI group scored
significantly higher than the comparison groups. Abt Associates
decided to delete these groups because of the initial entry
differences. Also, data were excluded if there were significant
differences between the two groups in preschool experience per site,
even though preschool experience (e.g., Head Start) had only a very
low correlation with later achievement (-0.09). (This variable was
not used in the previously cited Bereiter-Kurland study.) Overall,
approximately one-third of the data was excluded from most Follow
Through models because of these decision rules.
Figures 5-7 show the differences in results based on these analyses.
When data were kept for sites where there were initial performance
differences, the highest scoring model (DI) scored even higher
whereas the lower scoring models (Cognitive Curriculum and Open
Education) scored even lower. The scores for the other models stayed
roughly the same.
Figure 5: Index for Significant Outcomes for Cognitive Measures
Figure 6: Index for Significant
Outcomes for Basic Skills Measures
Figure 7: Index for Significant
Outcomes for Affective Measures
Figure 8: Percentile scores across nine Follow Through models
Becker and Carnine re-analyzed the data without the Grand Rapids
site. The Grand Rapids site stopped using Direct Instruction when
there was a change in program director. Even though this problem was
well documented, Abt Associates included the Grand Rapids site in the
DI data. Becker and Carnine reanalyzed the Abt Associates results
without the Grand Rapids site. Figures 6-8 shows the already high
scores for the DI group became even higher when the Grand Rapids data
were removed.
Norm-Referenced Comparisons
Another way of looking at the Abt Associates data is to compare
median grade-equivalent scores on the normed-referenced Metropolitan
Achievement Test that was used to evaluate academic progress. Unlike
the previous analysis that compared model data to local and pooled
national sites, the following norm-referenced comparisons show each
model's MAT scores based on the MAT norms. Figure 8 shows the results
across four academic subjects. The comparisons are made to a baseline
rate of the 20th percentile which was the average expectation of
disadvantaged children without special help. The figure displays the
results in one-fourth standard deviation intervals.
Clearly, children in the DI model showed consistently higher scores
than the other models, and also the students in the Southwest Lab and
the Open Education model were below expected levels of achievement
based on norms of performance in traditional schools in all four
academic subjects.
Only three of 32 possible reading scores of the other eight models
were above the 30th percentile. The DI students scored 7 percentile
points higher than the second place group (Behavior Analysis) and
over 20 percentile points higher than the Cognitive Curriculum (High
Scope), Open Education, and Southwest Lab Models.
Except for children in the DI model, the math results are
consistently dismal. The only other model above the 20th percentile
was the Behavior Analysis model. DI students scored 20 percentiles
ahead of the second place group (Behavior Analysis) and 37
percentiles higher than the last place group (Cognitive
Curriculum/High Scope).
In spelling, the DI model and the Behavior Analysis model were within
the normal range. DI students scored 2 percentiles above the second
place group (Behavior Analysis), 19 percentiles above the third place
group, and 33 percentiles above the last place group (Open
Education).
Like the previous academic subjects, the DI model was clearly
superior in language. DI students scored 29 percentiles above the
second place group (Behavior Analysis) and 38 percentiles above the
last place group (Cognitive Curriculum/High Scope).
Conclusions
For many people the use of normed scores are more familiar than the
use of the index described in the previous section. No matter which
analysis is used, children who were in the DI model made the most
gains when compared to the other eight models. With the possible
exception of the Behavior Analysis model, all other models seem to
have little positive effect on the academic progress of their
children.
The increase amounts of money, people, materials, health and dental
care, and hot lunches did not cause gains in achievement. Becker
(1978) observed that most Follow Through classrooms had two aides and
an additional $350 per student, but most models did not show
significant achievement gains.
Popular educational theories of Piaget and others suggest that
children should interact with their environment in a self-directed
manner. The teacher is supposed to be a facilitator and to provide a
responsive environment. In contrast, the successful DI model used
thoroughly field-tested curricula that teachers should follow for
maximum success. The Follow Through models that were based on a
self-directed learner model approach were at the bottom of academic
and affective achievement. The cognitively-oriented approaches
produced students who were relatively poor in higher-order thinking
skills and models that emphasized improving students' self-esteem
produced students with the poorest self-esteem.
Subsequent Analyses
Variability Across DI Sites
The Abt Associates findings were criticized by House, Glass, McLean,
& Walker, 1978) and then defended by others (Anderson, St.
Pierre, Proper, Stebbins, 1978; Becker, 1977; Bereiter & Kurland,
1981-82; Wisler, Burns, & Iwanoto, 1978). One Abt Associates
finding was that there was more variability within a model than
between models.
This statement is consistent with the often cited belief that
"Different programs work for different children" or another way of
saying "Not all programs work with all children." The following
sections provide research results that contradict this statement. The
problem is that the statement doesn't match the data.
Gersten (1984) provided an interesting picture of the consistency of
achievement scores of urban DI sites after the Abt report was
completed. Figure 9 shows the results in 3rd grade reading scores
from 1973 to 1981 in four urban cities. The reading scores are
consistently around the 40th percentile. Based on non-Follow Through
students in large Northwest urban settings, the expected score is the
28th percentile on the MAT. Some variability is due to the
differences between tests when some districts changed tests over the
nine year period. Also, Gersten mentioned that the drop in the New
York scores in 1978 and 1979 may have been because of budgetary
reductions during those years.
Figure 10 shows the stability of math scores. The math scores for
these three sites tend to be consistently in the 50th percentile
range. New York did not collect information of math during this
period. Based on the math scores of large Northwest cities,
non-Follow Through students would be expected to score at the 18th
percentile.
Figure 9: Total reading scores for K-3 students. Stability of effects: Percentile equivalents at the end of Grade 3.
Figure 10:
Follow-Up Studies
Fifth and Sixth Grade Follow-up
Some critics of DI have indicated that many, if not most, early DI
achievement gains will disappear over time. There are different
reasons given for this prediction. One reason given is that the DI
students were "babied" through sequences that made instruction easy
for them. They received reinforcement and enjoyed small group
instruction, but they would find it difficult to transition to the
realities of the "standard" classroom.
DI supporters give different reasons for suggesting that DI results
would decrease over time. The DI students were accelerated because
they had been taught more during the available time than they would
have been taught during the same time in a traditional program. Upon
leaving Follow Through, they would be in instructional settings that
teach relatively less than the Follow Through setting achieved.
Statistically, there would be a tendency for them to have a
regression toward the mean effect. Phenomonologically, students would
be provided with relatively fewer learning opportunities and would
tend to learn less accordingly.
In any case, the effects observed at a later time are not the effects
of Follow Through. They are the effects of either three or four years
of Follow Through and the effects of intervening instructional
practices. Engelmann (1996) observed that because the typical
instruction provided for poor children in grades 4 and beyond has not
produced exemplary results, there is no compelling reason to use
results of a follow-up to evaluate anything but the intervening
variables and how relatively effective they were in maintaining
earlier gains.
Junior and Senior High School Follow-up
New York City Follow-up
One of the most interesting long-term follow-up studies was
conducted by Linda Meyer (1984). She tracked students from two
schools in Ocean Hill-Brownsville section of Brooklyn. This district
was one of the lowest of the 32 New York school districts. The
fifteen elementary schools in District 23 had an average rank 519th
out of the 630 elementary schools.
PS 137 was the only DI Follow Through site in New York City. Meyer
selected a comparison school that matched the DI school on many
variables. Over 90% of the students were minority students and over
75% were from low-income families.
Meyer retrieved the rosters of the first three cohort groups (1969,
1970, and 1971) and included students who received either three or
four years DI instruction. With the cooperation of the New York City
Central Board of Education and the Office of the Deputy Chancellor
for Instruction, students were located through the computer database.
Meyer and staff were able to locate 82% of the former DI students and
76% of the control students. These rates should be considered high
because it would be expected that over time many students would move
totally out of the area.
Table 1* shows the grade equivalent scores for the DI and comparison
groups of the three cohort groups. At the end of 9th grade, the three
DI groups were on average one year above the three comparison groups
in reading (9.20 versus 8.21) (p.01) with an effect size of .43. In
math, the DI groups were approximately 7 months ahead of the
comparison group (8.59 versus 7.95) which was not statistically
significant (p.09), but educationally significant based on an effect
size of .28.
Table 1: Results of t-tests comparisons and effect sizes for reading
and math at the end of 9th grade*
Achievement Growth for Other Sites
Gersten, Keating, and Becker (1988) provide similar information for
other sites. Table 2* shows the effect sizes of the E. St. Louis and
Flint sites at the end of ninth grade. Most effect sizes were above
the .25 level of being educationally significant. It should be noted
that the 3-K East St. Louis group that started in kindergarten,
instead of first grade, had four years of instruction (not three) had
the second highest effect size (.49).
Table 2: Ninth Grade Reading Achievement Results from E. St. Louis,
Flint, and New York*
Table 3: Ninth Grade Math Achievement Results from E. St. Louis,
Flint, and New York*
Table 3* shows similar effectiveness in the math. The results of
these two analyses clearly show that while the superiority of DI
diminishes with the time spent in traditional curricula, the
advantage of the DI lasts. Educational significant differences occur
in reading (overall effect size = .43) and in math (overall effect
size =.25).
Graduation Rates and College Acceptance Rates at Other
Sites
Darch, Gersten, & Taylor (1987) tracked Williamsburg (SC)
students in the original Abt study (students entering first grade in
1969 and 1970) to compare graduation rates. All students were black
and had stayed in the county school system. Table 4* shows a
statistically significant difference in drop-out rate for Group 1
(the 1969 group), but the difference in drop-out rate was not
statistically significant for Group 2 (the 1970 group).
Table 4. Longitudinal Follow-up Study: Percentage of Graduates and
Dropouts for Direct Instruction Follow Through and Local Comparison
Groups.*
A total of 65.8% of the Group 1 Follow Through students graduated on
time in contrast to 44.8% of the comparison group (a statistically
significant difference - p .001). For Group 2, 87.1% of the Follow
Through group and 74.6% of the comparison group graduated on time (a
nonsignificant statistical difference). Also, 27% of the Group 1
Follow Through were accepted into college in contrast to 13% of the
comparison group; the difference for Group 2 in college admission was
not significant.
Meyer, Gersten, & Gutkin (1983) calculated the rates of
graduation, retention, dropping out, applying to college, and
acceptance to college for the three cohort groups in the New York
City site. Statistical analyses showed that the DI group had
statistically significantly higher rates of graduation (p.001),
applying to college (p.001), acceptance to college (p.001) and lower
rates of retention (p.001) and dropping out (p.001). The differences
in graduation rates were consistent across the three cohort groups
with over 60% of the DI students graduating in contrast to less than
a 40% graduate rate for the three comparison groups. Meyer mentioned
in her report that the difference in retention rate between Cohort II
and Cohorts I and III may have been due to the principal retaining
all students below grade level one year.
Table 5: Percentages of Cohorts 1, 2, and 3 Students: Graduated High
School, Retained, Dropped Out, Applied to College, and Accepted to
College*
Conclusions
Educational reformers search for programs that produce superior
outcomes with at-risk children, that are replicable and can therefore
be implemented reliably in given settings, and that can be used as a
basis for a whole school implementation that involves all students in
a single program sequence, and that result in students feeling good
about themselves. The Follow Through data confirm that DI has these
features. The program works across various sites and types of
children (urban blacks, rural populations, and non-English speaking
students). It produces positive achievement benefits in all subject
area - reading, language, math, and spelling. It produces superior
results for basic skills and for higher-order cognitive skills in
reading and math. It produces the strongest positive self-esteem of
the Follow Through programs.
Possibly, the single feature that is not considered by these various
achievements is the implied level of efficiency of the system. Some
Follow Through sponsors performed poorly in math, because they spent
very little time on math. Most of the day focused on reading and
related language arts. Although time estimates are not available for
the various sponsors, some of them spent possibly twice as much time
on reading as DI sites did. Even with this additional time, these
sites achieved less than the DI sites achieved. For a system to
achieve first place in virtually every measured outcome, the system
is required to be very efficient and use the limited amount of
student-contact time to produce a higher rate of learning than other
approaches achieve. If the total amount of "learning" induced over a
four-year period could be represented for various sponsors, it would
show that the amount of learning achieved per unit of time is
probably twice as high for the DI sites as it is for the non-DI
sponsors.
Perhaps the most disturbing aspect of the Follow Through results is
the persistence of models that are based on what data confirms is
whimsical theory. The teaching of reading used by the Tucson Early
Education Model was language experience, which is quite similar in
structure and procedures to the whole language approach. The fact
that TEEM performed so poorly on the various measures should have
carried some implications for later reforms; however, it didn't. The
notion of the teacher being a facilitator and providing children with
incidental teaching was used by the British infant school model (Open
Education). It was a flagrant failure, an outcome that should have
carried some weight for the design of later reforms in the US. It
didn't. Ironically, it was based on a system that was denounced in
England by its Department of Science and Education in 1992. At the
same time, states like California, Pennsylvania, Kentucky, Ohio, and
others were in full swing in the National Association for the
Education of Young Children's idiom of "developmentally appropriate
practices," which are based on the British system.
Equally disturbing is the fact that while states like California were
immersed in whole language and developmentally appropriate practices
from the 1980s through mid 1990s, there was no serious attempt to
find models or practices that work. Quite the contrary, DI was
abhorred in California and only a few DI sites survived. Most of them
did through deceit, pretending to do whole language. At the same
time, those places that were implementing the whole language reading
and the current idiom of math were producing failures at a tragic
rate.
Possibly the major message of Follow Through is that there seems to
be no magic in education. Gains are achieved only by starting at the
skill level of the children and carefully building foundations that
support higher-order structures. Direct Instruction has no peer in
this enterprise.
*Tables in this article could not be reproduced clearly in electronic
format. Please refer to Effective School Practices, vol.15,
no.1, or to Research on Direct Instruction, by G. Adams and S.
Engelmann.
References
Adams, G.., & Engelmann, S. (in press) Research on Direct
Instruction. Seattle, WA: Educational Achievement Systems.
Anderson, R., St. Pierre, R., Proper, E., & Stebbins, L. (1978).
Pardon us, but what was the question again? A response to the
critique of the Follow Through evaluation. Harvard Educational
Review, 48(2), 1621-170.
Becker, W. (1977). Teaching reading and language to the
disadvantaged­p;what we have learned from field research. Harvard
Education Review, 47, 518-543.
Becker, W. C. (1978). National Evaluation of Follow Through:
Behavior-theory-based programs come out on top. Education and Urban
Society, 10, 431-458.
Becker, W., & Carnine, D. (1981). Direct Instruction: A behavior
theory model for comprehensive educational intervention with the
disadvantaged. In S. Bijon (Ed.) Contributions of behavior
modification in education (pp. 1-106), Hillsdale, NJ: Laurence
Erlbaum.
Bereiter, c., & Kurland, M. (1981-82). A constructive look at
Follow Through results. Interchange, 12, 1-22.
Darch, C., Gersten, R., & Taylor, R. (1987). Evaluation of
Williamsburg County Direct Instruction Program: Factors leading to
success in rural elementary programs. Research in Rural Education, 4,
111-118.
Gersten, R. (1984). Follow Through revisted: Reflections on the site
variability issue. Educational Evaluation and Policy Analysis, 6,
411-423.
Gersten, R., Keating, T., & Becker, W. (1988). The continued
impact of the Direct Instruction model: Longitudinal studies of
Follow Through students. Education and Treatment of Children, 11(4),
318-327.
House, E., Glass, G., McLean, L., & Walker, D. (1978). No simple
answer: Critique of the FT evaluation. Harvard Educational Review,
48(2), 128-160).
Meyer, Gersten, & Gutkin, (1983). Direct Instruction: A Project
Follow Through success story in an inner- city school. Elementary
School Journal, 84, 241-252.
Meyer, L. A. (1984). Long-term academic effects of the Direct
Instruction Project Follow Through, Elementary School Journal, 84,
380-394.
Stebbins, L. B., St. Pierre, R. G. , & Proper, E. C. (1977).
Education as experimentation: A planned variation model (Volume IV-A
& B) Effects of follow through models. Cambridge, MA.: Abt
Associates.
Wisler, C., Burns, G.P.,Jr., & Iwamoto, D. (1978). FT redux: A
response to the critique by House, Glass, McLean, & Walker.
Harvard Educational Review, 48(2), 171-185).