Project Follow Through: In-depth and Beyond

Gary Adams
Educational Achievement Systems, Seattle

The following article is a summary of a chapter in Adams, G., & Engelmann, S. (1996). Research on Direct Instruction. Ordering information follows the article.

Project Participants and Models

The Follow Through project was the largest, most expensive educational experiment ever conducted. This federal program was originally designed to be a service-oriented project similar to Head Start. However, because of funding cutbacks the emphasis was shifted from service to program evaluation. Over 75,000 low income children in 170 communities were involved in this massive project designed to evaluate different approaches to educating economically disadvantaged students from kindergarten through grade 3. State, school, and national officials nominated school districts that had high numbers of economically disadvantaged students. Parent representatives of these school districts chose to participate after hearing presentations from the 20 different program designers (sponsors). Each participating district implemented the selected sponsor's approach in one or more schools. For participating, each district received $750 per student beyond the normal level of funding.

Each sponsor was required to:

·"provide the community with a well-defined, theoretically consistent and coherent approach that could be adapted to local conditions;

·provide the continuous technical assistance, training, and guidance necessary for local implementation of the approach;

·exercise a 'quality control' function by consistently monitoring the progress of program implementation;

·serve as an agent for change as well as a source of program consistency by asking the community in retaining a consistent focus on the objectives and requirements of the approach rather than responding in an ad hoc manner to the daily pressures of project operations;

·ensure implementation of a total program, rather than a small fragment, such as reading, with a resulting possibility for a major impact on the child's life, and

·provide a foundation for comprehending and describing results of evaluation efforts" (Stebbins, St. Pierre & Proper, 1977, p. 5)

The orientation of the sponsors varied from the loosely-structured open classroom approach to the highly-structured behavior analysis approach. Nine of the original sponsors qualified for inclusion in the evaluation. To be included, a sponsor had to have more than three active sites that could be compared to control sites in the same communities.

Abt Associates used the system developed by White to classify the approaches of the different models. The first dimension was the theoretical orientation of the models:

·The behavioristic approach is based on the belief that all behaviors are learned. The reason that disadvantaged children are behind is because no one has taught them necessary social and academic skills. The training is based on selecting the behavioral objectives that are needed. Then teachers reinforce the steps in the behavioral objectives. The general label for this group became the Basic Skills Models.

·The cognitive development approach is based on the sequence of normal cognitive growth. The reason that disadvantaged children are behind is because they have insufficient normal cognitive experiences. The orientation of this approach is to provide interactions between children and teachers. During these interactions, children learn how to solve problems and learn verbal skills based on a self-directed process. Emphasis is placed on the teacher providing age-appropriate cognitive materials and experiences. The general label for this group was the Cognitive/Conceptual Skills Models.

·The psychodynamic approach is based on the assumption that socioemotional development (the development of the "whole child") is essential to educational improvement. Emphasis is placed on trying to improve children's self-esteem and peer interactions. The goal for the teacher is to provide an environment in which children can move toward the goal of self-actualization through children making their own free choices. However, it is assumed that children know what is best for their personal growth. The general label for this group was the Affective Skills Models.

Basic Skills Models
Direct Instruction Model (University of Oregon)­p;Developed by Siegfried Engelmann and Wes Becker, this model used the DISTAR (DISTAR is an acronym for Direct Instruction System for Teaching And Remediation) reading, arithmetic, and language programs. The model assumes that the teacher is responsible for what the children learn.

Behavior Analysis Model (University of Kansas)­p;Developed by Donald Bushell, this model used a behavioral (reinforcement) approach for teaching reading, arithmetic, handwriting, and spelling. Social praise and tokens were given to the children for correct responses and the tokens were traded for desired activities. Teachers used programmed reading programs in which the task was presented in small steps. The instructional program was not specified by the model. Two sites used the DISTAR materials. Many used Sullivan Programmed Phonics. Students were monitored and corrective procedures were implemented to ensure student progress.

Language Development (Bilingual) Model (Southwest Educational Developmental Laboratory)­p;This curriculum-based model used an eclectic approach based on language development. When appropriate, material was presented first in Spanish and then in English.

Cognitive/Conceptual Skills Models
Cognitively-Oriented Curriculum (High Scope Foundation)­p;This popular program was directed by David Weikart and was based on Piaget's belief that there are underlying cognitive processes. Children were encouraged to schedule their own activities and then follow their schedules. The teacher modeled language through the use of labeling and explaining causal relationships. Also, the teacher fostered a positive self-concept through the way the students were given choices.

Florida Parent Education Model (University of Florida)­p;Based on the work of Ira Gordon, this program taught parents of disadvantaged children to teach their children. At the same time, students were taught in the classroom using a Piagetian approach. Parent trainers coordinated the teaching. Emphasis included not only language instruction, but also affective, motor, and cognitive skill instruction.

Tucson Early Education Model (University of Arizona)­p;Developed by Marie Hughes, TEEM used a language-experience approach (much like the whole language approach) that attempted to elaborate the child's present experience and interest. The model was based on the assumption that children have different learning styles so the child-directed choices are important. The teacher assists by helping children compare, recall, and locate relationships.

Affective Skills Models
Bank Street College Model (Bank Street College of Education)­p;This model used the traditional middle-class nursery school approach that was adopted by Head Start. Through the use of learning centers, children had many options, such as counting blocks and quiet areas of reading. The teacher is responsible for program implementation by taking advantage of learning situations. The classroom is structured to increase learning opportunities.

Open Education Model (Education Development Center)­p;Derived from the British Infant School model, this model focuses on building the children's responsibility for their own learning. Reading and writing were not taught directly, but through stimulating a desire to communicate.

Responsive Education Model (Far West Laboratory)­p;Developed by Glenn Nimict, this is an eclectic model using the work of O.K. Moore, Maria Montessori, and Martin Deutsch. The model used learning centers and the child's interests to determine when and where the child is stationed. The development of self-esteem is considered essential to the acquisition of academic skills.

Program Design
Each model had 4 to 8 sites with children that started school in kindergarten and some models also had sites with children that started in first grade. Each Follow Through (FT) school district identified a non-Follow Through (NFT) comparison school for each Follow Through site. The comparison school acted as a control group. Unfortunately, the NFT sites that were selected tended to have children who were less economically disadvantaged than the Follow Through sites. Because of this problem, Abt Associates used a covariance statistical analysis process to adjust for initial differences.

A total of 9,255 FT and 6,485 NFT children were in the final analysis group. Students in each school district site were tested at entry and then each spring until the third grade. The DI Model group included low income students in 20 communities. These communities varied widely­p;rural and urban­p;blacks, whites, Mexican Americans, Spanish American, Native Americans, and a diverse mixture of other ethnic groups.

The Stanford Research Institute was initially awarded a contract for data collection and Abt Associates received a contract for data analysis. The Office of Education determined the final design of the project with consultation from the Huron Institute. Because the sponsors had different approaches, the data collection was comprehensive. Assessment information was collected in the areas of basic skills (academic), cognitive, and affective behavior. The process of selecting appropriate assessment instruments was an arduous task given the time constraints of trying to select the most reliable, valid tests that could be administered in the least amount of time.

The following tests were used to assess basic skills, cognitive, and affective achievement: the Metropolitan Achievement Test (MAT), the Wide Range Achievement Test (WRAT), the Raven's Colored Progressive Matrices, the Intellectual Achievement Responsibility Scale (IARS+ and IARS-), and the Coopersmith Self-Esteem Inventory. The MAT is a respected achievement test that assesseses Basic Skills and Cognitive-Conceptual Skills. The Basic Skills scales of the MAT included Listening for Sound (sound-symbol relationships), Word Knowledge (vocabulary words), Word Analysis (word identification), Mathematic Computation (math calculations), Spelling, and Language (punctuation, capitalization, and word usage). The WRAT measured number recognition, spelling, word reading, and oral and written math problems.

The Cognitive Skills scales of the MAT included Reading (comprehension of written passages), Mathematics Concepts (knowledge of math principles and relationships), Mathematical Problem Solving (the use of reasoning with numbers). Also, the Raven's Coloured Progressive Matrices was used. The Raven's test, however, did not prove to discriminate between models or show change in scores over time.

Affective Skills was assessed using two instruments. The IARS was designed to assess whether children attribute their success (+) or failures (-) to themselves or external forces. The Coopersmith Self-Esteem Inventory is designed to assess how children feel about themselves, the way they think other people feel about them, and their feelings about school.

Comparisons Across Follow Through Sponsors

Students started in either kindergarten or first grade and were retested yearly through the end of third grade. While critics have complained about test selection and have usually suggested more testing, the assessment effort of this study was well beyond any other educational study conducted before, or since.

Significant Outcome Comparison
Abt Associates analyzed the data by comparing each Follow Through model's scores to both the local comparison group and the national pooled comparison group (created by combining the comparison groups from all nine Follow Through models). Local comparison scores and national pooled comparison scores were used as covariates to analyze each variable. A plus (+) was given if (a) the Follow Through (FT) group exceeded the Non-Follow Through (NFT) group by one-fourth standard deviation (.25 effect size) and (b) the difference was statistically significant. A minus (-) was given if the NFT score exceeded the FT score by one-fourth standard deviation (.25 effect size) and was statistically significant. If the results did not reach either the plus or the minus criterion, the difference was null and left blank.

The following index is based on a comparison of each model's site with the local and pooled national comparison groups. If either the pooled or local comparison were plus (+), the effect is recorded as a plus. If either or both was a minus (-), the effect is recorded as a minus. Then the plus and minus values are summed and multiplied by 100 so the possible range of scores was from -100 to 100. If the Follow Through model group scored consistently higher than the comparison group on a variable, then the index would be a positive number. If the comparison group scored higher, the index would be negative. If there was no difference between the two groups, the score would be zero (0.00).

Figure 1 shows the results of this analysis. As you can see by the number of negative scores, the local or national pooled comparison group scores were higher than most Follow Through models.

Only the Direct Instruction model had positive scores on all three types of outcomes (Basic Skills, Cognitive, and Affective). Overall, the Direct Instruction model was highest on all three types of measures.

Figure 1: Significant Outcomes Comparison Across Follow Through Models

The results were very different from expectations suggested by the model orientations. The three programs in the Basic Skills model had the best basic skills, cognitive skills, and affective skills scores. Of the three orientations, the Basic Skills models (Direct Instruction, Behavior Analysis, and Southwest Lab) had the best basic skills scores. The Cognitive models (Parent Education, TEEM, and Cognitively-Oriented Curriculum) ranked second in cognitive skills scores; however, the average rank of 5.0 is far from the average rank of 2.8 for the Basic Skills model. The Affective Models had the worst affective ranks (6.7 compared to 2.7 for the Basic Skills models).

Figure 1 provides more details on the models' rankings. The DI model had, by far, the highest basic skills scores while the other two Basic Skills models had more modest results (the Behavior Analysis model had a slight positive score and the Southwest Labs model score was 0.0).

Figure 1 also shows that none of the Cognitive Models had positive cognitive scores. In fact, the Direct Instruction Model was the only model of the nine that had a positive cognitive score (and the results were extremely positive - over 35%). In contrast, students in two of the three cognitively-oriented models [TEEM and Cognitive Curriculum (High Scope)] had the lowest cognitive scores.

Critics have often complained that the DI model was a pressure cooker environment that would negatively impact students' social growth and self-esteem. As the Abt Associates' authors note:

Critics of the model have predicted that the emphasis of the model on tightly controlled instruction might discourage children from freely expressing themselves and thus inhibit the development of self-esteem and other affective skills. (Stebbins, St. Pierre & Proper, p. 8)

Because of this expectation, the affective scores are of interest. Three of the five lowest scoring models on the affective domain were models that targeted improving affective behavior; none of the affective models had positive affective scores. In contrast, all Basic Skills models had positive affective scores with the Direct Instruction model achieving the highest scores. The theory that an emphasis on basic skills instruction would have a negative impact on affective behavior is not supported by the data. Instead, it appears that the models that focused on an affective education not only had a negative impact on their students' basic skills and cognitive skills, but also on their affective skills.

Fine Tuning the Results

The Bereiter-Kurland Reanalysis. A group funded by the Ford Foundation (House, Glass, McLean, & Walker, 1978) questioned certain aspects about the test selection and data analysis in the Abt report. After reviewing the critiques of the Abt Report by House et al., 1978), Bereiter and Kurland (1981-1982) reanalyzed the data of that report based on the criticisms that the report used an inappropriate unit of measurement for the dependent variable and inappropriate covariates. The Bereiter-Kurland reanalysis was based on:

·Using the site means as the dependent variable.

·Using these site scores as covariates: socio-economic status and ethnic and linguistic difference from the mainstream.

·Using only models that had data from 6 or more sites.

Each model had the possibility of 77 statistically significant differences (7 other models times 11 MAT subscale scores). Fifty of the 77 (65%) possible differences for the DI group were statistically significant based on Newman-Keuls Tests p=.05). In contrast, the Behavior Analysis group showed only 18 of 77 (23%) significant differences.

None of the other six models showed any statistically significant differences on any of the 11 MAT subscales (0 of 396 possible combinations). This means, for example, that none of the 11 MAT Bank Street scores differed significantly from any of the Responsive Education, TEEM, Cognitive Curriculum, Parent Education, or Open Education mean scores.

Another way of showing the difference between models was through the use of effect size comparisons. Figure 2 shows a different display of the information provided by Bereiter and Kurland (also Figure 2 in the Bereiter & Kurland review). In Figure 2, the effect size of the DI model is compared to the average effect size for the other Follow Through models. The differences are dramatic, even though the DI data include the Grand Rapids site that did not truly implement the DI model. The differences would be greater if only DI sites with implementation fidelity were included.

Figure 2: Effect Size Comparison (DI to Other Models)

To provide a clearer picture of the differences, Figures 3-4 display the Bereiter-Kurland findings according to domain. First, Figure 3 shows a comparison of effects for the Basic Skills scores between the DI group and the average effect size of the other Follow Though groups. Remember an effect size of .25 is thought to be educationally significant. Differences in some MAT Basic Skills subscales scores are over 3.0 (Total Language and Language B). The average difference in Basic Skills scores between Direct Instruction and the other models was 1.8.

Figure 3: Bereiter Analysis of Basic Skills Abt Data*

Figure 4 shows the differences in the cognitive scores between the DI models and the average Follow Through model. Effect sizes are above 1.0 for all but one difference.

Figure 4: Bereiter Analysis of Cognitive Ability Abt Data

Overall, the Bereiter-Kurland reanalysis provides even stronger support for the effectiveness of Direct Instruction. As the authors noted, only the DI and Behavior Analysis models had positive results and the DI model results were vastly superior.

Changing the Abt Report Criteria
Becker and Carnine (1981) had two other complaints about the Abt Associates report, which resulted in the report underrepresenting the superiority of the DI model. First, because of the problem of mismatches between comparison groups that initially had higher entry scores than the Follow Through model groups, Abt Associates deleted these data from subsequent analyses. Unfortunately for the DI model, sometimes the scores for the comparison groups were significantly higher at entry, but by the end of third grade the DI group scored significantly higher than the comparison groups. Abt Associates decided to delete these groups because of the initial entry differences. Also, data were excluded if there were significant differences between the two groups in preschool experience per site, even though preschool experience (e.g., Head Start) had only a very low correlation with later achievement (-0.09). (This variable was not used in the previously cited Bereiter-Kurland study.) Overall, approximately one-third of the data was excluded from most Follow Through models because of these decision rules.

Figures 5-7 show the differences in results based on these analyses. When data were kept for sites where there were initial performance differences, the highest scoring model (DI) scored even higher whereas the lower scoring models (Cognitive Curriculum and Open Education) scored even lower. The scores for the other models stayed roughly the same.

Figure 5: Index for Significant Outcomes for Cognitive Measures

Figure 6: Index for Significant Outcomes for Basic Skills Measures

Figure 7: Index for Significant Outcomes for Affective Measures

Figure 8: Percentile scores across nine Follow Through models

Becker and Carnine re-analyzed the data without the Grand Rapids site. The Grand Rapids site stopped using Direct Instruction when there was a change in program director. Even though this problem was well documented, Abt Associates included the Grand Rapids site in the DI data. Becker and Carnine reanalyzed the Abt Associates results without the Grand Rapids site. Figures 6-8 shows the already high scores for the DI group became even higher when the Grand Rapids data were removed.

Norm-Referenced Comparisons
Another way of looking at the Abt Associates data is to compare median grade-equivalent scores on the normed-referenced Metropolitan Achievement Test that was used to evaluate academic progress. Unlike the previous analysis that compared model data to local and pooled national sites, the following norm-referenced comparisons show each model's MAT scores based on the MAT norms. Figure 8 shows the results across four academic subjects. The comparisons are made to a baseline rate of the 20th percentile which was the average expectation of disadvantaged children without special help. The figure displays the results in one-fourth standard deviation intervals.

Clearly, children in the DI model showed consistently higher scores than the other models, and also the students in the Southwest Lab and the Open Education model were below expected levels of achievement based on norms of performance in traditional schools in all four academic subjects.

Only three of 32 possible reading scores of the other eight models were above the 30th percentile. The DI students scored 7 percentile points higher than the second place group (Behavior Analysis) and over 20 percentile points higher than the Cognitive Curriculum (High Scope), Open Education, and Southwest Lab Models.

Except for children in the DI model, the math results are consistently dismal. The only other model above the 20th percentile was the Behavior Analysis model. DI students scored 20 percentiles ahead of the second place group (Behavior Analysis) and 37 percentiles higher than the last place group (Cognitive Curriculum/High Scope).

In spelling, the DI model and the Behavior Analysis model were within the normal range. DI students scored 2 percentiles above the second place group (Behavior Analysis), 19 percentiles above the third place group, and 33 percentiles above the last place group (Open Education).

Like the previous academic subjects, the DI model was clearly superior in language. DI students scored 29 percentiles above the second place group (Behavior Analysis) and 38 percentiles above the last place group (Cognitive Curriculum/High Scope).

For many people the use of normed scores are more familiar than the use of the index described in the previous section. No matter which analysis is used, children who were in the DI model made the most gains when compared to the other eight models. With the possible exception of the Behavior Analysis model, all other models seem to have little positive effect on the academic progress of their children.

The increase amounts of money, people, materials, health and dental care, and hot lunches did not cause gains in achievement. Becker (1978) observed that most Follow Through classrooms had two aides and an additional $350 per student, but most models did not show significant achievement gains.

Popular educational theories of Piaget and others suggest that children should interact with their environment in a self-directed manner. The teacher is supposed to be a facilitator and to provide a responsive environment. In contrast, the successful DI model used thoroughly field-tested curricula that teachers should follow for maximum success. The Follow Through models that were based on a self-directed learner model approach were at the bottom of academic and affective achievement. The cognitively-oriented approaches produced students who were relatively poor in higher-order thinking skills and models that emphasized improving students' self-esteem produced students with the poorest self-esteem.

Subsequent Analyses

Variability Across DI Sites
The Abt Associates findings were criticized by House, Glass, McLean, & Walker, 1978) and then defended by others (Anderson, St. Pierre, Proper, Stebbins, 1978; Becker, 1977; Bereiter & Kurland, 1981-82; Wisler, Burns, & Iwanoto, 1978). One Abt Associates finding was that there was more variability within a model than between models.

This statement is consistent with the often cited belief that "Different programs work for different children" or another way of saying "Not all programs work with all children." The following sections provide research results that contradict this statement. The problem is that the statement doesn't match the data.

Gersten (1984) provided an interesting picture of the consistency of achievement scores of urban DI sites after the Abt report was completed. Figure 9 shows the results in 3rd grade reading scores from 1973 to 1981 in four urban cities. The reading scores are consistently around the 40th percentile. Based on non-Follow Through students in large Northwest urban settings, the expected score is the 28th percentile on the MAT. Some variability is due to the differences between tests when some districts changed tests over the nine year period. Also, Gersten mentioned that the drop in the New York scores in 1978 and 1979 may have been because of budgetary reductions during those years.

Figure 10 shows the stability of math scores. The math scores for these three sites tend to be consistently in the 50th percentile range. New York did not collect information of math during this period. Based on the math scores of large Northwest cities, non-Follow Through students would be expected to score at the 18th percentile.

Figure 9: Total reading scores for K-3 students. Stability of effects: Percentile equivalents at the end of Grade 3.


Figure 10:

Follow-Up Studies

Fifth and Sixth Grade Follow-up
Some critics of DI have indicated that many, if not most, early DI achievement gains will disappear over time. There are different reasons given for this prediction. One reason given is that the DI students were "babied" through sequences that made instruction easy for them. They received reinforcement and enjoyed small group instruction, but they would find it difficult to transition to the realities of the "standard" classroom.

DI supporters give different reasons for suggesting that DI results would decrease over time. The DI students were accelerated because they had been taught more during the available time than they would have been taught during the same time in a traditional program. Upon leaving Follow Through, they would be in instructional settings that teach relatively less than the Follow Through setting achieved. Statistically, there would be a tendency for them to have a regression toward the mean effect. Phenomonologically, students would be provided with relatively fewer learning opportunities and would tend to learn less accordingly.

In any case, the effects observed at a later time are not the effects of Follow Through. They are the effects of either three or four years of Follow Through and the effects of intervening instructional practices. Engelmann (1996) observed that because the typical instruction provided for poor children in grades 4 and beyond has not produced exemplary results, there is no compelling reason to use results of a follow-up to evaluate anything but the intervening variables and how relatively effective they were in maintaining earlier gains.

Junior and Senior High School Follow-up

New York City Follow-up
One of the most interesting long-term follow-up studies was conducted by Linda Meyer (1984). She tracked students from two schools in Ocean Hill-Brownsville section of Brooklyn. This district was one of the lowest of the 32 New York school districts. The fifteen elementary schools in District 23 had an average rank 519th out of the 630 elementary schools.

PS 137 was the only DI Follow Through site in New York City. Meyer selected a comparison school that matched the DI school on many variables. Over 90% of the students were minority students and over 75% were from low-income families.

Meyer retrieved the rosters of the first three cohort groups (1969, 1970, and 1971) and included students who received either three or four years DI instruction. With the cooperation of the New York City Central Board of Education and the Office of the Deputy Chancellor for Instruction, students were located through the computer database. Meyer and staff were able to locate 82% of the former DI students and 76% of the control students. These rates should be considered high because it would be expected that over time many students would move totally out of the area.

Table 1* shows the grade equivalent scores for the DI and comparison groups of the three cohort groups. At the end of 9th grade, the three DI groups were on average one year above the three comparison groups in reading (9.20 versus 8.21) (p.01) with an effect size of .43. In math, the DI groups were approximately 7 months ahead of the comparison group (8.59 versus 7.95) which was not statistically significant (p.09), but educationally significant based on an effect size of .28.

Table 1: Results of t-tests comparisons and effect sizes for reading and math at the end of 9th grade*

Achievement Growth for Other Sites
Gersten, Keating, and Becker (1988) provide similar information for other sites. Table 2* shows the effect sizes of the E. St. Louis and Flint sites at the end of ninth grade. Most effect sizes were above the .25 level of being educationally significant. It should be noted that the 3-K East St. Louis group that started in kindergarten, instead of first grade, had four years of instruction (not three) had the second highest effect size (.49).

Table 2: Ninth Grade Reading Achievement Results from E. St. Louis, Flint, and New York*

Table 3: Ninth Grade Math Achievement Results from E. St. Louis, Flint, and New York*

Table 3* shows similar effectiveness in the math. The results of these two analyses clearly show that while the superiority of DI diminishes with the time spent in traditional curricula, the advantage of the DI lasts. Educational significant differences occur in reading (overall effect size = .43) and in math (overall effect size =.25).

Graduation Rates and College Acceptance Rates at Other Sites

Darch, Gersten, & Taylor (1987) tracked Williamsburg (SC) students in the original Abt study (students entering first grade in 1969 and 1970) to compare graduation rates. All students were black and had stayed in the county school system. Table 4* shows a statistically significant difference in drop-out rate for Group 1 (the 1969 group), but the difference in drop-out rate was not statistically significant for Group 2 (the 1970 group).

Table 4. Longitudinal Follow-up Study: Percentage of Graduates and Dropouts for Direct Instruction Follow Through and Local Comparison Groups.*

A total of 65.8% of the Group 1 Follow Through students graduated on time in contrast to 44.8% of the comparison group (a statistically significant difference - p .001). For Group 2, 87.1% of the Follow Through group and 74.6% of the comparison group graduated on time (a nonsignificant statistical difference). Also, 27% of the Group 1 Follow Through were accepted into college in contrast to 13% of the comparison group; the difference for Group 2 in college admission was not significant.

Meyer, Gersten, & Gutkin (1983) calculated the rates of graduation, retention, dropping out, applying to college, and acceptance to college for the three cohort groups in the New York City site. Statistical analyses showed that the DI group had statistically significantly higher rates of graduation (p.001), applying to college (p.001), acceptance to college (p.001) and lower rates of retention (p.001) and dropping out (p.001). The differences in graduation rates were consistent across the three cohort groups with over 60% of the DI students graduating in contrast to less than a 40% graduate rate for the three comparison groups. Meyer mentioned in her report that the difference in retention rate between Cohort II and Cohorts I and III may have been due to the principal retaining all students below grade level one year.

Table 5: Percentages of Cohorts 1, 2, and 3 Students: Graduated High School, Retained, Dropped Out, Applied to College, and Accepted to College*

Educational reformers search for programs that produce superior outcomes with at-risk children, that are replicable and can therefore be implemented reliably in given settings, and that can be used as a basis for a whole school implementation that involves all students in a single program sequence, and that result in students feeling good about themselves. The Follow Through data confirm that DI has these features. The program works across various sites and types of children (urban blacks, rural populations, and non-English speaking students). It produces positive achievement benefits in all subject area - reading, language, math, and spelling. It produces superior results for basic skills and for higher-order cognitive skills in reading and math. It produces the strongest positive self-esteem of the Follow Through programs.

Possibly, the single feature that is not considered by these various achievements is the implied level of efficiency of the system. Some Follow Through sponsors performed poorly in math, because they spent very little time on math. Most of the day focused on reading and related language arts. Although time estimates are not available for the various sponsors, some of them spent possibly twice as much time on reading as DI sites did. Even with this additional time, these sites achieved less than the DI sites achieved. For a system to achieve first place in virtually every measured outcome, the system is required to be very efficient and use the limited amount of student-contact time to produce a higher rate of learning than other approaches achieve. If the total amount of "learning" induced over a four-year period could be represented for various sponsors, it would show that the amount of learning achieved per unit of time is probably twice as high for the DI sites as it is for the non-DI sponsors.

Perhaps the most disturbing aspect of the Follow Through results is the persistence of models that are based on what data confirms is whimsical theory. The teaching of reading used by the Tucson Early Education Model was language experience, which is quite similar in structure and procedures to the whole language approach. The fact that TEEM performed so poorly on the various measures should have carried some implications for later reforms; however, it didn't. The notion of the teacher being a facilitator and providing children with incidental teaching was used by the British infant school model (Open Education). It was a flagrant failure, an outcome that should have carried some weight for the design of later reforms in the US. It didn't. Ironically, it was based on a system that was denounced in England by its Department of Science and Education in 1992. At the same time, states like California, Pennsylvania, Kentucky, Ohio, and others were in full swing in the National Association for the Education of Young Children's idiom of "developmentally appropriate practices," which are based on the British system.

Equally disturbing is the fact that while states like California were immersed in whole language and developmentally appropriate practices from the 1980s through mid 1990s, there was no serious attempt to find models or practices that work. Quite the contrary, DI was abhorred in California and only a few DI sites survived. Most of them did through deceit, pretending to do whole language. At the same time, those places that were implementing the whole language reading and the current idiom of math were producing failures at a tragic rate.

Possibly the major message of Follow Through is that there seems to be no magic in education. Gains are achieved only by starting at the skill level of the children and carefully building foundations that support higher-order structures. Direct Instruction has no peer in this enterprise.

*Tables in this article could not be reproduced clearly in electronic format. Please refer to Effective School Practices, vol.15, no.1, or to Research on Direct Instruction, by G. Adams and S. Engelmann.


Adams, G.., & Engelmann, S. (in press) Research on Direct Instruction. Seattle, WA: Educational Achievement Systems.

Anderson, R., St. Pierre, R., Proper, E., & Stebbins, L. (1978). Pardon us, but what was the question again? A response to the critique of the Follow Through evaluation. Harvard Educational Review, 48(2), 1621-170.

Becker, W. (1977). Teaching reading and language to the disadvantaged­p;what we have learned from field research. Harvard Education Review, 47, 518-543.

Becker, W. C. (1978). National Evaluation of Follow Through: Behavior-theory-based programs come out on top. Education and Urban Society, 10, 431-458.

Becker, W., & Carnine, D. (1981). Direct Instruction: A behavior theory model for comprehensive educational intervention with the disadvantaged. In S. Bijon (Ed.) Contributions of behavior modification in education (pp. 1-106), Hillsdale, NJ: Laurence Erlbaum.

Bereiter, c., & Kurland, M. (1981-82). A constructive look at Follow Through results. Interchange, 12, 1-22.

Darch, C., Gersten, R., & Taylor, R. (1987). Evaluation of Williamsburg County Direct Instruction Program: Factors leading to success in rural elementary programs. Research in Rural Education, 4, 111-118.

Gersten, R. (1984). Follow Through revisted: Reflections on the site variability issue. Educational Evaluation and Policy Analysis, 6, 411-423.

Gersten, R., Keating, T., & Becker, W. (1988). The continued impact of the Direct Instruction model: Longitudinal studies of Follow Through students. Education and Treatment of Children, 11(4), 318-327.

House, E., Glass, G., McLean, L., & Walker, D. (1978). No simple answer: Critique of the FT evaluation. Harvard Educational Review, 48(2), 128-160).

Meyer, Gersten, & Gutkin, (1983). Direct Instruction: A Project Follow Through success story in an inner- city school. Elementary School Journal, 84, 241-252.

Meyer, L. A. (1984). Long-term academic effects of the Direct Instruction Project Follow Through, Elementary School Journal, 84, 380-394.

Stebbins, L. B., St. Pierre, R. G. , & Proper, E. C. (1977). Education as experimentation: A planned variation model (Volume IV-A & B) Effects of follow through models. Cambridge, MA.: Abt Associates.

Wisler, C., Burns, G.P.,Jr., & Iwamoto, D. (1978). FT redux: A response to the critique by House, Glass, McLean, & Walker. Harvard Educational Review, 48(2), 171-185).

For information on ordering Research on Direct Instruction, contact Educational Achievement Systems, 319 Nickerson St., Suite 112, Seattle, WA 98109. Phone or Fax (206) 820-6111.

Back to Table of Contents