Response to "The High/Scope Preschool Curriculum Comparison Study Through Age 23"
Siegfried Engelmann, Director
National Institute for Direct Instruction
Based on the follow up on three groups of children who had different preschool experience (Direct Instruction, High/Scope and Nursery School), Schweinhart and Weikart suggest that Direct Instruction causes antisocial behavior. The follow-up, which occurred 20 years after the preschool exposure, is presented as a monograph, Lasting Differences (1997), and as an article in Early Childhood Research Quarterly, "The High/Scope Preschool Curriculum Comparison Study Through Age 23" (1997).
Most of the data that Weikart and Schweinhart present may be rejected out of hand because it is non-significant, and the only major finding that has not been presented in earlier High/Scope reports is the arrest data, which the authors declare shows that DI children had a significantly greater number of felony arrests than children in the other curriculum groups. The authors clearly implicate the preschool instructional practices as the cause of this difference. In the monograph, they write, "The increase in felony arrests might well be considered a harmful effect of providing a Direct Instruction program for young children living in poverty" (p. 66).
The problems with this conclusion are revealed only through some detective work because of the awkward way the data are presented. There were originally 68 children in the entire study —23 in DI, 22 in High/Scope, and 23 in Nursery school. Instead of presenting tables with actual numbers of children, Weikart and Schweinhart convert them to percent values, which they sometimes add. Neither practice is reasonable. Adding percents sometimes yields a value that is impossible because it is not the correct percent for the actual number of subjects involved in the computation. Also, because the data tables do not indicate the number of subjects in the three curriculum models (only the total number for all), the only way to determine the actual number in each group is through inferences based on the percent values presented. The apparent reason for the percents is to make the study seem large, involving many subjects, when in fact the High/Scope sample is smaller than that of the other models and frequently has 14 or fewer subjects.
The authors attempt to establish statistically significant differences between the preschool programs. The procedures the authors use are probably inappropriate because the groups are small, and they are not well matched in number of subjects, sex, mobility, and differences in home environments. However, the data presentation has problems far more basic than those of statistical methods. The most severe problems have to do with elementary issues, such as the number of subjects actually involved in the comparison.
For the felony-arrest data, the number of subjects becomes a central issue. Of the 68 original preschool participants, 52 were reported to have been interviewed at age 23. In the monograph version of the table that deals with arrest records (Table 12, p. 53) the reported N is 68, which means that the table ostensibly reports on every subject who went through the preschool. This number assumes that the authors have data on every subject—data on whether or not each subject is still alive, data on where each resides, and data on the subject’s arrest record.
In the Early Childhood Research Quarterly article, the reported N for the arrest-data table is 62, not 68 (Table 6, p. 133). The revised N is an admission that there are at least 6 subjects for which there is no valid arrest data.
A problem with these two tables is that both of them present some of the same percentage values, which means that not all the values are possible. If done correctly, the percentages for the subgroups would change as the Ns change. However, the two arrest tables present the same per-capita felony-arrest numbers for all three curriculum groups, and the same percentages for 1-2 arrests and 3-4 arrests. The percentages for the DI group in both tables are 22% and 17%. Both these percentages are impossible for a group of 21 subjects, which would be the size of the DI group if the total N were 62. Likewise, the Nursery School group has 4% and 13%. Both these numbers are impossible with an N of 22, which would be the size of the group in the Early Childhood Research Quarterly report.
The total N for the study is further complicated by the authors’ description of which subjects were interviewed at age 23. The monograph’s Table 3 (p. 23), which presents demographic data on the subjects, indicates that 52 subjects were interviewed. It even indicates where they were interviewed, with 75 percent of them (39 subjects) interviewed at home and the remainder (13 subjects) accounted for in a footnote (b). One irregularity with this table, however, is that there is no information about the whereabouts of three of these interviewed subjects. For the heading in the table Current Home, the N is indicated as 49, which means that the location of three of the interviewed subjects was unknown, even though there was a record of where the interview took place.
How is that possible? If the subjects were interviewed, how could their "home" be unknown —particularly if the classification will either be Ypsilanti, the county, the state, or outside Michigan? It seems impossible. Even if, for some incredible reason, the data on where these three subjects resided were lost, but all the other data on them were retained, the N for Current Home would still be 52, and the three orphans would be listed under a heading, address unknown. They would have been "interviewed," and therefore counted as interviewed subjects, not discarded from the group of interviewed subjects. The background-information table in the Early Childhood Research Quarterly, (Table 2, p. 125) also indicates that 52 subjects were interviewed, but the "home" was identified for 50 subjects, not 49. So apparently one subject was found. (A note at the bottom of the table indicates that the N for the table is 68 unless otherwise indicated. Yet, the headings for which no deviation is indicated have an N of 52).
Another irregularity with the three interviewed subjects whose home is unknown is that all of them were members of the Nursery-School group. If, in fact, only 49 subjects were interviewed, the nursery school group would not have 19 interviewed subjects, as claimed, but 16. This reduction in number attenuates the apparent "statistical" effectiveness of this group.
Some of the assertions the authors make clearly suggest that the total number interviewed was 49 and not 50 or 52. For instance, in the Early Childhood Research Quarterly account, the authors state, "The 19 study participants who were not interviewed were retained in the arrest records sample" (p. 127). For now, we will not consider the soundness of this procedure, merely the number of subjects not interviewed —19. If there were 19 subjects who were not interviewed and 52 who were interviewed, the total N for the study would not be 68, but 71. This total is impossible because previous records indicate that the total N for the group was 68. The only other conclusion is that the reported number of subjects interviewed (52) is false. If 19 subjects were not interviewed, the correct N for interviewed subjects is 49.
As noted above, the reported N for felony-arrest data in the monograph is 68, although the Early Childhood Research Quarterly account indicates that the N is 62. The argument that the authors presented for determining both Ns for felonies is tenuous. In the Monograph, they argue, "Unlike missing school records, which simply count as missing data, missing arrest records signify the absence of arrests, giving a particular study participant a score of 0 for number of arrests" (p. 31). This conclusion follows only if the arrest records for all the subjects are thoroughly searched. In fact, Weikart and Schweinhart searched only the records for Michigan, not those for other states. Yet, they report that they did not interview 19 subjects and did not have the address for these 19. Therefore, it seems unlikely that they know whether these subjects live in Michigan or even whether all of them are still alive. The possibilities are that they lived at least some of their adult life in Michigan or none of it in Michigan. In the former case, they could have committed some adult crimes in Michigan. For the latter, they could have committed no adult crimes in Michigan. The authors’ conclusion, however, is that if there is no knowledge of where they live, they are assigned to live in Michigan.
In the Early Childhood Research Quarterly article, the authors present a somewhat moderated argument for establishing the total N of 62. They dropped 6 subjects from the group that had been interviewed because these subjects did not live in Michigan. The authors observe that "...study participants who were interviewed at age 23 in a state other than Michigan had a reduced chance of being arrested in Michigan....So...6 cases...were dropped from the sample" (p. 127).
This correction is reasonable, but it deals only with subjects who had been interviewed and who lived out of state. What about the 19 subjects who had not been interviewed and whose location was unknown? The authors argue that these subjects should be retained. Their rationale is that a search of Michigan state records resulted in percentages that are similar to percentages for the subjects whose location is known. The authors state, "Of the study participants not interviewed, 49% (8 of 19) had adult arrest records, only slightly less than the 56% (24 of 43) of the interviewed Michigan residents who had adult arrest records" (p. 127).
The argument rephrased goes something like this. "We don’t know where 19 subjects reside. We have information that 8 of them committed crimes in Michigan; therefore, all of them reside in Michigan and all of the crimes they ever committed occurred in Michigan." This argument is not logically sound or even reasonable. The idea that the percentages of arrests this group achieved in Michigan is evidence that all the subjects reside in Michigan is conjecture, not fact. (Note that the authors tacitly admit that the number of subjects interviewed was only 49, not 52. They observe that there were 43 interviewed Michigan residents who had adult arrest records. If we add in the six cases interviewed in a state other than Michigan, the total for those interviewed is 49.)
A more serious problem with the arrest records for the 19 subjects not interviewed is that again, the numbers are inconsistent. The authors state that 8 of the 19 subjects not interviewed had arrest records and that the resulting percentage was 49%. In the first place the percentage for 8/19 is not 49%, but 42%. So the percentage is not as close to 56% as the authors suggest. In the second place, both percentages are contradicted by the authors’ description of the resulting Ns for the three groups. The authors indicate that "1 of 4 Direct Instruction group members, 3 of 8 High/Scope group members, and 1 of 7 Nursery School members had adult arrest records" (p. 127). The description accounts for all 19 members, but it indicates that only 5 of them had adult arrest records. The resulting percentage of the 19 subjects who had adult records in Michigan was therefore not 49% or 42%, but 26%, which means that the authors’ argument that the percentage of arrests for the missing 19 was the same as that for the interviewed sample is spurious. The arrest percentage for the 19 is less than half of that for the interviewed subjects, which means that if percentages are used as a basis for determining the number of the subjects assigned to live in Michigan, less than half of the subjects not interviewed live in Michigan.
A different comparison between the Michigan subsample and the entire group appears in the monograph (p. 55). Here, the authors refer to felony arrests, not to adult arrests, and they present data that purportedly demonstrates that the rate of felony arrests is substantially the same for the Michigan subset as it is for the entire group. The data actually shows a much higher rate for Michigan residents than for the others, but the numbers presented for the Nursery School group are particularly revealing. The average felony arrests for the Michigan subsample of NS is reported at 0.5, and for the entire NS group it is 0.3, which is mathematically impossible. There were 15 subjects in the Michigan subsample and (according to the authors’ reckoning) 23 in the entire sample. 0.5 of 15 is 8 subjects, but 0.3 of 23 is only 7. So the authors would have us believe that part of the group had 8 felony arrests, but the entire group had only 7. Even if we assume that this is simply a rounding error and that the Michigan group had only 7 arrests, we would be faced with the obvious contradiction that the Michigan sample had a much higher rate of arrests than the non-Michigan sample —7/15 versus 0/7. A skeptic might conclude that there has been manipulation of this data.
So what is the proper total N and the Ns for the three subgroups’ arrest data? If we remove the 19 not interviewed subjects and remove the subjects who were interviewed in a state other than Michigan, the total number is 43. If we add in those 5 subjects whose address is not known but who committed crimes in Michigan, the N increases to 48. This may be the most reasonable number. It represents the group for which there is information about crimes in Michigan.
With a total N of 48, the Ns for the various subgroups would be: 18 for DI, 14 for H/S and 16 for NS. When these numbers are used, the statistically significant difference for felony arrests disappears.
Even if we disregard all these manipulations, however, the case that Weikart and Schweinhart present does not show that there were any statistically significant differences on convictions for felonies. The "significant" data that the authors have advertised as showing that DI promotes crime is based on "arrest" data, not on data about whether the subjects were judged to be guilty. The data reported by the authors on convictions shows that whether the total N is 68 or 62, there is no statistically significant difference between the groups on convictions for felonies. So even if the authors had the benefit of great doubt about whether there were significant differences in arrests, the data would not support the authors’ assertions that DI causes more crime, only that it results in more arrests. If the authors are to make assertions about the rate at which crimes are committed, (rather than the rate at which arrests are made) the authors would need to refer to conviction data, which is something they do not always do. For instance, in a letter to the editor of the National Review, Schweinhart wrote, "... those who received Direct Instruction ...committed three times as many felonies...." Schweinhart’s numbers are wrong and his judgment of guilt is premature.
One factor that the authors gloss over in their analysis of data is the mobility of the subjects. The goal in conducting a comparison is to be able to make statements about what caused outcome differences. Therefore, the groups that are compared should have matched experiences —except for one. The extent to which there is more than one great difference in the composition or experiences of the group is the extent to which it is not possible for us to determine which of the differences or which combination of differences accounted for the differences in outcome.
The groups in the High/Scope comparison differed in preschool experience; however, they also differed in other ways. Their gender balance was greatly different, with the High/Scope group having nearly two thirds of its participants female. The high-school experiences were greatly different. The percentages that attended Ypsilanti High School were 83% for DI, 69% for High/Scope and 39% for NS. The percentages that lived in Ypsilanti at age 23 were significantly different: 84% for DI, 64% for H/S and 44% for NS. Finally, the number of confirmed subjects within each group at age 23 is different, with DI having 18, High/Scope having only 14, and NS having 16.
The authors have a curious way of dealing with the possibility that mobility could have any effect on the outcomes. They don’t address it. Instead, they make the following observation about the significant differences in mobility. "It seems unlikely that differential geographic mobility before high school is directly attributable to preschool curriculum model; it is probably best to treat it as a chance occurrence."
It’s hard to imagine how any thoughtful person would suggest this obtuse relationship. The issue is not whether the curriculum model causes mobility; the issue is whether the differences in mobility cause differences in later arrest data. Given that pre-high school children are not usually in a position to determine whether they will move out of the city, the county, or the state, the idea that the preschool model would be related to difference in mobility is not only absurd; it displaces attention to a straw-man issue and completely ignores the very reasonable possibility that moving to a different environment may cause a difference in arrest rate, rates which are highly correlated with particular environments. The difference in mobility may therefore result in children growing up in greatly different environments, and being subjected to different pressures that relate to criminal activities. The difference in environments is a more recent possible cause than the differences in preschool curricula; the difference in environments has a longer duration and provides a more pervasive effect on the behavior of the subjects. Stated differently, the differences in environment, mobility, and sex between the curriculum groups could be used to make a far stronger case for differences in arrest data than any arguments based on preschool curricula.
Another problem with the arrest data presented by Weikart and Schweinhart is that these authors have a larger sample of subjects that show how atypical the performance of the High/Scope group is. The Perry Preschool project had a much larger number of preschool students than those involved in the High/Scope comparison study. The curriculum for the Perry Preschoolers was the same as that of the High/Scope group in the curriculum-comparison study. The estimated arrest performance of Perry Preschool subjects was quite different from that of the High/Scope children in the comparison study. In the Early Childhood Research Quarterly article, the authors acknowledge this difference. They write, "In the High/Scope Perry Preschool study, the estimated average felony arrests by age 23 were 0.7 for the program group and 1.5 for the no-program group" (p. 134). The reported number for the High/Scope group in the High/Scope comparison was 0.2, and DI was 0.9. It seems quite obvious that 0.2 is farther from the Perry Preschool mean of 0.7 than DI number of 0.9 is. The DI subjects are only .2 from this mean; the High/Scope subjects are 0.7 from this mean. Given the magnitude of this difference, the authors should have recognized that their best data (the data for a larger sample of subjects) would strongly imply that the arrest rate for the small sample in the comparison study is not typical for High/Scope (and most probably not typical for N/S) but that DI performed quite similarly to the Perry Preschool program group.
The authors present a curious interpretation of the relationship between the Perry Preschool data and the DI group. They assert that "...The Direct Instruction program did not lead to more felony arrests than no preschool program would have, but neither did it lead to fewer felony arrests than no preschool program, as the other preschool programs did" (p. 134).
The felony arrests for no-program subjects and High/Scope subjects in Perry Preschool are 1.5 and 0.7 respectively. The arrests for the no-program group and DI are 1.5 and 0.9. The numbers in these comparisons contradict the assertion that the DI program did not lead to fewer felony arrests than no preschool program. If the High/Scope subjects in the Perry Preschool showed an advantage over the no-program subjects, the DI subjects likewise showed an advantage over the no-program subjects.
Note also that when the authors argued for categorizing all subjects whose address is unknown as Michigan residents, they appealed to the percentages they ostensibly discovered when searching the Michigan arrest records. They argued that if the percentages are close to those obtained for another sample, the entire group must be a Michigan group. In the case of overall program effect, they could have used a variation of the same argument, to wit: If the programs are the same, the numbers for arrests should be the same. Given that the arrest numbers are not the same for the Perry preschool High/Scope subjects and for the High/Scope group in the comparison study, the High/Scope comparison group is probably an outlier.
A final fact attenuates possible conclusions about arrest data being caused by particular preschool curricula. Eight of the original DI group and 8 of the NS groups had only one year of preschool (as four year olds) but all the High/Scope participants had two years of preschool (as 3 year olds and 4 year olds). So the duration of preschool for the groups was not well matched. Sixteen students experienced half of the preschool exposure that the other 52 experienced. If the preschool experiences caused lasting differences that manifested themselves in such outcomes as arrest rates, it would seem that the effects of the two-year program would be more pronounced than those of a one-year exposure. If no differences are observed between one-year subjects and two year subjects, the difference in preschool duration is not a possible cause in arrest rates, which means that the second year of preschool is apparently inert. But if the second year has no influence on arrest outcomes, and if there are other possible causes for explaining felony differences between the groups, it’s possible that first year had no influence either. Possibly, whatever differences are observed for arrest rates are caused by differences in gender balance and place of residence.
In fact, the authors confirm that there are no differences between the one-year and two-year preschool experience. They write, "To see if the shorter preschool program influenced the curriculum group difference in felony arrests, the analysis was conducted with the subsample who attended their preschool programs for two years. In the two year subsample, the mean number of felony arrests for each of the three curriculum groups was almost exactly the same as it was in the complete arrest sample" (p. 134). This procedure is circuitous. The most straightforward comparison would be between the one-year sample and the two-year sample. It may have been that this comparison revealed some uncomfortable differences, such as the one-year subjects tending to commit more felonies than the two-year subjects. In any case, the authors suggest that the lack of difference in felony rates between the subsamples supports their case that DI causes relatively higher arrest rates and that the NS model causes lower rates. The absurdity of this logic is evident by extending their argument. If it’s true that there is no difference between one and two years —both for programming the "good" attributes that occurred with the NS subjects and the "bad" that occurred with DI —would the authors predict that a subject who received only 2 weeks of DI or NS would have the same arrest rate as a two-year subject? If not, what is the "exposure time" required to program DI students to engage in activities that lead to a higher arrest rate and for NS subjects to become squeaky clean? Clearly, if length of preschool exposure is not a variable in arrest performance, either the preschool is not a principal variable in accounting for the arrest performance or we should give a serious consideration to the one-week preschool experience that programs children for life.
In summary the case Weikart and Schweinhart present falls far short of the mark of being scientific or even orderly. The numbers don’t add up; the arguments are illogical; the presentation is so laced with inconsistencies that it smacks of questionable "manipulations". The most serious problem, however, is that there is no data to suggest that preschool experiences had an appreciable influence on the rate of felonies. There are too many intervening influences, too many differences between the groups and their experiences to single out the preschool as the cause for differences in felonies.
Yet, the authors proceed with confidence in identifying the preschool experience as the single cause of differences in felony arrests, despite the fact that their data comes from three woefully small groups of subjects who had begun preschool with an average IQ of 78, groups not well matched in number, in duration of preschool, in gender balance, or in pre-high school mobility. The case that Weikart and Schweinhart present lacks the endorsement of statistical significance, even with the most liberal interpretations. And their denial that influences other than the preschool could affect adult performance sets a new standard for fatalism.
Weikart and Schweinhart would like people to believe that DI is harmful. In fact, DI has lots of data to show that it is greatly beneficial, that it promotes a positive self image, and that it is effective in teaching children skills that permit later academic success. (See Research on Direct Instruction, 1996.)
Adams, G. L., & Engelmann, S. (1996). Research on direct instruction: 25 years beyond DISTAR. Seattle, WA: Educational Achievement Systems.
Schweinhart, L. J., & Weikart, D. P. (1997). Lasting differences: The high/scope preschool curriculum comparison study through age 23. High/Scope Educational Research Foundation, Monograph 12.
Schweinhart, L. J., & Weikart, D. P. (1997). The high/scope preschool curriculum comparison study through age 23. Early Childhood Research Quarterly, 12, 117-143.
Schweinhart, L. J. (1998, June). [Letter to the editor in reference to:] Nadler, R. (1998, June). Feature article: Failing grade. National Review, pp. 1-5.