What Does it Mean to be a Research-Based Profession?

Bonnie Grossen

University of Oregon, Eugene


Mechanisms for Distinguishing Fads from Best Practice

Uses and Abuses of Research

What Can Teachers Do?

References



WHAT DOES IT MEAN TO BE A RESEARCH-BASED PROFESSION?

Research currently plays an extraordinarily weak role in educational decision-making. Lovely Billups, Coordinator of the American Federation of Teachers' Educational Research and Dissemination Program, uses a Mount Olympus metaphor to describe the forces that influence modern education. From Mount Olympus the gods in ancient Greek mythology controlled the events affecting the lives of humans living below. As Lovely tells it, modern education is not much different. Students, teachers, administrators, and school district officials live at the base of Mount Olympus. About half-way up a layer of clouds prevents these school personnel from seeing clearly the activities on top of education's Mount Olympus. On top is the professional support system for education: university teacher trainers, publishers, education consultants, researchers, and national curriculum organizations (e.g., the National Association for the Education of Young Children, the National Council of Teachers of Mathematics, the National Council of Teachers of English, and so on). State departments of education usually serve to link the schools to the professional support system above the clouds, so they would probably fit about where the layer of clouds are on Mount Olympus. The activities of these "gods" often set off new fads in education which strike the professional lives of the school personnel below like bolts of lightening from the clouds. With each new fad that strikes the schools, a burst of dollars flow from the schools up through the clouds into the pockets of those who provide the products and services for implementing the fad. As more lightening bolts strike, more dollars flow into the clouds.

The Mount Olympus metaphor illustrates serious problems in the professional support system for teachers. The system provides more incentives for faddism than for the development of practices that get results. The mechanisms for distinguishing expensive fads that fail from practices that produce visible results are weak and ineffective.

Mechanisms for Distinguishing Fads from Best Practice

Scientific research should be the centerpiece of any mechanism that distinguishes fads from practices that get results. Many "gods" on Mount Olympus dispute the utility of scientific research in education, with its emphasis on tests and measurement. The complexities of this dispute boil down simply to this: If the public were willing to buy invisible results, then it wouldn't matter whether a specific teaching practice produced visible results or not. However, today the public is clearly not willing to continue to pay for invisible results. A recent Gallup poll documents the growing impatience the public feels toward public education. For the first time ever, more than one third of Americans are willing to spend public dollars for private schooling (Lawton, 1996). If public education is to survive, it has to improve its ability to produce visible and better results.

To improve the results achieved by schools, the instructional practices that are shared widely across the profession should be limited to those that are most likely to produce better results. Scientific research is the best method for predicting the results that different practices are likely to produce. This research allows predictions for a larger group of children based on how something works with smaller samples of children. Those procedures that get better results across a number of teachers and across students are the ones that are worth sharing and only these belong in the shared professional-knowledge base of teaching. Those procedures that are not expected to work for more than one teacher or more than one student need not be shared and should not go into the professional-knowledge base. A professional-knowledge base developed through scientific research is a science; it contains instructional procedures that work well across the profession. A knowledge base developed any other way is known as quackery, dogma, superstition, and so forth. Figure 1 illustrates how research builds a science.


See Figure 1. A simple illustration of how a science develops.

Consumers of educational practice-teachers, and administrators-have some awareness of the need for scientific research and have learned to ask for it. Education reformers seem to have readily obliged this request; they generally claim to have a research base for their recommendations. These claims though are often very misleading.

First, much of what is passed off as research is not research at all but only opinion. Many readers who see something written with citations that include the name and date in parentheses believe that they are reading research. In fact, they may be reading a citation of opinion. For example:

Many people assume that if they see printed material in a book or journal that looks like this, they are examining research (Stephen-Bailie, 1992). What they do not realize is that professional opinion can be referenced in the same manner (Ruggles, 1993).
A small group of prolific professionals with strong beliefs, can write a great deal and quote each other's ideas (Stephen-Bailie, 1994; Grossen, 1993). They can create a circular research base that may appear to research (Stephen-Bailie, 1995), but may, in fact, just be bullshit (Ruggles, 1963).

Many recommendations for teaching practice are the opinions of academicians who justify these opinions only by their intent or "theory." Classroom evidence that the teaching recommendation will contribute to achieving the intent is often missing. For example, from principles such as, "In a democratic society people should be free to make choices," some might conclude, "Children should make choices and not be directed by the teacher." Researchers, on the other hand, might reason like this: "We want young adults in our society to have as many choices as possible. Educationally speaking, how do we best accomplish this-by giving children direction or by letting them make their own choices?" Scientific researchers would answer this question by sampling the effects of these different interventions, to see which one seemed to result in young adults with more choices. Researchers use classroom evidence to answer questions; pure theorists do not feel they need classroom evidence.

Claims of research support can be misleading because the research data may not have anything to do with the recommended teaching practice. For example, descriptive research documents that many students are not good at critical thinking. Data documenting this problem do not support any specific instructional procedures for solving the problem. One cannot prescribe any instructional procedures for teaching critical thinking from this research. No recommendation for teaching has a research base unless the recommendation has been followed and the results evaluated. Recommendations that have not been tried out and the effects evaluated scientifically should not become part of the shared professional-knowledge base of education.

Ellis and Fouts (1993, 1994) provide a classification system that is helpful in evaluating the strength of the evidence behind the statement, "The research says …" Their 3-level classification system is somewhat revised below:



LEVELS OF RESEARCH:

How Much Evidence is There to Support a Theory?

Level 1--"Basic" research.

Are there correlations? Is there a rational explanation (theory) for these correlations?

Level 2--Test of the theory in real classrooms.

In small-scale comparative studies, does the theory accurately predict which practices will result in better learning?

Level 3--Program Evaluation on a school- or district-wide basis.

In large-scale comparative studies, does the theory predict ?

Shared Knowledge Base

=information taught to preservice teachers

=information widely disseminated to the field of inservice teachers


According to Ellis and Fout's classification system, there are three levels of research. Level 1 research is "basic research on learning." Correlations, descriptive data and qualitative case studies comprise level 1 research. Level 1 research is abundant. However, no theory regarding teaching procedures is testable with descriptive and correlation research (level 1) no matter how abundant it is. For example, from the correlation between high achievement and high levels of self-esteem, some have concluded that self-esteem causes achievement and that by offering warmth and sympathy to children who fail will build self-esteem and higher achievement will result. This is not necessarily true. Take for example another correlation that shows that people with higher achievement have larger shoe sizes. The correlation is explained by a third variable: as children grow older they achieve more and their feet grow.

Correlation data can, however, be used to disprove a theory. If there is no correlation between two variables, one variable certainly does not predict the occurrence of the other. In the companion article, E.D. Hirsch summarizes an abundance of level 1 research that shows correlations in the opposite direction of what the most popular theories of today would predict. It is foolish to hold onto a hypothesis that is contradicted by level 1 research (correlation data).

At level 2 a theory describing how teachers should teach is tested by applying it in the classroom to see if it accurately predicts and gets better results than the practice it replaces. Different teaching interventions are compared at level 2 in controlled research studies to see if students learn more or better in classrooms using teaching procedures based on the theory. Students are randomly assigned to two or more instructional groups. One group learns one way, the other group(s) learn other ways. The results are compared. Researchers use statistics to decide whether any differences in the results were accidental or would be likely to occur again.

Level 2 tests of the hypothesis that expressing warmth and sympathy toward students who fail actually leads to a deterioration in self-esteem. For example, Graham (1984) found that warmth and sympathy towards students when they fail to succeed on school tasks serves to reduce further the students' beliefs in their own capabilities. According to Ellis and Fout's review of the research, mastery learning and cooperative learning are two practices that have strong level 2 support.

Level 3 research evaluates the effects of the recommended teaching intervention in school-wide or district-wide implementations. Level 3 is important because at this level the new intervention is integrated into all the other things that teachers must accomplish in a day. The danger of having only level 2 research support is that we may find that something is very good for reading, but when we get to level 3, we see that it takes so much time that it interferes with teaching other subjects such as math. For example, cooperative learning may get good results in reading at level 2, but may require so much time to get these results, that fewer learning goals are accomplished. This shortcoming would become evident through level 3 research. At level 3 scientists are not evaluating one hypothesis regarding one tool, but are evaluating the integration of a whole toolbox full of tools to maximize effectiveness.

Education is different from the field of science in that education reform leaders tend to call their hypotheses "theories." In science, the word "theory" is used to describe a hypothesis that has been tested. We can compare Ellis and Fout's three-level analysis with the traditional scientific method. Level 1, theory building in education, is equivalent to hypothesis formation in science. Level 2, theory testing, would be the actual experiment. In science, we wouldn't use the word "theory" until after level 2. Level 3 goes one step beyond the basic scientific method but seems necessary, given the tendency in education to pick up one new tool, even one that works, and carry it to an extreme, throwing out all the old tools though they are still necessary to do the complex job of teaching.

Constructing Knowledge from Evidence

EducationScientific Method
Level 1-theory building.1. Develop a hypothesis through informal observation.
Level 2-test the theory.2. Test the hypothesis by formally attempting to disprove it.

3. Analyze the data from the test to evaluate the truth of the hypothesis.

Level 3-develop high performing model schools.

Many reform "gods" emphasize the importance of "new" research and do not acknowledge any contribution from "old" research. If medicine functioned as education does, we would see large numbers of people suffering the debilitating effects of polio again, because the Salk vaccine, a discovery of the 1950's, was "old" research.

An educational system that looked to level 3 research would be more likely to integrate old as well as new research in a deepening understanding of the relationship between teaching and learning. To use only the most recent research to build our professional-knowledge unnecessarily limits our understanding. The nature of children's learning has probably not changed much in hundreds of years. Research is timeless. We can look back at old studies, look at the instruction, the measures, and the results and integrate these results with new research. In this way our professional-knowledge base grows over time and becomes more refined.

The professional-knowledge base for the teaching profession should come from high-performing schools. The ultimate measure of predictability is a level 3 demonstration of a high-performing model that is replicable. A high-performing model that is not replicable is not as useful to the profession as one that is replicable. A replicable model can be used to teach the rest of the profession how to get high achievement levels. A high performing model would be replicable if more than one high performing school can be developed from the same model. Those "gods" whose advice does not produce high performing schools that are replicable would not have influence, if the education marketplace looked to high-performing schools for knowledge about best teaching practice.

Because community wealth is such a strong predictor of performance level, only schools with similar socio-economic levels should be compared to identify high-performing schools. This way the effects of the instructional practices, rather than the wealth of the parents, can be seen. It is also possible that different intervention models will work better to solve the different problems that distinguish low income schools from high income schools. We would learn if this is true or not by comparing schools only with other schools of similar socio-economic status.

If high-performing schools (level 3 research) were the gatekeepers for new knowledge in education, then these working models of high performing teachers would become the dissemination centers for educational reform. The practitioners and researchers who actually get the results this country is demanding would become the leaders of the educational reform movement and teach other practitioners across the country how they, too, can achieve at high performance levels. These high performing schools could serve as training sites for preservice teachers as well. In a system that looked to high-performing schools, rather than to the gods, theories that have not yet produced results could not be widely disseminated. Faddism would not be able to get a foothold.

This is not to say that theories that are only at level 1 have no merit and will never work. It only means that new teachers should not be trained in theories with only level 1 support and that districts should not mandate practices or spend large amounts of money promoting teaching practices with only level 1 support. These restrictions would not prevent individual teachers from reading about level 1 research and working with it to see if promising interventions can be developed from it. Anyone using level 1 research would understand its limitations as such. It would not prevent school communities of teachers and parents from working with their favorite researchers and theorists with a goal to develop a new model of a replicable high performing school. But level 2 research would precede this kind of initiative, to reduce the educational risks to the larger number of children involved.

Level 2 research would remain very important for high-performing schools because level 2 comparative studies would allow identification of the specific components that are important for replicating the high performance. Only the teaching procedures that gather level 2 and 3 support should become part of our teacher-training programs and become widely disseminated across the profession.

With high-performing schools as the fountainhead of knowledge about teaching, major changes would occur on Mount Olympus. The "gods" would be required to come into the classrooms and demonstrate, before they could influence the profession at large.

Uses and Abuses of Research

A huge problem in education is that most innovations jump from level 1 straight into the "shared professional-knowledge base" without being tested. Most of the educational practices that become widely disseminated in our university teacher-training programs and across the nation do not even have Level 2 research support, nevermind level 3. Other professions have well-established gatekeepers that monitor the information that enters the shared professional-knowledge base. For example, the Food and Drug Administration is a gatekeeper for the medical profession.

In education though, the trend is to test each new hypothesis on a nationwide scale. When the result is a national failure, who gets blamed? Not the promoters, not the consultants providing staff development, not the university professors. The teachers are blamed for the failure: "The teachers didn't do it right."

To ask for level 2 research is to ask the people who are telling teachers how to teach simply to say, "Show us how it's done. Once you show us how to get these children performing at noticeably higher levels of performance then we'll take a look at what you've got to say. Don't bother us before that."

Piaget and Developmental Psychology

Piaget's work is only level 1 research because Piaget's observations never tried to teach children. The extensive research in developmental psychology that describes what children seem to do at different ages is only level 1 research. No teaching procedures are compared.

Theory of Multiple Intelligences

Howard Gardner's theory of multiple intelligences is another example of Level 1 research. What implications does the theory have for instruction? How should we use the theory of multiple intelligences? Where has it been shown to improve learning? Some ideas for using the theory of multiple intelligences in instruction have been suggested. However, few, if any, level 2 comparative studies have evaluated the effectiveness of these suggestions. Yet the theory of multiple intelligences is one of the most popular discussion topics in education today. Howard Gardener himself has said that the enthusiasm for the theory of mulitple intelligences has gotten a bit out of hand.

Interdisciplinary / integrated curriculum

According to Ellis and Fouts' review (1993), "the level 2 research is close to nil" (p. 153). The numerous claims (e.g., interdisciplinary curriculum improves higher level thinking, is less fragmented, heightens the opportunity for transfer of learning, improves mastery, positively shapes a learner's overall approach to knowledge, and improves motivation) should be treated as hypotheses for level 2 research. Block scheduling, often part of an integrated curriculum, also has no empirical basis.

Cooperative Learning

Cooperative learning has an extensive level 2 research base and cooperative learning is also one of the most widely used innovations of our time. It would seem that cooperative learning is one example of research that has successfully moved into practice. However, this is not the case. Cooperative learning was designed to complement teacher-directed instruction by providing further opportunity for students to work together using what they have learned. In most schools today, cooperative learning is used to replace teacher-directed instruction and students are expected to construct their own knowledge working in groups.

Furthermore, research shows that two elements are crucial to its success: group goals and individual accountability (Ellis & Fouts, 1993). "When group goals and individual accountability are clear, achievement effects of cooperative learning are consistently positive-37 of 44 experimental/control comparisons of at least four weeks' duration yielded significant positive effects" (Ellis & Fouts, p. 123). Unfortunately, schools implementing cooperative learning often use it without any individual accountability. Cooperative learning is more than simply group work on projects. Communicating the technical knowledge teachers need to implement cooperative learning so that it is effective requires extensive training. This training rarely occurs. So cooperative learning is essentially reduced to another fad.

Reading Recovery Research

Reading Recovery promoters claim success rates of up to 90%. These "success rates" are often used to support the continuation and expansion of Reading Recovery in America. The Reading Recovery "success rate" data do not qualify as level 2 or level 3 research because there are no comparisons. The "success rate" data are collected according to procedures specified by the Reading Recovery promoters. Reading Recovery teachers collect their own data and give the numbers to their supervisors who give the data to their university training programs who then give the data to the National Diffusion Network, where final tallies for the nation are made. The results of any comparative studies are not included in the calculations of these "success rates."

The few studies that have compared Reading Recovery with alternative interventions do not provide the same glowing picture of Reading Recovery success. Iverson and Tunmer (1993) compared Reading Recovery with a program that included an explicit phonics decoding strand and found that the comparison program achieved better results than RR. Center, Wheldall, Freeman, Outhred, and McNaught (1995) compared Reading Recovery instruction with a control group and found that half the successful children in Reading Recovery would have achieved these levels without any assistance program for reading. These level 2 comparative studies are rarely cited, while the noncomparative "success rate" data are widely publicized.

The "success rate" data without the comparisons are misleading; consumers get the impression that about 90% of the children in Reading Recovery become proficient readers, never needing further assistance in reading. Because this success rate seems higher than most consumers have experienced in their schools, consumers believe that Reading Recovery will improve reading performance.

The success rate data are misleading for three reasons. First, according to the prescribed procedures for collecting success rate data, children who are not successful early in the program can be referred to special education or simply dropped before going through the entire program and not counted in the tallies. According to Shanahan and Barr (1995) about 30% of the total number of children entering Reading Recovery are excluded in the final tallies. The practice of systematically eliminating unsuccessful children from the count before the end of the study inflates the reported success rate considerably. It is like "looking in the box" in the illustration in Figure 1, to pick only items that support a researcher's hypothesis.

Secondly, "success" is measured with "predictable text," that is, text that is highly cued by context. These cues include sentence patterns and picture cues. Standardized measures show no advantage for Reading Recovery (Hiebert, 1994; Pinnel, Lyons, DeFord, Bryk, & Seltzer, 1994). The ability to succeed in predictable text apparently does not transfer very well to "authentic" text on standardized measures, which is not so predictable. This could explain why RR children seem to fall immediately behind again when they return to the regular classroom (Hiebert, 1994).

Thirdly, "success" is defined as achievement at the level of the local class. Reaching the local classroom average means that in low income areas, children are "successful" when they match the performance of a class that reads at only about the 20th percentile. At this low level, these children are generally all still nonreaders. To call a nonreader a "success" is quite misleading. Furthermore, bringing poor children to a lower criterion than wealthy children is inequitable. The program goal should be the same for all children, regardless of ethnic group or socio-economic status.

Project Follow Through-Level 3 Research

Though level 3 research has been rare, it has occurred. Project Follow Through, the largest, most expensive research study in the history of education, involved level 3 research. Follow Through began in 1967 as part of Johnson's war on poverty and continued to receive funding until last summer, 1995. In a massive effort to break the cycle of poverty through better education, Follow Through affected over 70,000 children in over 180 schools over a period of nearly 30 years and cost taxpayers over a billion dollars. The goal was to identify teaching models that could raise the level of performance of America's poorest schools from the 20th to the 50th percentile (even with mainstream America).

The results were quite controversial, especially since a preschool teacher's model, Direct Instruction, won in a race against the models developed by the gods on Mount Olympus. Figure 2 shows the mean national percentile levels achieved by about 10,000 third-grade children. The baseline in Figure 2 is at the 20th percentile. This is the normally expected level of performance for children in poverty. When the bars go above the 20th percentile, the children's academic performance was improved over the expected level. When the bars go down, their performance was hindered by the model. That is, the children learned less than they would have without the model. You can see that the children in the DI model scored very near the 50th percentile, the targeted levels of performance in all subjects. The scores of children in the other models were often lower than the 20th percentile.

Figure 2. Graphic representation of the data reported by Abt Associates, 1977.

Most analysts of the Follow Through evaluation data concluded that structured, teacher-directed instruction resulted in stronger academic outcomes than the popular child-centered models (Adams & Engelmann, in press; Bereiter & Kurland, 1981; Kennedy, 1978; Lindsley, 1992; McDaniels, 1975; Stebbins, St. Pierre, Proper, Anderson, & Cerva, 1977).

"The two high-scoring models according to our analysis are Direct Instruction and Behavior Analysis; the two low-scoring are EDC Open Education and Responsive Education. If there is some clear meaning to Follow Through results, it ought to emerge from a comparison of these two pairs of models. On the one hand, distinctive characteristics of the first pair are easy to name: sponsors of both the Direct Instruction and Behavior Analysis models call their approaches "behavioral" and "structured" and both give a high priority to the three Rs. EDC and Responsive Education, on the other hand, are avowedly "child-centered." Although most other Follow Through models could also claim to be child-centered, these two are perhaps most militantly so and most opposed to what Direct Instruction and Behavior Analysis stand for." (Bereiter & Kurland, 1981, pp. 16-17)

The Direct Instruction model consisted of lesson plans developed by Siegfried Engelmann, who was well-known for the remarkable results his students achieved. He thought to share his expertise he would share his lesson plans. Engelmann expected the scripts to work as scaffolds for teaching teachers specific effective instructional procedures. The lesson plans were not just made up out of the blue. They were developed in a process much like that described in the "polished stones" (Stigler & Stevenson, Spring, 1991). Engelmann and his colleagues developed the lessons by trying them out with different teachers and with different pupils and revised them based on the feedback from these tryouts. Only after going through this process were the lesson plans finalized and sent out to the teachers in Follow Through.

Bereiter and Kurland (1981) point out an aspect of the lesson plans that seemed to account for the higher achievement scores:

"Child-centered approaches rely almost exclusively on a form of instruction that … may be called relevant activity. … The instructional approaches used in Direct Instruction and Behavior Analysis reflect years of analysis and experimentation devoted to finding ways of going beyond relevant activity to forms of instruction that get more directly at cognitive skills and strategies. This effort has been successful in some areas, not so successful in others, but the effort goes on. Meanwhile, child-centered approaches have tended to fixate on the primitive, relevant-activities form of instruction for all their instructional objectives." (Bereiter & Kurland, 1981, p. 20)

The most popular child-centered models resulted in more negative outcomes than positive ones (Stebbins, St. Pierre, Proper, Anderson, & Cerva, 1977). Abt Associates, the independent group who analyzed the data, noted that the most surprising outcome was that the Direct Instruction model had the best outcomes for self-esteem (Bock, Stebbins, & Proper, 1977). The models that indicated that improved self-esteem was their targeted goal, often resulted in more negative outcomes on the self-esteem measures. The Direct Instruction model did not target self-esteem as a goal. The sponsors predicted that by targeting academic success and engineering the instruction so that students were highly successful each step of the way, self-esteem would result.

Overall, the learning and self-esteem of children of poverty was hampered by nondirective methods, methods that claimed to be "liberal" in their ideology. The three lowest scoring models in the Abt analysis, ones that had negative results, are models that are widely promoted today. The model with the most negative outcomes, the Open Education model, was the British infant and primary school model that is promoted today by the National Association for the Education of Young Children under the new name, "developmentally appropriate practices." The Cognitively-Oriented Curriculum is High Scope. The TEEM model was a language experience approach for reading instruction, called "whole language" today.

What happened next?

Before the final evaluation was even published, the Ford Foundation funded House and Glass to critique the evaluation. A widely read critique in the Harvard Educational Review by House, Glass and others (House, Glass, McLean, & Walker, 1978) emphasized that the Follow Through evaluation had asked the wrong question: Instead of asking "what model works best," the evaluation should have asked, "what makes the models work" or "how can one make the models work better?"

In a 1981 report to the National Institute of Education, predecessor of the current OERI, Glass and Camilli argued that the results of Follow Through should not be used to guide school policy. Two major reasons they gave were:

1. "The NIE should conduct evaluations emphasizing an ethnographic, principally descriptive case-study approach. ... Evaluations of Follow Through have been quantitative, experimental approaches. The deficiencies of quantitative, experimental evaluation approaches are so thorough and irreparable as to disqualify their use" (from ERIC abstract ED244738 of Glass & Camilli, 1981). Translated, Glass and Camilli contend that only level 1 research is valid and level 2 or level 3 research is not.
2. Glass and Camilli gave another reason for keeping the results quiet: "The audience for Follow Through evaluations is an audience of teachers to whom appeals to the need for accountability for public funds or the rationality of science are largely irrelevant" (from ERIC abstract ED244738 of Glass & Camilli, 1981). In other words, teachers don't need to know about this.

The House and Glass critiques were widely accepted in the profession. The critique of the Follow Through research, funded by the Ford Foundation, seemed successful in discrediting the results. Hardly anyone heard about the Follow Through evaluation, especially teachers. Without really understanding the basis for the critique, the educational community seemed happy to accept that the Follow Through evaluation was a mistake. And in spite of the fact that Follow Through funding continued until 1995, with an increased level of funding for the models that were not validated as effective, there was no further evaluation (Watkins, 1996).

The fundamental basis of the House and Glass critiques was that we should not use level 2 or level 3 research to develop our professional-knowledge base; we should rely on level 1 research. The upshot of this line of reasoning though, is to deny a scientific basis for education and open the door wide to faddism.

In the 1990s now we are in the midst of a flurry of educational reforms that have rejected directive teaching and emphasize child-centered approaches again. Most of these reforms are very similar to the models that failed in Follow Through. What has happened to explain this change? Have modifications been made in the Open Education model to make it much more effective in its reincarnation as "developmentally appropriate practices"? Does whole language improve the language experience approach substantially? Has High Scope changed so that the old data from 1977 no longer hold?

From a logical standpoint, one cannot use level 1 research to overturn level 3 research. To claim that the poorest performing models in Follow Through are now research-based in the 90s requires successful demonstrations at level 3. (Even level 2 might provide some evidence.) If such demonstrations were achieved and were more recent than the Follow Through evaluation, then we could say that the Follow Through data no longer hold or that these models have been improved.

The NAEYC and "Developmentally Appropriate Practices"

In the mid-80s, the National Association for the Education of Young Children convened a committee to define best practices for teachers. These were called "developmentally appropriate practices" (DAP). Johnson and McChesney Johnson (1992), who served on the NAEYC committees and who advocate for DAP, described the purpose of these committee meetings as follows:

"DAP was born from meetings of the NAEYC in the mid-80s in an effort to foster professional identity and visibility for the early childhood practitioner."

In these committee meetings no effort was made to review and synthesize research in defining best practice. According to Johnson and McChesney Johnson (1992), this is how the guidelines for DAP were defined:

"DAP was never seen as needing to be exclusively or even primarily based on research literature…. Folklore and personal accounts of best practices passed on from one generation of teachers to the next counted a great deal…. The types of citations used to reference the NAEYC publications of DAP guidelines clearly indicate a reliance on sources other than articles reporting original empirical data (i.e., bona fide research)… Only 13 of 25 references cited in the DAP report were original reports of research (Kontos, 1989)."

DAP was not based on "bona fide" research. Yet DAP was publicized as a set of guidelines for how teachers should teach. The NAEYC has been very active in promoting DAP through legislative mandates and among state departments of education across the country. In 1991, Oregon's Educational Reform Act mentioned "developmentally appropriate practices" many times. By 1993, it became clear that the definition of this term was to be taken as a statewide mandate for the specific NAEYC DAP guidelines. Elementary teachers in the state were expected to learn and implement DAP. State Department of Oregon officials visiting schools have criticized teachers and schools, if the children were seated facing the teacher or listening to the teacher, a practice labelled as inappropriate in the NAEYC guidelines. By the 1995 legislative session, parents, teachers, and others flooded the capitol with complaints. Legislative aides indicated that no issue had ever brought so many people to speak to the legislature as the Oregon education reform bill had.


APPROPRIATE
INAPPROPRIATE
Each child is viewed as a unique person with an individual pattern and timing of growth….For example, not every child will learn how to read at age 6; most will learn to read by 7; and some will need intensive exposure to appropriate literacy experiences to learn to read by 8 or 9. Children are evaluated against a standardized group norm. All are expected to achieve the same narrowly defined, easily measured academic skills by the same predetermined time schedule typically determined by chronological age and grade level expectations.
The curriculum is integrated so that children's learning in all traditional subject areas occurs primarily through projects and learning centers that teachers plan and that reflect children's interests and suggestions. Curriculum is divided into separate subjects and time is carefully allotted for each with primary emphasis given each day to reading and secondary emphasis to math.
The curriculum is integrated so that learning occurs primarily through projects, learning centers, and playful activities that reflect current interests of children. For example, a social studies project such as building and operating a store or a science project such as furnishing and caring for an aquarium provide focused opportunities for children to plan, dictate, and/or write their plans (using invented and teacher-taught spelling), to draw and write about their activit y, to discuss what they are doing, to read nonfiction books for needed information, to work cooperatively with other children, to learn facts in a meaningful context, and to enjoy learning. Skills are taught as needed to accomplish projects. Instructional strategies revolve around teacher-directed reading groups that take up most of every morning, lecturing to the whole group, total class discussion, and paper-and-pencil exercises or worksheets to be completed silently by children working individually at desks. Projects, learning centers, play, and outdoor time are seen as embellishments and are only offered if time permits or as reward for good behavior.
Teachers try to motivate children by giving numerical (85%) or letter grades, stickers, gold stars on charts, candy, or privileges such as extra minutes of recess.
Children's progress is reported to parents in letter or numerical grades. Emphasis is on how well the child compares to others in the same grade and to standardized national averages.
(pp. 67-78, Brederkamp, 1987)

Figure 3. A sample of some of the appropriate and inappropriate practices listed in the DAP guidelines for Primary Grades (ages 5 to 8).

What was the research base supporting the DAP mandate? Of the 40 "research, not opinion literature" references cited to support the statewide mandate, only 7 studies involved level 2 research. Piaget's work and the child development studies are level 1 research. Four of the level 2 studies actually reported significant differences favoring the "inappropriate" teacher-directed practices; 3 studies found no significant differences; and none supported DAP.

An eighth reference of the 40 (Goodlad & Anderson, 1987) reviewed research on nongraded models, but did not break these models down according to type of practice used (i.e., DAP or teacher-directed practices). Another source, Gutierrez and Slavin (1992) did do such a breakdown. The breakdown by practices showed that the "nongraded organization can have a positive impact on student achievement if cross-age grouping is used to allow teachers to provide more direct instruction to students but not if it is used as a framework for individualized instruction" (p. 333). Gutierrez and Slavin also state:

"The movement toward developmentally appropriate early childhood education and its association with nongrading means that nongraded primary programs will probably be more integrated and thematic, and less academically structured or hierarchical [than the nongraded models evaluated in the research]… Whether these instructional methods will have positive or negative effects on ultimate achievement is currently unknown." (p. 370, 1992)

Eighteen more references were unpublished local reports describing the activities of the pilot schools in Oregon using DAP. None of these schools scored very high on the Oregon Statewide Assessment. In fact, the scores of some went down after adopting DAP. The positive evaluations the state gave these models were largely based on perceptions of the conformity of a school's program to the NAEYC guidelines. The more the school seemed to reflect the use of DAP, the better it was determined to be, regardless of lower scores on standardized tests. Here is an excerpt from a personal letter I received from a superintendent in Oregon:

"It never ceases to amaze me that no one seems to be actually utilizing the Oregon Statewide Assessment to ascertain program effectiveness and accountability. Over the past three years, in reviewing the listings of the Statewide Assessment scores by building and district throughout the state, it is very clear (and also disheartening) that many of the very schools which are touted as exemplary models do not have the test scores (measureable outcomes) to match.

"This is particularly discouraging for a school district such as ours, where we have … consistently achieved exemplary results in the Statewide Assessment; these proven successes are paid no heed. Given such examples, one cannot help but question the wisdom of our state's instructional 'leaders.'" (personal correspondence from a superintendent of an Oregon school district)

In England, where DAP has been the officially endorsed approach for over 20 years in the British infant and primary schools, achievement scores have gone steadily down. In 1992 the English Department of Education and Science published a scathing indictment of the DAP model (Department of Education and Science, 1992).

"The rhetoric of primary education has for a long time been hostile to the idea tha young children should be exposed to subjects. Subject divisions, it is argued, are inconsistent with the child's view of the world. Children must be allowed to construct their own meanings and subject teaching involves the imposition of a received version of knowledge. And, moreover, it is the wholeness of the curriculum which is important rather than the distinct identity of the individual subjects." (para 63)

"Each of these familiar assertions needs to be contested. First, to resist subjects on the grounds that they are inconsistent with children's views of the world is to confine them within their existing modes of thought and deny them access to some of the most powerful tools for making sense of the world which human beings have ever devised. Second, while it is self evident that every individual, to an extent, constructs his / her own meanings, education is an encounter between these personal understandings and the public knowledge embodied in our cultural traditions. The teacher's key responsibility is to mediate such encounters so that the child's understanding is enriched. And, finally the integrity of the curriculum as a whole is hardly likely to be achieved by sacrificing the integrity of its constituent parts." (para 64)

The NAEYC is currently holding committee meetings once again to define a new set of guidelines for teaching practice. Does the process involve a careful review and synthesis of level 2 and level 3 research this time around? Probably not.

Whole Language

Several national organizations promote whole language as the official best practice for teaching language arts, including the NAEYC, the Whole Language Umbrella, the International Reading Association, and the National Council of Teachers of English. Whole language is very similar to the language experience approach that characterized the TEEM model in Project Follow Through. In the Follow Through evaluation, the language experience approach (TEEM) had small positive effects in language arts performance, but required so much time that the model had negative effects in mathematics.

A distinction that is often made is that whole language is not just a set of teaching activities, as language experience was, but includes a philosophy. Is there research to support whole language as improved and effective practice? Whole language advocates vocally reject the "scientific paradigm." In other words, the whole language leaders reject the use of controlled comparative studies to identify effective teaching procedures. (See logs of email discussions in October and November, 1995, where whole language leaders reject comparative studies, TAWL@ listserv.arizona.edu.)

Research data that whole language advocates frequently cited was the fact that children learn oral language naturally. And clearly the data show that virtually all children learn to speak without any systematic instruction at all. That research is "true." But these data do not support a recommendation that teachers should use whole language to teach reading. No teaching recommendation has a research base until it has been tried and has shown better results than the practice it replaces. It is misleading to say that this is a "research-base" for whole language, because the data did not involve evaluating the effects of whole language on learning to read.

The evidence that whole language was a significant improvement over the language experience approach is weak. The fact that California claimed to adopt whole language in 1987 and then became the lowest ranking state in the nation in fourth-grade reading seems to indicate that whole language is not much improved over the language experience approach.

High / Scope

In Follow Through, the model developed by The High/Scope Foundation (Cognitively Oriented Curriculum) resulted in much lower scores in mathematics (11th percentile) and language (12th percentile) than the usual performance level of children of poverty (around the 20th percentile). On measures of self-esteem (affect) there were more negative outcomes for High/Scope than positive.

To support the adoption of the High/Scope model today, most research claims affective gains, rather than academic. The support is strongest for the preschool High/Scope model and even that is quite weak. Most of these studies compare the effects of a High/Scope preschool with those of no preschool. Only one study claims to support the distinctively nonacademic aspect of a High/Scope preschool. (A preschool teacher in Illinois recently reported that she was told she risked losing the preschool funding by posting the alphabet on the wall.) In that study 54 subjects were assigned to three preschool treatment groups (18 children to a group) in the Perry preschool program, a Direct Instruction academic approach, a High/Scope approach, and a common nursery school model. The children were followed to high school graduation. Along the way the performance of these groups was occasionally compared.

When the students were age 15, Schweinhart, Weikart, and Larner (1986) reported that those who had received Direct Instruction in preschool reported higher rates of juvenile delinquency than the subjects who had been in High Scope. On this self-report scale, students were asked questions such as, "Have you ever argued or fought with parents?" and indicated how many times they could recall doing this in their lifetime. Many critics of the study (Bereiter, 1986a, 1986b; Gersten, 1986; Gersten & Keating, 1987; Gersten & White, 1986) question the reliability of the self-report measure as an indicator of juvenile delinquency. The reliability was questioned first because the objective data, such as actual arrests and suspensions from school, showed no differences between the groups. Secondly, some of the responses reported for the measure seem unbelievable. For example, no group reported more than an average of 2 occasions per student where they had ever argued or fought with their parents during their lifetime.

Major policy decisions have been made on the basis of this one study. Decision-makers should look for more compelling evidence than provided by one study, especially one conducted by the same organization that markets the product that the study supports. Hirsch (1996) cites the results of decades of French data comparing the long-term effects of the academic preschool for 3 and 4-year olds (cole maternelle) with the nonacademic preschool (crche).

"Recently, French social scientists completed longitudinal studies of some four thousand children on the long-term effects of coles maternelles on the more than 30 percent of French two-year-olds who now attend these preschools. The results are striking. Those who attend school at a younger age are more effective academically and, by all indirect measures, better adjusted and happier for having had early exposure to challenging and stimulating early academic experiences.

"The French results are even more compelling from the standpoint of social justice. When disadvantaged children attend coles maternelles at age two, their academic performance by grade six or seven equals that of highly advantaged children who have not attended preschool until age four." (p. 80, Hirsch, 1986)

The findings contradict that the conclusions of the High/Scope study. The French data are more compelling because a) 4000 children were evaluated as opposed to only 54 in the High/Scope study, b) the French researchers were independent of ownership in the program (the High/Scope Foundation conducted the study and markets the product), and c) the High/Scope measures did not seem technically reliable.

The High/Scope study is widely discussed at national conferences and in professional circles. Some seem to have the impression from the High/Scope study that the effects of Direct Instruction do not hold over time. Studies that followed up on the Direct Instruction Follow Through children found long-term benefits. Two studies (Darch, Gersten, & Taylor, 1987; Meyer, Gersten, & Gutkin, 1983) evaluated five cohorts of students (293 Direct Instruction students and 317 comparison students). Students who were in the Direct Instruction model through grade 3 showed higher graduation rates, lower drop-out rates, lower numbers of retention, more applications to college, and more acceptances to college. All these differences were statistically significant. For example, 60% of the Direct Instruction students graduated from high school compared to 40% in the comparison group.

Some have heavily criticized the fact that though the children in the Direct Instruction model generally caught up with their middle class peers by the end of third grade, they lost ground again in grades 4-12. A 60% graduation rate, for example, is far from ideal. Perhaps if the Follow Through children attended the same schools their middle class peers attended during these later years, it would be reasonable to expect them to have maintained their gains. However, the quality of their education after grade 3 was not the same as middle class children. Gersten describes his observations while gathering the follow-up data:

"I spent six months … riding the subway lines to every vocational high school in Brooklyn and driving through swampy country roads in South Carolina to isolated high schools. It was impossible not to see how segregated education is or to ignore consistently low teacher expectations, as well as the apathy, sarcasm, and latent hostility present in some of the high schools." (p. 31, Gersten & Keating, 1987).

Nevertheless, the evidence from these follow-up studies indicated that children of poverty receiving Direct Instruction in grades K-3 maintained a lasting advantage.

National Committees

Rather than use research to develop reform ideas, many national curriculum organizations are convening committees as the NAEYC has. Committees might be a good idea if the committees were dedicated to synthesizing research, but the reports from these committees often indicate that this is not their intent at all. And, in fact, their teaching recommendations often contradict research. These committees often reject a scientific model for building a knowledge base; that is, they reject level 2 and level 3 research.

The National Council of Teachers of Mathematics is also using a similar committee approach to define teaching practice. The NCTM convened a committee to establish the NCTM standards. As standards for what students should know and be able to do, the NCTM standards are not a problem; however, the NCTM did not translate these standards into assessments of students' learning. They translated them immediately into vignettes that illustrate the teaching practices that teachers should use. As standards for telling teachers how to teach, the NCTM standards are not research-based. The recommendations on how teachers should teach represent the consensus of opinions of the people on the NCTM committee. The NCTM document itself describes the recommendations as a "research agenda" not a research synthesis. The document also mentions that one of the members of the committee suggested that the NCTM set up a pilot school to demonstrate that the NCTM teaching recommendations could result in the achievement of the NCTM standards. The fact that this suggestion is mentioned is the committee's acknowledgement that no level 3 research existed to support the teaching practices they recommended.

A parent recently wrote to the NCTM requesting data to support the adoption of the teaching practices recommended in the NCTM standards. In a reply the NCTM indicated that there were no such data: " First, this reply is to inform you that I am not aware of any research study that relates the 'adoption' of the NCTM's Standards to improved scores on the Iowa Tests of Basic Skills, a fact that your school district's administrators and board of trustees have correctly stated." The NCTM letter points out that the content of the standards "transcend reliance on paper and pencil tests to assess students' aptitutes and achievement in mathematics." Data to support the teaching recommendations do not exist because the kinds of measures the NCTM needed to evaluate the learning they desire have not been developed.

If no assessment tools exist for adequately evaluating the NCTM standards for student learning, how was it possible to develop a research base for the teaching standards that were published in 1991? Why were the assessment standards not developed first, instead of much later, in 1995? Without ways to evaluate the learning that the NCTM recommends, it is impossible to use research to identify the teaching practices that best accomplish those learning goals.

In spite of the hypothetical nature of the NCTM teaching recommendations, the NCTM engaged in a widespread national marketing campaign to promote these practices as a reform. Now that the NCTM has convinced most of the education world that the best way to teach is the way the NCTM has recommended, the research journal published by the NCTM does not seem interested in further research to evaluate questions regarding best teaching practice. A research study submitted to the Journal for Research in Mathematics Education (JRME) was rejected because the findings of the study were not consistent with the opinion of the NCTM committee.

The Research Advisory Committee for the JRME recommended last year, as a matter of policy, that the journal not publish level 2 research. They recommended instead that the journal publish "disciplined inquiry." "Disciplined inquiry" is apparently something different from scientific inquiry: "Disciplined inquiry is as much an orientation as an accomplishment" (p. 301, Research Advisory Council of the National Council of Teachers of Mathematics, 1995). One can publish one's "orientation" in JRME, accomplishments, on the other hand, such as identifying the features of a superior curriculum are not acceptable:

"The question, "Is Curriculum A better than Curriculum B?' is not a good research question because it is not really answerable." (p. 301, 1995)

If this question is not answerable, then the NCTM committee has no business making recommendations regarding curriculum design and teaching practice to teachers.

We often hear about the inadequacy of the scientific paradigm. The premise for this argument holds that humans are so unique and complex that the effects of any teaching procedure on learning either cannot be measured or will be unpredictable. If there is no expectation that a specific teaching practice will work with more than the sample with which it was tested, then there is no basis for recommending the practice to anyone at all. If there are recommendations to teachers, scientific research should support them. Unfortunately, those gods on Mount Olympus who reject science seem most vocal in their recommendations to teachers. Those who reject the scientific paradigm cannot possibly have any recommendations for teachers. To make a teaching recommendation to someone else is to step into a scientific paradigm. Science clearly stands in the way of faddism.

What Can Teachers Do?

Scientific research does not guide the development of the professional-knowledge base of teaching. As E.D. Hirsch points out, the recommendations of the national curriculum organizations for teachers are better characterized as "worst practice," than as "best practice." The teaching practices taught in colleges of education are generally no different. The professional support system for teachers that resides above the clouds on Mount Olympus is dysfunctional. By ignoring scientific research and promoting prejudices, it often serves as an obstacle rather than a resource in the dissemination of the knowledge that is so crucial to the success of public education.

This is not news to many teachers. However, some of the tactics teachers use to avoid reliance on a dysfunctional professional support system also undermine the development of a scientific professional-knowledge base about teaching. For example, an over-emphasis on individual teacher autonomy and creativity can undermine the development of a shared knowledge base. As Adam Urbanski said: "Everyone seems to think that all you need to do to be a good teacher is to love to teach. But no one thinks that all you need to do to be a good surgeon is to love to cut." Having teachers pick and choose instructional procedures according to personal preference, without any scientific information regarding the effectiveness of these procedures, is not likely to lead to significant improvements in the effectiveness of public education.

What we know in the 90s is that reform will not work until it gets down to the details of engineering specific teaching procedures for teaching specific topics, such as King Lear or fractions. Al Shanker put it this way in his recent editorial entitled, "There is a lot of bull [in educational reform], but no beef." He said: "You don't know a theory is worth anything until you grapple with the details of putting it into practice" (Shanker, 1996). According to the research synthesized through the National Center to Improve the Tools of Educators, the kind of knowledge that leads to significant improvement, especially in the education of special education, at-risk, and other vulnerable learners, is specific and technical (see Kameenui & Carnine, in press).

To be a profession is to have a professional-knowledge base comprised of shared procedures that work. To have shared procedures that work is a new idea for teachers, though it is quite old for other professions. Good teachers using well-engineered tools and detailed procedures can achieve remarkable results with their students, and, this is the good news, teachers can get these results and also have a life. The reformers providing teachers with theories, and no details for how to use them, are also asking teachers to create their own tools and curricula. This is like asking airplane pilots to build their own airplanes; like asking farmers to design their own tractors. When would teachers have time to do this? There's no time. Teachers have to teach all day. Engineering a highly effective instructional sequence would more than consume the teachers' private lives. Are teachers entitled to a life?

The professional support system should allow the sharing of "polished stones," instructional procedures and lesson plans that work. The emphasis though, cannot be just on sharing; it has to be on sharing only those teaching procedures that get better results. The clearest way to find those, especially given the current level of dysfunction in the professional support system for teachers, is by looking in high-performing schools. High-performing schools should be the gatekeepers controlling the information that enters the shared professional-knowledge base of teaching. High-performing schools must be those that are accomplishing the things that the public wants schools to accomplish.

A reliable system for identifying high-performing schools will not happen until we have measureable academic standards that align with what society wants from schools. Schools of the same socio-economic level should be compared with one another to identify the schools that are best achieving the standards. Increasing the role of high performing schools (level 3 research) in education reform will reward school personnel for seeking the level 2 research that makes them successful by placing them in a position of central importance in teacher training and school reform. It will reward the researchers and theorists who develop the models that replicate to produce additional high-performing schools.

What happens to teachers' autonomy and creativity when they use a science? Using a science in teaching is like dancing to music. No, you are not completely free; you have to follow the music. You would look silly doing a Western cha-cha when the music is playing the tango. Even though dancers follow the same steps or procedures when they dance, there is still a lot of room for personal style. Just look at a dance floor sometime and notice the variety in dancing styles. Yet they are all following the same steps and moving to the same music. If someone is not following, that person looks clearly out of place. Although you are limited in a way by the music, in another way you are set free by the music. Just think how hard it is to dance without music. It's about as hard to teach with no science as it is to dance with no music. Unfortunately, many of our national curriculum organizations are making so much noise that we can hardly hear the music anymore. We've got to amplify the music, hear the science, so the dancing can begin.

If the flow of education dollars can be redirected to reward the development and use of teaching tools and practices that result in higher achievement, there is no doubt that American public education could become the best in the world.

Note: The National Center to Improve the Tools of Educators is funded by the U.S. Department of Education Office of Special Education Programs.

Acknowledgement: I would like to thank Barbara Ruggles, Vice President of Illinois Local #604 and President of the Park Forest Council, for her thoughtful feedback and constant encouragement in the development of the manuscript.


References

Adams, G, & Engelmann, S. (in press). Research on direct instruction: 20 years beyond DISTAR. Seattle, WA: Educational Achievement Systems. [Phone or Fax: 206-820-6111]

Bereiter, C. (1986a). Does Direct Instruction cause delinquency? Early Childhood Research Quarterly, 1, 289-292.

Bereiter, C. (1986b). "Mountains of evidence," said to contradict study on effects of preschool. Education Week, 5(57), 19.

Bereiter, C., & Kurland, M. (1981). A constructive look at Follow Through results. Interchange, 12, 1-22.

Bock, G., Stebbins, L., & Proper, E. (1977). Education as experimentation: A planned variation model, Volumes IV-B.. Cambridge, MA: Abt Associates.

Brandt, R. (1986). On long-term effects of early education: A conversation with Lawrence Schweinhart. Educational Leadership, 44, 14-18.

Brederkamp, S. (Ed.) (1987). Developmentally appropriate practice in early childhood programs serving children from birth through age 8. Washington DC: National Association for the Education of Young Children.

Center, Y., Wheldall, K., Freeman, L., Outhred, L., & McNaught, M. (1995). An experimental evaluation of Reading Recovery. Reading Research Quarterly, 30, 240-263.

Darch, C., Gersten, R., & Taylor, R. (1987). Evaluation of Williamsburg County Direct Instruction program: Factors leading to success in rural elementary programs. Research in Rural Education, 4, 111-118.

DeFord, D.E., Estice, R., Fried, M., Lyons, C.E. & Pinnell, G.S. (1993). The Reading Recovery program: Executive summary 1984-92. Columbus: The Ohio State University.

Department of Education and Science. (1992). Curriculum organization and classroom practice in primary schools: A discussion paper. London: Author.

Ellis, A., & Fouts, J. (1993). Research on Educational Innovations. Princeton, NJ: Eye on Education. [Phone: (609)-799-9188; Fax: (609)-799-3698]

Ellis, A., & Fouts, J. (1994). Research on school restructuring. Princeton, NJ: Eye on Education.

Gersten, R. (1986). Response to "consequences of three preschool curriculum models thorugh age 15." Early Childhood Research Quarterly, 1, 293-302.

Gersten, R., & Keating, T. (1987). Long-term benefits from Direct Instruction. Educational Leadership, , 28-31.

Gersten, R., & White, W.A.T. (1986). Castles in the sand: Response to Schweinhart and Weikart. Educational Leadership, 44, 19-20.

Glass, G. V., & Camilli, G. (1981). "FT" Evaluation. Washington, DC: National Institute of Education.

Glynn, T., Crooks, T., Bethune, N., Ballard, K., & Smith, J. (1989). Reading Recovery in context: implementation and outcome. Educational Psychology, 12(3 & 4), 249-261.

Graham, S. (1984). Teacher feelings and student thought: an attributional approach to affect in the classroom. Elementary School Journal, 85, 91-104.

Gutierrez, R., & Slavin, R. (1992). Achievement effects of the nongraded elementary school: Summary of a best evidence synthesis. Review of Educational Research, 62(4), 333-376.

Hiebert, E. (1994). Reading Recovery in the United States: What difference does it make to an age cohort? Educational Researcher, 23(9), 15-25.

Hirsch, E.D. (1996). The schools we need: Why we need them. New York: Doubleday.

House, E., Glass, G., McLean, L., & Walker, D. (1978). No simple answer: Critique of FT evaluation. Harvard Educational Review, 48(2), 128-160.

Iverson, S., & Tunmer, W.E. (1993). Phonological processing skills and the Reading Recovery program. Journal of Educational Psychology, 85(1), 112-126.

Johnson, J., & McChesney Johnson, K. (1992). Clarifying the developmental perspective in response to Carta, Schwartz, Atwater, and McConnell. Topics in Early Childhood Special Education, 12(4), 439-457.

Kameenui, E., & Carnine, D. (Eds.) (in press). Educational tools for diverse learners. Merrill.

Kennedy, M. (June, 1978). Findings from the Follow Through planned variation study. Educational Researcher, 3-11.

Kontos, S. (1989, September). Developmentally appropriate practice: What does research tell us? Indiana Association for the Education of Young Children, Indianapolis.

Lawton, M. (1996). Support for private school vouchers is on the increase Gallup poll reports. Education Week, 16(1), 18-19.

Lindsley, O. (1992). Why aren't effective teaching tools widely adopted? Journal of Applied Behavior Analysis, 25(1&2).

McDaniels, G. (1975). Evaluation of Follow Through. Educational Researcher, 4, 7-11.

Meyer, L., Gersten, R., & Gutkin, J. (1983). Direct Instruction: A Project Follow Through success story in an inner-city school. Elementary School Journal, 84, 241-252.

Pinnell, G. S., Lyons, C. A., DeFord, D. E., Bryk, A. S., & Seltzer, M. (1994). Comparing instructional models for the literacy education of high-risk first graders. Reading Research Quarterly, 29(1), 9-38.

Research Advisory Council of the National Council of Teachers of Mathematics. (1995). Research and practice. Journal for Research in Mathematics Education, 26(4), 300-303.

Schweinhart, L., & Weikart, D. (1986). Schweinhart and Weikart reply. Educational Leadership, 44, 22.

Schweinhart, L., Weikart, D, & Larner, M. (1986). Consequences of three preschool curriculum models through age 15. Early Childhood Research Quarterly, 1, 15-45.

Schweinhart, L., & Weikart, D. (1988). Education for young children living in poverty: Child-initiated or teacher-directed instruction? Elementary School Journal, 89, 213-225.

Shanahan, T., & Barr, R. (1995). Reading recovery: An independent evaluation of the effects of an early instructional intervention for at-risk learners. Reading Research Quarterly, 30(4), 958-996.

Shanker, A. (May 12, 1996). Where We Stand: Lot's of bull but no beef. Http://www.aft.org/

Stanovich, K. (1994). Romance versus reality. The Reading Teacher, 47(4), 280-291.

Stebbins, L., St. Pierre, R., Proper, E., Anderson, R., & Cerva, T. (1977). Education as experimentation: A planned variation model, Volumes IV-A.. Cambridge, MA: Abt Associates.

Stigler, J. W., & Stevenson, H. (Spring, 1991). How Asian teachers polish each lesson to perfection. American Educator, 12-47.

Watkins, C. (1996). Follow through: Why didn't we? Effective School Practices, 15(1), 57-66

Weikart, D., Epstein, A., Schweinhart, L., & Bond, J. (1978). The Ypsilanti Preschool Curriculum Demonstration Project: Preschool years and longitudinal results. Ypsilanti, MI: High/Scope.