Comment on Meehl

Meehl's points keep returning to the issue of falsifiability. He is concerned

  1. about how little is implicated in a nonsignificant (or significant, for that matter) group comparison

  2. about the use of an easy way out of failed predictions, namely, to explain it away with methodological problems; and

  3. about the lack of strong theories that would make daring predictions, whose success or failure would be more informative.

Certainly, there is too much attention paid to p values and too little to what the results really mean for the phenomenon in question. One of the worst examples I have seen was in a paper I reviewed, where the author first selected extreme groups (top and bottom 33% of a distribution of scores) and then performed a t test to show that they differed significantly from each other...

But there is nothing inherently wrong with group comparisons. If your hypothesis is that men and women do (not) differ on, say, leadership potential, then your results will be based on a group comparison that has enormous social implications. Of course, when interpreting a p value of, say, .01 we would need to know the effect size and sample size to gauge what implications this difference really has.

Similarly, there is nothing inherently wrong with p values either. They are one piece of information (telling you how likely you can get a result of given magnitude or larger by chance). But we need other pieces of information too (power, sample size, effect size) to judge the meaning of the result. What is wrong is to stop at p values and think that a hypothesis has been falsified or verified; that an insight has been gained. These latter two issues go far beyond p values (and that's what sutdents need to be taught, even in 302!)

Finally, there is nothing inherently wrong with realizing that one's methodology was flawed and with mistrusting, therefore, one's negative finding. (Reviewers and others do it all the time with positive findings!!). The point is that we should not be satisfied with one study telling us that some effect is "significant" or "not significant." Only the accumulation of data can be informative. You would never stop after your first subject's data point and say that your hypothesis is confirmed or disconfirmed; why, then, stop after the first study? Phenomena with high complexity need aggregation at many levels to reveal the underlying processes.