Multiple
Comparisons
I. The problem: When making multiple tests, the nominal significance level of the
test may be an underestimate and one may reject the null hypothesis
erroneously.
II. The solutions: A set of tests which
guarantee that the nominal significance level will be the actual significance
level when certain procedures are followed.
Notation: k= number of
groups, f1= df numerator and f2=df for the denominator (MSerror).
A. Scheffé:
1. Use when
testing all possible comparisons in a set of means (not
necessarily between pairs of means; e.g., when testing non-orthogonal
contrasts).
2. Replace
the unprotected critical value given by F1-a(f1,f2) with a critical
value equal to: (k-1)*F1-a(k-1,f2).
B. Bonferroni:
1. Use when
testing a given number of hypotheses on a set of means.
2. Set a = aFW* /k where k is the number of desired comparisons and aFW* is the desired alpha level. When the number of comparisons exceeds the
number of degrees of freedom between groups, Keppel (1982) suggests setting
aFW = (k-1) aFW* where aFW* is the desired protected alpha level and (k-1) is
the number of degrees of freedom between k groups.
C. Tukey
HSD (Honestly Significant Difference):
1. Use when
testing between all possible pairs of means.
2. Test differences between means against critical
value given by the studentized range statistic with value
* q1-a(k,f2).
Note n in the studentized range statistic tables is
the number of observations included in the means.
D. Newman-Keuls:
1. Use when
testing between all possible pairs of means. This is an alternative to the Tukey
HSD. It may not protect against
exceeding the nominal significance level in all cases. Although results based on the Newman-Keuls
procedure are frequently reported in the literature, the Tukey HSD is
preferred.
2. Order
means from smallest to largest.
a. Test
difference between means farthest apart and move inwards, testing differences
between each closer pair of means.
b. Stop
testing when difference becomes non-significant.
c. Tests are
made in reference to the studentized range statistic:
qr
=
r = j-i+1
where j and i are the rank orders of the larger and
smaller means being compared (so that r is always positive).
E. Unequal Sample Sizes (Tukey-Kramer)
Replace
with 
III.
Reporting Notation
A. Underline
all means between pairs of means that are not significant.
B. Give all
means that are not significantly different the same lower-case superscript.
IV.
Comparison of methods example
(see Repeated Measures handout).
Drugs
A B C D
Responses 26.4 25.6 15.6 32
MSe= 9.4,
dferror =12
Scheffé: critical value= (4-1)F1-.05(4-1,12)= 3*(3.49)=10.47
Tukey HSD: critical value =
*q1-.05(4,12)=
* 4.20= 5.76
Newman-Keuls: ordered means:
Drugs
C B A D
Responses
15.6 25.6 26.4 32
critical
values:
*q1-.05(4,12)
= 5.76
*q1-.05(3,12) = 5.17
*q1-.05(2,12 = 4.22
Test between drugs C and D significant; tests
between C and A and between B and D were also significant; test between B and A
is non-significant so stop (even if there were more means between B and A in
rank order).
Report
Drugs
C B A D
Responses
15.6a 25.6ab 26.4b 32b Scheffé
Responses
15.6 25.6a 26.4a 32 Tukey HSD
Responses 15.6 25.6a 26.4a 32 Newman-Keuls