Multiple Comparisons

Multiple Comparisons

I. The problem: When making multiple tests, the nominal significance level of the test may be an underestimate and one may reject the null hypothesis erroneously.

II. The solutions: A set of tests which guarantee that the nominal significance level will be the actual significance level when certain procedures are followed. Notation: k= number of groups, f1= df numerator and f2=df for the denominator (MS_error).

A. Scheffé:

1. Use when testing all possible comparisons in a set of means (not necessarily between pairs of means; e.g., when testing non-orthogonal contrasts).

2. Replace the unprotected critical value given by F_1-_a(f1,f2) with a critical value equal to: (k-1)*F_1-_a(k-1,f2).

B. Bonferroni:

1. Use when testing a given number of hypotheses on a set of means.

2. Set a = a_FW*/k where k is the number of desired comparisons and a_FW*is the desired alpha level. When the number of comparisons exceeds the number of degrees of freedom between groups, Keppel (1982) suggests setting

a_{FW =}(k-1)a_FW*where a_FW*is the desired protected alpha level and (k-1) is the number of degrees of freedom between k groups.

C. Tukey HSD (Honestly Significant Difference):

1. Use when testing between all possible pairs of means.

2. Test differences between means against critical value given by the studentized range statistic with value

* q_1-_a(k,f2).

Note n in the studentized range statistic tables is the number of observations included in the means.

D. Newman-Keuls:

1. Use when testing between all possible pairs of means. This is an alternative to the Tukey HSD. It may not protect against exceeding the nominal significance level in all cases. Although results based on the Newman-Keuls procedure are frequently reported in the literature, the Tukey HSD is preferred.

2. Order means from smallest to largest.

a. Test difference between means farthest apart and move inwards, testing differences between each closer pair of means.

b. Stop testing when difference becomes non-significant.

c. Tests are made in reference to the studentized range statistic:

q_r = r = j-i+1

where j and i are the rank orders of the larger and smaller means being compared (so that r is always positive).

E. Unequal Sample Sizes (Tukey-Kramer)

Replace with

III. Reporting Notation

A. Underline all means between pairs of means that are not significant.

B. Give all means that are not significantly different the same lower-case superscript.

IV. Comparison of methods example (see Repeated Measures handout).

Drugs

A B C D

Responses 26.4 25.6 15.6 32

MSe= 9.4, dferror =12

Scheffé: critical value= (4-1)F_1-.05(4-1,12)= 3*(3.49)=10.47

Tukey HSD: critical value = *q_1-.05(4,12)= * 4.20= 5.76

Newman-Keuls: ordered means:

Drugs

C B A D

Responses 15.6 25.6 26.4 32

critical values: *q_1-.05(4,12) = 5.76

*q_1-.05(3,12) = 5.17

*q_1-.05(2,12 = 4.22

Test between drugs C and D significant; tests between C and A and between B and D were also significant; test between B and A is non-significant so stop (even if there were more means between B and A in rank order).

Report

Drugs

C B A D

Responses 15.6^a 25.6^ab 26.4^b 32^b Scheffé

Responses 15.6 25.6^a 26.4^a 32 Tukey HSD

Responses 15.6 25.6^a 26.4^a 32 Newman-Keuls