Representation
of Experimental Designs
I. Conceptual Overview.
To determine whether some characteristic (whether
innate or manipulated) of a subject has an effect on a criterion variable, one
may sample a number of subjects that differ on this characteristic, assign them
to groups on the basis of the characteristic, measure the criterion values of
each subject, and then compare the means of the groups. These means will reflect differences in the
criterion that are related to group membership plus any other differences
between the groups of subjects. Were
this procedure to be repeated with a new sample of subjects, one would not
expect to obtain exactly the same results because the subjects in the groups
would differ. In a sample, the variance
in a criterion attributed to a factor will be a combination of the variance due
to that factor plus any variance in the criterion due to interactions between
that factor and other sources of variance (e.g., subjects) that would not be
duplicated in other samples. These
latter sources are called random variables.
To determine whether a factor has an effect on a
criterion variable one can construct a ratio in which the variance attributed
to the factor is compared to an estimate of the variance due solely to the random
variables (the “random error” variance).
This is the F ratio: F=MSA/MSerror. In a sample, the variance attributed to the
factor (the numerator in this ratio) is composed of the variance due to the
factor and the variance due to interactions between that factor and random
variables. When the variance due to the
factor of interest is small relative to the variance due to the random
variables, this ratio will be close to 1.
To obtain an appropriate estimate of the error
variance, a model of the relations between the predictor(s) and the criterion
variable must be developed. In this
handout, a general method of representing experimental designs is
discussed. By following this method,
you will be able to build models to represent most of the commonly encountered
experimental designs and obtain appropriate estimates of the error variances.
II.
Random vs. Fixed factors
Before constructing a statistical model, one must
determine whether each factor in the model is to be considered
"fixed" or "random."
A. A fixed factor is any factor which (conceptually) would be
duplicated identically in any future replication of the study. If a factor is considered fixed, then one
can generalize only to the particular items, levels, etc. used in the
study.
B. A random factor is any factor that is represented in the
study by items, levels, subjects, etc. which (conceptually) are randomly
selected from a larger set of possible items, levels, etc. any of which could
have been used in the study. If a
factor is considered random, then one can generalize to the class from which
the items, levels, etc. used in the study were drawn.
C. The “levels” of random factors are rarely drawn at random from
the class to which they belong. In these cases, one can either treat that
factor as random realizing that simply saying doesn't make it so, or treat the
factor as fixed and not generalize.
D. Subjects are usually considered to be random because researchers
usually want to generalize beyond the subjects examined in a study. However, subjects may not be the only random
factor in a study. Frequently, the
particular items (e.g., words, pictures, sounds) used in a study are selected
from a larger conceptual set to which the researcher would like to
generalize. In this case, these items
should be considered to be random factors.
III.
Structural Models
A. Crossed Factors: when every level of one
factor is paired with every level of another factor, the factors are said to be
crossed.
1. Example A is crossed with B; A X B; S(A X B)
|
|
Brightness (B) |
|
|
Contrast (A) |
Low |
High |
Low |
S1, S2, S3 |
S7, S8, S9 |
High |
S4, S5, S6 |
S10, S11, S12 |
B. Nested
Factors: when each level of one factor is paired with only one level of
another factor, then the first factor is said to be nested within the second
factor.
1. Example B is nested within A; B(A)
|
School (A) |
||
|
1 |
2 |
3 |
|
East School (A1) |
North School (A2) |
West School (A3) |
|
Teachers: B1, B2, B3 |
Teachers: B4, B5, B6 |
Teachers: B7, B8, B9 |
C. Some
common designs (where F indicates fixed; R indicates random):
1. Factorial ANOVA: AF X BF X CF or SR(AF
X BF X CF)
2. Repeated Measures: SR X AF X BF
3. Nested Design: AR(BF)
4. Mixed Design: SR(AF X BF)
X CF
IV.
Variance Components
C. To
determine the appropriate error term, one can use the following heuristic
(Cornfield & Tukey, 1956):
1. Construct
a structural model of the design; specify whether each factor is fixed or
random.
2. Construct a
two-way table with a row for each term in the model and an appropriately
labeled column for each factor in the model.
3. The entries in each column are evaluated as follows:
a.
For each row determine if the factor in that column helps create a nest
for the term represented on that row.
If so, enter a 1; if not:
b. Determine
whether the factor is represented in that term. If so, enter a Df where the f is the letter corresponding to the factor
represented by that column; if not:
c.
Enter the letter corresponding to the factor represented by that column.
4. Evaluate the
entries in each column. If the factor
is random, then all of the associated Df equal 1. If the factor is fixed, then the associated
Df equal 0.
5.
Construct
an E(MS) table in which each term in the model is represented on a separate
row.
a.
The
E(MS) for each term is generated by adding together the variance components
generated by every row in the two-way table that includes the term being
evaluated.
b.
The
variance component for a factor generated by a row is composed of the variance
for the term represented by that row and a coefficient generated by multiplying
together all of the entries on the row that are not in a column representing a
factor included in the term.
c.
The
error term for each term in the E(MS) table is that term (if one exists) that includes
all of the variance components that comprise the E(MS) for the term except
the component representing the variance due to the factor of interest.
C. Example: Two-way ANOVA; A, B are fixed; S is random
1.
Model: SR(AFXBF)
a.
There
are 3 factors in this model: A, B, and
S
b.
There
are 4 terms: A, B, AB, S(AB)
2.
Two-way
Table
|
|
Factors
|
||
Terms |
a |
b |
n |
|
A |
Da |
b |
n |
|
B |
a |
Db |
n |
|
A
B |
Da |
Db |
n |
|
S(AB) |
1 |
1 |
Dn |
Note: in these tables, n is frequently used to
represent the subject factor.
3.
Evaluate
entries
S is random, Dn=1;
A is fixed, Da=0; B is fixed, Db = 0.
|
|
Factors
|
||
Terms |
a |
b |
n |
|
A |
Da=0 |
b |
n |
|
B |
a |
Db=0 |
n |
|
A
B |
Da=0 |
Db=0 |
n |
|
S(AB) |
1 |
1 |
Dn
=1 |
4.
E(MS)
Table
For A: nbs2A + 0*ns2AB+ 1*1s2S(AB)+ = nbs2A + s2S(AB)
|
|
Factors
|
|
|
|
Terms |
a |
b |
n |
|
|
A |
Da=0 |
b |
n |
|
|
B |
a |
Db=0 |
n |
Skip
this row because A not represented in this term |
|
A
B |
Da=0 |
Db=0 |
n |
|
|
S(AB) |
1 |
1 |
Dn
=1 |
|
|
|
Skip
this column because it represents the factor of interest |
|
|
|
The first variance component is the variance due to
the factor of interest (A). The
coefficient for this component (nb) is composed of letters representing the
other factors in the model (S and B) multiplied together. Added to this component, is the variance due
to other factors that are random that interact with A. In this model, there is only one random
factor -- subjects (S). Because S is
nested within the A X B interaction, AB enters the model with S (S(AB)).
Complete
E(MS) Table
Source E(MS) Error
Term Line
1.
A nbs2A +s2S(AB) 4
2.
B nas2B + s2S(AB) 4
3.
A X B ns2AB +s2S(AB) 4
4.
S(AB) s2 S(AB)
D. Alternate
Heuristic
1. Construct a structural model of the design;
specify whether each factor is fixed or random.
2. Construct
an E(MS) table with each term listed on a separate row.
3. Construct an E(MS) for each term that is composed
of:
a.
the variance due to that term plus
b. the
variance due to interactions or nestings of that term with other factors that
are random (e.g., subjects).
4. Factors which define nests enter an E(MS) when any factor nested within these factors enters. Remember the mnemonic: random factors cannot leave their nests.
5. The
coefficient for each variance component represents the levels of each of the
other sources of variance (including subjects).
6. The error term
for each factor is the term for which the E(MS) is identical to that of the
factor of interest except that it lacks the variance component for that factor.
7. Notation: variance components are denoted by s2 with subscripts indicating
the factor(s).
a. Subjects
are commonly denoted by an "S" subscript, but an "n"
coefficient.
b. The
within cell error in a factorial design is often denoted by s2e instead of the longer form.