Regression
with Categorical Predictors
I. Why: Sometimes one
will want to regress predictors on the criterion that are qualitative (e.g., race, gender).
II. How:
To represent the effect of a qualitative variable having k levels in a
multiple regression model, constructs k-1 "dummy" predictors. These predictors may be coded in three ways:
traditional dummy coding, effect coding, and orthogonal coding.
A. Dummy coding: Each variable has 1's for
each case in one group and 0's elsewhere; no two variables have 1's for the
same group
B. Effects coding: Each
variable is coded so that it has 1's for one group, -1's for the
"base" group, and 0's elsewhere.
C. Orthogonal coding: Each
independent variable is coded so that the groups receive orthogonal contrast
weights.
III.
Example: 3 groups: A, B, C (e.g., 3 different treatments)
Subject
|
Group
|
Criterion
|
Predictor (D1) Code |
Predictor (D2) Code |
1 |
A |
y1 |
1 |
0 |
2 |
A |
y2 |
1 |
0 |
3 |
B |
y3 |
0 |
1 |
4 |
B |
y4 |
0 |
1 |
5 |
C |
y5 |
0 |
0 |
6 |
C |
y6 |
0 |
0 |
1. Model: Y= a + b1D1 + b2D2
2. For
group A the equation becomes: Y= a + b1
B
Y= a + b2
C Y=
a
3. The
predicted y values are the means of the respective groups. The constant term (a) is the mean of the
group with 0's in all of the dummy variable.
The bj for a group represents the deviation of that group
(coded with 1's for the jth variable) from the group with 0's in all the dummy
variables.
4. This
technique is especially useful when one wants to test for differences of groups
to a control group. This is done by a
t-test of the relevant bj.
Subject
|
Group
|
Criterion
|
Predictor (E1) Code |
Predictor (E2) Code |
1 |
A |
y1 |
1 |
0 |
2 |
A |
y2 |
1 |
0 |
3 |
B |
y3 |
0 |
1 |
4 |
B |
y4 |
0 |
1 |
5 |
C |
y5 |
-1 |
-1 |
6 |
C |
y6 |
-1 |
-1 |
1.
Model: Y= a + b1E1
+ b2E2
2. For group A the equation becomes: Y= a + b1
B Y= a + b2
C Y=
a - b1 - b2
3. The
predicted y values are the means of the respective groups. The constant term (a) is the grand
mean. The bj for a group
represents the deviation of that group (the group coded with 1's) from the
grand mean.
4. This
technique is especially useful when one wants to test for differences of groups
from the grand mean. This is done by a
t-test of the relevant bj.
Subject
|
Group
|
Criterion
|
Predictor (C1) Code |
Predictor C2) Code |
1 |
A |
y1 |
1 |
-1 |
2 |
A |
y2 |
1 |
-1 |
3 |
B |
y3 |
-1 |
-1 |
4 |
B |
y4 |
-1 |
-1 |
5 |
C |
y5 |
0 |
2 |
6 |
C |
y6 |
0 |
2 |
1.
Model: Y= a + b1C1
+ b2C2
2. For
group A the equation becomes: Y= a + b1 - b2
B Y= a - b1 - b2
C Y=
a + 2b2
3. The
predicted y values are the means of the respective groups. The constant term (a) is the grand
mean. The bj for a group
represents the comparison implied by the contrast; it's value depends on the
values chosen for the contrast weights.
4. This technique
is especially useful when one wants to test hypotheses about patterns of group
means. This is done by a t-test of the
bj with weights that describe the desired contrast.
IV.
Numerical Example (Pedhazur)
A B C
4 7 1
5 8 2
6 9 3
7 10 4
8 11 5
S 30 45 15
Mean 6 9 3 6
A. Dummy Coding (as in I above)
1. Equation: Y= 3 + 3D1 + 6D2
YA= 3 + 3 + 0 = 6 = A
YB= 3 + 0 + 6 = 9 = B
YC= 3 + 0 + 0 = 3 = C
a= 3 = C
b1=
3 = A-C
b2=
6 = B - C
B. Effects Coding (as in I above)
1. Equation:
Y= 6 + 3E2
YA
= 6 + 0 + 0 = 6 = A
YB
= 6 + 0 + 3 = 9 = B
YC
= 6 - 0 - 3 = 3 = C
a= 6 = ..
b1=
0 = A-..
b2=
3 = B-..
C. Orthogonal Coding (as in I above)
1. Equation:
Y= 6 + (-1.5)C1 +
(-1.5)C2
YA
= 6 + (-1.5)(1) + (-1.5)(-1) = 6 =A
YB
= 6 + (-1.5)(-1) + (-1.5)(-1) = 9=B
YC
= 6 + (-1.5)2 = 3 = C
a= 6 = ..
b1=
-1.5
b2=
-1.5
C1 represents contrast of 1st group
against the 2nd:
A-B = 6-9=
-3 = (-1.5)(1)-(-1.5)(-1)
C2 represents the contrast of the average
of the 1st and 2nd groups against the 3rd:
C - (A+B)/2 = 3 - (6+9)/2 =
-4.5 = (-1.5)(-1)+(-1.5)(2)
V. Interactions between categorical variables
A.
Multiplication of “dummy” coded variables
1. In
multifactor designs can obtain variables representing interactions between
categorical variables by multiplying the “dummy” coded variables.
2. Example:
A (3 categories) X B (2 categories) using contrast coding (e.g., 3 dosage
levels; two types of subjects)
a. Because A
has 3 levels, 2 “dummy” variables (number of levels -1) will be required to
code the main effect of A.
b. Because B has 2 levels, 1 “dummy” variable (number of levels -1 ) will be required to code the main effect of B.
c. The
interaction of A X B will require 2 X 1 = 2 “dummy” variables.
Factor |
Level |
“Dummy Variable” Coding
Scheme |
|
|
|
A1
(linear) |
A2
(quadratic) |
Main Effect A |
1 |
-1 |
-1 |
|
2 |
0 |
2 |
|
3 |
1 |
-1 |
|
|
|
|
|
|
“Dummy Variable” Coding
Scheme |
|
|
|
B1 |
|
Main Effect B |
1 |
-1 |
|
|
2 |
1 |
|
|
|
|
|
A X B Interaction |
|||
|
Linear A X B (A1B1) |
||
|
“Dummy Variable” Coding
Scheme |
||
|
Factor A (A1) |
||
Factor
B (B1) |
Level
1 code
(-1) |
Level
2 code
(0) |
Level
3 code
(1) |
Level
1; code (1) |
-1 |
0 |
1 |
Level
2; code (-1) |
1 |
0 |
-1 |
Entries
in this section of the table are derived by multiplying the column code by
the row code. |
|||
|
Quadratic A X B (A2B1) |
||
|
“Dummy Variable” Coding
Scheme |
||
|
Factor A (A2) |
||
Factor
B (B1) |
Level
1 code
(-1) |
Level
2 code
(0) |
Level
3 code
(1) |
Level
1; code (1) |
-1 |
2 |
-1 |
Level
2; code (-1) |
1 |
-2 |
1 |
Entries
in this section of the table are derived by multiplying the column code by
the row code. |
Resulting matrix:
Factors |
Dummy Variables |
|||||
A |
B
|
A1 |
A2 |
B1 |
A1B1 |
A2B1 |
1 |
1 |
-1 |
-1 |
-1 |
1 |
1 |
1 |
2 |
-1 |
-1 |
1 |
-1 |
-1 |
2 |
1 |
0 |
2 |
-1 |
0 |
0 |
2 |
2 |
0 |
2 |
1 |
0 |
0 |
3 |
1 |
1 |
-1 |
-1 |
-1 |
1 |
3 |
2 |
1 |
-1 |
1 |
1 |
-1 |
B.
Nontraditional coding schemes
1. Instead
of dividing the available degrees of freedom into traditional main effects and
interactions, any other coding schemes that code all of the available information
can be used. This approach is
especially useful when an hypothesis of interest is not coded into one of the
traditional main effects or interactions.
2. Example A (3 categories) X B (2 categories)
testing the hypothesis that A and B will have no effect unless A=3 and
B=2.
a. In this
design, there are 2 X 3=6 cells. This
yields 5 degrees of freedom and 5 “dummy” variables will be required. The first will be coded to represent the
hypothesis above.
Original Factors |
“Dummy” Variable Coding
Scheme |
|||||||
A |
B |
C1 |
C2 |
C3 |
C4 |
C5 |
|
|
1 |
1 |
-1 |
4 |
0 |
0 |
0 |
|
|
|
2 |
-1 |
-1 |
-1 |
2 |
0 |
|
|
2 |
1 |
-1 |
-1 |
-1 |
-1 |
1 |
|
|
|
2 |
-1 |
-1 |
-1 |
-1 |
-1 |
|
|
3 |
1 |
-1 |
-1 |
3 |
0 |
0 |
|
|
|
2 |
5 |
0 |
0 |
0 |
0 |
|
|
b. Note that
the codes for C2..C5 were chosen to be orthogonal to C1. Other orthogonal coding schemes are
possible.