Regression with Categorical Predictors

Regression with Categorical Predictors

I. Why: Sometimes one will want to regress predictors on the criterion that are qualitative (e.g., race, gender).

II. How: To represent the effect of a qualitative variable having k levels in a multiple regression model, constructs k-1 "dummy" predictors. These predictors may be coded in three ways: traditional dummy coding, effect coding, and orthogonal coding.

A. Dummy coding: Each variable has 1's for each case in one group and 0's elsewhere; no two variables have 1's for the same group

B. Effects coding: Each variable is coded so that it has 1's for one group, -1's for the "base" group, and 0's elsewhere.

C. Orthogonal coding: Each independent variable is coded so that the groups receive orthogonal contrast weights.

III. Example: 3 groups: A, B, C (e.g., 3 different treatments)

A. Traditional dummy coding

Subject	Group	Criterion	Predictor (D1) Code	Predictor (D2) Code
1	A	y1	1	0
2	A	y2	1	0
3	B	y3	0	1
4	B	y4	0	1
5	C	y5	0	0
6	C	y6	0	0

1. Model: Y= a + b₁D₁ + b₂D₂

2. For group A the equation becomes: Y= a + b₁

B Y= a + b₂

C Y= a

3. The predicted y values are the means of the respective groups. The constant term (a) is the mean of the group with 0's in all of the dummy variable. The b_j for a group represents the deviation of that group (coded with 1's for the jth variable) from the group with 0's in all the dummy variables.

4. This technique is especially useful when one wants to test for differences of groups to a control group. This is done by a t-test of the relevant b_j.

B. Effects Coding

Subject	Group	Criterion	Predictor (E1) Code	Predictor (E2) Code
1	A	y1	1	0
2	A	y2	1	0
3	B	y3	0	1
4	B	y4	0	1
5	C	y5	-1	-1
6	C	y6	-1	-1

1. Model: Y= a + b₁E₁ + b₂E₂

2. For group A the equation becomes: Y= a + b₁

B Y= a + b₂

C Y= a - b₁ - b₂

3. The predicted y values are the means of the respective groups. The constant term (a) is the grand mean. The b_j for a group represents the deviation of that group (the group coded with 1's) from the grand mean.

4. This technique is especially useful when one wants to test for differences of groups from the grand mean. This is done by a t-test of the relevant b_j.

C. Orthogonal (Contrast) Coding

Subject	Group	Criterion	Predictor (C1) Code	Predictor C2) Code
1	A	y1	1	-1
2	A	y2	1	-1
3	B	y3	-1	-1
4	B	y4	-1	-1
5	C	y5	0	2
6	C	y6	0	2

1. Model: Y= a + b₁C₁ + b₂C₂

2. For group A the equation becomes: Y= a + b₁ - b₂

B Y= a - b₁ - b₂

C Y= a + 2b₂

3. The predicted y values are the means of the respective groups. The constant term (a) is the grand mean. The b_j for a group represents the comparison implied by the contrast; it's value depends on the values chosen for the contrast weights.

4. This technique is especially useful when one wants to test hypotheses about patterns of group means. This is done by a t-test of the b_j with weights that describe the desired contrast.

IV. Numerical Example (Pedhazur)

A B C

4 7 1

5 8 2

6 9 3

7 10 4

8 11 5

S 30 45 15

Mean 6 9 3 6

A. Dummy Coding (as in I above)

1. Equation: Y= 3 + 3D₁ + 6D₂

Y_A= 3 + 3 + 0 = 6 = _A

Y_B= 3 + 0 + 6 = 9 = _B

Y_C= 3 + 0 + 0 = 3 = _C

a= 3 = _C

b₁= 3 = _A-_C

b₂= 6 = _B - _C

B. Effects Coding (as in I above)

1. Equation: Y= 6 + 3E₂

Y_A = 6 + 0 + 0 = 6 = _A

Y_B = 6 + 0 + 3 = 9 = _B

Y_C = 6 - 0 - 3 = 3 = _C

a= 6 = ..

b₁= 0 = _A-..

b₂= 3 = _B-..

C. Orthogonal Coding (as in I above)

1. Equation: Y= 6 + (-1.5)C₁ + (-1.5)C₂

Y_A = 6 + (-1.5)(1) + (-1.5)(-1) = 6 =_A

Y_B = 6 + (-1.5)(-1) + (-1.5)(-1) = 9=_B

Y_C = 6 + (-1.5)2 = 3 = _C

a= 6 = ..

b₁= -1.5

b₂= -1.5

C₁ represents contrast of 1st group against the 2nd:

_A-_B = 6-9= -3 = (-1.5)(1)-(-1.5)(-1)

C₂ represents the contrast of the average of the 1st and 2nd groups against the 3rd:

_C - (A+B)/2 = 3 - (6+9)/2 = -4.5 = (-1.5)(-1)+(-1.5)(2)

V. Interactions between categorical variables

A. Multiplication of “dummy” coded variables

1. In multifactor designs can obtain variables representing interactions between categorical variables by multiplying the “dummy” coded variables.

2. Example: A (3 categories) X B (2 categories) using contrast coding (e.g., 3 dosage levels; two types of subjects)

a. Because A has 3 levels, 2 “dummy” variables (number of levels -1) will be required to code the main effect of A.

b. Because B has 2 levels, 1 “dummy” variable (number of levels -1 ) will be required to code the main effect of B.

c. The interaction of A X B will require 2 X 1 = 2 “dummy” variables.

Factor	Level	“Dummy Variable” Coding Scheme
		A1 (linear)	A2 (quadratic)
Main Effect A	1	-1	-1
	2	0	2
	3	1	-1

		“Dummy Variable” Coding Scheme
		B1
Main Effect B	1	-1
	2	1

A X B Interaction
	Linear A X B (A1B1)
	“Dummy Variable” Coding Scheme
	Factor A (A1)
Factor B (B1)	Level 1 code (-1)	Level 2 code (0)	Level 3 code (1)
Level 1; code (1)	-1	0	1
Level 2; code (-1)	1	0	-1
Entries in this section of the table are derived by multiplying the column code by the row code.
	Quadratic A X B (A2B1)
	“Dummy Variable” Coding Scheme
	Factor A (A2)
Factor B (B1)	Level 1 code (-1)	Level 2 code (0)	Level 3 code (1)
Level 1; code (1)	-1	2	-1
Level 2; code (-1)	1	-2	1
Entries in this section of the table are derived by multiplying the column code by the row code.

Resulting matrix:

Factors		Dummy Variables
A	B	A1	A2	B1	A1B1	A2B1
1	1	-1	-1	-1	1	1
1	2	-1	-1	1	-1	-1
2	1	0	2	-1	0	0
2	2	0	2	1	0	0
3	1	1	-1	-1	-1	1
3	2	1	-1	1	1	-1

B. Nontraditional coding schemes

1. Instead of dividing the available degrees of freedom into traditional main effects and interactions, any other coding schemes that code all of the available information can be used. This approach is especially useful when an hypothesis of interest is not coded into one of the traditional main effects or interactions.

2. Example A (3 categories) X B (2 categories) testing the hypothesis that A and B will have no effect unless A=3 and B=2.

a. In this design, there are 2 X 3=6 cells. This yields 5 degrees of freedom and 5 “dummy” variables will be required. The first will be coded to represent the hypothesis above.

Original Factors			“Dummy” Variable Coding Scheme
A	B	C1		C2	C3	C4	C5
1	1	-1		4	0	0	0
	2	-1		-1	-1	2	0
2	1	-1		-1	-1	-1	1
	2	-1		-1	-1	-1	-1
3	1	-1		-1	3	0	0
	2	5		0	0	0	0

b. Note that the codes for C2..C5 were chosen to be orthogonal to C1. Other orthogonal coding schemes are possible.

A. Traditional dummy coding

Subject

Group

Criterion

B. Effects Coding

Subject

Group

Criterion

C. Orthogonal (Contrast) Coding

Subject

Group

Criterion

Main Effect A

Main Effect B

A X B Interaction

Dummy Variables

B