Regression with Categorical Predictors

 

I.  Why:  Sometimes one will want to regress predictors on the criterion that are qualitative  (e.g., race, gender). 

 

II.  How:  To represent the effect of a qualitative variable having k levels in a multiple regression model, constructs k-1 "dummy" predictors.  These predictors may be coded in three ways: traditional dummy coding, effect coding, and orthogonal coding. 

A.  Dummy coding:  Each variable has 1's for each case in one group and 0's elsewhere; no two variables have 1's for the same group

B.  Effects coding:  Each variable is coded so that it has 1's for one group, -1's for the "base" group, and 0's elsewhere.

C.  Orthogonal coding:  Each independent variable is coded so that the groups receive orthogonal contrast weights.

 

III. Example:  3 groups:  A, B, C (e.g., 3 different treatments)

 

A.     Traditional dummy coding

 

Subject

Group

Criterion

Predictor

(D1)

Code

Predictor

(D2)

Code

1

A

y1

1

0

2

A

y2

1

0

3

B

y3

0

1

4

B

y4

0

1

5

C

y5

0

0

6

C

y6

0

0

 

1.  Model:  Y= a + b1D1 + b2D2

2.  For group  A the equation becomes:             Y= a + b1

                        B                                               Y= a + b2

                        C                                             Y= a

3.  The predicted y values are the means of the respective groups.  The constant term (a) is the mean of the group with 0's in all of the dummy variable.  The bj for a group represents the deviation of that group (coded with 1's for the jth variable) from the group with 0's in all the dummy variables.

4.  This technique is especially useful when one wants to test for differences of groups to a control group.  This is done by a t-test of the relevant bj.


 

B.     Effects Coding

 

Subject

Group

Criterion

Predictor

(E1)

Code

Predictor

(E2)

Code

1

A

y1

1

0

2

A

y2

1

0

3

B

y3

0

1

4

B

y4

0

1

5

C

y5

-1

-1

6

C

y6

-1

-1

 

1.  Model:  Y= a + b1E1 + b2E2

2.  For group  A the equation becomes:             Y= a + b1

                        B                                               Y= a + b2

                        C                                             Y= a - b1 - b2

3.  The predicted y values are the means of the respective groups.  The constant term (a) is the grand mean.  The bj for a group represents the deviation of that group (the group coded with 1's) from the grand mean.

4.  This technique is especially useful when one wants to test for differences of groups from the grand mean.  This is done by a t-test of the relevant bj.

 

C.     Orthogonal (Contrast) Coding

 

Subject

Group

Criterion

Predictor

(C1)

Code

Predictor

C2)

Code

1

A

y1

1

-1

2

A

y2

1

-1

3

B

y3

-1

-1

4

B

y4

-1

-1

5

C

y5

0

2

6

C

y6

0

2

 

1.  Model:  Y= a + b1C1 + b2C2

2.  For group  A the equation becomes:             Y= a + b1 - b2

                        B                                               Y= a - b1 - b2

                        C                                             Y= a + 2b2

3.  The predicted y values are the means of the respective groups.  The constant term (a) is the grand mean.  The bj for a group represents the comparison implied by the contrast; it's value depends on the values chosen for the contrast weights.

4.  This technique is especially useful when one wants to test hypotheses about patterns of group means.  This is done by a t-test of the bj with weights that describe the desired contrast.

 


IV. Numerical Example (Pedhazur)

            A         B          C        

            4          7          1

            5          8          2

            6          9          3         

            7          10        4

            8          11        5

S          30        45        15

Mean   6          9          3          6

           

A. Dummy Coding (as in I above)

                        1.  Equation: Y= 3 + 3D1 + 6D2   

                                                    YA= 3 + 3 + 0 = 6 = A

                                                    YB= 3 + 0 + 6 = 9 = B

                                                    YC= 3 + 0 + 0 = 3 = C

                                    a= 3     = C

                                    b1= 3 = A-C

                                    b2= 6 = B - C

           

B. Effects Coding (as in I above)

                        1.  Equation:  Y= 6 + 3E2

                                                            YA = 6 + 0 + 0 = 6 = A

                                                            YB = 6 + 0 + 3 = 9 = B

                                                            YC = 6 - 0 - 3 = 3 = C

                                    a= 6 = ..

                                    b1= 0 = A-..

                                    b2= 3 = B-..

           

C. Orthogonal Coding (as in I above)

                        1.  Equation:  Y= 6  + (-1.5)C1 + (-1.5)C2

                                                            YA = 6 + (-1.5)(1) + (-1.5)(-1) = 6 =A

                                                            YB = 6 + (-1.5)(-1) + (-1.5)(-1) = 9=B

                                                            YC = 6 + (-1.5)2 = 3 = C

a= 6 = ..

                                    b1= -1.5

                                    b2= -1.5

C1 represents contrast of 1st group against the 2nd:

A-B = 6-9=  -3 = (-1.5)(1)-(-1.5)(-1)

C2 represents the contrast of the average of the 1st and 2nd groups against the 3rd:

C - (A+B)/2  = 3 - (6+9)/2 = -4.5 = (-1.5)(-1)+(-1.5)(2)

 


V.  Interactions between categorical variables

A.  Multiplication of “dummy” coded variables

1.  In multifactor designs can obtain variables representing interactions between categorical variables by multiplying the “dummy” coded variables.

2.  Example: A (3 categories) X B (2 categories) using contrast coding (e.g., 3 dosage levels; two types of subjects)

a.  Because A has 3 levels, 2 “dummy” variables (number of levels -1) will be required to code the main effect of A.

b.  Because B has 2 levels, 1 “dummy” variable (number of levels -1 ) will be required to code the main effect of B.

c.  The interaction of A X B will require 2 X 1 = 2 “dummy” variables.

 

Factor

Level

“Dummy Variable” Coding Scheme

 

 

A1 (linear)

A2 (quadratic)

 Main Effect A

1

-1

-1

 

2

 0

 2

 

3

 1

-1

 

 

 

 

 

 

“Dummy Variable” Coding Scheme

 

 

B1

Main Effect B

1

-1

 

2

 1

 

 

 

A X B Interaction

 

Linear A X B  (A1B1)

 

“Dummy Variable” Coding Scheme

 

Factor A  (A1)

Factor B  (B1)

Level 1

code (-1)

Level 2 

code (0)

Level 3 

code (1)

Level 1; code (1)

-1

0

 1

Level 2; code (-1)

 1

0

-1

Entries in this section of the table are derived by multiplying the column code by the row code.

 

Quadratic A X B  (A2B1)

 

“Dummy Variable” Coding Scheme

 

Factor A  (A2)

Factor B  (B1)

Level 1

code (-1)

Level 2 

code (0)

Level 3 

code (1)

Level 1; code (1)

-1

2

-1

Level 2; code (-1)

1

-2

1

Entries in this section of the table are derived by multiplying the column code by the row code.

 


Resulting matrix:

 

Factors

Dummy Variables

A

B

A1

A2

B1

A1B1

A2B1

1

1

-1

-1

-1

1

1

1

2

-1

-1

1

-1

-1

2

1

0

2

-1

0

0

2

2

0

2

1

0

0

3

1

1

-1

-1

-1

1

3

2

1

-1

1

1

-1

 

 

B.  Nontraditional coding schemes       

1.  Instead of dividing the available degrees of freedom into traditional main effects and interactions, any other coding schemes that code all of the available information can be used.  This approach is especially useful when an hypothesis of interest is not coded into one of the traditional main effects or interactions.

2. Example A (3 categories) X B (2 categories) testing the hypothesis that A and B will have no effect unless A=3 and B=2.

a.  In this design, there are 2 X 3=6 cells.  This yields 5 degrees of freedom and 5 “dummy” variables will be required.  The first will be coded to represent the hypothesis above.

 

Original Factors

“Dummy” Variable Coding Scheme

A

B

C1

C2

C3

C4

C5

 

1

1

-1

4

0

0

0

 

 

2

-1

-1

-1

2

0

 

2

1

-1

-1

-1

-1

1

 

 

2

-1

-1

-1

-1

-1

 

3

1

-1

-1

3

0

0

 

 

2

5

0

0

0

0

 

 

b.  Note that the codes for C2..C5 were chosen to be orthogonal to C1.  Other orthogonal coding schemes are possible.