Multiple
Regression
I. Why multiple regression?
A. To reduce stochastic error, i.e. increase ability to predict Y
B. To remove
bias in estimates of b.
C. Note there are two goals of MR: prediction and explanation -
involve different strategies.
II.
Two Independent variables
A.
Regression
Equation.
Just as simple linear regression defines a line in
the (x,y) plane, the two variable multiple linear regression model Y = a + b1x1
+ b2x2 + e is the
equation of a plane in the (x1, x2, Y) space. In this model, b1 is slope of the
plane in the (x1, Y) plane and b2 is slope of the plane
in the (x2, Y) plane.
B.
Regression coefficients
1.
Unstandardized
The bi's are least squares estimates
chosen to minimize
=
To find the formulae for these estimates, transform
the xi as before and take the first derivatives with respect to a, b1,
and b2 and set each equal to 0.
This yields the system of equations:
a =
Sx1iyi = b1Sx1i2 +
b2Sx1ix2i
(1)
Sx2iyi = b2Sx2i2 +
b1Sx1ix2i
(2)
rearranging equation (1) to solve for b1 yields:
Thus, b1 depends on b2 and the
covariance between x1 and x2.
2.
Standardized.
a.
The
relation between b1 and b2 is easier to see if
standardized regression coefficients are used; i.e. coefficients from a
regression of standardized variables onto a standardized variable:
Zy = bz1z1
+ bz2z2 Note: bi = bzi
sy/si
This means that the regression coefficient of Z1
is the correlation of Z1 with y minus the correlation of Z2
with y to the degree that Z1 and Z2 are correlated,
divided by the variance in Z1 not "explainable" by (i.e.
not overlapping with) Z2.
Note: when r12
= 0 then bzi = ryi
b.
Interpretation
of standardized b's.
1.
Standardization
merely scales all variables to the same scale by dividing each score by its
variance (after subtracting the variable's mean).
2.
If
variance in the variable is meaningful (i.e. it is not just a function of the
measurement techniques), one may not want to perform this transformation.
3.
Standardized
b's are sometimes used as indicators of the relative importance of the xi's. However, "importance" is likely to
be related to the ease with which a change in position on a predictor is
accomplished in addition to the size of the effect of that predictor on the
criterion.
4.
Note
also that standardized regression coefficients are affected by sample variances
and covariances. One cannot compare bz's
across samples.
3. Comparison of b and bz (from Pedhazur)
|
Sample 1 |
|
Sample 2 |
||||||
|
|
x1 |
x2 |
y |
|
|
x1 |
x2 |
y |
Correlations |
x1 |
1 |
|
|
|
x1 |
1 |
|
|
|
x2 |
0.5 |
1 |
|
|
x2 |
0.4 |
1 |
|
|
y |
0.8 |
.7 |
1 |
|
y |
0.6 |
.45 |
1 |
sd |
|
10 |
15 |
20 |
|
|
8 |
5 |
16 |
Mean |
|
50 |
50 |
100 |
|
|
50 |
50 |
100 |
Samples 1 and 2 have the same ranking of r's and the
same means and the same regression equation:
Y = 10 + 1.0x1 + .8x2 . However, the bzi differ considerably!
Recall that bzi =
bi(si/sy)
bz1
1(15/20) = . 5 1(8/16) = .50
bz2
.8(10/20) = .6 .8(5/16)
= .25
C.
Regression Statistics
1. Model
Statistics
a.
Proportion of variance explained
R2 = SSregression/SStotal
Note: when r12
=0, then Ry.122 = ry12 + ry22
Ry.122
is the R2 obtained from a regression of y on x1 and x2; this
notation is useful when discussing several different regression models that use
the same variables.
b. Adjusted
R2
R2 is dependent on the sample size and
number of independent variables. For
example, when N = 2 and k = 1 a perfect prediction of every data point can be
made. A regression on these data will
yield a line joining the two points. In
this case R2 = 1. The
expected value of the estimated R2 = k/(N-1) when the true R2
= 0. Thus when k is large relative to
N, the estimated R2 is not a good estimate of the true R2. In replications of the study, the R2
obtained is expected to be smaller. To
adjust the estimated R2 one can use the following formula:
= 1 - (1-R2)
[(N-1)/(N-k-1)]
Note: for a given number of predictors, the larger
the R2 and N, the smaller the adjustment. For example (from Pedhazur), for k=3:
|
|
R2 = .60 |
R2 = .36 |
N |
Ratio
k/N |
Adjusted R2 |
Adjusted R2 |
15 |
1:5 |
.19 |
.491 |
90 |
1:30 |
.34 |
.586 |
150 |
1:50 |
.35 |
.592 |
Moral: whenever possible have many more observations
than predictors.
c. Variance
estimate
s2 = SSresidual/dfresidual
= SSresidual/(N-k-1)
where k=number of independent variables.
d. F ratio
F = SSreg/dfreg dfreg=k
SSres/dfres dfres=N-k-1
2. Parameter
Statistics
a. Standard
error of b:
Sby1.2=
b.
t-test
t= b1/Sby1.2
Note:
the larger r12 the larger the Sby1.2.
This may result in a significant test of the
regression model but nonsignificant tests of the b's. Under these conditions, it is difficult to determine the effects
of the xi's. This is one of the symptoms of multicollinearity.
III.
Multiple predictors
i.e., testing the increment in proportion of
variance explained (change in R2).
F = SSfm-SSrm/dffm-rm =
(R2y.12...kfm-R2y.12...krm)/(kfm-krm)
SSres(fm)/dfres(fm) (1 - R2y.12...kfm)/(N
- kfm - 1)
k: number of variables
fm: full model; rm: reduced model
This is useful for testing whether the kfm
- krm added variables have an effect over and above effect of the krm
variables in the reduced model; i.e. whether some sub-set of regression
coefficients = 0.
C. Testing the equality of regression coefficients
1. Given Y =
a + b1X1 + b2X2 + ... + bkXk
, One may wish to test hypothesis that some subset of the true bi
are all equal. To do so, create a new
variable W = Sxi of interest and compare the R2 of this reduced
model with the original full model as above.
2. Example:
test whether b1 = b2 in
(1) Y = a + b1X1 + b2X2 + b3X3
let W = X1 + X2,
then
if b1 = b2 (2)
Y = a + bwW + b3X3
compare
R2 from model (2) with R2 from (1)
3. When comparing only 2 b's, one can use a t-test.
D. Testing constraints on regression coefficients
1. One can
use similar methods to test other constraints on the possible values of the bi's.
2.
Example: test whether b1
+ b3 = 1 in (1) Y = a + b1X1 + b2X2
+ b3X3
let b3=1-b1
then
substituting in (1) Y = a + b1X1
+ b2X2 + (1 - b1)X3
Y = a +
b1X1 + b2X2 + X3 - b1X3
Y - X3
= a + b1(X1 - X3) + b2X2
let
Y* = Y - X3 and V = X1 - X3
then
fit (2) Y* = a + b1V + b2X2
and
compare the R2 of this reduced model to that of the original full
model.
IV.
Problems depending on goals of regression models: Prediction
A. One can
have several models with adequate fit to the data, to decide which is
preferable, one must know what the goal of the study is: prediction or
explanation. Multiple regression is used both as a tool for understanding
phenomena and for predicting phenomena.
Although explanation and prediction are not distinct goals, neither are
they identical. The goal of prediction
research is usually to arrive at the best prediction possible at the lowest
possible cost.
1. Inclusion
of irrelevant variables leads to loss of degrees of freedom (a minor
problem) and when the irrelevant variables are correlated with included
relevant variables, the standard errors of the latter will be larger than they
would be without the added irrelevant variables.
2. Omission
of relevant variable(s) causes the effect of omitted variable(s) to be
included in the error term and when the omitted variable is correlated with the
included variable(s), its omission biases the b's of the included variable(s).
a. Example: if the true model is: Y =a + by1.2x1
+ by2.1x2 + e
and one fits: Y' = a' + by1x1
+ e'
then by1
= by1.2 + by2.1b21
Where b21 is coefficient from regression
of x1 on x2: x2
= b21x1 + e"
and b21=r21(s2/s1). That is, the estimate of the effect of X1
on Y is biased by the effect of X2 on Y to the extent that X1
and X2 are correlated.
Note: in multiple independent variable models, the omission of relevant variables may only affect some of the b's greatly. Effect is worrisome to the extent that variables of interest are highly correlated with omitted variable and no other variable that is highly correlated with the omitted variable is included.
3. Selection
techniques
a. All
possible subsets regression. This
is the best (indeed the only good) solution to the problem of empirical
variable selection. However, the amount
of necessary calculation may be unwieldy, e.g. with 6 independent variables
there are:
6 models
with 5 variables
15 models with 4 variables
20 models with 3 variables
15 models with 2 variables
6 models
with 1 variable
b. Stepwise regression.
Two strategies are possible. In forward
selection the variable that explains the most variance in the dependent measure
is entered into the model first. Then
the variable explaining the most of the unexplained variance is entered in
next. The process is repeated until no
variable explains a significant portion of the remaining unexplained
variance. In backward selection,
all of the variables are entered into a model.
Then the variable that explains the least variance is omitted if its
omission does not significantly decrease the variance explained. This process is then repeated until the
omission of some variable leads to a significant change in the amount of
variance explained. The order of
entrance of variables determines which other variables are included in the
model.
1. Variable 1 would enter 1st because it explains
the most variance in Y.
2. Variable 3 would enter 2nd because it explains
the greatest amount of the remaining variance.
3. Variable
2 might not because it explains very little of the remaining variance. Leaving variables 1 and 3 in the
equation. However, variable 2 accounts
for more variance than 3.
Backward
1. Variable 3 would leave because it explains the
least variance. Leaving variables 1 and
2 in the equation.
Moral: Don't do
stepwise regression for variable selection.
If you do, at least do it several ways.
c. Selection
by using uniqueness and communality estimation. Sometimes predictors are selected according to the amount of
variance explained by a variable that is explained by no other variable
(uniqueness). This technique may be
useful for selecting the most efficient set of measures.
V.
Problems depending on goals of regression models: Explanation
A. Biggest
new problem is multicollinearity: high
correlations between predictors. It
distorts regression coefficients and may make the entire model unstable and/or
inestimable. In simple linear
regression, if there is little variance in X, one cannot determine which line
through the mean of Y is the best line.
This is unimportant if you don't want to predict off of the observed x
value. In multiple regression, if the range
of some xi is restricted or the xi's are multicollinear,
one will have multiple possible best planes through the line. It will be impossible to determine which is
the "best" line or to isolate the effects of individual variables
(since this requires projection off of the line). Regression in these circumstances is very sensitive to outliers
and random error.
B. Symptoms
of multicollinearity
1. Large
changes in the estimated b's when a variable is added or deleted.
2. The
algebraic signs of the b's do not conform to expectations (e.g. r with y
variable has opposite sign).
3. b's of purportedly important variable have large
SE's
C. Detecting
multicollinearity
1. Think
about variables and check for "high" intercorrelations.
2. Observe
correlation matrix.
3. Examine
tolerances.
a.
Tolerance
for xj is defined as 1-R xj. xi.(xj)..xk 2
b.
It
is a measure of the variance in a predictor that cannot be explained by all of
the other variables in the model. It is
the 1-R2 that would be obtained from a regression on a predictor
of all of the other predictors in a model.
c. A
tolerance of 1 would be achieved if the predictors are independent. A tolerance of 0 would be obtained if the
predictor could be explained by a linear combination of the other predictors.
4. Test
determinant of correlation matrix.
a. Calculate
|R|. If the matrix is multicollinear,
the determinant will be near 0; if its OK, the determinant will be near 1.
b. Find the
source of the multicollinearity.
Examine R-1 (if estimable); the diagonals should be near
1. Larger values indicate
collinearity; off-diagonal elements should be near 0.
c.
Demonstration: when r12
= 1, B = R-1 r is undefined:
R= |R|=1
- r122
adj R =
R-1=
but if r12 = 1, then |R| = 0 and one cannot divide by zero.
When r12 is close to 1, there will be
large diagonal elements in the R-1 matrix. For example, if r12=.96, the
minor diagonal elements will be:
-.96 = 12.24
1-(.96)2
D. Remedies
for multicollinearity
1. Regression
on principal components. Principal
components is a technique by which new variables are created by combinations of
existing variables so that each PC is independent of all others. However, the bi's from
regressions on PC's may be hard to interpret (but if one is only interested in
prediction, this will take care of multicollinearity problems).
2. Create
a new variable that is a specified combination of the collinear variables
and regress on the new variable. This
is a special case of imposing constraints on a model.
e.g. Y = a + b1X1
+ b2X2 + b3X3
let W = X1 + X2
Y = a + b1' W + b3X3
3. Regress
other variables on culprit xi and use the residuals from this
regression as independent variables.
(Caution: if there is collinearity in this regression, one may have
biased residuals). One may also have
trouble interpreting the bi's produced by this technique.
4. Dump
the variable. This will cause
misspecification (omitted variable) error i.e., it will bias the estimates of
the included variables.