Psychology 613
Data Analysis III
Prof. Bertram Malle
Spring 2008
Multiple and Logistic Regression
The data set and program files for this assignment can be found in assign8.dat and assign8.prg. Alternatively, you can use the SPSS data file. The data set contains roughly 200
cases and has six variables. They are, in order:
1. Conduct exploratory data analyses (frequencies,
correlations, bivariate plots), report any gross anomalies, and give a
synopsis of the relations among the variables.
2. Run a (parametric) multiple regression, simultaneously entering
all predictors. Answer the following questions in plain English and cite
appropriate statistics as justification.
4. Observe the residual plots to determine whether the parametric
regression model provides a good fit to the data. What's the problem
with having nonnormal residuals?
5. (a) Now run a logistic regression on the same data, again with
backward elimination. (b) Comment on the -2LL numbers that float
around in the output. What do they stand for and what do they tell us
about the success of the logistic regression?
6. (a) How good is each predictor? Do we need all of them to predict
suicide? (Again, use plain English as well as appropriate statistics
to answer these questions.) (b) Compare your conclusions with those
reached in the multiple regression run above.
Note 1: If you have an older
version of SPSS, you may not see enough digits in the Exp(B) column to
assess the actual log-odds reduction for SEX of .0004. If you
double-click on the table to bring up the chart editor, right click on
the cell w/ the statistic of interest (in this case, the Exp(B)) and
increase the number of decimal places that SPSS includes. (Courtesy of
Eric Olofsen)
Note 2: Desktop SPSS still
doesn't seem to display the semi-partial rs, which are helpful
for interpreting the results. You can compute them yourself with the
formula I showed in class and reprint below or you can download this EXCEL file that does the calculation for you
once you type in the three parameters. (But note that the EXCEL
formula doesn't take signs into account. You have to transfer the
sign from the coefficient B.)
7. Comment on the classification table of the final model. What is
our hit rate, our false alarm rate? One could argue that in the case
of suicides we want to minimize "misses"---that is, minimize cases in
which "no suicide likely" is predicted but the person actually does
attempt suicide. How could we decrease our proportion of misses?
8. As usual, write a one-page summary of your analyses, briefly
justifying why you would prefer a logistic regression and then
focusing on the results and interpretation of this analysis.
9. The page limit for this homework is 9.
Extra credit (up to 5 points):
Construct the linear combination u and plot the predicted
probability p(event) against the values of u. As a
reward you will get a pretty ogive curve.
The background to these (fictitious) data is that about 200 people
survived a suicide attempt, went through a psychiatric
hospitalization, and were tracked over the subsequent 3 years. The
dependent variable, Suicide, indicates whether the person made a second
suicide attempt within 3 years since release from the hospital.
(b) Now also examine the intercorrelations among predictors and use
this information to elucidate the changes from zero-order to semi-partial
predictive effects for some of the variables. (Be prepared for somewhat
unusual patterns, so don't suppress them...)
(c) Do we need all variables to predict suicide attempts?
(d) Overall, how well can we predict suicide attempts with the
joint set of retained predictors?