Psychology 613
Data Analysis III
Prof. Bertram Malle
Spring 2008


Assignment 8

Multiple and Logistic Regression

The data set and program files for this assignment can be found in assign8.dat and assign8.prg. Alternatively, you can use the SPSS data file. The data set contains roughly 200 cases and has six variables. They are, in order:

The background to these (fictitious) data is that about 200 people survived a suicide attempt, went through a psychiatric hospitalization, and were tracked over the subsequent 3 years. The dependent variable, Suicide, indicates whether the person made a second suicide attempt within 3 years since release from the hospital.

1. Conduct exploratory data analyses (frequencies, correlations, bivariate plots), report any gross anomalies, and give a synopsis of the relations among the variables.

2. Run a (parametric) multiple regression, simultaneously entering all predictors. Answer the following questions in plain English and cite appropriate statistics as justification.

(a) How good is each predictor? In answering this question take into account both unique predictive variance (semi-partial r values for each predictor) and shared predictive variance. To really understand the latter you need to check first whether there is shared predictive variance. Compare the sum of the unique r2 values (all unique variance) with the total R2 (both unique and shared variance). Then examine whether the zero-order r values for each predictor with the DV differ from the corresponding semi-partial values. Establish, for each predictor, whether it has any predictive variance to offer and, if so, whether it is unique or shared with other predictors.
(b) Now also examine the intercorrelations among predictors and use this information to elucidate the changes from zero-order to semi-partial predictive effects for some of the variables. (Be prepared for somewhat unusual patterns, so don't suppress them...)
(c) Do we need all variables to predict suicide attempts?
(d) Overall, how well can we predict suicide attempts with the joint set of retained predictors?

3. Attempt a substantive (real-world) interpretation of the regression results -- i.e., what are risk factors for a suicide relapse?

4. Observe the residual plots to determine whether the parametric regression model provides a good fit to the data. What's the problem with having nonnormal residuals?

5. (a) Now run a logistic regression on the same data, again with backward elimination. (b) Comment on the -2LL numbers that float around in the output. What do they stand for and what do they tell us about the success of the logistic regression?

6. (a) How good is each predictor? Do we need all of them to predict suicide? (Again, use plain English as well as appropriate statistics to answer these questions.) (b) Compare your conclusions with those reached in the multiple regression run above.

Note 1: If you have an older version of SPSS, you may not see enough digits in the Exp(B) column to assess the actual log-odds reduction for SEX of .0004. If you double-click on the table to bring up the chart editor, right click on the cell w/ the statistic of interest (in this case, the Exp(B)) and increase the number of decimal places that SPSS includes. (Courtesy of Eric Olofsen)

Note 2: Desktop SPSS still doesn't seem to display the semi-partial rs, which are helpful for interpreting the results. You can compute them yourself with the formula I showed in class and reprint below or you can download this EXCEL file that does the calculation for you once you type in the three parameters. (But note that the EXCEL formula doesn't take signs into account. You have to transfer the sign from the coefficient B.)

7. Comment on the classification table of the final model. What is our hit rate, our false alarm rate? One could argue that in the case of suicides we want to minimize "misses"---that is, minimize cases in which "no suicide likely" is predicted but the person actually does attempt suicide. How could we decrease our proportion of misses?

8. As usual, write a one-page summary of your analyses, briefly justifying why you would prefer a logistic regression and then focusing on the results and interpretation of this analysis.

9. The page limit for this homework is 9.

Extra credit (up to 5 points):

Construct the linear combination u and plot the predicted probability p(event) against the values of u. As a reward you will get a pretty ogive curve.