Answers to Practice Questions for Final:

Part III: Data Analysis

Problem 1:

a. Do the candidates differ in popularity?

b. Chi-square goodness of fit, equal proportions

c. The data is categorical (hence chi-square). We are looking to see if candidates DIFFER in popularity from one another (hence goodness of fit equal proportions).

d. Ho: All candidates are equally popular. Ha: Candidates differ in popularity (or) All candidates are not equally popular

e. (Step 1 already completed in d, and alpha provided). NOTE: You should be getting out your ChiSquare handout at this point to guide you for the rest of the problem.

Step 2: Critical region/value: Depends on df, formula for GF chi-square is C-1, so df. 3-1 =2. [Give formula! Show work!]

For alpha = .05, critical value will be 5.99 (look up in table). (NOTE: If you are a visual thinker, you might want to draw the chi-square distribution and mark your critical region as well. Remember mantra: Shaded, reject; Unshaded, retain)

Step 3: Calculate test statistic. [Give formulas! Show work! Draw boxes (as in handout) and label them Observed, Proportions, Expected, to keep everything straight]

Formula for expected frequencies is pn which will be the probability for each cell (100%/3 = 33.3%, or .333) times your total number of candidates [100] = 33.33 for each cell. Write down the formula for chi-square and follow out the calculations: (43-33.33)²/33.33 + (45-33.33)²/33.33 + (12-33.33)²/33.33 =2.81 + 4.09 + 13.65 = 20.55.

[Note: Your answer might be slightly different due to rounding. It's a good idea to keep two decimal points (33.33 instead of 33.3 for greater accuracy. Another note: Because n=100 for this problem, the probability equals the predicted number. But if n = 200, putting 33.33 in the boxes would NOT work, because when you multiplied the probability times the N, you'd get 66.6. Make sure that the FREQUENCY is what appears in both your "observed" and your "expected" boxes]

Step 4. Decide about Ho. 20.55 > 5.99 (falls in shaded region) so reject Ho.

Step 5. Answer research question and give results in APA style: Candidates are not all equally popular, X² (2, n = 100) = 20.55, p < .05. [Also < .01, by the way]. It appears that Joe Bloggs (12 out of 100) is trailing Sue Shoemaker (43) and Jane Doe (45).

[Note: When you get a significant result, look back at your observed counts and your chi-square calculations to see what looks like the main pattern. The biggest component in the chi-square is for Bloggs (13.65) which is why I focused on him.]

Problem 2: Gender gap

a. Do voters' preferences for the candidates depend on their gender? (OR) Are people's political preferences related to their gender?

b. Chi-square for independence

c. We have two categorical variables. We are looking for a possible relation (association) between these two variables: gender of voter and candidate preferred.

d. Ho: Preference for candidate is independent of gender

Ha: Preference for candidate will depend on gender

e. (Step 1 already completed in d, and alpha provided). NOTE: You should be getting out your ChiSquare handout at this point to guide you for the rest of the problem. Step 1: completed in d.

Step 2: Critical region/value: Depends on df. Formula for Chi-square Independence is (C-1)(R-1). So df = (2-1)(2-1) =1, so

For alpha = .05, critical value will be 3.84 (look up in table). (NOTE: If you are a visual thinker, you might want to draw the chi-square distribution and mark your critical region as well. Remember mantra: Shaded, reject; Unshaded, retain)

Step 3: Calculate test statistic. [Give formulas! Show work! Draw boxes (as in handout) and label them Observed, Proportions, Expected, to keep everything straight]

Formula for expected frequencies is fc fr / n for each cell. So you need to calculate your row and column totals and find your total n. Add these numbers to margins of the Observed box (given in problem). This will be 100 / 100 for men, women (the row totals), and 105 / 95 for Heather and Eric (the column totals). The overall n is 200. [Note: you can check your math by making sure the two row totals and the two column totals both add up to the same n]

For men/Heather, the expected cell value will be (100)(105) / 200 = 52.5. Note: Because you know the row and column totals, you can figure all the other expected values once you have one. They will look like this: Note: you can double check your math in figuring the expected frequencies by checking to ensure that you get the same row and column totals as you did for "observed" matrix.

	Heather	Eric
Men	52.5	47.5	(adds to 100)
Women	52.5	47.5	(adds to 100)
(Adds to)	105	95	(Total n is 200 both ways)

Write down the formula for chi-square and follow out the calculations: (45-52.5)²/52.5. + (55-47.5)²/47.5 + (60-52.5)²/52.5 + (40-47.5)²/47.5 = 1.07 + 1.18 + 1.07 + 1.18 = 4.5.

Step 4. Decide about Ho. 4.5 > 3.841 [in the shaded region] so reject Ho.

Step 5. Answer research question and give results in APA style: Gender and preference for candidate are related, X² (1, n = 200) = 4.5, p < .05. Women are more likely to prefer the female candidate, Heather, while men are more likely to prefer the male candidate, Eric.

Problem 3:

a. Ho: Myuu-10 = Myuu-1 = Myuu-3. Ha: All means are not the same.

b. dfb = k -1. 3-1 = 2. dfw = N - k. 12 - 3 = 9. Critical value at .01 level for F (2, 9) = 8.02

[NOTE: be sure you look up the value for the right alpha level. I specified .01 for this problem, so correct answer is 8.02 , not 4.26. Actually, this is a stupid choice of alpha given the low n, but if you decide to veto my directions, be sure you give a justification or I'll just think you used the table wrong...]

c. 10 am mean = 9. 1 PM T = 29. 3 PM SS = 2. G = 97

Formulas used: mean: EX/n [that E should be the Greek Sigma sum sign]. G: EX or ET (again, using the Greek summation sign). SS: E (X-x-bar)² or EX²-(EX)² / N.

[NOTE: The efficient (lazy?) statistician will have noticed that the definitional formula for SS makes calculation a piece of cake here: 1²+ 0² + 0² + 1² = 2. The computational formula (which sometimes makes computations easier) actually is more laborious here. Why make more work for yourself? Stats is hard enough when you have to do it by hand.]

Source	SS	df	MS	F
Between	6.167	2	3.084	4.112
Within	6.75	9	.75
Total	12.917	11

Formulas and calculations for filling in the table [Be sure to give formulas and show work!]

SS_W = E SS_i 2 + 2.75 + 2 = 6.75

MSb = SSb/dfb = 6.167/2 = 3.084. MSw = Ssw/dfw = 6.75/9 = .75. F = MSb/MSw = 3.84/.75 = 4.112

e. Decision about Ho: 4.112 < 8.02, so retain [fail to reject] Ho.

f. Answer to research Q and result in APA style: No significant differences were found in the performance of students taking classes at different times of day, F(2,9) = 4.112, NS. However because this was such a low n study, and we used the conservative alpha of .01, it's quite possible that there is a real difference but our test is not powerful enough to detect it.

[NOTE: Don't just look at failure to reject Ho and say there IS no difference. This is the mistake of "affirming the null." If you want to REALLY impress me, and you have finished the rest of the test, go ahead and calculate the estimated effect size (formula on your handout) sqrt of F/ sqrt of n = 2.028 / 2 = 1.014, which is LARGE. Advise the psychology professor to use all of her data and run an ANOVA using SPSS before she gives up on her research hypothesis.]

Problem 4:

Note: The focus here is on using what you know to "read" results as they might be presented in a research article. What you are given are two separate correlation matrixes. To find correlations between two variables, look for the intersection of those variables reading down from column labels and across from row labels.

1. No. 2. Yes [-0.11 for Non-gambling, 0.62** for Gambling]

3. For gambling boys, there is a significant positive relationship (.05 level) between fighting and three other variables: Alcohol/drug use, theft rates, and anxiety ratings by their mothers. For non-gambling boys, none of these are significantly correlated, and the obtained correlations are either close to zero or slightly negative (for alcohol drug use). For gambling boys, theft rates and anxiety are also positively correlated. For non-gambling boys, the correlation is negative (although not significantly so).