Learning Check on Errors, quick review of two types of error.
For stacked deck problem:
The card deck is:
Decision after the H-test |
Stacked | Normal | |
Reject null | Correct | Type I (alpha) | |
Retain null | Type II (beta) | Correct |
Note: There's a tradeoff between the two errors
The higher the alpha, the lower the beta
The lower the alpha, the higher the beta
You can't maximize both at the same time....
Effect Size
Effect size: How strong your "effect" is.
One measure: Standardized difference between two population means
In lab, there were two decks: A & B
Mean denomination for A: 8.5
Mean denomination for B: 10
(7 for normal deck)
Which has larger effect size? ___B___
Cohen's d: Standardized difference between two population means
(Mupop1 - Mupop2) / sigma
Cohen's conventions for effect size:
Small: .2 or smaller (d)
Medium: .5
Large: .8 or larger
Note: It doesn't really make sense to say you have a negative effect size, since this is a
meaure of the distance between means. Distance can only be positive, so the smallest
possible effect is zero, no effect.
Deck A:
Effect size = (8.5-7)/3.742 = .4
Deck B:
Effect size = (10-7)/3.742 = .8
Your Turn:
19th century Englishmen: Mean = 67 inches
Standard deviation = 3 inches
UO students = about 72 inches
(point estimate from our class sample)
(Estimated) effect size? (72-67)/ 3 5/3 = 1.67 Big effect! Note that effect size can
be larger than one.
Statistical power:
Probability that a study will yield a significant result when the research hypothesis is true.
Power = 1 - probability of Type II (false negative) error, or 1- beta.
If p (Type II) is .20, power is 1- .20 = .80
If cancer test misses 20% of cancers (.20), then the power of the test is 1 - .20 = .80.
If it misses 50% of cancers (.50), then the power of the test is 1-.50 = .50.
How are power and effect size related?
The larger the effect, the larger the power
[The larger the tumor, the easier it is to detect]
This seems weird because it is the "test" that presumably has power, but we are pretending the test has more power when it tackles an easier task. That's like saying my eyes are more powerful when I look at large type and weaker when I look at small type.
This seems less paradoxical if you remember that power = a probability of getting
something right. So when I look at large type, I'm more likely to read it correctly... than
when I look at small type. This makes intuitive sense.
What affects power?
1. Effect size (larger effect, higher power) Easier to detect large effects...
2. Sample size (larger sample, higher power)
3. Sigma (smaller sigma, higher power)
Why? Because they create more separation between the sampling distributions for the two populations.
Mathematical reasoning: The degree to which the sampling distribution "bunches up"
around the mean is determined by the standard error (SE). The formula for SE = square
root of sigma2/ n. When you have a fraction, increasing the number on the bottom
makes the quantity smaller; decreasing the number on the top makes the quantity smaller
as well.
4. Significance level
(higher alpha, higher power)
Why?
Let's reason this out:
higher alpha = lower beta
power = 1-beta
the smaller the beta, the higher the power
Draw on your intuitive understanding of what happens when you subtract. When you
subtract a large quantity, you have less left over. When you subtract a small quantity, you
have more left over.
5. Tails (one-tailed, more power)
**But ONLY if you guessed right on direction
If you guessed wrong, power will be way low
Show Pictures: See the illustrations in your A&A book in Chapter 7. Recommend
you mark up which picture illustrates what.
Figure 7-1 is the "baseline" with n = 64, one-tailed, alpha = .05, true mu for the 5th graders who got coaching = 208.
7-2 varies effect size, making it larger. Result? Increased power.
7-6 and 7-7 both vary sample size, making it larger. Result? Increased power.
7-8 reduces the significance level (smaller alpha). Result? Decreased power (because smaller alpha means bigger beta, and power = 1- beta).
7-9 changes to 2-tailed.