1. Probability as Relative Frequency
Probability (p) can be understood as relative frequency.
The probability of getting a particular outcome, when choosing at random, is the same as
the relative frequency of that outcome in a distribution of possible outcomes. You all
know how to do this already (sibling example, card example).
Note that if I pick a card and I see it and you don't, the probability that it is a spade, or a
heart, or red, is different for me (0 or 1) than it is for those who DON'T see the card, but
must guess.
POINT: Probabilities are calculated for things that we DON'T know --if we know what
has or will happen, we don't normally worry about probabilities. Probabilities are a way
to make predictions about what is more or less likely, and about HOW likely different
outcomes are.
2. Samples versus Population
Problem: 1. We want to know what is true about a population of scores but
2. We don't have and often can't get all the scores
Solution:
1. Take a sample of scores.
2. Calculate the statistics for the sample
3. Use sample statistics to make a guess about
the population parameters.
Sampling Strategy:
Random samples yield the best guesses.
Techniques like multistage cluster sampling (described in your book) provide a good approximation of random sampling.
Biased samples yield biased guesses.
NOTE: Provided good sampling technique is followed, the size of the sample is what
determines how good the guess will be, NOT the size of the sample relative to the size of
the population. So a sample of 1000 people will give you just as good an estimate of a
parameter for U.S. as a whole as using a sample of 1000 people to estimate a parameter
of Oregon will. More explanation about why to follow in later chapters...
Mean for population is a PARAMETER
Mean for a sample is a SAMPLE STATISTIC
Different symbols are normally used for parameters and statistics. Aron and Aron use
instead the rather clunky strategy of saying Parameter Mean vs Sample Mean. (They slip
up on Figure 6-6, p. 119, you might notice). Most people use the symbols:
2 and (Sigma squared and sigma) to indicate population variance and standard
deviation.
s2 and sd for sample variance and standard deviation.
(myuu) for population mean and x-bar [x with a line on top of it] for sample mean.
NOTE: If you encounter other symbols and wonder what they mean, ask me or TAs, in
person or on Motet.
3. Normal Curve and Probability as Area
The normal curve shows how certain variables are distributed (women's height, chest
size of Scottish men, scores on an intelligence test). It is mathematically precise. (You
can go to web links and check out links for normal curve to find the lovely equation that
generates the normal curve.)
If your population of scores or statistics is normally distributed, you can find
probabilities of scores in a certain range by calculating the percentage of area under the
curve.
STEPS:
1. Change X values to Z-scores
2. Sketch distribution, mark mean and sd
3. Mark approximate location of your Z-score(s)
and shade in area you want to find
4. Consult the normal curve table. Add and subtract as needed to find the exact area you
want.
How does this relate to sampling?
ERRORS are normally distributed. Sample statistics vary from population parameters because of sampling error. Hence, sample statistics are normally distributed. We can use the normal curve (and some other math, to be developed in following chapters) to determine the precise probability of our estimate (the sample statistic) being wrong by different amounts.