1. Probability as Relative Frequency

Probability (p) can be understood as relative frequency.

The probability of getting a particular outcome, when choosing at random, is the same as the relative frequency of that outcome in a distribution of possible outcomes. You all know how to do this already (sibling example, card example).

Note that if I pick a card and I see it and you don't, the probability that it is a spade, or a heart, or red, is different for me (0 or 1) than it is for those who DON'T see the card, but must guess.

POINT: Probabilities are calculated for things that we DON'T know --if we know what has or will happen, we don't normally worry about probabilities. Probabilities are a way to make predictions about what is more or less likely, and about HOW likely different outcomes are.

2. Samples versus Population

Problem: 1. We want to know what is true about a population of scores but

2. We don't have and often can't get all the scores

Solution:

1. Take a sample of scores.

2. Calculate the statistics for the sample

3. Use sample statistics to make a guess about

the population parameters.

Sampling Strategy:

Random samples yield the best guesses.

Techniques like multistage cluster sampling (described in your book) provide a good approximation of random sampling.

Biased samples yield biased guesses.

NOTE: Provided good sampling technique is followed, the size of the sample is what determines how good the guess will be, NOT the size of the sample relative to the size of the population. So a sample of 1000 people will give you just as good an estimate of a parameter for U.S. as a whole as using a sample of 1000 people to estimate a parameter of Oregon will. More explanation about why to follow in later chapters...

Mean for population is a PARAMETER

Mean for a sample is a SAMPLE STATISTIC

Different symbols are normally used for parameters and statistics. Aron and Aron use instead the rather clunky strategy of saying Parameter Mean vs Sample Mean. (They slip up on Figure 6-6, p. 119, you might notice). Most people use the symbols:

2 and (Sigma squared and sigma) to indicate population variance and standard deviation.

s2 and sd for sample variance and standard deviation.

(myuu) for population mean and x-bar [x with a line on top of it] for sample mean.

NOTE: If you encounter other symbols and wonder what they mean, ask me or TAs, in person or on Motet.

3. Normal Curve and Probability as Area

The normal curve shows how certain variables are distributed (women's height, chest size of Scottish men, scores on an intelligence test). It is mathematically precise. (You can go to web links and check out links for normal curve to find the lovely equation that generates the normal curve.)

If your population of scores or statistics is normally distributed, you can find probabilities of scores in a certain range by calculating the percentage of area under the curve.

STEPS:

1. Change X values to Z-scores

2. Sketch distribution, mark mean and sd

3. Mark approximate location of your Z-score(s)

and shade in area you want to find

4. Consult the normal curve table. Add and subtract as needed to find the exact area you want.

How does this relate to sampling?

ERRORS are normally distributed. Sample statistics vary from population parameters because of sampling error. Hence, sample statistics are normally distributed. We can use the normal curve (and some other math, to be developed in following chapters) to determine the precise probability of our estimate (the sample statistic) being wrong by different amounts.