The logic of inferential statistics.... some building blocks.

What we want to know:

Population parameter

Examples:

Mean height for Oregon women

Mean difference in stats anxiety for men & women

Proportion of Oregonians who prefer Bradley or Gore or Bush or Dole for president

What we have (in inferential statistics)

Sample statistic

Height of a few Oregon women

Difference in stats anxiety for men & women taking 302 this term

Results of a telephone poll of 500 Oregonians

We can use our statistic to make a guess at the parameter. How good is the guess? How far off is it likely to be? To know this, we need a sampling distribution. The sampling distribution allows us to figure out, for example, what the standard deviation of our statistic is (which is the typical distance it will be from the true parameter: if what we are guessing at is the mean).

In some cases, we can figure out what the sampling distribution should be because we know what the population is (example: our card deck). However, if we know about the population, why would we be doing inferential statistics in the first place? Usually we don't have the sampling distribution -- we only have a single sample.

What comes to our rescue is the central limit theorem, which tells us what the sampling distribution will be. It will be NORMAL, for one thing, because it is a distribution of measurements that include error.

Early on, the normal curve was called the "error law" because it described the distribution of errors in astonomical observations (which were full of error because of primitive telescopes). A sample mean is also a kind of measurement of the population, and it is typically in error (usually will not be equal to the true population mean)

Central Limit Theorem:

For any population with mean mu and variance sigma², the distribution of sample means (x-bar) for sample size n will approach a normal distribution with a mean of mu and a variance of sigma²/ n as n approaches infinity.

This means the following three things are true about sampling distributions:

1. Mean of x-bar will equal mu

2. Variance of x-bar will equal sigma²/ n

3. Distribution will be normal as n gets large OR if population distribution is normal

In practice, large is 20-30. When the population distribution is rectangular, sampling distribution becomes very close to normal with samples of 20. For skew, need samples of size 30 or so to correct skew, especially a strong skew.

Empirical sampling distribution:

1. Take a deck of cards.

2. Pick two cards at random

3. Take the average of the DENOMINATION of the cards

A=1, J=11, Q=12, K=13

(Class does this, then we create a distribution of the means on the whiteboard)

Notice that the means are starting to pile up more in the middle than on the sides, so shape is changing away from rectangular. Variance should be smaller. Mean may or may not be equal to 7---but if we kept sampling for a very long time, it would be.

Theoretical sampling distribution:

Can be computed by figuring out what the probability is of each sample mean. There are 6 ways to get a sample mean of 1, for example, 6 ways to get a sample mean of 13, 102 ways to get a sample mean of 7 [the true mean] (there are 6 ways to get two 7s, 16 ways to get a 6 and an 8, etc.).