**CHAPTER 1: Distributions**

**Population**:

Any complete set of observations or measurements; the entire set of individuals, objects, or events a researcher wants to study.

*Examples: *

1) all UO undergrads

2) all the cards in a deck

3) the hours that 302 students sleep on Saturday and Sunday

**Sample:**

A subset of observations from a population, used to infer what is true of the population

*Examples: *

1) the students in this class

2) a pok

3) hours that students in your small group slept on Sat & Sun

**Descriptive statistics:**

Techniques to summarize and organize observations (data). This area of statistics is concerned
with describing the data you have collected.

*Example: *

1.Distribution of hours of sleep Sat. for this class.

2. Census data about how the population of the United States is distributed across regions, states,
urban, suburban, rural.

Summary data about the distribution of cards in a regular playing card deck:

52 cards:

50% red, 50% black

25% diamonds, __% hearts

___% picture cards

**Inferential statistics:**

Techniques that use data from samples to generalize about a population. Inferential statistics
allow us to make conclusions that go beyond the data we actually have.

*Example: *

1. Use sleep data from your little group to infer how much sleep students in the whole class got this weekend.

2. From a sample of whole U.S. population that fills out the "long form," infer the answers to important questions such as "What percentage of U.S. households have indoor plumbing?"

3. From a sample of people polled, infer whether voters like Bush Jr. or Gore better.

**Variables & Values, Populations & Samples. **

*Variables and Values:*

A **Variable** is a characteristic that can have different values--that varies.

A **Value** is a particular score or number for a variable.

***What varies about playing cards, for example? How are they different?

***What were some of the variables the questions on the class questionnaire measured?

*** The values?

Variables can be **numeric/quantitative** or **nominal/categorical**

Examples of a quantitative variable:

1.Denomination of cards

2.Level of Stress, from 0 to 10

3.Hours of sleep

Examples of a nominal variable:

1.Color of cards

2.Sex

3.Political affiliation

A **frequency distribution** shows how frequent the different **values** are for a particular **variable**

**Displaying, describing, summarizing distributions**

You've learned two ways to display the information contained in a distribution: frequency table
and histogram. Distributions are also described by** words** and **numbers**.

WORDS for SHAPE: Unimodal, bimodal, multimodal-- number of modes.

Rectangular--no mode

Normal. The normal distribution is a mathematical object, defined by an equation. The perfectly
smooth shape is what we approach if we have lots of measurements on a very fine scale of a
variable that is truly normally distributed.

Symmetric or asymmetric (skewed). Also called positively or negatively skewed. The skew
refers to where the tail is--this is the direction the MEAN is pulled towards, away from the mode.

More heavy or light tailed than normal heavy=**platykurtic** light=**leptokurtic**

To remember:

platy--FAT tails, FLAT looking

lepto--LIGHT tails, LEAPS up

Numbers measuring the degree of skew & kurtosis can also be calculated, as noted by Knapp. These measure departure from symmetry (skew), and departure from normal curve in peakedness and tail thickness.

Chapter 2: Central tendency, variability, z-scores

In the last class, we talked about variables, values, and scores, about displaying &
describing distributions. Frequency tables and graphs give all the data, ordered by values
and frequencies. Words like unimodal, skewed, and rectangular describe the general
shape of the distribution. In grouped frequency tables and graphs you start to lose some
of the information, in the service of conciseness. Numbers are the next step--very
concise, summarize all the scores in single numbers.

Today, I'm going to talk about PARAMETERS, which are numbers that describe a
POPULATION of scores, and in particular, we will focus on measures of CENTRAL
TENDENCY and VARIABILITY. SKEW & KURTOSIS, which we talked about as words to
describe shape, can also be calculated and summarized with a number.

**Parameters: Describing a distribution with numbers. **

**Parameters** are numbers used to **describe** a the distribution of a **P**opulation.

An example would be the population average. Parameters are essential to
descriptive statistics. They also play an important role in inferential statistics.

The most important parameters used in inferential statistics are the **Mean** and the
**Variance (or Standard Deviation). **

The mean is the most important measure of **central tendency**; the** variance (or
standard deviation)** is the most important measure of variability.

These two numbers are closely related. The variance is the sum of squared
deviations** from the mean**. Thus you must know the mean to calculate the
variance. Similarly, the mean is the balance point that **minimizes the variance**.

3. Mean, Median, Mode:

**Mode:** The most common score: the value with the highest frequency. There may
be one mode, no modes, or multiple modes.

**Median: **The middle score, or the midpoint between the two middle scores, if
there is an even number of scores. It divides the pile of scores in half, so half the
scores are above the median and half the scores are below the median.

**Mean:** The mean is the average, the sum of all scores divided by the number of
scores.

In symbols: **M = X /N**

NOTE: µ (muu) is also commonly used to denote the population mean.

In words and symbols: The mean M is equal to = the sum of all scores X
divided by the number N of scores.

How mean, median and mode are related:

Unimodal symmetric distribution:

**Mean = Median = Mode**

Skewed distribution: What gets skewed, mainly, is the MEAN.

Median and mode are more resistant.

**WHEN TO USE Which?**

Use **MODE** for **nominal** data.

Use **Median** to describe seriously skewed distributions

Most inferential tests use the **mean**; some use the median. None use the mode.

**4. Variance and Standard Deviation**

**Variance:** SD^{2}: The average squared deviation from the mean. More typical
symbol is ^{2 }(sigma squared)

Symbols: **SD ^{2}: (X-M)^{2 }/ N**

Four steps for calculating the variance:

1. Subtract the mean from each score

2. Square each of these deviation scores

3. Add them up [sum of squares, or SS]

4. Divide by N

**Standard deviation:** Positive square root of the variance.

Two steps:

1. Calculate the variance

2. Take the square root

Example using denomination of 4 cards

STEP 1 (deviation score) | STEP 2 (square deviations) | |

A(1) | 1-6= -5 | 25 |

5 | 5-6= -1 | 1 |

6 | 6-6= 0 | 0 |

Q(12) | 12-6=6 | 36 |

STEP 3: (sum of squares) | 62 |

STEP 4: divide by N: 62/4 = 15.5

How to remember this?

Deviate, Square, Average DSA

SD= square root of SD^{2}

**Standard deviation:** Take square root

Two steps:

1. Calculate the variance

2. Take the square root

STEP 1-- Done. SD^{2}= 15.5

STEP 2: Take square root of 15.5 = 3.937

NOTE: Knowing the mean tells us nothing about what the variance/standard
deviation will be, and vice versa.

**Segment 2: Z-scores**

**Z-score:** A standardized score that indicates where a score is in a distribution

Z-scores help us know the relative value of a score--how far it is from the mean.

**Formulas: **Z = (X-M)/SD

Going backwards (find raw score from Z): X=(Z)(SD)+M

Find the Z-score of a Queen in terms of denomination.

What should we use for mean & standard deviation?

Knapp has calculated mean & sd for denomination:

M=7, sd = 3.742

Q= 12

Z = (X-M)/SD

Z= (12-7)/3.742 = 1.34