**Chapters 1-4 overheads from lecture. **

*Note: *When I translate into html, I lose Greek symbols. So I'm using M for myuu, SD
for the standard deviation (sigma), x-bar, and other words such as sigma instead of the
symbols.

**Chapter 1:**

**Descriptive statistics:**

Techniques to summarize and organize a whole set of observations (data). This area of statistics is concerned with describing the data you actually have.

*Examples: *

Distribution of hours of sleep for this class.

Census data about how the population of the United States is distributed across regions, states, urban, suburban, rural, how much money people in different regions make, etc., etc.

What percentage of people voting chose McCain over Bush in primaries in the different states

**Inferential statistics:**

Techniques that use data from samples to generalize about a population. Inferential statistics allow us to make conclusions that go beyond the data we actually have.

*Example: *

Use sleep data from your little group to infer how much sleep students in the whole class got last night.

From a sample of whole U.S. population that fills out the census long form, infer the answers to questions such as "What percentage of U.S. households have access to the Web at home?"

From a sample of people interviewed on the phone, infer whether voters prefer Bush or Gore for president

**Frequency distribution table:**

Score (X) | Frequency (f) |

(Hours) | (Number of people with this score) |

3 | 1 |

4 | 2 |

5 | 6 |

etc. | etc. |

To make a histogram from this data, put the scores on the X-axis (horizontal) and the frequency (number of people/events/objects with this score) on the Y-axis (vertical). If instead of counting up frequencies you made a mark for each person, when you turned the frequency table on its side you would see a rough histogram.

*Variables and Values:*

A **Variable** is a characteristic that can have different values--that varies.

A **Value** is a particular score for a variable. Also called observation, measurement

Examples:

Amount of sleep is a *variable*. 3, 5, 9 hours of sleep are different *values* for that variable.

Sex is a *variable*. Male or female are different *values* for that variable

A **frequency distribution** shows how frequent the different **values** are for a particular
**variable**

Variables can be** continuous **or **discrete.**

Examples of continuous variables: Stress level, Amount of sleep

Examples of a discrete variables: Country, Number of siblings

Question: Are these variables discrete or continuous?

State of birth _____________

Height __________________

Difficulty of college classes ____________

Popularity of math as a subject ____________

There are at least 4 distinct measurement scales:

Nominal (categorical, think names)

Ordinal (think ordinal numbers-1st, 2^{nd}, etc)

Interval (think equal intervals)

Ratio (needs an absolute zero)

*NOTE:* Both discrete and continuous variables can be measured on multiple scales**. **Many
psychological variables are measured on scales that are treated as interval but in actuality
are better than ordinal but not perfectly interval (that is, the intervals may only be roughly
equal).

When we do experiments, we call variables independent or dependent based on their role in the experimental design. An experiment manipulates (controls the values of) one or more variables and then measures others, which are free to vary.

The manipulated variable are called **independent variables** (IVs).

The variables that are free to vary are called **dependent variables **(DVs).

**NOTE**: Independent and dependent are not inherent qualities of the variables. Instead
they depend on the research design. A variable can be the IV in one experiment and the
DV in another.

**Chapter 2:**

*Displaying, describing, summarizing distributions:*

Two ways to display the information contained in a distribution: **frequency table** and
**histogram. **

Distributions are also described by ** words** and **numbers**.

WORDS for SHAPE:

Modes: Unimodal, bimodal, multimodal. (Rectangular has no mode). The two modes in a bimodal distribution may be equal or unequal(major and minor).

Normal distribution (bell curve). This is a mathematical object, defined by an equation. It shows relative frequency (no numbers on the y axis). Many variables are distributed in a way that is approximately normal.

Distributions may be **symmetric** (mirror image, like the normal curve) or** asymmetric**
(skewed).

Skew may be positive (skewed to the right) or negative (skewed to the left).

The skew is where the skinny tail of the distribution is.

**Population:** Any complete set of observations or measurements; the entire set of
individuals, objects, or events a researcher wants to study.

Examples:1) all UO undergrads 2) all Oregonians

**Sample:** A subset of observations from a population, used to infer what is true of the
population

Examples: 1) the students in this class 2) the Oregonians who got the long census form

**Parameters** are numbers used to **describe** a the distribution of a **P**opulation.

The parameters used most in inferential stats are **Mean,** **Variance, **and** Standard
Deviation. **

Mean is a measure of central tendency;

Variance and standard deviation are measures of variability

**Chapter 3. Central tendency: Mean, Median, Mode**

**Mean:** The average-- sum of all scores divided by N.

Greek letter "Myuu" used for population mean; symbol "x-bar" for sample mean.

In symbols: **M = [sum of X] /N.** For samples, **x-bar = [sum of x]/n**

The mean M is equal to = the sum [capital sigma, looks like a big E] of all scores X divided by the number N of scores.

**Median: **The score of the middle person (odd N), or the midpoint between the scores of
the two middle people (even N). N= number of people/objects in the population. The
median divides N (the number of scores) in half. It does NOT divide the x-axis (the
range of values) in half (most common misconception). When finding the median of a
distribution, you should be dividing the area of the distribution in half (not the number
line).

*How mean, median and mode are related: *

In a unimodal symmetric distribution, **Mean = Median = Mode**

In a skewed distribution, they are not equal: What gets skewed, mainly, is the MEAN, and it is skewed toward the tail, away from the main bulk of the distribution.

The median is more resistant to skew, and the mode is not affected.

**WHEN TO USE the different measures:**

Use **Mode** for **nominal** data.

Use **Median** to describe seriously skewed distributions (like income, house prices).

Most inferential tests use the **mean**; some use the median. None use the mode.

**Chapter 4. Variability**

**Range:** Distance between largest score (Max) and smallest score (Min). For whole
numbers, formula is Max - Min +1 (plus one so that the Min is counted too)

Example: 1, 3, 8. Range is 8-1 +1 = 8 (1, 2, **3**, 4, 5, 6, 7, **8)**

**Population Variance and Standard Deviation**

**Variance:** sigma squared:

Mean squared deviation from the mean [MS].

Symbols: **sigma =[sum of (X-M) ^{2 }]/ N**

Four steps for variance:

1. Calculate deviation scores (X - M)

2. Square ( )^{2} each deviation score

3. Sum the squares (give SS, sum of squares)

4. Divide by N for mean square (MS)

Example: Xs (scores) are 1, 3, 4, 4, 4, 6, 6

N (number of scores) = 7, **Sum of X **= 28 ** **

M = **[sum of X] /N **= 28/7 = **4**

X-M |
(X-M)^{2} |

1-4 = -3 | (-3)^{2} = 9 |

3-4 = -1 | (-1)^{2} = 1 |

4-4 = 0 [x 3] | 0^{2 }x 3 = 0 |

6-4 = 2 [x 2] | 2^{2}=4 x 2 =8 |

**Sum of (X-M) ^{2 }**=18 (The sum of squares)

**Variance = [sum of (X-M) ^{2 }]/ N** = SS/N = 18/7 = 2.57

**Standard deviation:** sigma**:**

Mathematically: Positive square root of the variance.

Conceptually: The typical distance of scores from the mean.

(Not exactly the average distance, but close)

Two steps for standard deviation:

1. Calculate the variance

2. Take the square root

Our variance was 2.57, so SD = 2.57 = 1.6

From our example: The deviation scores were -3, -1, 0, 0, 0, 2, 2

Average distance from the mean using absolute values = 8/7=1.14. So the standard deviation is not EXACTLY the same as the average distance from the mean, instead it is a "typical" distance from the mean.

**Sample variance ( s^{2} )and standard deviation (s) **(used to estimate population variance
and standard deviation). The difference is that instead of dividing SS by N, you divide
SS by n-1.

*Exercise*

Calculate variance for the number of siblings in your group, using **[sum of (X-M) ^{2 }]/ N**
formula (population variance)

Xs = , **M**=

1. X-**M** =

2. (X-**M**)^{2 }=^{}

3. Sum of (X-**M**)^{2 }=

4. Sigma^{2} (variance, MS) =

Let's see how well these variances predict the TRUE population variance, by averaging them together: _________

**Sample variance:**

Now calculate the **SAMPLE** variance for your groups (*s*^{2}) dividing the SS by *n*-1 (the
degrees of freedom) instead of by N.

Hw well do these CORRECTED variance scores predict the true population variance? (should be a closer fit)