CHAPTER 1: Distributions

Population:

Any complete set of observations or measurements; the entire set of individuals, objects, or events a researcher wants to study.

Examples:

1) all UO undergrads

2) all the cards in a deck

3) the hours that 302 students sleep on Saturday and Sunday

Sample:

A subset of observations from a population, used to infer what is true of the population

Examples:

1) the students in this class

2) a pok

3) hours that students in your small group slept on Sat & Sun

Descriptive statistics:

Techniques to summarize and organize observations (data). This area of statistics is concerned with describing the data you have collected.

Example:

1.Distribution of hours of sleep Sat. for this class.

2. Census data about how the population of the United States is distributed across regions, states, urban, suburban, rural.

Summary data about the distribution of cards in a regular playing card deck:

52 cards:

50% red, 50% black

25% diamonds, __% hearts

___% picture cards

Inferential statistics:

Techniques that use data from samples to generalize about a population. Inferential statistics allow us to make conclusions that go beyond the data we actually have.

Example:

1. Use sleep data from your little group to infer how much sleep students in the whole class got this weekend.

2. From a sample of whole U.S. population that fills out the "long form," infer the answers to important questions such as "What percentage of U.S. households have indoor plumbing?"

3. From a sample of people polled, infer whether voters like Bush Jr. or Gore better.

Variables & Values, Populations & Samples.

Variables and Values:

A Variable is a characteristic that can have different values--that varies.

A Value is a particular score or number for a variable.

***What varies about playing cards, for example? How are they different?

***What were some of the variables the questions on the class questionnaire measured?

*** The values?

Variables can be numeric/quantitative or nominal/categorical

Examples of a quantitative variable:

1.Denomination of cards

2.Level of Stress, from 0 to 10

3.Hours of sleep

Examples of a nominal variable:

1.Color of cards

2.Sex

3.Political affiliation

A frequency distribution shows how frequent the different values are for a particular variable

Displaying, describing, summarizing distributions

You've learned two ways to display the information contained in a distribution: frequency table and histogram. Distributions are also described by words and numbers.

WORDS for SHAPE: Unimodal, bimodal, multimodal-- number of modes.

Rectangular--no mode

Normal. The normal distribution is a mathematical object, defined by an equation. The perfectly smooth shape is what we approach if we have lots of measurements on a very fine scale of a variable that is truly normally distributed.

Symmetric or asymmetric (skewed). Also called positively or negatively skewed. The skew refers to where the tail is--this is the direction the MEAN is pulled towards, away from the mode.

More heavy or light tailed than normal heavy=platykurtic light=leptokurtic

To remember:

platy--FAT tails, FLAT looking

lepto--LIGHT tails, LEAPS up

Numbers measuring the degree of skew & kurtosis can also be calculated, as noted by Knapp. These measure departure from symmetry (skew), and departure from normal curve in peakedness and tail thickness.

Chapter 2: Central tendency, variability, z-scores

In the last class, we talked about variables, values, and scores, about displaying & describing distributions. Frequency tables and graphs give all the data, ordered by values and frequencies. Words like unimodal, skewed, and rectangular describe the general shape of the distribution. In grouped frequency tables and graphs you start to lose some of the information, in the service of conciseness. Numbers are the next step--very concise, summarize all the scores in single numbers.

Today, I'm going to talk about PARAMETERS, which are numbers that describe a POPULATION of scores, and in particular, we will focus on measures of CENTRAL TENDENCY and VARIABILITY. SKEW & KURTOSIS, which we talked about as words to describe shape, can also be calculated and summarized with a number.

Parameters: Describing a distribution with numbers.

Parameters are numbers used to describe a the distribution of a Population.

An example would be the population average. Parameters are essential to descriptive statistics. They also play an important role in inferential statistics.

The most important parameters used in inferential statistics are the Mean and the Variance (or Standard Deviation).

The mean is the most important measure of central tendency; the variance (or standard deviation) is the most important measure of variability.

These two numbers are closely related. The variance is the sum of squared deviations from the mean. Thus you must know the mean to calculate the variance. Similarly, the mean is the balance point that minimizes the variance.

3. Mean, Median, Mode:

Mode: The most common score: the value with the highest frequency. There may be one mode, no modes, or multiple modes.

Median: The middle score, or the midpoint between the two middle scores, if there is an even number of scores. It divides the pile of scores in half, so half the scores are above the median and half the scores are below the median.

Mean: The mean is the average, the sum of all scores divided by the number of scores.

In symbols: M = X /N

NOTE: µ (muu) is also commonly used to denote the population mean.

In words and symbols: The mean M is equal to = the sum of all scores X divided by the number N of scores.

How mean, median and mode are related:

Unimodal symmetric distribution:

Mean = Median = Mode

Skewed distribution: What gets skewed, mainly, is the MEAN.

Median and mode are more resistant.

WHEN TO USE Which?

Use MODE for nominal data.

Use Median to describe seriously skewed distributions

Most inferential tests use the mean; some use the median. None use the mode.

4. Variance and Standard Deviation

Variance: SD2: The average squared deviation from the mean. More typical symbol is 2 (sigma squared)

Symbols: SD2: (X-M)2 / N

Four steps for calculating the variance:

1. Subtract the mean from each score

2. Square each of these deviation scores

3. Add them up [sum of squares, or SS]

4. Divide by N

Standard deviation: Positive square root of the variance.

Two steps:

1. Calculate the variance

2. Take the square root

Example using denomination of 4 cards

STEP 1 (deviation score) STEP 2 (square deviations)
A(1) 1-6= -5 25
5 5-6= -1 1
6 6-6= 0 0
Q(12) 12-6=6 36
STEP 3: (sum of squares) 62

STEP 4: divide by N: 62/4 = 15.5

How to remember this?

Deviate, Square, Average DSA

SD= square root of SD2

Standard deviation: Take square root

Two steps:

1. Calculate the variance

2. Take the square root

STEP 1-- Done. SD2= 15.5

STEP 2: Take square root of 15.5 = 3.937

NOTE: Knowing the mean tells us nothing about what the variance/standard deviation will be, and vice versa.





Segment 2: Z-scores

Z-score: A standardized score that indicates where a score is in a distribution

Z-scores help us know the relative value of a score--how far it is from the mean.

Formulas: Z = (X-M)/SD

Going backwards (find raw score from Z): X=(Z)(SD)+M

Find the Z-score of a Queen in terms of denomination.

What should we use for mean & standard deviation?

Knapp has calculated mean & sd for denomination:

M=7, sd = 3.742

Q= 12

Z = (X-M)/SD

Z= (12-7)/3.742 = 1.34