CHAPTER 1: Distributions
Population:
Any complete set of observations or measurements; the entire set of individuals, objects, or events a researcher wants to study.
Examples:
1) all UO undergrads
2) all the cards in a deck
3) the hours that 302 students sleep on Saturday and Sunday
Sample:
A subset of observations from a population, used to infer what is true of the population
Examples:
1) the students in this class
2) a pok
3) hours that students in your small group slept on Sat & Sun
Descriptive statistics:
Techniques to summarize and organize observations (data). This area of statistics is concerned
with describing the data you have collected.
Example:
1.Distribution of hours of sleep Sat. for this class.
2. Census data about how the population of the United States is distributed across regions, states,
urban, suburban, rural.
Summary data about the distribution of cards in a regular playing card deck:
52 cards:
50% red, 50% black
25% diamonds, __% hearts
___% picture cards
Inferential statistics:
Techniques that use data from samples to generalize about a population. Inferential statistics
allow us to make conclusions that go beyond the data we actually have.
Example:
1. Use sleep data from your little group to infer how much sleep students in the whole class got this weekend.
2. From a sample of whole U.S. population that fills out the "long form," infer the answers to important questions such as "What percentage of U.S. households have indoor plumbing?"
3. From a sample of people polled, infer whether voters like Bush Jr. or Gore better.
Variables & Values, Populations & Samples.
Variables and Values:
A Variable is a characteristic that can have different values--that varies.
A Value is a particular score or number for a variable.
***What varies about playing cards, for example? How are they different?
***What were some of the variables the questions on the class questionnaire measured?
*** The values?
Variables can be numeric/quantitative or nominal/categorical
Examples of a quantitative variable:
1.Denomination of cards
2.Level of Stress, from 0 to 10
3.Hours of sleep
Examples of a nominal variable:
1.Color of cards
2.Sex
3.Political affiliation
A frequency distribution shows how frequent the different values are for a particular variable
Displaying, describing, summarizing distributions
You've learned two ways to display the information contained in a distribution: frequency table
and histogram. Distributions are also described by words and numbers.
WORDS for SHAPE: Unimodal, bimodal, multimodal-- number of modes.
Rectangular--no mode
Normal. The normal distribution is a mathematical object, defined by an equation. The perfectly
smooth shape is what we approach if we have lots of measurements on a very fine scale of a
variable that is truly normally distributed.
Symmetric or asymmetric (skewed). Also called positively or negatively skewed. The skew
refers to where the tail is--this is the direction the MEAN is pulled towards, away from the mode.
More heavy or light tailed than normal heavy=platykurtic light=leptokurtic
To remember:
platy--FAT tails, FLAT looking
lepto--LIGHT tails, LEAPS up
Numbers measuring the degree of skew & kurtosis can also be calculated, as noted by Knapp. These measure departure from symmetry (skew), and departure from normal curve in peakedness and tail thickness.
Chapter 2: Central tendency, variability, z-scores
In the last class, we talked about variables, values, and scores, about displaying &
describing distributions. Frequency tables and graphs give all the data, ordered by values
and frequencies. Words like unimodal, skewed, and rectangular describe the general
shape of the distribution. In grouped frequency tables and graphs you start to lose some
of the information, in the service of conciseness. Numbers are the next step--very
concise, summarize all the scores in single numbers.
Today, I'm going to talk about PARAMETERS, which are numbers that describe a
POPULATION of scores, and in particular, we will focus on measures of CENTRAL
TENDENCY and VARIABILITY. SKEW & KURTOSIS, which we talked about as words to
describe shape, can also be calculated and summarized with a number.
Parameters: Describing a distribution with numbers.
Parameters are numbers used to describe a the distribution of a Population.
An example would be the population average. Parameters are essential to
descriptive statistics. They also play an important role in inferential statistics.
The most important parameters used in inferential statistics are the Mean and the
Variance (or Standard Deviation).
The mean is the most important measure of central tendency; the variance (or
standard deviation) is the most important measure of variability.
These two numbers are closely related. The variance is the sum of squared
deviations from the mean. Thus you must know the mean to calculate the
variance. Similarly, the mean is the balance point that minimizes the variance.
3. Mean, Median, Mode:
Mode: The most common score: the value with the highest frequency. There may
be one mode, no modes, or multiple modes.
Median: The middle score, or the midpoint between the two middle scores, if
there is an even number of scores. It divides the pile of scores in half, so half the
scores are above the median and half the scores are below the median.
Mean: The mean is the average, the sum of all scores divided by the number of
scores.
In symbols: M = X /N
NOTE: µ (muu) is also commonly used to denote the population mean.
In words and symbols: The mean M is equal to = the sum of all scores X
divided by the number N of scores.
How mean, median and mode are related:
Unimodal symmetric distribution:
Mean = Median = Mode
Skewed distribution: What gets skewed, mainly, is the MEAN.
Median and mode are more resistant.
WHEN TO USE Which?
Use MODE for nominal data.
Use Median to describe seriously skewed distributions
Most inferential tests use the mean; some use the median. None use the mode.
4. Variance and Standard Deviation
Variance: SD2: The average squared deviation from the mean. More typical
symbol is 2 (sigma squared)
Symbols: SD2: (X-M)2 / N
Four steps for calculating the variance:
1. Subtract the mean from each score
2. Square each of these deviation scores
3. Add them up [sum of squares, or SS]
4. Divide by N
Standard deviation: Positive square root of the variance.
Two steps:
1. Calculate the variance
2. Take the square root
Example using denomination of 4 cards
STEP 1 (deviation score) | STEP 2 (square deviations) | |
A(1) | 1-6= -5 | 25 |
5 | 5-6= -1 | 1 |
6 | 6-6= 0 | 0 |
Q(12) | 12-6=6 | 36 |
STEP 3: (sum of squares) | 62 |
STEP 4: divide by N: 62/4 = 15.5
How to remember this?
Deviate, Square, Average DSA
SD= square root of SD2
Standard deviation: Take square root
Two steps:
1. Calculate the variance
2. Take the square root
STEP 1-- Done. SD2= 15.5
STEP 2: Take square root of 15.5 = 3.937
NOTE: Knowing the mean tells us nothing about what the variance/standard
deviation will be, and vice versa.
Segment 2: Z-scores
Z-score: A standardized score that indicates where a score is in a distribution
Z-scores help us know the relative value of a score--how far it is from the mean.
Formulas: Z = (X-M)/SD
Going backwards (find raw score from Z): X=(Z)(SD)+M
Find the Z-score of a Queen in terms of denomination.
What should we use for mean & standard deviation?
Knapp has calculated mean & sd for denomination:
M=7, sd = 3.742
Q= 12
Z = (X-M)/SD
Z= (12-7)/3.742 = 1.34