Study Design

Statistics

Data:

Collecting
Presenting
Analyzing

Variability:

0, 0, 0, 0, 0
0, 255, 17, 5, 318

Statistical Question:

to answer, need to collect data with variability

Correlation:

the state or relation of being correlated; specifically : a relation existing between phenomena or things or between mathematical or statistical variables which tend to vary, be associated, or occur together in a way not expected on the basis of chance alone

Causality:

the relation between a cause and its effect or between regularly correlated events or phenomena

Sampling methods review

In a statistical study, sampling methods refer to how we select members from the population to be in the study.If a sample isn't randomly selected, it will probably be biased in some way and the data may not be representative of the population.There are many ways to select a sample—some good and some bad.

Bad ways to sample

Convenience sample: The researcher chooses a sample that is readily available in some non-random way.

Example—A researcher polls people as they walk by on the street.

Why it's probably biased: The location and time of day and other factors may produce a biased sample of people.

Voluntary response sample: The researcher puts out a request for members of a population to join the sample, and people decide whether or not to be in the sample.

Example—A TV show host asks his viewers to visit his website and respond to an online poll.

Why it's probably biased: People who take the time to respond tend to have similarly strong opinions compared to the rest of the population.

Good ways to sample

Simple random sample: Every member and set of members has an equal chance of being included in the sample. Technology, random number generators, or some other sort of chance process is needed to get a simple random sample.

Example—A teachers puts students' names in a hat and chooses without looking to get a sample of students.

Why it's good: Random samples are usually fairly representative since they don't favor certain members.

Stratified random sample: The population is first split into groups. The overall sample consists of some members from every group. The members from each group are chosen randomly.

Example—A student council surveys 100 students by getting random samples of 25 freshmen, 25 sophomores, 25 juniors, and 25 seniors.

Why it's good: A stratified sample guarantees that members from each group will be represented in the sample, so this sampling method is good when we want some members from every group.

Cluster random sample: The population is first split into groups. The overall sample consists of every member from some of the groups. The groups are selected at random.

Example—An airline company wants to survey its customers one day, so they randomly select 5 flights that day and survey every passenger on those flights.

Why it's good: A cluster sample gets every member from some of the groups, so it's good when each group reflects the population as a whole.

Systematic random sample: Members of the population are put in some order. A starting point is selected at random, and every $n^{\text{th}}$ member is selected to be in the sample.

Example—A principal takes an alphabetized list of student names and picks a random starting point. Every $20^{\text{th}}$ student is selected to take a survey.

Note: In the real world, we can't ethically take a random sample of people and make them participate in a study involving drugs, however, there are more advanced methods for controlling for this type of selection bias. When we rely on volunteers for testing new drugs and we see significant results, we need to be willing to assume that the volunteers are representative of the larger population. We can also repeat the study on a different group of volunteers to see if we get the same results.

Key idea: If a sample isn't randomly selected, it may not be representative of the larger population. On the AP test, be ready to apply this concept and some nuance when it comes to discussing if a sample is representative of the larger population.

Summary

The table below summarizes what type of conclusions we can make based on the study design.

Random sampling

Not random sampling

Random assignment

Can determine causal relationship in population. This design is relatively rare in the real world.

Can determine causal relationship in that sample only. This design is where most experiments would fit.

No random assignment

Can detect relationships in population, but cannot determine causality. This design is where many surveys and observational studies would fit.

Can detect relationships in that sample only, but cannot determine causality. This design is where many unscientific surveys and polls would fit.

PreviousExploring bivariate numerical data NextProbability

Last updated 7 years ago

hashtagStatistics

hashtagCorrelation:

hashtagCausality:

hashtagSampling methods review

hashtagBad ways to sample

hashtagGood ways to sample

hashtagSummary