Skip to content

Populations and Samples

Collecting data correctly is just as important as analyzing it accurately. Let’s say that we are interested in determining the average tail length of Golden Retrievers, how would we collect this data? It’s quite obvious that it would near impossible to measure every Golden Retriever’s tail to get an average. Instead, it make more sense to measure the tails of a smaller group of Golden Retrievers and use that average as a estimate of the tail length of the entire breed.

In order to draw conclusions about a complete collection of individuals, which is the population, we study a subset of the population, called a sample.

The reason for collecting the data can also create this distinction. For example, data collected from all Golden Retrievers in California are sample data when we use them to represent a larger collection such as all Golden Retrievers in the United States, but is population data if we only care about describing the Golden Retrievers in California.

PopulationSample
GDP of each country in the worldGDP of each country in Asia
All undergraduate students in the United StatesUndergraduate students at Harvard University
Every earthquake worldwide in the past 100 yearsThe 20 strongest earthquakes recorded
Patients with diabetes in a particular hospital50 diabetic patients from the hospital chosen for a clinical trial

Notice that every observation in a sample is also part of the respective population.

Population Parameters and Sample Statistics

Section titled “Population Parameters and Sample Statistics”

We can collect information from our data that describe measures of center and measures of spread.