9 min read

How to Explain Statistics to Eighth Graders

Master strategies for teaching statistics to 13 and 14 year olds. Learn clear methods for scatter plots, line of best fit, two-way tables, and analyzing bivariate data relationships.

Mathify Team

Mathify Team

"Does studying more actually lead to better grades?"

To answer questions like this, we need to look at data from TWO variables at once. Eighth grade statistics introduces bivariate data analysis—the foundation for understanding relationships, making predictions, and separating correlation from causation.

Why Statistics Matters

Statistical thinking is essential for:

  • Understanding scientific studies
  • Evaluating news and media claims
  • Making informed decisions
  • College-level research
  • Business and economics
  • Being an informed citizen

Bivariate Data

What Is It?

Bivariate data involves two variables measured on the same individuals or items.

Examples:

  • (height, shoe size) for students
  • (study hours, test score) for a class
  • (age, income) for workers
  • (temperature, ice cream sales) for days

Displaying with Ordered Pairs

Each data point is an ordered pair (x, y):

Student 1: (2 hours studying, 75% score)
Student 2: (4 hours studying, 88% score)
Student 3: (1 hour studying, 68% score)

Scatter Plots

What Is a Scatter Plot?

A scatter plot displays bivariate data as points on a coordinate plane.

  • x-axis: Independent variable (what you control or measure first)
  • y-axis: Dependent variable (what you think might be affected)

Creating a Scatter Plot

Data: Study Hours vs. Test Scores

Hours (x) Score (y)
1 65
2 70
2 75
3 78
4 85
5 88
5 92
6 95
Score
  |
95+                     *
  |                 *
90+             *
  |
85+         *
  |
80+
  |     *
75+   * *
  |
70+ *
  |
65+*
  +--+--+--+--+--+--+--
     1  2  3  4  5  6  Hours

Reading Scatter Plots

Ask: "As x increases, what happens to y?"

  • Goes up? → Positive relationship
  • Goes down? → Negative relationship
  • No pattern? → No relationship

Types of Associations

Positive Association

As x increases, y tends to increase.

      y
      |         *
      |       *
      |     *
      |   *
      | *
      +----------x

Examples:

  • Height and weight
  • Study time and grades
  • Age and vocabulary size

Negative Association

As x increases, y tends to decrease.

      y
      |*
      |  *
      |    *
      |      *
      |        *
      +----------x

Examples:

  • Car age and value
  • Elevation and temperature
  • Price and quantity demanded

No Association

No clear pattern between x and y.

      y
      |  *    *
      |    *    *
      | *    *
      |   *  *
      |*    *
      +----------x

Examples:

  • Shoe size and IQ
  • Birthday month and height
  • House number and income

Strength of Association

Strong Association

Points cluster tightly along a line.

      y
      |       *
      |     **
      |   **
      | **
      |*
      +----------x

Weak Association

Points show a general trend but are spread out.

      y
      |    *  *
      |  *   *
      | *  *
      |*  *
      |  *
      +----------x

Describing Associations

Use three descriptors:

  1. Direction: Positive, negative, or none
  2. Form: Linear or nonlinear
  3. Strength: Strong, moderate, or weak

Example: "There is a strong, positive, linear association between study hours and test scores."

Line of Best Fit

What Is It?

The line of best fit (trend line) is a straight line that best represents the data in a scatter plot.

Characteristics

  • Roughly equal numbers of points above and below
  • Minimizes overall distance from points to line
  • Shows the general trend

Drawing by Eye

Score
  |
95+                   * /
  |               *  /
90+             *  /
  |              /
85+         * /
  |          /
80+        /
  |     */
75+   * /
  |    /
70+ * /
  |  /
65+*/
  +--+--+--+--+--+--+--
     1  2  3  4  5  6  Hours

Writing the Equation

The line of best fit has the form y = mx + b.

To find it:

  1. Identify two points ON the line (not necessarily data points)
  2. Calculate slope: m = (y₂ - y₁)/(x₂ - x₁)
  3. Use one point to find b

Example: Points on line: (1, 65) and (6, 95)

m = (95 - 65)/(6 - 1) = 30/5 = 6

Using (1, 65):
65 = 6(1) + b
b = 59

Equation: y = 6x + 59

Interpreting the Equation

y = 6x + 59

  • Slope (6): For each additional hour of study, the test score increases by about 6 points.
  • Y-intercept (59): The predicted score with 0 hours of study is 59 points.

Making Predictions

Interpolation vs. Extrapolation

Interpolation: Predicting within the data range (more reliable)
Extrapolation: Predicting beyond the data range (less reliable)

Example Predictions

Using y = 6x + 59:

"Predict the score for 3.5 hours of study." (Interpolation)

y = 6(3.5) + 59 = 21 + 59 = 80
Predicted score: 80%

"Predict the score for 10 hours of study." (Extrapolation)

y = 6(10) + 59 = 60 + 59 = 119%

This is unrealistic! Extrapolation beyond the data can give nonsensical results.

Correlation vs. Causation

Correlation

Two variables are correlated if they show an association—they change together.

Causation

Causation means changes in one variable DIRECTLY cause changes in the other.

The Crucial Difference

Correlation does NOT prove causation!

Example 1: Ice cream sales and drowning deaths

  • Correlation: Both increase in summer
  • Causation? No! Hot weather causes both.

Example 2: Shoe size and reading ability in children

  • Correlation: Larger shoes, better reading
  • Causation? No! Age affects both.

Third Variables

A lurking variable (confounding variable) can create a correlation between two unrelated variables.

Lurking Variable
      /    \
     /      \
    ↓        ↓
Variable A  Variable B
(appear correlated)

Two-Way Tables

What Are They?

Two-way tables (contingency tables) display data for two categorical variables.

Example: Sports and Gender

Play Sports Don't Play Total
Boys 45 15 60
Girls 35 25 60
Total 80 40 120

Reading the Table

  • Joint frequencies: Inner cells (45, 15, 35, 25)
  • Marginal frequencies: Totals (60, 60, 80, 40, 120)

Calculating Relative Frequencies

Joint relative frequency: (cell / grand total)

  • Boys who play sports: 45/120 = 37.5%

Marginal relative frequency: (row or column total / grand total)

  • All who play sports: 80/120 = 66.7%

Conditional relative frequency: (cell / row or column total)

  • Of boys, what percent play sports? 45/60 = 75%
  • Of girls, what percent play sports? 35/60 = 58.3%

Looking for Associations

Compare conditional frequencies:

  • Boys: 75% play sports
  • Girls: 58.3% play sports

This suggests an association between gender and playing sports (not the same percentages).

Collecting Good Data

Random Sampling

A random sample gives every individual an equal chance of being selected.

Good: Randomly select 50 students from all grade levels
Bad: Survey only students in your friend group

Bias

Bias is systematic error that makes results unrepresentative.

Types of bias:

  • Selection bias: Sample doesn't represent population
  • Response bias: Questions influence answers
  • Nonresponse bias: Some people don't respond

Sample Size

Larger samples generally give more reliable results, but the sample must be representative.

Hands-On Activities

Class Data Collection

Collect bivariate data from classmates:

  • Hand span vs. height
  • Commute time vs. distance to school
  • Hours of sleep vs. alertness rating

Create scatter plots and look for associations.

Prediction Competition

Give students scatter plots and have them:

  • Draw a line of best fit by eye
  • Write the equation
  • Make predictions
  • Compare accuracy

Correlation Hunting

Find real-world examples of:

  • Positive correlation
  • Negative correlation
  • Misleading correlations (correlation ≠ causation)

Two-Way Table Survey

Survey the class on two categorical questions:

  • "Do you prefer summer or winter?" + "Do you like ice cream?"
  • Create a two-way table
  • Look for associations

Data in the News

Find statistical claims in news articles:

  • What data was collected?
  • What associations were found?
  • Is causation claimed? Is it justified?

Common Mistakes and How to Fix Them

Mistake 1: Assuming Causation from Correlation

Wrong: "The data shows ice cream causes drowning."

Fix: Correlation only shows variables move together. Always consider other explanations and lurking variables.

Mistake 2: Extrapolating Too Far

Wrong: Using a study hours/grades equation to predict 20 hours of study.

Fix: Only interpolate within the data range. Extrapolation is unreliable and can give impossible results.

Mistake 3: Drawing Line Through All Points

Wrong: Connecting all data points like a connect-the-dots.

Fix: The line of best fit shows the TREND. It may not pass through any actual data points.

Mistake 4: Confusing Table Frequencies

Wrong: "75% of athletes are boys" (when you calculated % of boys who are athletes)

Fix: Be precise about what's in the numerator and denominator. Label clearly.

Mistake 5: Ignoring Outliers

Fix: Identify outliers. They might be errors, or they might be important! Consider their effect on the line of best fit.

Practice Ideas for Home

Personal Data Tracking

Track two variables for a week:

  • Sleep hours vs. mood rating
  • Screen time vs. homework time
  • Steps walked vs. energy level

Create scatter plots and look for patterns.

Sports Statistics

Find real data online:

  • NBA: Minutes played vs. points scored
  • Baseball: At-bats vs. hits
  • Analyze relationships

Consumer Research

Compare products:

  • Price vs. rating
  • Size vs. price
  • Look for value outliers

Survey Design

Create a survey:

  • Write unbiased questions
  • Identify target population
  • Plan for random sampling
  • Predict possible associations

Connecting to Future Concepts

Correlation Coefficient (r)

A numerical measure of correlation strength:

  • r = 1: Perfect positive
  • r = -1: Perfect negative
  • r = 0: No correlation

Regression Analysis

More sophisticated methods for finding and evaluating lines of best fit.

Multiple Variables

Analyzing relationships among three or more variables simultaneously.

Hypothesis Testing

Using statistics to test claims and make decisions with confidence levels.

Big Data and Machine Learning

Modern statistics analyzes massive datasets to find patterns and make predictions.

The Bottom Line

Eighth grade statistics introduces the fundamental skill of analyzing relationships between two variables. Whether using scatter plots, lines of best fit, or two-way tables, students learn to see patterns in data.

Key takeaways:

  • Scatter plots visualize bivariate data
  • Associations can be positive, negative, or none; strong or weak
  • Line of best fit summarizes the trend and enables predictions
  • Correlation is NOT causation—always look for alternative explanations
  • Two-way tables show relationships between categorical variables

These skills are essential for understanding research, evaluating claims, and making informed decisions. In a world overflowing with data, statistical literacy isn't optional—it's essential for every informed citizen.

Frequently Asked Questions

What's the difference between correlation and causation?
Correlation means two variables change together—when one increases, the other tends to increase (or decrease). Causation means one variable directly causes the other to change. Ice cream sales and drowning deaths are correlated (both increase in summer) but neither causes the other—a third variable (hot weather) affects both. Always be cautious about assuming causation from correlation.
How do students know if a correlation is strong or weak?
Look at how tightly clustered the points are around the line of best fit. If points are very close to the line, the correlation is strong. If points are scattered loosely, it's weak. No correlation means the points show no pattern at all—they're randomly scattered.
What makes eighth grade statistics different from earlier grades?
Eighth grade focuses on bivariate data (two variables) and their relationships, including scatter plots, line of best fit, and two-way tables. Earlier grades focus on one-variable statistics (mean, median, mode) and single-variable displays (bar graphs, histograms). The key new skill is analyzing how two quantities relate to each other.

Related Articles

See Mathify in Action

Try a free sample lesson and discover how Mathify makes math fun and engaging for your child. No signup required.

Try a Sample Lesson

Free forever for up to 5 students