How to Explain Statistics to Eighth Graders
Master strategies for teaching statistics to 13 and 14 year olds. Learn clear methods for scatter plots, line of best fit, two-way tables, and analyzing bivariate data relationships.
Mathify Team
Mathify Team
"Does studying more actually lead to better grades?"
To answer questions like this, we need to look at data from TWO variables at once. Eighth grade statistics introduces bivariate data analysis—the foundation for understanding relationships, making predictions, and separating correlation from causation.
Why Statistics Matters
Statistical thinking is essential for:
- Understanding scientific studies
- Evaluating news and media claims
- Making informed decisions
- College-level research
- Business and economics
- Being an informed citizen
Bivariate Data
What Is It?
Bivariate data involves two variables measured on the same individuals or items.
Examples:
- (height, shoe size) for students
- (study hours, test score) for a class
- (age, income) for workers
- (temperature, ice cream sales) for days
Displaying with Ordered Pairs
Each data point is an ordered pair (x, y):
Student 1: (2 hours studying, 75% score)
Student 2: (4 hours studying, 88% score)
Student 3: (1 hour studying, 68% score)
Scatter Plots
What Is a Scatter Plot?
A scatter plot displays bivariate data as points on a coordinate plane.
- x-axis: Independent variable (what you control or measure first)
- y-axis: Dependent variable (what you think might be affected)
Creating a Scatter Plot
Data: Study Hours vs. Test Scores
| Hours (x) | Score (y) |
|---|---|
| 1 | 65 |
| 2 | 70 |
| 2 | 75 |
| 3 | 78 |
| 4 | 85 |
| 5 | 88 |
| 5 | 92 |
| 6 | 95 |
Score
|
95+ *
| *
90+ *
|
85+ *
|
80+
| *
75+ * *
|
70+ *
|
65+*
+--+--+--+--+--+--+--
1 2 3 4 5 6 Hours
Reading Scatter Plots
Ask: "As x increases, what happens to y?"
- Goes up? → Positive relationship
- Goes down? → Negative relationship
- No pattern? → No relationship
Types of Associations
Positive Association
As x increases, y tends to increase.
y
| *
| *
| *
| *
| *
+----------x
Examples:
- Height and weight
- Study time and grades
- Age and vocabulary size
Negative Association
As x increases, y tends to decrease.
y
|*
| *
| *
| *
| *
+----------x
Examples:
- Car age and value
- Elevation and temperature
- Price and quantity demanded
No Association
No clear pattern between x and y.
y
| * *
| * *
| * *
| * *
|* *
+----------x
Examples:
- Shoe size and IQ
- Birthday month and height
- House number and income
Strength of Association
Strong Association
Points cluster tightly along a line.
y
| *
| **
| **
| **
|*
+----------x
Weak Association
Points show a general trend but are spread out.
y
| * *
| * *
| * *
|* *
| *
+----------x
Describing Associations
Use three descriptors:
- Direction: Positive, negative, or none
- Form: Linear or nonlinear
- Strength: Strong, moderate, or weak
Example: "There is a strong, positive, linear association between study hours and test scores."
Line of Best Fit
What Is It?
The line of best fit (trend line) is a straight line that best represents the data in a scatter plot.
Characteristics
- Roughly equal numbers of points above and below
- Minimizes overall distance from points to line
- Shows the general trend
Drawing by Eye
Score
|
95+ * /
| * /
90+ * /
| /
85+ * /
| /
80+ /
| */
75+ * /
| /
70+ * /
| /
65+*/
+--+--+--+--+--+--+--
1 2 3 4 5 6 Hours
Writing the Equation
The line of best fit has the form y = mx + b.
To find it:
- Identify two points ON the line (not necessarily data points)
- Calculate slope: m = (y₂ - y₁)/(x₂ - x₁)
- Use one point to find b
Example: Points on line: (1, 65) and (6, 95)
m = (95 - 65)/(6 - 1) = 30/5 = 6
Using (1, 65):
65 = 6(1) + b
b = 59
Equation: y = 6x + 59
Interpreting the Equation
y = 6x + 59
- Slope (6): For each additional hour of study, the test score increases by about 6 points.
- Y-intercept (59): The predicted score with 0 hours of study is 59 points.
Making Predictions
Interpolation vs. Extrapolation
Interpolation: Predicting within the data range (more reliable)
Extrapolation: Predicting beyond the data range (less reliable)
Example Predictions
Using y = 6x + 59:
"Predict the score for 3.5 hours of study." (Interpolation)
y = 6(3.5) + 59 = 21 + 59 = 80
Predicted score: 80%
"Predict the score for 10 hours of study." (Extrapolation)
y = 6(10) + 59 = 60 + 59 = 119%
This is unrealistic! Extrapolation beyond the data can give nonsensical results.
Correlation vs. Causation
Correlation
Two variables are correlated if they show an association—they change together.
Causation
Causation means changes in one variable DIRECTLY cause changes in the other.
The Crucial Difference
Correlation does NOT prove causation!
Example 1: Ice cream sales and drowning deaths
- Correlation: Both increase in summer
- Causation? No! Hot weather causes both.
Example 2: Shoe size and reading ability in children
- Correlation: Larger shoes, better reading
- Causation? No! Age affects both.
Third Variables
A lurking variable (confounding variable) can create a correlation between two unrelated variables.
Lurking Variable
/ \
/ \
↓ ↓
Variable A Variable B
(appear correlated)
Two-Way Tables
What Are They?
Two-way tables (contingency tables) display data for two categorical variables.
Example: Sports and Gender
| Play Sports | Don't Play | Total | |
|---|---|---|---|
| Boys | 45 | 15 | 60 |
| Girls | 35 | 25 | 60 |
| Total | 80 | 40 | 120 |
Reading the Table
- Joint frequencies: Inner cells (45, 15, 35, 25)
- Marginal frequencies: Totals (60, 60, 80, 40, 120)
Calculating Relative Frequencies
Joint relative frequency: (cell / grand total)
- Boys who play sports: 45/120 = 37.5%
Marginal relative frequency: (row or column total / grand total)
- All who play sports: 80/120 = 66.7%
Conditional relative frequency: (cell / row or column total)
- Of boys, what percent play sports? 45/60 = 75%
- Of girls, what percent play sports? 35/60 = 58.3%
Looking for Associations
Compare conditional frequencies:
- Boys: 75% play sports
- Girls: 58.3% play sports
This suggests an association between gender and playing sports (not the same percentages).
Collecting Good Data
Random Sampling
A random sample gives every individual an equal chance of being selected.
Good: Randomly select 50 students from all grade levels
Bad: Survey only students in your friend group
Bias
Bias is systematic error that makes results unrepresentative.
Types of bias:
- Selection bias: Sample doesn't represent population
- Response bias: Questions influence answers
- Nonresponse bias: Some people don't respond
Sample Size
Larger samples generally give more reliable results, but the sample must be representative.
Hands-On Activities
Class Data Collection
Collect bivariate data from classmates:
- Hand span vs. height
- Commute time vs. distance to school
- Hours of sleep vs. alertness rating
Create scatter plots and look for associations.
Prediction Competition
Give students scatter plots and have them:
- Draw a line of best fit by eye
- Write the equation
- Make predictions
- Compare accuracy
Correlation Hunting
Find real-world examples of:
- Positive correlation
- Negative correlation
- Misleading correlations (correlation ≠ causation)
Two-Way Table Survey
Survey the class on two categorical questions:
- "Do you prefer summer or winter?" + "Do you like ice cream?"
- Create a two-way table
- Look for associations
Data in the News
Find statistical claims in news articles:
- What data was collected?
- What associations were found?
- Is causation claimed? Is it justified?
Common Mistakes and How to Fix Them
Mistake 1: Assuming Causation from Correlation
Wrong: "The data shows ice cream causes drowning."
Fix: Correlation only shows variables move together. Always consider other explanations and lurking variables.
Mistake 2: Extrapolating Too Far
Wrong: Using a study hours/grades equation to predict 20 hours of study.
Fix: Only interpolate within the data range. Extrapolation is unreliable and can give impossible results.
Mistake 3: Drawing Line Through All Points
Wrong: Connecting all data points like a connect-the-dots.
Fix: The line of best fit shows the TREND. It may not pass through any actual data points.
Mistake 4: Confusing Table Frequencies
Wrong: "75% of athletes are boys" (when you calculated % of boys who are athletes)
Fix: Be precise about what's in the numerator and denominator. Label clearly.
Mistake 5: Ignoring Outliers
Fix: Identify outliers. They might be errors, or they might be important! Consider their effect on the line of best fit.
Practice Ideas for Home
Personal Data Tracking
Track two variables for a week:
- Sleep hours vs. mood rating
- Screen time vs. homework time
- Steps walked vs. energy level
Create scatter plots and look for patterns.
Sports Statistics
Find real data online:
- NBA: Minutes played vs. points scored
- Baseball: At-bats vs. hits
- Analyze relationships
Consumer Research
Compare products:
- Price vs. rating
- Size vs. price
- Look for value outliers
Survey Design
Create a survey:
- Write unbiased questions
- Identify target population
- Plan for random sampling
- Predict possible associations
Connecting to Future Concepts
Correlation Coefficient (r)
A numerical measure of correlation strength:
- r = 1: Perfect positive
- r = -1: Perfect negative
- r = 0: No correlation
Regression Analysis
More sophisticated methods for finding and evaluating lines of best fit.
Multiple Variables
Analyzing relationships among three or more variables simultaneously.
Hypothesis Testing
Using statistics to test claims and make decisions with confidence levels.
Big Data and Machine Learning
Modern statistics analyzes massive datasets to find patterns and make predictions.
The Bottom Line
Eighth grade statistics introduces the fundamental skill of analyzing relationships between two variables. Whether using scatter plots, lines of best fit, or two-way tables, students learn to see patterns in data.
Key takeaways:
- Scatter plots visualize bivariate data
- Associations can be positive, negative, or none; strong or weak
- Line of best fit summarizes the trend and enables predictions
- Correlation is NOT causation—always look for alternative explanations
- Two-way tables show relationships between categorical variables
These skills are essential for understanding research, evaluating claims, and making informed decisions. In a world overflowing with data, statistical literacy isn't optional—it's essential for every informed citizen.
Frequently Asked Questions
- What's the difference between correlation and causation?
- Correlation means two variables change together—when one increases, the other tends to increase (or decrease). Causation means one variable directly causes the other to change. Ice cream sales and drowning deaths are correlated (both increase in summer) but neither causes the other—a third variable (hot weather) affects both. Always be cautious about assuming causation from correlation.
- How do students know if a correlation is strong or weak?
- Look at how tightly clustered the points are around the line of best fit. If points are very close to the line, the correlation is strong. If points are scattered loosely, it's weak. No correlation means the points show no pattern at all—they're randomly scattered.
- What makes eighth grade statistics different from earlier grades?
- Eighth grade focuses on bivariate data (two variables) and their relationships, including scatter plots, line of best fit, and two-way tables. Earlier grades focus on one-variable statistics (mean, median, mode) and single-variable displays (bar graphs, histograms). The key new skill is analyzing how two quantities relate to each other.
Related Articles
See Mathify in Action
Try a free sample lesson and discover how Mathify makes math fun and engaging for your child. No signup required.
Try a Sample LessonFree forever for up to 5 students