Book a Call
Join a Training

Comparing Groups: Why Boxplots are Your Friends

boxplots boxplots for comparison data strategies

 

So often when our students compare groups they use bar charts. This is for understandable reasons. We have shown them and graphed data with them in bar charts since PK (e.g., favorite colors, household pets). And it turns out that adults too often make bar charts…thus they often see them outside of school settings as a common way to visualize data. 

Bar charts absolutely have a place in our graphing tool boxes, but turns out by middle school our students are learning in math classrooms across the country an unsung hero of comparing data. The boxplot, box-and-whiskers, regardless of what you call it, packs a major punch and can/should be used far more often in our graphing of data when comparing groups!

Before diving into the benefits of boxplots, let’s quickly norm set on what they are and how they function. A boxplot is a way to display the distribution of data based on a five-number summary of the data values: 

  • Middle line of the median = not the average, but where the middle value in the dataset falls. The median is less influenced by outliers compared to the mean/average and makes it easier to assess data symmetry.
  • Box of the first quartile (Q1) and third quartile (Q3) = the outer edges of the box are set by the values of 25-50% (Q1) and 50-75% (Q3) of the data values (aka where the bulk of data values are)…and the distance from the two outer edges of the box are called the Interquartile Range (IQR). The length of the box is indicative of the data's variability; a longer box suggests more variability, while a shorter box indicates less.
  • Whiskers of the minimum / lower extreme and maximum / upper extreme = these are not necessarily the actual minimum or maximum value in the dataset but instead the Q1 value minus 1.5 times the IQR (lower extreme) or Q3 value plus 1.5 times the IQR (upper extreme)…the technical details of this are not as important as helping students conceptually understand that data values can fall outside of the whiskers. The whiskers provide a visual representation of the range for the data, helping to identify the overall spread and the presence of extreme values. They offer insights into how compact or dispersed the data are relative to the core. 

Therefore this visualization provides a comprehensive snapshot of the data's spread and center, making it easier to get a more nuanced understanding of the range, groupings, and gaps of measured data values for the group. This is critical for being able to identify patterns within and across groups…rather than just going by numerical differences in the average of a group alone.

Boxplots are particularly useful in conveying information about the distribution of data in a succinct manner. However, their exciting superpower for Science and Social Studies classrooms especially is how much they allow for quick comparisons between groups. You can compare the observed values of a numerical variable (e.g., cicada head width) between different groups from a categorical variable (e.g., sex) by creating boxplots for each group.

Using a boxplot to make these comparisons provides SO much more information (in digestible ways) to help students identify similarities and differences among the groups, as well as  deviations from what they had expected to see in the patterns.

By including multiple essential features of the measured data values in a dataset, boxplots simultaneously set our students up for more success with being able to identify different pieces of evidence from the data as part of analysis and simplify the process of data interpretation to make a claim as they have more to pull from and thus more information to use for their reasoning about the data.

 

Source: Cicada Sex Education dataset from DataClassroom.

Benefits of Using Boxplots to Make Comparisons

 

Clarity in Data Distribution

One of the primary advantages of boxplots is their ability to show rather than obscure the distribution of data. This feature is crucial for students who need to make sense of complex datasets as it relates to their subject area content. Boxplots provide a straightforward way to visualize differences in medians, the spread of the data, and the presence of outliers. Through this clarity, students can quickly grasp key statistical concepts without being overwhelmed by numbers. And thus the data analysis moves away from numerically calculating the difference in heights of bars, and instead into looking at the data overall as a full group in the aggregate. This is what is necessary to make sense of data and boxplots naturally lend themselves to this kind of thinking.

Data points that fall outside the whiskers are plotted as individual points and are outliers. Understanding and identifying outliers is crucial in data analysis as they can significantly affect the interpretation of results. Therefore boxplots, especially those that include an overlay of the original data points (with graphing programs like Tuva), also assist students in looking at and making sense of data values far from the median. Typically students want to remove a data value that is different from others, just because it is different. Instead we need to help students to learn to look at such values as part of the whole group, and consider what stories it may tell.

 Leg Up in Comparing Groups

Boxplots also facilitate the comparison of multiple groups simultaneously. This capability is invaluable in educational settings where students often need to and/or want to compare data across categories. Much of our curriculum and students’ interests lies in comparing between two or more groups (e.g., British and English sides of the Revolutionary War, before or after fertilizer was added in an experiment).

However, determining if those groups are functionally different from one another requires looking at the full group, not just the average (which can be extremely sensitive to data values far apart from other values). Only looking at the height of bars, that represent average values by category, gives you one piece of information to work with as evidence. The average may be indicative of the overall group, or it may be skewed or obscuring important information about the group.

The most challenging aspect is that a reader has no idea if they only have the bar heights of averages. When using a boxplot they have a lot more information about the group of data to look at and use as evidence. By examining the boxplots, students can hypothesize reasons for the observed differences and explore potential factors influencing the results.

Introduction to Statistical Concepts

Boxplots can aid in teaching students how to interpret data. By highlighting the median and the interquartile range, students are exposed to different aspects of a dataset. This sets them up for success to visually and more easily compare the variability within a variable and among different categories of a variable. This visual representation helps to demystify statistical concepts (e.g., central tendency, variability), making them more approachable for students.

Furthermore, boxplots can serve as a foundation for more advanced data analysis discussions. Once students are comfortable with interpreting boxplots, educators can introduce additional statistical measures and concepts, building upon the foundational knowledge established through boxplot analysis. This step-by-step approach fosters a deeper understanding and appreciation of data analysis.

Identifying Misconceptions

Boxplots can help highlight common struggles or hiccups in students’ data interpretation. For instance, students might misconstrue the mean and the median as interchangeable measures of central tendency. Through boxplots, educators can visually demonstrate the differences between these metrics (through graphing programs like DataClassroom that let you plot both side-by-side). Also the boxplot helps to visually demonstrate and emphasize the importance of considering variability and outliers, not just the mean of a dataset. This visual distinction helps to reinforce the idea that different measures of central tendency and other summary statistics of a dataset can tell different stories about the variable.

Boxplots also enable educators to address misconceptions about variability. Students often struggle to understand the concept of variability and its implications. By using boxplots to illustrate variability through the interquartile range and whiskers, educators can provide a concrete visual representation of variability, aiding in student comprehension. It is hard to make sense of variability, let alone use it in their discussions of data, if they literally cannot see it in the data visualization.

Strategies for Teaching Boxplots

 

Hands-on Learning Activities

To foster a deeper understanding of boxplots, consider using hands-on activities that allow students to create fully or begin their own boxplots. This approach not only enhances comprehension but also encourages critical thinking. Engaging students in the process of collecting and visualizing data empowers them to take ownership of their learning. This can be especially productive when using manipulatives like beads or plastic bricks to represent the data.

For example, as your students are recording daily air temperatures from the schoolyard for your Weather & Climate unit, have them construct boxplots to visualize the values from the week or month. Then they can make a boxplot for different months and see how the overall month’s daily air temperatures compare month-to-month.

This exercise will enable them to apply theoretical knowledge practically. By working with real-world data, students can see the relevance of statistical concepts and develop a more intuitive understanding of data analysis. Yes, typically we make bar charts of these data, but by using the boxplot to compare one month to the next you can help students reinforce their math concepts (e.g., how to make a boxplot) while giving them more information to compare similar but different groups (e.g., October to November).

Encouraging Critical Thinking

Boxplots are more than just a visual tool; they are a way to promote critical thinking. Encourage students to question what the data representation tells them about the dataset. Ask them to consider why certain groups might have more variability or why outliers might exist. These questions stimulate analysis and critical evaluation, helping students to develop a nuanced understanding of the data.

Looking for ideas of what kinds of questions to ask with your boxplots? Check out the “What to Ask?” Teacher Guides for Grades 6-8 or Grades 9-12.

By integrating open-ended questions into the lesson, educators can challenge students to think beyond the immediate data representation. Encourage students to formulate hypotheses about the factors influencing the data and to explore different interpretations. This critical thinking approach prepares students for more advanced statistical analysis and real-world problem-solving.

Looking for example boxplots to use in your classroom to help students make sense of data? Check out these great examples from Slow Reveal: https://slowrevealgraphs.com/tag/box-plot/

Integrating Technology

Leveraging technology can also enhance the learning experience, and fortunately many online graphing tools to help students explore data have boxplot features easily built in. By leveraging the software tools or online platforms that allow students to experiment with data and observe how boxplots change with different datasets we can empower students to better explore data in this way. Interactive tools can provide immediate feedback, helping students to understand the impact of changing data parameters on the boxplot representation.

As a note, boxplots are not intuitive or part of the typical graphing options in Google Sheets or Excel. But they are easy to make in other graphing options that have been specifically designed to help students work with data like CODAP, DataClassroom, Tuva, HHMI DataExplorer, DataFly, Desmos, etc.

Technology can also facilitate the exploration of more complex datasets. By using digital tools, students can handle larger datasets than would be less feasible manually, allowing them to investigate more sophisticated questions and analyses. This can be especially important as our students work with larger and more complex datasets through upper middle school and into high school. This exposure to technology-driven data analysis prepares students for future academic and professional endeavors.

Addressing Common Challenges

Despite their simplicity, students may still face challenges when interpreting boxplots. Some common struggles to be mindful of:

  • Comparing Medians and Means/Averages: Clarify that the median is the measure of central tendency used in boxplots, and how it can differ from the mean, especially in skewed distributions. Provide examples of datasets where the mean and median differ significantly, and discuss the implications of each measure in understanding the data.
  • Interpreting the IQR: Ensure students understand that the IQR reflects the middle 50% of the measured data values, and teach them how this range can indicate data consistency or variability. Use visual aids and examples to show how different IQRs reflect varying levels of data dispersion.
  • Understanding Outliers: Students might struggle to comprehend the significance of outliers or why they are outside the whiskers. Use visual aids to help explore how outliers can affect overall data interpretation and discuss strategies for making sense of them. Use examples to illustrate how outliers arise from unique circumstances, emphasizing the importance of context in data analysis. Students typically want to remove outliers, but it is important to help them realize there are stories in our outliers and that outliers naturally exist in the world.

Conclusion

Boxplots are a powerful ally when making comparisons among groups and an excellent teaching tool to simplify complex concepts. By providing a clear and concise visual representation, boxplots can facilitate understanding of data distribution, variability, and outliers. This clarity is essential in helping students develop strong analytical skills and a solid foundation in data literacy.

Integrating boxplots into your teaching methodology can enhance students' critical thinking skills and enable them to engage with data meaningfully. Encourage hands-on learning, utilize technology, and address common misconceptions to maximize the educational value of boxplots in your classroom. By fostering an interactive and inquiry-based learning environment, educators can inspire students to explore the world of data with confidence and curiosity.

By embracing boxplots as your friends, you can transform data comparison from a numerical execution task into an engaging and insightful learning experience for your students. This transformation not only enriches the educational experience but also equips students with the tools they need to navigate a data-driven world.

Looking for more about how to make boxplots or why we use them? Check out the "4 Steps to Graph Like a Pro & Transform Your Teaching" Webinar