MATHia Math and Metacognition Data Science Challenge
This summer, join us for the first LIVE Data Science Competition, a unique opportunity to explore real-world educational data while building and applying analytical skills to real data from Carnegie Learning, an education technology and curriculum solutions provider for K-12 schools.
Hosted by Drs. Cristina Zepeda and Kelley Durkin in partnership with Carnegie Learning, this 9-week, interdisciplinary competition is open to students from all majors, minors, programs of study, and backgrounds. If you’re interested in data analysis or education technology, you’re encouraged to apply!
Dates: Monday, June 2 – Friday, August 1, 2025 (9 weeks)
Sign-ups are now closed. To ask about open spots, email the program coordinator, Sarah Shaw, at sarah.shaw@vanderbilt.edu
About the Challenge
Participants will work in interdisciplinary teams to explore real student interaction data from Carnegie Learning’s intelligent math platform, MATHia. Your mission? To uncover patterns, predict student learning outcomes, and help improve personalized support in math education.
Want to explore MATHia? Check out a demo.
Along the way, participants will:
- Receive coaching and mentorship from experienced researchers and data scientists
- Gain hands-on experience with real-world ed tech data
- Build skills in data exploration, modeling, interpretation, and impact analysis
- Network with peers and industry professionals
- Present your solutions to a panel of experts
Prizes:![]()
- First place will receive $3,000 and the opportunity to publish with the PI team using their model and network with Carnegie Learning
- Runner-up will receive $1,000
Who Should Participate
This program is open to undergraduate, graduate, and doctoral students at Vanderbilt. Students whose program of studies involves data science, computer science, psychology, and teaching and learning, are especially encouraged to apply. Interdisciplinary teams are required.
Participants do not need to have a team to apply, and participants can choose their own teams or opt to be matched by the program coordinators.
The program will be fully virtual, so students may apply even if they will not be in Nashville during the program.
The Challenge
The dataset contains hundreds of thousands of rows of data representing the actions different students take as they move through the software. Identifying the patterns in these data and how they correspond to different student outcomes (e.g., metacognitive skills, math knowledge, or motivational outcomes) is important to understand how different behaviors may indicate the type of expertise a student has or the type of support they may need.
Thus, the goal of this competition is to develop a model that can predict students’ learning, metacognitive, and motivational outcomes based on the actions they take as they move through the workspaces (e.g., actions like requesting hints, making multiple attempts to answer a math problem, or the time spent per math problem).
To accomplish this challenge, participating teams will be given a selection of the data from multiple workspaces in MATHia in addition to pre and post measurements of students’ metacognitive skills (survey), motivation (survey), and math knowledge test. Program participants will then need to choose what action or actions they would like to use to predict the three outcomes. This might involve conducting exploratory analysis of the data to better understand what actions students are taking, how these look in the provided data, and which may best predict a specific outcome (or set of outcomes).
Teams will build a model that will be tested on a hold-out set of the data. Teams may use any machine learning approach, though interpretability is highly encouraged. Teams will be provided a fully cleaned dataset and weekly office hours with members of the research team.
Teams will be expected to submit:
- A short 1 to 2 page report explaining the actions and outcomes chosen and how their model performs on the validation data
- Outcome predictions for the provided test data
- A codebase with clear documentation (analyses should be conducted using R or Python)
- A slide deck of their final 5-8 minute presentation
Teams will be evaluated on the model’s overall performance on the test set (i.e., how well your model predicts actual performance outcomes) as well as the conceptual relevance/interpretation (i.e., what might we learn) and impact of the model for an EdTech Platform (i.e., how it might be applied to support math learning). For more information on what metrics will be used, check out the Challenge Details section on the FAQ page.
About the Dataset
The data come from 6th grade students using MATHia, a personalized online math learning platform designed to adapt to student learning. Students completed math problems in workspaces to learn and practice concepts and demonstrate understanding and readiness to move on.
In this study, 6th grade students completed workspaces with added metacognitive videos and prompts. These metacognitive supports were designed to help students learn about, build, and practice three skills: planning, monitoring, and reflecting on their learning. Students responded to the metacognitive prompts with multiple-choice and open-ended responses. Students were also given pre and post assessments of their metacognitive skills, motivation, and math knowledge.
The dataset, then, is a record of each action a student takes while moving through these workspaces, from clicking a button to entering an answer. Every action corresponds to one line of data, and a students’ full set of actions as they move through the workspace is represented by hundreds to thousands of rows of data. More details about the contents of the dataset will be made available to challenge participants.
Have more questions?
Check out our FAQ page.
Special thanks to:
. 