Skip to main content

MATHia Data Science Challenge FAQs

General Competition Questions

What is the LIVE MATHia Math and Metacognition Data Science Challenge?

The LIVE MATHia Math and Metacognition Data Science Challenge is a unique learning opportunity for Vanderbilt students interested in data science and education technology. The 9-week program will give students opportunities to learn and develop new skills, network with professionals in the field, and a chance to win a prize at the end.

Who is organizing this challenge?

This challenge is being co-hosted by LIVE, the co-PIs of the research study, Dr. Cristina Zepeda and Dr. Kelley Durkin, and Carnegie Learning.

What is the goal of this challenge?

The overall goal of is this challenge is to give Vanderbilt students an opportunity to develop and practice data analysis skills with a real-world dataset in partnership with Carnegie Learning. For more details about the challenge itself, see the Challenge Details section below.

 

Eligibility and Participation

Can any student participate?

This program is open to any current Vanderbilt student. Students at all degree levels are encouraged to apply.

How many students will be accepted into the program?

There is not a specific limit to program participation.

Is this program virtual or in-person?

The program is fully virtual with all events and meetings being held on Zoom.

What is the estimated time commitment?

Student teams may choose to dedicate whatever time they wish to this challenge. At the end of the challenge, teams will be expected to submit a 1-2 page short report about their model, a well-documented codebase (in R or Python), and slides for a 5-8 minute presentation.

What sessions or meetings are required and/or synchronous?

The orientation and kick-off meetings as well as the final presentations are required and synchronous (but will be held virtually). Additionally, teams will likely schedule their own synchronous meetings during the program. Weekly office hours with program staff will be available but are optional.

How are teams formed?

Participants may choose their own teams or request to be paired with others by the program coordinators. All teams must be interdisciplinary, meaning members must be enrolled in different programs at Vanderbilt.

What does an interdisciplinary team mean?

An interdisciplinary team means a team with members enrolled in different programs at Vanderbilt. Our hope is that all teams will have at least one member from an education program and at least one member from a data science or computer science program.

 

Challenge Details

Is there a prize for the winning team?

Yes! The winning team will receive $3,000 (to be split across all team members) with the first runner up receiving $1,000 (to be split across all team members).

What kind of data will we be working with?

You will be working with log data (aka process data or event stream data) for this competition. Each line of the data represents an action a student takes within MATHia (e.g., requesting a hint, entering an answer, clicking to advance to the next page, etc.). Specifically, the following variables are included for each student action:

  • Student ID
  • School ID
  • Class ID
  • Session ID
  • Time stamp
  • Workspace name
  • Problem name
  • Step name
  • Action taken (attempt, hint request, etc.)
  • Input (i.e., the answer a student gave)
  • Outcome (did they get the answer correct)
  • Workspace progress (did they complete the workspace)

Each individual student has about 30,000 rows of data for the workspace used for this competition. We anticipate the final dataset participants will use to exceed 1,000,000 rows of data. Teams will receive training, validation, and, at the end of the competition, test datasets.

How will teams submit their work?

Test sets will be shared with participants late in the competition without outcome variables. Teams will run their model and submit their predictions to the judging panel. The judging panel will then evaluate the performance metrics of these submissions.

How will submissions be evaluated?

Final submissions will be evaluated by a panel of judges. Judges will be instructed to rate the predictive performance of the model (80% of the final score) as well as the relevance of the model to the competition goals (20% of the final score). See the question below for more specifics on how we will measure overall performance. For the relevance score, our team of judges will examine the 1-2 page summary teams will submit where they will explain how this model might be used by Carnegie Learning, educators, or students to facilitate better math instruction, evaluation, or learning. A specific rubric for this 1-2 page summary and the presentation will be provided to teams.

What performance metrics will be used?

Teams will be asked to build a model that can predict three different outcomes: one is continuous and two are ordinal. For the continuous outcome (student learning), we will use Root Mean Squared Error. (RMSE) and R-squared as the primary metrics for evaluation as well as Mean Absolute Error (MAE) as a secondary metric. For the ordinal outcomes (motivational and metacognitive outcomes), we will use Quadratic Weighted Kappa (QWK) to evaluate these outcomes. A specific rubric with a breakdown of how these three scores contribute to the overall score will be provided to teams.

Will our work be published or shared?

Teams will be asked to share their model and work with Carnegie Learning at the final presentation, and teams will have the opportunity to publish their model in partnership with the research team.

 

Technical Requirements

Do I need to have prior experience with data science or coding languages, like R or Python?

At least one team member should have prior experience working with either R or Python, and program coordinators will ensure every team has a member with this experience. However, students without this experience who are interested in learning either of these languages are encouraged to apply.

What tools or platforms will we be using?

Teams will be working with real data from real students, so there will be specific requirements to ensure all data is protected. As such, we recommend RStudio for teams using R and JupyterLab, VS Code, or PyCharm for teams using Python. All tools much use local installations. We will provide more information and support during orientation to ensure all teams have the tools or platforms needed.

What type of training and support will be offered?

We will conduct an orientation with all teams to ensure participants are familiar with the data, the challenge, the tools, and the process. Additionally, program staff will offer weekly office hours to answer questions or provide guidance as needed for any teams.

Are we allowed to use external tools, including generative AI?

Generative AI and similar tools are allowed, but with restrictions to ensure the data remains protected. We will cover this in more detail during the orientation.

Will a baseline model be provided?

No, however, teams may request additional support as needed during weekly office hours.

 

Other

I have a question that’s not listed here. Who can I contact?

You can email the Program Coordinator, Sarah Shaw, at sarah.shaw@vanderbilt.edu