Journal Publications
2025
Cho, S.-J., Goodwin, A., Salas, J., & Mueller, S. (2025). Explaining person-by-item responses using person- and item-level predictors via random forests and interpretable machine learning in explanatory item response models. Psychometrika, 90(3), 1197–1234. https://doi.org/10.1017/psy.2025.10032
Data & Code: https://osf.io/dbjpu
This study incorporates a random forest (RF) approach to probe complex interactions and nonlinearity among predictors into an item response model with the goal of using a hybrid approach to outperform either an RF or explanatory item response model (EIRM) only in explaining item responses. In the specified model, called EIRM-RF, predicted values using RF are added as a predictor in EIRM to model the nonlinear and interaction effects of person- and item-level predictors in person-by-item response data, while accounting for random effects over persons and items. The results of the EIRM-RF are probed with interpretable machine learning (ML) methods, including feature importance measures, partial dependence plots, accumulated local effect plots, and the H-statistic. The EIRM-RF and the interpretable methods are illustrated using an empirical data set to explain differences in reading comprehension in digital versus paper mediums, and the results of EIRM-RF are compared with those of EIRM and RF to show empirical differences in modeling the effects of predictors and random effects among EIRM, RF, and EIRM-RF. In addition, simulation studies are conducted to compare model accuracy among the three models and to evaluate the performance of interpretable ML methods.
Shimizu, A. Y., Havazelet, M., Smith, B. E., & Goodwin, A. P. (2025). Multimodal analyses and visual models for qualitatively understanding digital reading and writing processes. Education Sciences, 15(9), 1135. https://doi.org/10.3390/educsci15091135
Data & Code: https://www.openicpsr.org/openicpsr/project/195723/version/V1/view
As technology continues to shape how students read and write, digital literacy practices have become increasingly multimodal and complex—posing new challenges for researchers seeking to understand these processes in authentic educational settings. This paper presents three qualitative studies that use multimodal analyses and visual modeling to examine digital reading and writing across age groups, learning contexts, and literacy activities. The first study introduces collaborative composing snapshots, a method that visually maps third graders’ digital collaborative writing processes and highlights how young learners blend spoken, written, and visual modes in real-time online collaboration. The second study uses digital reading timescapes to track the multimodal reading behaviors of fifth graders—such as highlighting, re-reading, and gaze patterns—offering insights into how these actions unfold over time to support comprehension. The third study explores multimodal composing timescapes and transmediation visualizations to analyze how bilingual high school students compose across languages and modes, including text, image, and sounds. Together, these innovative methods illustrate the power of multimodal analysis and visual modeling for capturing the complexity of digital literacy development. They offer valuable tools for designing more inclusive, equitable, and developmentally responsive digital learning environments—particularly for culturally and linguistically diverse learners.
2024
Cho, S.-J., Goodwin, A., Naveiras, M., & Salas, J. (2024). Differential and functional response time item analysis: An application to understanding paper versus digital reading processes. Journal of Educational Measurement, 61, 219–251. https://doi.org/10.1111/jedm.12389
Data & Code: https://osf.io/s6ra8/
Despite the growing interest in incorporating response time data into item response models, there has been a lack of research investigating how the effect of speed on the probability of a correct response varies across different groups (e.g., experimental conditions) for various items (i.e., differential response time item analysis). Furthermore, previous research has shown a complex relationship between response time and accuracy, necessitating a functional analysis to understand the patterns that manifest from this relationship. In this study, response time data are incorporated into an item response model for two purposes: (a) to examine how individuals' speed within an experimental condition affects their response accuracy on an item, and (b) to detect the differences in individuals' speed between conditions in the presence of within-condition effects. For these two purposes, by-variable smooth functions are employed to model differential and functional response time effects by experimental condition for each item. This model is illustrated using an empirical data set to describe the effect of individuals' speed on their reading comprehension ability in two experimental conditions of reading medium (paper vs. digital) by item. A simulation study showed that the recovery of parameters and by-variable smooth functions of response time was satisfactory, and that the type I error rate and power of the test for the by-variable smooth function of response time were acceptable in conditions similar to the empirical data set. In addition, the proposed method correctly identified the range of response time where between-condition differences in the effect of response time on the probability of a correct response were accurate.
Cho, S.-J., Goodwin, A., Naveiras, M., & De Boeck, P. (2024). Modeling nonlinear effects of person-by-item covariates in explanatory item response models: Exploratory plots and modeling using smooth functions. Journal of Educational Measurement, 61, 595–623. https://doi.org/10.1111/jedm.12410
Data & Code: https://osf.io/kqctg/
Explanatory item response models (EIRMs) have been applied to investigate the effects of person covariates, item covariates, and their interactions in the fields of reading education and psycholinguistics. In practice, it is often assumed that the relationships between the covariates and the logit transformation of item response probability are linear. However, this linearity assumption obscures the differential effects of covariates over their range in the presence of nonlinearity. Therefore, this paper presents exploratory plots that describe the potential nonlinear effects of person and item covariates on binary outcome variables. This paper also illustrates the use of EIRMs with smooth functions to model these nonlinear effects. The smooth functions examined in this study include univariate smooths of continuous person or item covariates, tensor product smooths of continuous person and item covariates, and by-variable smooths between a continuous person covariate and a binary item covariate. Parameter estimation was performed using the mgcv R package through the maximum penalized likelihood estimation method. In the empirical study, we identified a nonlinear effect of the person-by-item covariate interaction and discussed its practical implications. Furthermore, the parameter recovery and the model comparison method and hypothesis testing procedures presented were evaluated via simulation studies under the same conditions observed in the empirical study.
Shimizu, A. Y., Havazelet, M., & Goodwin, A. P. (2024). More than one way: Fifth-graders’ varied digital reading behaviors and comprehension outcomes. AERA Open, 10. https://doi.org/10.1177/23328584241226633
Data & Code: https://www.openicpsr.org/openicpsr/project/195723/version/V1/view
Digital reading is ubiquitous, yet understanding digital reading processes and links to comprehension remains underdeveloped. Guided by new literacies and active reading theories, this study explored the reading behaviors and comprehension of thirteen fifth graders who read static digital texts. We coded for the quantity and quality of digital reading behaviors and employed action path diagrams to connect behaviors to comprehension. We used timescape analyses to visualize how behaviors were orchestrated differently across readers. Findings showed no single behavior was related directly to comprehension, indicating varying pathways to digital reading success. Occasional rereading seemed to support active reading and improved comprehension. Instances of students subverting expected digital tools were observed. Minor distractions like mind-wandering did not link to poor performance. This research deepens our understanding of self-monitoring and active reading in static digital contexts, offering insights for future study of more complex digital reading contexts like reading on the internet.
Peer-Reviewed Computer Science Paper Conference Presentations
2025
Davalos, E., Zhang, Y., Srivastava, N., Salas, J. A., McFadden, S., Cho, S.-J., Biswas, G., & Goodwin, A. (2025). LLMs as educational analysts: Transforming multimodal data traces into actionable reading assessment reports. In A. I. Cristea, E. Walker, Y. Lu, O. C. Santos, & S. Isotani (Eds.), Artificial intelligence in education (AIED 2025): Vol. 15878. Lecture notes in computer science (pp. 191-204). Springer. https://doi.org/10.1007/978-3-031-98417-4_14.
Paper: https://arxiv.org/pdf/2503.02099
Data & Code: https://github.com/edavalosanaya/LLMsAsEducationalAnalysts
Reading assessments are essential for enhancing students' comprehension, yet many EdTech applications focus mainly on outcome-based metrics, providing limited insights into student behavior and cognition. This study investigates the use of multimodal data sources -- including eye-tracking data, learning outcomes, assessment content, and teaching standards -- to derive meaningful reading insights. We employ unsupervised learning techniques to identify distinct reading behavior patterns, and then a large language model (LLM) synthesizes the derived information into actionable reports for educators, streamlining the interpretation process. LLM experts and human educators evaluate these reports for clarity, accuracy, relevance, and pedagogical usefulness. Our findings indicate that LLMs can effectively function as educational analysts, turning diverse data into teacher-friendly insights that are well-received by educators. While promising for automating insight generation, human oversight remains crucial to ensure reliability and fairness. This research advances human-centered AI in education, connecting data-driven analytics with practical classroom applications.
2024
Davalos, E., Srivastava, N., Zhang, Y., Goodwin, A., & Biswas, G. (2024, November 25). GazeViz: A web‑based approach for visualizing learner gaze patterns in online educational environment. In ICCE 2024: The 32nd International Conference on Computers in Education. https://doi.org/10.58459/icce.2024.4974
Paper: https://library.apsce.net/index.php/ICCE/article/view/4974/4903
Data & Code: https://github.com/RedForestAI/ETProWeb
As online learning tools become more widespread, understanding student behaviors through learning analytics is increasingly important. Traditional methods relying on system log data fall short of capturing the full range of cognitive strategies students use. To address this, we developed an in-depth post-assignment reflection dashboard that visualizes gaze data to aid students in reflecting on their learning behaviors. This dashboard was made possible by ETProWeb, a system that integrates high-fidelity eye-tracking directly into the browser, enabling real-time analysis of gaze data aligned with user interactions. ETProWeb leverages the browser's Document Object Model (DOM) to track areas of interest (AOIs) dynamically, overcoming issues related to multiple timelines and manual alignment. In a pilot study with 38 sixth-grade students, the dashboard received positive feedback, with 90% of students expressing interest in the eye-tracking technology for its ability to help them observe and reflect on their reading behaviors. This interest highlights the potential of eye-tracking as a valuable tool for enhancing students' self-awareness and engagement in online learning environments.
Peer-Reviewed Conference Paper Presentations
2024
Gonzales, A., Havazelet, M., & Shimizu, A. Y. (2024, April). Man vs. machine or is it?: Using AI to conduct qualitative analyses [Paper presentation]. American Educational Research Association (AERA) Conference, Philadelphia, PA.
Paper: https://vanderbilt.box.com/s/yg9046abq8jtbjskr9u59ujy5sbe2dw8
The imminent proliferation of AI tools such as ChatGPT have the potential to transform nearly every element of our lives. However, hasty implementation could dilute the quality of work and reinforce existing inequities. As qualitative researchers, we have a duty to consider how to responsibly leverage these emerging tools for robust, scalable, analysis. Using semi-structured interview data from 78 participants about their paper and digital reading, this study surfaces comparative strengths and challenges of qualitative coding using the AI tools ChatGPT and Atlas.ti. and a human coder. Our findings consider how all three of these coding tools can be effectively integrated in qualitative analysis.
Naveiras, M., Cho, S.-J., Goodwin, A. P., Salas, J. A., & Davalos, E. (2024, April). A scanpath trajectory reading eye-tracking spatio-temporal similarity (STRESS) measure [Paper presentation]. Annual Meeting of the National Council on Measurement in Education (NCME), Philadelphia, PA.
Slides: https://vanderbilt.box.com/s/6njziplk5rmqphyvgv2omlug4xudg0gt
A new similarity measure developed to detect the similarity and difference of eye-tracking scanpath in reading among readers.
2023
Havazelet, M. (2023, December). More than one way: A multimodal analysis of digital reading and question-answering [Conference presentation]. In A. Y. Shimizu (Chair), Multimodal methods for examining digital reading and writing processes [Symposium]. Literacy Research Association (LRA) Conference, Atlanta, GA.
Presentation: https://vanderbilt.box.com/s/67ip363v5c5oare2of7sdfm0q4yc6l37
Digital reading is ubiquitous, yet understanding digital reading processes and links to comprehension remains underdeveloped. Guided by new literacies and active reading theories, this study explored the reading behaviors and comprehension of thirteen fifth graders who read static digital texts. We coded for the quantity and quality of digital reading behaviors and employed action path diagrams to connect behaviors to comprehension. We used timescape analyses to visualize how behaviors were orchestrated differently across readers. Findings showed no single behavior was related directly to comprehension, indicating varying pathways to digital reading success. Occasional rereading seemed to support active reading and improved comprehension. Instances of students subverting expected digital tools were observed. Minor distractions like mind-wandering did not link to poor performance. This research deepens our understanding of self-monitoring and active reading in static digital contexts, offering insights for future study of more complex digital reading contexts like reading on the internet.
Shimizu, A. Y., Havazelet, M., & Goodwin, A. P. (2023, July). A mixed-methods multimodal analysis: Fifth graders’ digital reading processes and their links to comprehension. Paper presented at the Society for the Scientific Study of Reading, Port Douglas, Australia.
Slides: https://vanderbilt.app.box.com/file/2050259423115?s=b807492ugzbwflzqfygyrp2kc007l3f0
Digital reading is ubiquitous in our society, yet our understandings of digital reading processes and links to comprehension remain underdeveloped (Coiro, 2021). Framed within the New Literacies and Active Reading theories, the current study builds understanding of digital reading behaviors and links to reading comprehension for thirteen 5th graders digitally reading a static passage and answering associated questions. Using qualitative multimodal analysis, potential evidence of active self-regulation, including digital reading behaviors (i.e., re-reading, highlighting, scrolling, tracking with a cursor, and digital dictionary use) and off-task behaviors (i.e., looking away from the text, playful manipulation of tools) were coded and analyzed via constant comparative and visual data reduction methods. In terms of most digital reading and off-task behaviors no specific patterns were associated with either reading level or comprehension outcomes. Such results suggest different pathways towards digital reading success. However, rereading did appear to play a role in comprehension. Occasional rereading during the cold read of the passage was associated with more accurate returns to the text during the question and answer section and performance. This pattern suggests occasional digital rereading demonstrates active self-regulation and supports comprehension. Overall, our results provide a better understanding of students’ self-monitoring and active reading processes, which adds to the literature on digital reading processes and connections to reading comprehension.
Goodwin, A. P., Havazelet, M., Cho, S.-J., Salas, J., & Naveiras, M. (2023, April 13-16). Is digital reading generally more difficult? Nuances of effective reading behaviors for middle school readers. Paper presented at the 2023 Annual Meeting of the American Educational Research Association (place-based format), Chicago, IL.
Slides: https://docs.google.com/presentation/d/1kjbJAATyvLew7VTibw8NF2msGsdGviwe0Ln4eJlX6q0/edit?usp=sharing
This proposed study using mixed methods explores how students use tools to support digital and paper reading comprehension. We investigate qualities of highlighting and links to comprehension for 370 students grades 5 – 8. We then explore qualitatively the broader strategies used by 10 overperforming students (i.e., who perform better than district-tests would suggest) when reading a NAEP passage digitally. We consider medium, reader, text, and item differences to critique and extend current understandings. Quantitatively, we used nonparametric tests to explore similarities and differences, latent profile analysis to investigate profiles of highlighting, and explanatory item response models to link highlighting variables to comprehension and take into account medium, reader, text, and item differences. Qualitative analyses involved examining videos from selected students' digital reading and question-answering sessions using procedural coding (Guest, et al., 2012) and pattern coding (Miles, et al., 2020) to identify visible student actions. Quantitative results involving nonparametric tests indicate similarities and differences across mediums, and these patterns differ by part of text, yet few interactions between mediums and reader characteristics occur. For comprehension, explanatory item response models show most highlighting variables are nonsignificant predictors except highlights in areas of interest which support comprehension for both mediums controlling for other predictors. Differences by items and readers are explored. Results suggest highlighting in areas of interest is particularly important for locate and recall items and for students with more content knowledge. Qualitative results suggest movement (i.e., swaying, leaning, etc.) and breaks (i.e., head down, playful behaviors like humming or making faces, or manipulating digital tools) are part of overperforming students’ reading behaviors. Video and eye-tracking data suggest that students continued to engage with the task or reengaged immediately after such behaviors, suggesting that behaviors that are commonly seen as "off-task" may be supportive of digital reading success. This study highlights effective ways of using tools (highlighting, movement, distraction) to support reading, and in particular digital reading.
Naveiras, M., Cho, S.-J., Goodwin, A. P., & Salas, J. (2023, April 12–15). Explanatory item response models with factor smooth functions [Paper presentation]. In 2023 Annual Meeting of the National Council on Measurement in Education, New York, NY.
Slides: https://vanderbilt.app.box.com/file/2052212882030?s=lpkiazaz0gmcbtk0z2r0evtyuxmuk4y4
This paper presents explanatory item response models to model nonlinear interaction effects between continuous and binary covariates in person-by-item response data. Factor smooth functions were used to model the interactions. The models were illustrated to understand how highlighting behaviors on text are related to reading comprehension.
Presentations
Cho, S.-J. (2024, September). Explanatory item response models with random forests and interpretable machine learning [Colloquium presentation]. Quantitative Methods Colloquium, Vanderbilt University, Nashville, TN, United States.
Slides: https://vanderbilt.box.com/s/3ja0l8pfv15ziys4dp607zw7u19fmxa2
Explanatory item response models (EIRMs; De Boeck & Wilson, 2004; De Boeck, Cho, & Wilson, 2016) or generalized linear mixed-effects models with crossed random effects (e.g., Baayen, Davidson, & Bates, 2008; Cho & Rabe-Hesketh, 2011) have been widely applied to investigate the effects of person- and item-level explanatory variables in the fields of reading education and psycholinguistics. In this explanatory investigation, it is crucial to determine the appropriate form of predictor-outcome relationships, account for potential complex interactions between person- and item-level predictors, and select the most relevant predictors to ensure model adequacy and interpretation without redundancy or overfitting. While it is challenging to address these in EIRM, machine learning methods are capable of modeling highly complex and nonlinear relationships between predictors and outcomes (e.g., Kuhn & Johnson, 2018). Among the machine learning methods, ensemble approaches, which use multiple models, such as random forests (RF; Breiman, 2001), have been employed to enhance prediction accuracy beyond what is achievable with a single model. In my talk, I will present a hybrid approach, called EIRM-RF, to model complex nonlinear and interacted fixed effects of person- and item-level predictors (a main modeling component of RF), while allowing for random effects over persons and items (crossed random effects; a main modeling component of EIRM). The results of the EIRM-RF are examined using interpretable machine learning methods, including feature importance measures, partial dependence plots, accumulated local effect plots, and the H-statistic.
Preprints, Under Review, Code
Davalos, E., Salas, J. A., Zhang, Y., Srivastava, N., Thatigotla, Y., Gonzales, A., McFadden, S., Cho, S. J., Biswas, G., & Goodwin, A. (2025). Beyond instructed tasks: Recognizing in-the-wild reading behaviors in the classroom using eye tracking [Manuscript submitted for publication]. Educational Technology Research and Development. https://arxiv.org/abs/2501.18468
Paper: https://arxiv.org/pdf/2501.18468
Understanding reader behaviors such as skimming, deep reading, and scanning is essential for improving educational instruction. While prior eye-tracking studies have trained models to recognize reading behaviors, they often rely on instructed reading tasks, which can alter natural behaviors and limit the applicability of these findings to in-the-wild settings. Additionally, there is a lack of clear definitions for reading behavior archetypes in the liter ature. We conducted a classroom study to address these issues by collecting instructed and in-the-wild reading data. We developed a mixed-method frame work, including a human-driven theoretical model, statistical analyses, and an AI classifier, to differentiate reading behaviors based on their velocity, density, and sequentiality. Our lightweight 2D CNN achieved an F1 score of 0.8 for behavior recognition, providing a robust approach for understanding in-the wild reading. This work advances our ability to provide detailed behavioral insights to educators, supporting more targeted and effective assessment and instruction.
Davalos, E., Zhang, Y., Jain, S., Srivastava, N., Truong, T., Haque, N.-u., Van, T., Salas, J. A., McFadden, S., Cho, S.-J., Biswas, G., & Goodwin, A. (2025). Designing gaze analytics for ELA instruction: A user-centered dashboard with conversational AI support [Preprint submitted to IUI 2026]. arXiv. https://arxiv.org/abs/2509.03741
Paper: https://arxiv.org/pdf/2509.03741
Eye-tracking offers rich insights into student cognition and engagement, but remains underutilized in classroom-facing educational technology due to challenges in data interpretation and accessibility. In this paper, we present the iterative design and evaluation of a gaze-based learning analytics dashboard for English Language Arts (ELA), developed through five studies involving teachers and students. Guided by user-centered design and data storytelling principles, we explored how gaze data can support reflection, formative assessment, and instructional decision-making. Our findings demonstrate that gaze analytics can be approachable and pedagogically valuable when supported by familiar visualizations, layered explanations, and narrative scaffolds. We further show how a conversational agent, powered by a large language model (LLM), can lower cognitive barriers to interpreting gaze data by enabling natural language interactions with multimodal learning analytics. We conclude with design implications for future EdTech systems that aim to integrate novel data modalities in classroom contexts.
Davalos, E., Zhang, Y., Srivastava, N., Salas, J. A., McFadden, S., Cho, S.-J., Biswas, G., & Goodwin, A. (2025). Linking reading comprehension outcomes to process: Interpretable eye-tracking to aid instruction and learning [Manuscript submitted for presentation]. LAK2026 Conference.
Understanding how students engage with text during reading comprehension tasks is essential for delivering timely, pedagogically relevant feedback. While eye-tracking holds promise for uncovering cognitive and attentional processes, many existing gaze-based features are difficult to interpret and disconnected from classroom instruction. In this study, we present a novel gaze analytics framework that introduces interpretable, task-aligned features, such as line coverage, first and second pass reading behaviors, and answer-area dwell times, to analyze student engagement during question answering. Starting from a unimodal log-based baseline (F1 = 0.58), we show that these features not only improve item-level prediction of student correctness (F1 = 0.67), but also enable the segmentation of students into meaningful lookback behavior branches (e.g., Lookback & Answer Found) with distinct success rates and engagement profiles. Through unsupervised clustering, we further identify behavioral archetypes that capture nuanced strategies of effort and efficiency, providing interpretable and instructionally relevant models of student thinking. These findings demonstrate how gaze-informed learning analytics can help bridge the gap between raw sensor data and classroom practices, supporting scalable, strategy-aware feedback and reflection in reading assessment settings.
Davalos, E., Zhang, Y., Srivastava, N., Thatigotla, Y., Salas, J. A., McFadden, S., Cho, S.-J., Goodwin, A., TS, A., & Biswas, G. (2025). WEBEYETRACK: Scalable eye-tracking for the browser via on-device few-shot personalization [Manuscript submitted for publication]. arXiv. https://arxiv.org/abs/2508.19544
Paper: https://arxiv.org/pdf/2508.19544
Data & Code: https://github.com/RedForestAi/WebEyeTrack
With advancements in AI, new gaze estimation methods are exceeding state-of-the-art (SOTA) benchmarks, but their real-world application reveals a gap with commercial eye-tracking solutions. Factors like model size, inference time, and privacy often go unaddressed. Meanwhile, webcam-based eye-tracking methods lack sufficient accuracy, in particular due to head movement. To tackle these issues, we introduce We bEyeTrack, a framework that integrates lightweight SOTA gaze estimation models directly in the browser. It incorporates model-based head pose estimation and on-device few-shot learning with as few as nine calibration samples (k < 9). WebEyeTrack adapts to new users, achieving SOTA performance with an error margin of 2.32 cm on GazeCapture and real-time inference speeds of 2.4 milliseconds on an iPhone 14. Our open-source code is available at https://github.com/RedForestAi/WebEyeTrackL.
Naveiras, M., Cho, S.-J., Goodwin, A. P., & Salas, J. A. (n.d.). Python code for analyzing multimodal time-series and non-time-series data with recurrent neural networks.
Code: https://vanderbilt.box.com/s/xfgr685piosqwezljodneop825sn00y8
Artificial neural networks methods of multimodal process data (eye-tracking, emotional process, and reading behaviors) were developed.
Naveiras, M., Cho, S.-J., Goodwin, A. P., Salas, J. A., & Davalos, E. (n.d.). A scanpath trajectory reading eye-tracking spatio-temporal similarity (STRESS) measure.
Code: https://vanderbilt.box.com/s/k5gu5d83bjiudpfbiwdzo821h7874pir
A new similarity measure developed to detect the similarity and difference of eye-tracking scanpath in reading among readers
Goodwin, A. P., Cho, S.-J., Salas, J., Naveiras, M., & Shimizu, A. Y. (n.d.). Digital and paper text processing: Qualities of highlights that link to comprehension for middle school readers [Manuscript in preparation].
Detailed analysis of contributions of highlighting to reading comprehension considering content, reader, and item differences.