An AI-powered tutoring assistant increased human tutors’ capacity to help students through math problems and improved students’ performance in math, according to a Stanford University study.
The digital tool, Tutor CoPilot, was created by Stanford researchers to guide tutors, especially novices, in their interactions with students.
The study is the first randomized controlled trial to examine a human-AI partnership in live tutoring, according to the researchers. The study examines whether the tool is effective for improving tutors’ skills and students’ math learning.
It comes as tutoring has become a key learning-recovery tool. Schools, however, have run into challenges in scaling and sustaining tutoring programs because they require a lot of human tutors, time, and money.
In an interview with Education Week, Susanna Loeb, an education professor at Stanford and one of the study’s authors, discussed the creation of the tool, the trial findings, and its implications for schools.
This interview has been edited for brevity and clarity.
Generative AI, particularly Language Models (LMs), has the potential to transform real-world domains with societal impact, particularly where access to experts is limited. For example, in education, training novice educators with expert guidance is important for effectiveness but expensive, creating significant barriers to improving education quality at scale. This challenge disproportionately hurts students from under-served communities, who stand to gain the most from high-quality education and are most likely to be taught by inexperienced educators. We introduce Tutor CoPilot, a novel Human-AI approach that leverages a model of expert thinking to provide expert-like guidance to tutors as they tutor. This study presents the first randomized controlled trial of a Human-AI system in live tutoring, involving 900 tutors and 1,800 K-12 students from historically under-served communities. Following a preregistered analysis plan, we find that students working on mathematics with tutors randomly assigned to have access to Tutor CoPilot are 4 percentage points (p.p.) more likely to master topics (p<0.01). Notably, students of lower-rated tutors experienced the greatest benefit, improving mastery by 9 p.p. relative to the control group. We find that Tutor CoPilot costs only $20 per-tutor annually, based on the tutors’ usage during the study. We analyze 550,000+ messages using classifiers to identify pedagogical strategies, and find that tutors with access to Tutor CoPilot are more likely to use strategies that foster student understanding (e.g., asking guiding questions) and less likely to give away the answer to the student, aligning with high-quality teaching practices. Tutor interviews qualitatively highlight how Tutor CoPilot’s guidance helps them to respond to student needs, though tutors flag common issues in Tutor CoPilot, such as generating suggestions that are not grade-level appropriate. Altogether, our study of Tutor CoPilot demonstrates how Human-AI systems can scale expertise in real-world domains, bridge gaps in skills and create a future where high-quality education is accessible to all students.