Comparing Human-Only, AI-Assisted, and AI-Led Teams on Assessing Research Reproducibility in Quantitative Social Science
David Valenta
(University of Ottawa)
Bruno Barbarioli
(University of Ottawa)
Tom Stafford
Abel Brodeur
Alexandru Marcoci
(University of Nottingham)
Juan Posada
(University of Ottawa)
Derek Mikola
(University of Ottawa)
Rohan Alexander
(University of Toronto)
Lachlan Deer
(Tilburg university)
Lars Vilhuber
(Cornell University)
Gunther Bensch
(RWI - Leibniz-Institute for Economic Research)
Short abstract
We examine the impact of AI integration in reproducibility assessments of social science research. Across 103 teams—human-only, AI-assisted, and AI-led—we evaluate AI’s effect on reproducibility success rates and speed, error detection, and the quality of proposed robustness checks.
Long abstract
This study evaluates the effectiveness of varying levels of human and artificial intelligence (AI) integration in reproducibility assessments of quantitative social science research. We computationally reproduced quantitative results from published articles in the social sciences with 288 researchers, randomly assigned to 103 teams across three groups — human-only teams, AI-assisted teams and teams whose task was to minimally guide an AI to conduct reproducibility checks (the “AI-led” approach). Findings reveal that when working independently, human teams matched the reproducibility success rates of teams using AI assistance, while both groups substantially outperformed AI-led approaches (with human teams achieving 57 percentage points higher success rates than AI-led teams,
Accepted Paper
Short abstract
Long abstract
This study evaluates the effectiveness of varying levels of human and artificial intelligence (AI) integration in reproducibility assessments of quantitative social science research. We computationally reproduced quantitative results from published articles in the social sciences with 288 researchers, randomly assigned to 103 teams across three groups — human-only teams, AI-assisted teams and teams whose task was to minimally guide an AI to conduct reproducibility checks (the “AI-led” approach). Findings reveal that when working independently, human teams matched the reproducibility success rates of teams using AI assistance, while both groups substantially outperformed AI-led approaches (with human teams achieving 57 percentage points higher success rates than AI-led teams,
Where next for replication, transparency and analysis of QRPs? (I)
Session 1 Tuesday 1 July, 2025, -