Reproducibility is the cornerstone of the credibility and trustworthiness of science. This session will focus on factors that affect reproducibility of research, as well as tools, methods and initiatives that can help increase reproducibility.
Long Abstract
This session is focused on reproducibility, transparency and questionable research practices. Lina Koppel presents findings from a survey where they asked 11,050 researchers in Sweden about their views on potential causes of the replication crisis and the extent to which various interventions can increase science’s credibility. Odd Erik Gundersen will talk about the replications of 22 AI studies that either publicly shared code and data or only data with a 50% success rate. David Valenta’s paper focuses on examining the impact of AI integration in reproducibility assessments of social science research. Maria Jones and Luis Eduardo San Martin discuss learnings for metascience, share institutional lessons, and present a roadmap for alliances toward standardizing and increasing transparency based on the initiative launched by the World Bank to enhance reproducibility. Tom Hardwicke estimated the prevalence of transparent research practices (e.g., data sharing & preregistration) in psychology by manually examining 400 randomly sampled empirical articles, where transparency increased modestly between 2017-2022, but continues to be widely neglected. Lastly, Jane Hergert identified and categorized 40 QRPs in quantitative research, creating a taxonomy that enhances transparency and helps mitigate QRPs, contributing to research integrity and metascience.
How to ensure credibility in science? We surveyed 11,050 researchers in Sweden about their views on potential causes of the replication crisis and the extent to which various interventions can increase science’s credibility. We compare results across academic fields and academic seniority.
Long abstract
The credibility of scientific research has come under increased scrutiny over the past decade, fuelled in part by lower-than-expected success rates of various efforts to replicate published findings. We conducted a survey of 11,050 researchers and PhD students in Sweden, who indicated (1) whether they had heard of the replication crisis, (2) to what extent they believed a number factors contributed to the replication crisis, and (3) how successful they believed a number of interventions are or would be in increasing the credibility of scientific findings. Overall, 51% of respondents indicated that they had heard of the replication crisis (30% answered “no” and 19% were unsure). This number varied substantially across fields, with psychology having the highest proportion of researchers indicating they had heard of the replication crisis (90%). Moreover, top-rated causes and solutions related to (1) how researchers are evaluated for employment and promotion based on quantitative publication metrics and (2) publication bias. In contrast, interventions such as lowering the threshold for statistical significance and limiting the number of publications per researcher per year were on average rated as the least successful in terms of their potential to increase credibility. Ratings were relatively stable across academic disciplines and employment categoreis (i.e., seniority), although there was some variation across fields in average ratings of some of the interventions.
We conducted replications of 22 AI studies that either publicly shared code and data or only data with a 50% success rate. Reproducibility increases to 86% when both code and data are shared, while it is reduced to 33% if only data is shared. Documenting data is more important than documenting code.
Long abstract
A reproducibility crisis has been reported in science, but the extent to which it affects AI research is not yet fully understood. Therefore, we performed a systematic replication study including 30 highly cited AI studies relying on original materials when available. In the end, eight articles were rejected because they required access to data or hardware that was practically impossible to acquire as part of the project. Six articles were successfully reproduced, while five were partially reproduced. In total, 50% of the articles included was reproduced to some extent. The availability of code and data correlate strongly with reproducibility, as 86% of articles that shared code and data were fully or partly reproduced, while this was true for 33% of articles that shared only data. The quality of the data documentation correlates with successful replication. Poorly documented or miss-specified data will probably result in unsuccessful replication. Surprisingly, the quality of the code documentation does not correlate with successful replication. Whether the code is poorly documented, partially missing, or not versioned is not important for successful replication, as long as the code is shared. This study emphasizes the effectiveness of open science and the importance of properly documenting data work.
We examine the impact of AI integration in reproducibility assessments of social science research. Across 103 teams—human-only, AI-assisted, and AI-led—we evaluate AI’s effect on reproducibility success rates and speed, error detection, and the quality of proposed robustness checks.
Long abstract
This study evaluates the effectiveness of varying levels of human and artificial intelligence (AI) integration in reproducibility assessments of quantitative social science research. We computationally reproduced quantitative results from published articles in the social sciences with 288 researchers, randomly assigned to 103 teams across three groups — human-only teams, AI-assisted teams and teams whose task was to minimally guide an AI to conduct reproducibility checks (the “AI-led” approach). Findings reveal that when working independently, human teams matched the reproducibility success rates of teams using AI assistance, while both groups substantially outperformed AI-led approaches (with human teams achieving 57 percentage points higher success rates than AI-led teams,
The World Bank launched an initiative in 2023 to enhance reproducibility, strongly encouraging reproducibility packages for research products. We discuss learnings for metascience, share institutional lessons, and present a roadmap for alliances toward standardizing and increasing transparency.
Long abstract
Reproducibility is crucial to understanding how social scientists derive their findings. Reproducibility standards enhance the credibility, quality, and impact of research. Journals and research institutions are increasingly adopting reproducibility standards, requiring authors to provide the code, data, and documentation necessary to reproduce their results. In September 2023, the World Bank launched a new initiative to increase the reproducibility of its research. Reproducibility packages are strongly encouraged for working papers, books, and flagship reports produced by World Bank staff and consultants; and required for a subset of working papers. Since then, internal, third-party replicators have verified the computational reproducibility of more than 200 research products. Once verified, reproducibility packages and corresponding metadata are published to the World Bank’s Reproducible Research Repository.
This change to the World Bank’s research process creates unique opportunities to study and strengthen metascience. We discuss observed changes in the publication and outreach of working papers and code attributes associated with computational reproducibility. We introduce what is, to our knowledge, the first metadata schema specifically designed for reproducibility packages. We also present institutional lessons for transparency and reproducibility and a roadmap for alliances toward increased transparency in research from multilateral organizations.
In a cross-sectional study, we estimated the prevalence of transparent research practices (e.g., data sharing & preregistration) in psychology by manually examining 400 randomly sampled empirical articles. Transparency increased modestly between 2017-2022, but continues to be widely neglected.
Long abstract
More than a decade of advocacy and policy reforms have attempted to increase the uptake of transparent research practices in the field of psychology; however, their collective impact is unclear. We estimated the prevalence of transparent research practices in (a) all psychology journals (i.e., field-wide), and (b) prominent psychology journals, by manually examining two random samples of 200 empirical articles (N = 400) published in 2022. Most articles had an open-access version (field-wide: 74%, 95% confidence interval [CI] = [67%, 79%]; prominent: 71% [64%, 77%]) and included a funding statement (field-wide: 76% [70%, 82%]; prominent: 76% [70%, 82%]) or conflict-of-interest statement (field-wide: 76% [70%, 82%]; prominent: 73% [67%, 79%]). Relatively few articles had a preregistration (field-wide: 7% [2.5%, 12%]; prominent: 14% [8.5%, 19%]), materials (field-wide: 16% [9%, 24%]; prominent: 19% [12%, 27%]), raw/primary data (field-wide: 14% [7%, 21%]; prominent: 16% [9.5%, 24%]), or analysis scripts (field-wide: 8.5% [4.5%, 13%]; prominent: 14% [9.5%, 19%]) that were immediately accessible without contacting authors or third parties. In conjunction with prior research, our results suggest transparency increased moderately from 2017 to 2022. Overall, despite considerable infrastructure improvements, bottom-up advocacy, and top-down policy initiatives, research transparency continues to be widely neglected in psychology.
Questionable research practices threaten scientific integrity, yet remain ill-defined. Using a community consensus method, we identified and categorized 40 QRPs in quantitative research. This taxonomy enhances transparency and helps mitigate QRPs, contributing to research integrity and metascience.
Long abstract
Questionable research practices (QRPs) undermine the credibility of scientific research, yet a comprehensive and structured taxonomy has been lacking. Using a community consensus method, we developed a refined definition of QRPs and systematically identified, categorized, and analyzed 40 QRPs in quantitative psychological research. These QRPs span various stages of the research process, from data collection to publication, and were assessed based on their potential harms, detectability, clues, and possible preventive measures. Our findings highlight the pervasiveness and versatility of QRPs and demonstrate their potential to distort scientific conclusions. By providing a structured framework for recognizing and mitigating QRPs, this work contributes to ongoing efforts in research integrity and transparency. Our Bestiary offers a practical tool for researchers, institutions, and policymakers to improve scientific practices and strengthen metascientific discourse. This talk will discuss key findings, methodological approaches, and implications for the future of open and credible psychological science.
Short Abstract
Reproducibility is the cornerstone of the credibility and trustworthiness of science. This session will focus on factors that affect reproducibility of research, as well as tools, methods and initiatives that can help increase reproducibility.
Long Abstract
This session is focused on reproducibility, transparency and questionable research practices. Lina Koppel presents findings from a survey where they asked 11,050 researchers in Sweden about their views on potential causes of the replication crisis and the extent to which various interventions can increase science’s credibility. Odd Erik Gundersen will talk about the replications of 22 AI studies that either publicly shared code and data or only data with a 50% success rate. David Valenta’s paper focuses on examining the impact of AI integration in reproducibility assessments of social science research. Maria Jones and Luis Eduardo San Martin discuss learnings for metascience, share institutional lessons, and present a roadmap for alliances toward standardizing and increasing transparency based on the initiative launched by the World Bank to enhance reproducibility. Tom Hardwicke estimated the prevalence of transparent research practices (e.g., data sharing & preregistration) in psychology by manually examining 400 randomly sampled empirical articles, where transparency increased modestly between 2017-2022, but continues to be widely neglected. Lastly, Jane Hergert identified and categorized 40 QRPs in quantitative research, creating a taxonomy that enhances transparency and helps mitigate QRPs, contributing to research integrity and metascience.
Accepted papers
Session 1 Tuesday 1 July, 2025, -