Can AI be used for better matching of proposals to reviewers? Feasibility and formal evaluation with the Metascience 2025 conference.
Tom Stafford
Amanda Kvarven
(University College London)
Short abstract
We report the results of a shadow experiment on proposals to the Metascience 2025 conference. After programme selection was finalised, we evaluated the use of large language models to successfully predict which reviewers would judge themselves most suitable to review submitted proposals.
Long abstract
Peer review is central to the evaluation of research, not least at this conference where all submitted proposals are reviewed by a subset of the programme committee. Identifying and recruiting suitable reviewers is a key bottleneck in the peer review process. AI tools promise to be able to match texts to reviewers, and so improve the efficiency (directly) and quality (indirectly) of reviews. By conducting a “shadow experiment” on the Metascience 2025 submissions we seek to evaluate this potential. After selection for the conference has been completed, including reviewers scoring proposals and their own suitability to review, we will investigate the feasibility of running, in-house, privacy-respecting, language models which are capable of matching reviewer profiles to proposal texts. The ambition will be to calculate match scores for all possible reviewer-proposal combinations and so be able to analyse a) how well the actual matching was against the suggested optimum and b) how well AI matching is able to predict self-rated reviewer suitability to review. Reporting the results at the conference will contribute to Metascience as a self-reflective and learning organisation.
Accepted Paper
Short abstract
Long abstract
Peer review is central to the evaluation of research, not least at this conference where all submitted proposals are reviewed by a subset of the programme committee. Identifying and recruiting suitable reviewers is a key bottleneck in the peer review process. AI tools promise to be able to match texts to reviewers, and so improve the efficiency (directly) and quality (indirectly) of reviews. By conducting a “shadow experiment” on the Metascience 2025 submissions we seek to evaluate this potential. After selection for the conference has been completed, including reviewers scoring proposals and their own suitability to review, we will investigate the feasibility of running, in-house, privacy-respecting, language models which are capable of matching reviewer profiles to proposal texts. The ambition will be to calculate match scores for all possible reviewer-proposal combinations and so be able to analyse a) how well the actual matching was against the suggested optimum and b) how well AI matching is able to predict self-rated reviewer suitability to review. Reporting the results at the conference will contribute to Metascience as a self-reflective and learning organisation.
Metascience Lab (III): brokering experiments
Session 1 Wednesday 2 July, 2025, -