Leveraging Large Language Models for Data Extraction in Metaresearch: a Feasibility Study and Automatised Protocol
Benjamin Simsa
(Slovak Academy of Sciences)
Matus Adamkovic
(Charles University)
Artem Buts
Short abstract
We test the feasibility of Large Language Models for automating data extraction in metaresearch. Results show that these models achieve high accuracy in extracting a wide range of metascientific variables at a fraction of the cost of manual coding, with frontier models nearing human-level accuracy.
Long abstract
The manual data extraction in metaresearch is often a tedious, time-consuming, and error-prone process. In this paper, we investigate whether the current generation of Large Language Models (LLMs) can be used to extract accurate information from scientific papers. Across the metaresearch literature, these usually range from extracting verbatim information (e.g., the number of participants in a study, effect sizes, or whether the study is preregistered) to making subjective inferences.
Using a publicly available dataset (Blanchard et al., 2023) containing a wide range of meta-scientific variables from 34 network psychometrics papers, we tested six LLMs (Claude 3.5 Sonnet, Claude 3 Opus, Claude 3 Haiku, GPT 4o, GPT 4o mini, o1-preview). We used the API for extracting the variables from the documents automatically. This automated pipeline allows batch processing of research papers. As such, it represents a more efficient and scaleable way to extract metascientific data, compared to using the default chat interface.
Our results point to a high accuracy and high potential of LLMs for metascientific data extraction. The accuracy of the respective models ranged from 76 % to 87 %, and most models were able to convey uncertainty in the more contentious cases.
We provide a comparison of accuracy and cost-effectiveness of the individual models and discuss the characteristics of variables that are (non)suitable for automatic coding. Furthermore, we describe some of the common pitfalls and best practices of automatised LLM data extraction. The proposed procedure can decrease the time and costs associated with conducting metaresearch by orders of magnitude.
Accepted Paper
Short abstract
Long abstract
The manual data extraction in metaresearch is often a tedious, time-consuming, and error-prone process. In this paper, we investigate whether the current generation of Large Language Models (LLMs) can be used to extract accurate information from scientific papers. Across the metaresearch literature, these usually range from extracting verbatim information (e.g., the number of participants in a study, effect sizes, or whether the study is preregistered) to making subjective inferences.
Using a publicly available dataset (Blanchard et al., 2023) containing a wide range of meta-scientific variables from 34 network psychometrics papers, we tested six LLMs (Claude 3.5 Sonnet, Claude 3 Opus, Claude 3 Haiku, GPT 4o, GPT 4o mini, o1-preview). We used the API for extracting the variables from the documents automatically. This automated pipeline allows batch processing of research papers. As such, it represents a more efficient and scaleable way to extract metascientific data, compared to using the default chat interface.
Our results point to a high accuracy and high potential of LLMs for metascientific data extraction. The accuracy of the respective models ranged from 76 % to 87 %, and most models were able to convey uncertainty in the more contentious cases.
We provide a comparison of accuracy and cost-effectiveness of the individual models and discuss the characteristics of variables that are (non)suitable for automatic coding. Furthermore, we describe some of the common pitfalls and best practices of automatised LLM data extraction. The proposed procedure can decrease the time and costs associated with conducting metaresearch by orders of magnitude.
A strategic brain for STI
Session 1 Tuesday 1 July, 2025, -