Accepted Paper

Textual Footprints of Science: Analyzing Geographical Mentions in Research Outputs  
Berta Grimau (SIRIS Academic) Nicolau Duran-Silva (SIRIS Academic Pompeu Fabra University) Tatiana FERNÁNDEZ-SIRERA (Generalitat de Catalunya) Enric Fuster (SIRIS Academic)

Short abstract

The identification of geographical entities in research outputs is important for science monitoring/evaluation. To address this demand, we developed a tool that finds locations in text, matches them to their unique spatial footprint and classifies them according to their role (e.g. object of study).

Long abstract

Accessing the semantic contents of research outputs is essential for science monitoring and evaluation. One type of information found in research documents is their geographical scope: what places are studied or where does the research take place? Geotagging, the task of identifying and disambiguating geographical mentions in text, not only captures specific geographical points as relevant for a research activity, but it also allows for the definition of new indicators.

To address this demand, we built a multilingual system (which will be available open-source) that performs the following tasks: (1) Identification of geographical entities in scholarly texts, (2) Toponym resolution, (3) Contextual role classification (e.g. object of study or impacted location).

For (1), we created a new NER dataset focusing exclusively on geographical entities by combining existing datasets from multiple sources. With it we finetuned a multilingual LLM (CAT, DE, EN, ES, FR, IT) to perform geographical NER (following Zekun et al 2023). For (2), the OpenStreetMap API is called and the mentions are matched with their most likely unique spatial footprint. For (3), we trained a classifier with a dataset of geographical mentions in R&I contexts manually labelled with their role, based on the following taxonomy:

-Object of study

-Location of research

-Impacted location

-Contextual/other

This last step allows for higher analytical granularity and the elimination of possible noise.

In our talk, we will present a case study on a territorial ecosystem (analysing their publications, and R&I projects) demonstrating the impact of the tool for metascience and research mapping.

Panel T3.2
Methods mash: expanding the tools of metascience
  Session 1 Tuesday 1 July, 2025, -