From human to non-human alignment: shaping the notion of “semantic similarity” in text embedding models

Conference

EASST2026

Krakow, Poland

8 – 11 Sep 2026

Panel explorer Panel list Website
Log in

Accepted Paper

Mathieu Jacomy (Aalborg University) Anders Kristian Munk (Technical University of Denmark)

Paper short abstract

This paper opens the black box of text embedding benchmarks, revealing that “semantic similarity” is not a settled matter of fact but an assemblage stabilized by alignment logics foreign to SSH. We propose descriptive benchmarks as an algorithmic resistance intervention.

Paper long abstract

Embedding models, employed in retrieval-augmented generation techniques that extend the capabilities of large language models, are said to encode (i.e. tasked to perform) “semantic similarity.” But in what sense, exactly, and can it be repurposed for research in the social science and humanities (SSH)? Benchmarks such as MMTEB (Enevoldsen et al., 2025) de facto set this standard.

In this paper we review the benchmark literature, observe that most benchmarks approach semantic similarity as a matter of human alignment, and contest that such a frame is necessarily the most relevant to assess SSH applications of embedding models. To defend the methodological autonomy of SSH and as an algorithmic resistance intervention, we propose a different approach to their benchmarking, departing from the alignment problem, and accepting to live with the trouble of humans not being aligned with each other in the first place. Rather than setting a single universal standard, we defend descriptive benchmarks that document analytically useful yet potentially incompatible features that can never be assembled into a single scale from worst to best model.

First, we unpack how MMTEB compiles benchmarks that in turn compile smaller benchmarks, and discuss what this aggregative logic performs on the nature of semantic similarity. Second, we articulate our approach to “descriptive” benchmarks in the case of an application to observatorial controversy mapping. Third and finally, we showcase an implemented example that documents whether a model considers two opposite opinionated statements less similar than with a neutral one (and find that many do not).

Traditional Open Panel P201
The Futures of Qualitative Inquiries: Post-Digital Methods, Pre-Digital Methodologies
Session 1

A A A A A