Log in to star items.
Accepted Contribution
Short abstract
This presentation proposes a processual theoretical framework to trace how conceptual assumptions are embedded in machine learning algorithms via ground-truth datasets, adapting Strauss’s “arc of work”. It draws on two empirical case studies, in the fields of justice and environmental health.
Long abstract
Machine learning algorithms are often portrayed as large-scale statistical tools that marginalize theory and conceptualization. Yet research on ground-truthing and data annotation shows that conceptual work remains central to model training and crucially shapes algorithmic outcomes. This work, however, largely remains invisible, as it is both displaced upstream into data preparation processes and fragmented across multiple actors involved in algorithmic production, including domain experts, engineers, data scientists, and annotators.
In this context, how can the conceptual, political, and social assumptions shaping algorithms be traced as they are incorporated through ground-truth datasets? This presentation proposes a methodological and conceptual framework for studying the progressive constitution of ground-truth datasets as a situated, processual activity. It adapts Anselm Strauss’s notion of “arc of work” to follow the successive stages of dataset construction and annotation, highlighting the articulation work required to align a plural, unstable reality with rigid classification systems. This approach also accounts for the diversity of professional worlds involved, making visible the tensions, turning points, and iterative adjustments through which AI systems are gradually configured and reconfigured.
The framework is grounded in two empirical fields: an ethnography (2020–2024) at two sites of AI production within the French justice system (Supreme Court and Ministry of Justice), and interviews and document analysis (2025) dedicated to the development of an AI tool in the field of environmental health within the French Ministry for Ecological Transition. In both cases, the study follows the full production chain, from category definition to annotation practices.
Ground truths and the epistemology of AI
Session 1