Making Models Visible: Synthetic Data and the Epistemology of Visibility

Conference

EASST2026

Krakow, Poland

7 – 11 Sep 2026

Panel explorer Panel list Website
Log in

Accepted Contribution

Lorenzo Beltrame (University of Trento) Fabio Gasparini (University of Padova)

Short abstract

Drawing on ethnographic research in data-centric biomedicine, this paper examines how synthetic data are used in model validation. It argues that rather than representing reality, synthetic datasets function as epistemic devices that reorganize regimes of visibility and validation in AI modelling

Long abstract

This paper examines how synthetic data are used in the situated practices of model construction and validation in contemporary data-centric biomedicine. Drawing on ethnographic fieldwork and interviews with bioinformaticians, computational biologists, and computer scientists, we explore how synthetic datasets are deployed to test whether methods can recover known structures embedded in the data. In these settings, synthetic data function as epistemic devices through which the behaviour of models is rendered visible and assessable. We confront this empirical material with an analysis of the technical literature on explainable and trustworthy AI. These approaches focus on making inferential pathways visible in order to render model outputs interpretable and intelligible. We argue that this shift reflects a broader epistemological transformation involving regimes of visibility, epistemic virtues, and socially organised practices of seeing and representing. Our claim is that synthetic data do not introduce a radical break in scientific representation. Rather than their “synthetic” nature, what deserves scrutiny are the changing conditions under which validated knowledge is warranted. While the situated practices we studied were oriented toward aligning modelling workflows with particular regimes of visibility that define what counts as valid knowledge, the adoption of deep-learning AI systems (using both synthetic and real datasets) and the rise of explainability techniques shift the domain of the visible from data-model structures to the reliability and interpretability of model outputs. More broadly, the paper shows how AI and synthetic data reorganize regimes of validation by shifting attention from data-model relations to model output credibility.

Combined Format Open Panel CF02
Synthetic data and representation: The politics of AI generated computational practices
Session 3

A A A A A