Log in to star items.
Accepted Contribution
Short abstract
Drawing on ethnographic research in data-centric biomedicine, this paper examines how synthetic data are used in model validation. It argues that rather than representing reality, synthetic datasets function as epistemic devices that reorganize regimes of visibility and validation in AI modelling
Long abstract
This paper examines how synthetic data are used in the situated practices of model construction and validation in contemporary data-centric biomedicine. Drawing on ethnographic fieldwork and interviews with bioinformaticians, computational biologists, and computer scientists, we explore how synthetic datasets are deployed to test whether methods can recover known structures embedded in the data. In these settings, synthetic data function as epistemic devices through which the behaviour of models is rendered visible and assessable. We confront this empirical material with an analysis of the technical literature on explainable and trustworthy AI. These approaches focus on making inferential pathways visible in order to render model outputs interpretable and intelligible. We argue that this shift reflects a broader epistemological transformation involving regimes of visibility, epistemic virtues, and socially organised practices of seeing and representing. Our claim is that synthetic data do not introduce a radical break in scientific representation. Rather than their “synthetic” nature, what deserves scrutiny are the changing conditions under which validated knowledge is warranted. While the situated practices we studied were oriented toward aligning modelling workflows with particular regimes of visibility that define what counts as valid knowledge, the adoption of deep-learning AI systems (using both synthetic and real datasets) and the rise of explainability techniques shift the domain of the visible from data-model structures to the reliability and interpretability of model outputs. More broadly, the paper shows how AI and synthetic data reorganize regimes of validation by shifting attention from data-model relations to model output credibility.
Synthetic data and representation: The politics of AI generated computational practices
Session 2