to star items.

Accepted Contribution

Ontological Filters - Examining the Knowledge Politics of Medical Synthetic Datasets and Models   
Sam Bennett (Durham University) Imo Emah (Edge Hill University)

Short abstract

Synthetic data is increasingly developed and used in medical research, but risks (re)producing narrow, sanitised worlds. This talk examines the knowledge politics shaping synthetic data in medical research, through the concept of the ontological filter which draws upon Barad's agential realism.

Long abstract

Synthetic data is framed as a key approach for improving representation of marginalized groups in medical datasets, improving access to data while preserving privacy. Despite these aims, synthetic data models risk reproducing narrow, sanitised versions of the world which prioritise the logics underpinning AI whilst being easy to scale up (Jacobsen 2024; Pasquinelli 2023). Responding to these concerns, this talk considers the ethico-onto-epistemology of synthetic data in medical research, bringing together findings from conceptual analysis and semi-structured interviews.

In this analysis, we apply the conceptual lens of Karen Barad’s agential realism - where material-discursive entanglements draw boundaries between what matters and doesn’t, shaping specific ‘worlds’ (Barad 2007). Real-world data has direct indexical grounding, meaning it points back to something that happened in the world as a representation of that event, which necessarily involves choices about what counts as ‘residue’, e.g. noise, edge cases, contradictions, and correlations which are judged as irrelevant. In contrast, synthetic datasets make data features legible to AI models, which go on to employ them in another instantiation of the data gaze (Beer 2018). The shaping of these datasets means there is minimal ‘residue’ outside the representational frame of the latent space of the model, as the data has already been passed through what we term an ‘ontological filter’, where noise becomes intentional rather than a problem to be fixed. In this panel, we interrogate the knowledge politics of how these are constructed and the implications of this for representation of marginalized groups.

Combined Format Open Panel CB027
Synthetic data and representation: The politics of AI generated computational practices
  Session 1