Log in to star items.
Accepted Contribution
Short abstract
This academic paper applies the data journeys framework to synthetic data, tracing how data are constructed, translated and stabilised in practice. Drawing on qualitative work with two industry partners, we examine what synthetic data are and what their reuse demands of social research.
Long abstract
Synthetic data is increasingly positioned as a solution to longstanding problems of access, missingness and representativeness in social research. Yet as social researchers are drawn into contexts where synthetically generated data are available for reuse, critical questions arise: what are these data and how did they come to be? The sociodigital practices through which synthetic data are produced, the decisions embedded in their generation, and how they circulate data across contexts remain largely opaque at the point of reuse. This opacity is not incidental but constitutive. Synthetic data do not record social worlds, they construct representations shaped by practices, infrastructures and assumptions that conventional methodological frameworks are poorly equipped to trace.
This academic paper proposes that the data journeys framework (Bates et al., 2016) offers conceptual resources for addressing this gap. Developed to trace the relational, contingent, practice-laden processes through which data are made and transformed, data journeys directs analytic attention towards the moments of construction, translation and stabilisation that synthetic data production involves. We report on ongoing qualitative work with two industry organisations engaged in synthetic data production, using these collaborations to develop and stress-test the framework across different generative contexts. We reflect on what data journeys can reveal, where it requires adaptation, and what forms of methodological collaboration are necessary. We argue that STS-informed methods for tracing synthetic data provenance are not optional refinements but preconditions for epistemologically robust social research in a moment of rapid expansion in generative AI.
Synthetic data and representation: The politics of AI generated computational practices
Session 1