Log in to star items.
Accepted Contribution
Short abstract
We examine counterfactual medical image generation in health Causal AI. Generating hypothetical data shifts real-world observations towards simulated ideal conditions. We highlight ‘pragmatic expediency’ and the role of synthetic data aiming to overcome the imperfect state of real-world data.
Long abstract
Robustness of AI models is often attributed to the representation quality of training data – a questionable assumption (e.g. Jaton 2025). Growing interest in the use of synthetic data has built on the acknowledgement that existing data are imperfect, making the former desirable for improving sample quality (Jordon et al. 2022; Offenhuber 2025). We examine counterfactual image generation (e.g. Roschewitz et al. 2025) in health Causal AI.
To capture real-world diversity, training data should be assembled by sourcing datasets across different countries, health systems and populations. In practice, this means accessing an uneven and relatively contingent collection of datasets. Synthetic images can then be useful to address gaps in data availability, coverage and quality.
Clinically generated data can be ‘cleaned’ from noise through the generation of ‘what-if’ images, for instance, through simulating the way scans would look in different lighting conditions. Similarly, existing medical scans can be used to generate a scan for a hypothetical patient who researchers would like to be, for instance, a different age. Manipulating existing data to generate hypothetical clinical observations could shift real-world observations toward a distribution generated in ideal conditions, which can serve as ground truth data for machine learning.
Synthetic data can serve as ground truth thanks to pragmatic and empirical strategies. Here the picture is more complicated than a naïve account of ground truth construction would ‘paint’ it. We highlight the importance of ‘pragmatic expediency’ and outline a case for the role of synthetic data in this more complex picture.
Ground truths and the epistemology of AI
Session 1