Click the star to add/remove an item to/from your individual schedule.
You need to be logged in to avail of this functionality,
and to see the links to virtual rooms.
Log in
Accepted Paper:
Synthetic data, the new quick-fix: interrogating the data governance discourse
Sook Lin Toh
(University of Southern California)
Paper short abstract:
To understand how synthetic data has become the new 'revolutionary' solution to data injustice, I critique how dominant data governance and risk mitigation discourses establishes a narrow definition of 'data risk'. The harm that data governance aims to address may stem from the discourse itself.
Paper long abstract:
In line with the critique of technocratic solutions in development, this paper tackles the emerging innovation of ‘synthetic data’. Synthetic data is AI-generated image or tabular datasets used to substitute real-life data to train machine learning models, now framed as the ‘safe, ethical alternative’ and solving issues of bias and privacy. The promises of synthetic data are contingent upon a particular, narrow definition of ‘data risks’. Data-fication, and now AI, is often criticised for its opacity and exacerbation of pre-existing power imbalances. However, attention to dominant discourses on data governance and reform, both within the private and public sector, illuminates how many recommendations rely on a simplified ethical framework of ‘privacy’ and ‘bias’. This paper looks at how privacy and bias are understood and evaluated for generating synthetic data to demonstrate that the current discourse and best practices may legitimise ‘synthetic data’ as an appropriate mitigation strategy, and the possible implications of this. Ultimately, this study argues that the dominant discourse on data governance and risk mitigation may be enabling technical solutions that potentially replicate and exacerbate existing issues. As such, more rigorous definitions of ‘risk’ should be adopted by mainstream data justice narratives.