Justice in data structures? Investigating representations of data work in machine learning practice

Click the star to add/remove an item to/from your individual schedule.
You need to be logged in to avail of this functionality. Log in

Accepted Contribution:

SJ Bennett (Durham University) Fabio Tollon (University of Edinburgh) Benedetta Catanzariti (University of Edinburgh)

Short abstract:

This paper empirically investigates AI practitioners’ conceptions of how data annotation fits into data structures, their representations of the workers engaged with this type of data work, and how these representations shape data structures themselves.

Long abstract:

Much of today’s AI development requires a vast and distributed network of data workers who sort through, clean, and annotate the data used to train machine learning models. However, this network is often represented asymmetrically with a central focus on the contributions of AI practitioners which are positioned as pivotal, whilst other forms of labour, such as annotation, are seen as ad hoc and with little cumulative impact. These representations draw upon practitioner accounts, but rarely interrogate their underlying assumptions. This paper investigates AI practitioners’ conceptions of how data annotation fits into data structures, their representations of the workers engaged with this type of data work, and how these representations shape data structures themselves. Drawing from workshops conducted with machine learning practitioners, we explore experiences of data ’wrangling’, or practices of data acquisition, cleaning, and annotation, as the point where AI practitioners interface with domain experts and data annotators. In exploring these practices, we move beyond the simple recognition of data workers’ ‘invisibility’ to examine the political role of epistemic framings of the data work that underpin AI development and how these framings can shape data workers’ agency. Finally, we reflect on the implications of our findings for developing more participatory and equitable approaches to AI.

Combined Format Open Panel P036
Questioning data annotation for AI: empirical studies
Session 1 Friday 19 July, 2024, 8:30-10:00

A A A A A