P036: Questioning data annotation for AI: empirical studies

Click the star to add/remove an item to/from your individual schedule.
You need to be logged in to avail of this functionality. Log in

P036

Questioning data annotation for AI: empirical studies

Convenors:: Camille Girard-Chanudet (EHESS)
Assia Wirth
Send message to Convenors

Format:: Combined Format Open Panel

Location:: HG-11A33

Sessions:: Friday 19 July, 8:30-10:00, 10:30-12:00
Time zone: Europe/Amsterdam

Short Abstract:

This panel focuses on manual data annotation for machine learning purposes. It welcomes empirical studies conducted with annotators, documenting the doubts, inquiries, and choices, that progressively shape the training data sets and, thus, the results produced by AI.

Long Abstract:

The growing implementation of Artificial Intelligence (AI) technologies in “sensitive” fields such as health, justice or surveillance has triggered diverse preoccupations about algorithmic opacity. The efforts to crack open the “black box” of machine learning have mainly focused on coding architectures and practices, on the one hand, and on the constitution of training data sets, on the other hand. Both these components of machine learning dispositives are however held together by an essential link, which has largely remained on the fringes of AI studies: the manual annotation of data by human professionals.

Annotation work consists of the manual and meticulous labeling of documents (pictures, texts…) with a desired outcome that the algorithmic model will then reproduce. It can be undertaken by various categories of actors. While annotation conducted by underpaid and outsourced workers has been well documented, these activities can also be assumed by qualified workers even within prestigious professions. All of these empirical cases raise questions regarding the micro doubts, inquiries, choices, and overall expertise, that progressively shape the training data sets and, thus, the results produced by AI. Data labelling is about putting classification systems into practice, defining categories and their empirical borders, and constructing information infrastructures. Despite these strong political impacts, data annotation remains highly invisible.

This panel welcomes papers addressing the question of annotation from an empirical perspective. The contributions may include:

- Ethnographic studies documenting the practices of annotation work in particular contexts.

- Historical studies situating annotation work for AI in a broader genealogy of classification instruments and inscription practices.

- Organizational studies describing the effects of different institutional settings (in-house, outsourced, subcontracted etc.) and social configurations (gender, nationality, socioeconomic background etc.) on the annotation process.

The last session will be constructed as an open workshop, aiming at drawing research questions and perspectives.