Unpacking data annotation work practices and workarounds: insights from Indian outsourcing firms

Click the star to add/remove an item to/from your individual schedule.
You need to be logged in to avail of this functionality. Log in

Accepted Paper:

Srravya Chandhiramowuli (University of Edinburgh) Alex Taylor (University of Edinburgh) Sara Heitlinger (City, University of London) Ding Wang

Short abstract:

This paper presents findings from an ethnographic study in annotation centres in India. We highlight the actors, practices, and policies that make up processes of creating datasets for AI, revealing their entanglements with infrastructural histories, global supply chains, and cultural constraints.

Long abstract:

Data annotation, an indispensable part of AI/ML system building, is a rapidly growing industry globally (Miceli & Posada, 2022; Irani, 2015; Poell et al., 2019). Yet, a model-centric, myopic view of AI (Sambasivan, 2022) affords little recognition to data annotation’s crucial contribution and wider challenges. Addressing this gap, we examine how human labour in data labelling for AI system-building is envisioned and operationalised. We draw on an ethnographic study of data work at an annotation company in India, during June - August 2022 at two of their centres located in semi-rural towns.

At these centres, first generation office workers, particularly women workers, are actively hired to support their financial independence and career development through tech work. However, the expectations, priorities and preferences of data requesters dictated worker schedules, time off and annotation tools at their disposal. We found that the choice of annotation tools varied with each project and typically, was dictated by the requesters. Whether the requesters provided the tools or licensed them from a third party, annotation teams rarely enjoyed agency over them. Far from being neutral or objective, we found that annotation practices and tools serve to assert conformity, and locate authority and control amongst a few actors.

In examining the material practices, global flows and social relations that shape data annotation and AI, we show how data labelling comes in contact with model building, impact sourcing, social entrepreneurship, and venture capital funding and in doing so, reflect on the effectiveness and fragility of AI systems.

Traditional Open Panel P348
Digital ghost work: human presences in AI transformations
Session 2 Tuesday 16 July, 2024, 15:30-17:00

A A A A A