Anxious annotators, resistance to data extractivism: an ethnographic study of a multidisciplinary novel-generating ai research team in Korea

Click the star to add/remove an item to/from your individual schedule.
You need to be logged in to avail of this functionality. Log in

Accepted Contribution

So Yeon Leem (Dong-A University)

Send message to Author

Short abstract

Exploring ethical AI development, this study focuses on literature experts annotating texts for AI in Korea, highlighting their struggles against data extractivism and advocating for 'text with care' as an ethical AI development practice.

Long abstract

The rise of Large Language Models (LLMs) like ChatGPT is reshaping the social discourse on AI technology, bringing copyright concerns of texts and images used in AI training data to the forefront. Notably, the New York Times' December 2023 lawsuit against OpenAI and Microsoft over copyrighted content in ChatGPT's training data exemplifies this shift. This study scrutinizes whether financial compensation for copyright infringement is the sole ethical countermeasure to data extractivism, which inherently regards all content, copyrighted or not, as mere data for AI enhancement. I explore data extractivism through the lens of the low-wage, non-professional laborers tasked with converting meticulously written texts into 'AI fodder' following pre-set manuals. Since 2022, I have been conducting participant observation within a Korean multidisciplinary team developing an AI model for generating novels in English and Korean. This research highlights a group of experts—primarily literature PhD candidates—transforming novel texts into AI training data. Their professional expertise and passion for literature underscore the complexities of reducing literary works to data. Their anxiety and frustration, I argue, affirm that annotation involves 'text with care(Leedham et al., 2021),' presenting an opportunity for developing AI ethically, in opposition to data extractivism

Combined Format Open Panel P036
Questioning data annotation for AI: empirical studies
Session 1 Friday 19 July, 2024, 8:30-10:00

A A A A A