Accepted Paper:

‘No language left behind?’ Predictive text and dilemmas of inclusion for African indigenous languages in digital spaces  
Peter Chonka (King's College London) Stephanie Diepeveen (University of Cambridge) Yidnekachew Haile (Royal Holloway University of London)

Paper short abstract:

Drawing on a study on a search 'autocomplete' algorithm's interaction with 3 east African languages, this paper highlights dilemmas related to their 'inclusion' in natural language processing/generation technologies that relate to accountability for digital harms and digital linguistic survival.

Paper long abstract:

Ongoing developments in natural language processing (NLP) and natural language generation (NLG) raise critical questions about linguistic bias and the dominance of English datasets. This paper addresses the dilemmas emerging from intentional and unintentional forms of inclusion of globally marginalised African languages in NLP and NLG, and queries whether/how digital 'inclusion' is essential for linguistic survival. We draw upon a study of how Google Search autocomplete algorithms interact with three languages indigenous to East Africa: Amharic, Kiswahili and Somali, each with its own historical, political and orthographic features. The paper explores different forms of harm that result through the operation of autocomplete algorithms in each language, and situates the experiences of marginalised languages and NLP algorithms within specific historical, cultural and political contexts. Our results raise important questions about the desirability of digital linguistic inclusion in the context of local and global power imbalances. The paper also reflects on the significance of our results in relation to recent developments in NLG (e.g. ChatGPT) and suggests the importance of future research into the contextual factors that inform how languages engage with these technologies.

Panel Lang04
Indigenous languages and disentanglement with African futures
  Session 1 Friday 2 June, 2023, -