Click the star to add/remove an item to/from your individual schedule.
You need to be logged in to avail of this functionality, and to see the links to virtual rooms.


Exploring Low-Resource Languages through Corpus Work: Challenges, Innovations, and Insights 
Nikolay Mikhailov (Nazarbayev University)
Send message to Convenor
Nikolay Mikhailov (Nazarbayev University)
Nikolay Mikhailov (Nazarbayev University)
Language & Linguistics
Hall of Turan civilization (Floor 1)
Thursday 6 June, -
Time zone: Asia/Almaty


This panel seeks to explore the intricate landscape of low-resource languages (LRLs) in Central Asia, using the Kazakh language as an example, by examining the methodologies, challenges, and discoveries within corpus linguistics applied to these linguistic domains. The study of LRLs has garnered significant attention from researchers across various disciplines due to recent innovations in computational technologies. However, the scarcity of resources in terms of data and tools presents great challenges in conducting comprehensive linguistic analysis and documentation.

The panel brings together researchers who have navigated these challenges and made significant contributions towards corpus work on LRLs, specifically the Kazakh language. The studies presented in the panel will highlight the approaches and methodologies employed in collecting, annotating, and analyzing linguistic data from low-resource settings. Furthermore, they will address the ethical considerations, emphasizing the importance of respectful collaboration and responsible data stewardship concerning natural spoken language data.

Key themes to be explored include methodological innovations, language documentation, sociolinguistic implications, and technology tools, presenting a corpus of spoken Kazakh language. The panelists will discuss strategies developed to overcome data scarcity and linguistic documentation challenges, like leveraging crowd-sourcing techniques, adapting existing tools for LRLs, and employing community-based participatory research methods to build and annotate corpora, showcasing the successful application of those methods through a corpus of spoken Kazakh language, which will be available to researchers wishing to explore the new perspective. The papers will showcase the diverse linguistic phenomena and cultural insights gleaned through corpus-based studies of LRLs, particularly in the domain of spoken language. The presented projects that rely on the spoken corpus of the Kazakh language will demonstrate the phenomena that often remain unseen when only employing literary textual corpora of a language. Finally, the panel will address the broader sociolinguistic implications of spoken corpus work on LRLs, including language revitalization efforts, linguistic rights advocacy, and community empowerment initiatives. By actively engaging with local communities and stakeholders, researchers can ensure that their work contributes meaningfully to preserving and promoting linguistic diversity, diverging from the existing textual corpora of the literary Kazakh language.

Overall, this panel seeks to showcase the transformative potential of corpus linguistics in advancing our understanding of low-resource languages in Central Asia, while also advocating for ethical and inclusive research practices that center the voices and agency of language speakers. Through collaboration, innovation, and mutual respect, we can collectively work towards a more equitable and linguistically diverse world.

Accepted papers:

Session 1 Thursday 6 June, 2024, -