LANG005: Kazakhstani Languages I

LANG005

Kazakhstani Languages I

Chair:: Jiydegul Alymidin Kyzy (American University of Central Asia)

Discussant:: Meiramgul Kussainova (Nazarbayev University)

Format:: Panel

Theme:: Language & Linguistics

Location:: Room 2016

Sessions:: Thursday 18 June, 14:00-15:45
Time zone: KZT

Accepted papers

Session 1 Thursday 18 June, 2026, 14:00-15:45

Small cities under big influences. The Linguistic Landscape of Taraz and Karagandy, Kazakhstan

Zhamilya Abik (Nazarbayev University)

Send message to Author

Abstract

This study examines the linguistic landscapes of Taraz and Karagandy, two regional cities in Kazakhstan, in order to explore how historical background, language policy, and globalization shape visible language practices in post-Soviet urban spaces. Drawing on the theoretical framework of Linguistic Landscape studies the research analyzes the visibility, salience, and placement of languages on commercial and public signage. The study applies Elana Shohamy and Bernard Spolsky’s language policy and sign-making theory, which conceptualizes signage as the outcome of interactions between sign owners, sign-makers, readers, and language management authorities. In addition, Pavlenko’s post-Soviet de-russification framework is employed to interpret processes of language erasure, replacement, upgrading, downgrading, regulation, and transgression.

Quantitative and qualitative analysis of commercial signage, supplemented by informal interviews with residents, reveals significant regional differences. Taraz demonstrates strong Kazakh-dominant practices, frequent monolingual Kazakh signage, and instances of Kazakh-English bilingualism that reflect both de-russification and globalization. Karagandy, by contrast, shows a higher prevalence of Russian monolingual and Kazakh-Russian bilingual signage, including transgressive signs that violate official language placement norms.

The findings suggest that smaller cities respond differently to external influences such as migration from the Russian Federation and global economic integration. Taraz appears to be at a more advanced stage of symbolic de-russification and language loyalty, while Karagandy reflects a more persistent Russian linguistic presence linked to demographic composition and Soviet industrial heritage. The study contributes to post-Soviet sociolinguistics by demonstrating that linguistic landscapes in regional cities provide nuanced insight into language ideology, identity construction, and the ongoing negotiation between national language policy and everyday linguistic practice.

Exploring the Linguistic Landscape in Digital Environments: A Case Study of Kazakhstan

Aliya Aimoldina (Astana IT University) Sholpan Zharkynbekova (L.N. Gumilyov Eurasian National University)

Send message to Authors

Abstract

This study examines the linguistic landscape of social media through an analysis of posts about the city of Astana published on Facebook and Threads. The relevance of investigating the linguistic landscape in digital environments stems from a fundamental shift: social media platforms are evolving into autonomous semiotic territories in which language choice, toponymic markers, and discursive patterns construct alternative mappings of the city (Ivkovic & Lotherington, 2009). The virtual image of the capital thus emerges as a significant factor in urban policy and shapes residents’ perceptions of urban space.

The corpus consists of user-generated posts containing Astana-related geotags, published between 2023 and 2025. Unlike physical space, the digital landscape is characterized by rapid change, a polyphony of voices, and the possibility of immediate feedback, making it a sensitive indicator of social tensions and identity shifts. Analysis of the virtual linguistic landscape enables the identification of informal meaning-making practices related to the urban environment that often remain outside official representations.

The study aims to identify the key linguistic strategies employed by residents in representing Astana on Facebook and Threads and to examine the relationship between language choice and discursive practices. It hypothesizes that the use of Kazakh, Russian, and hybrid forms, along with specific toponymic practices, not only reflects existing social hierarchies but also actively constructs meanings of urban belonging that may diverge from the official narrative. The study combines quantitative analysis of language distribution with qualitative analysis of self-positioning strategies. The research questions operate on three levels: (1) the distribution of languages and patterns of code-switching; (2) discursive techniques used to mark urban belonging; and (3) the relationship between linguistic visibility and the semantic density of identity construction in audience responses.

The findings indicate that Astana’s virtual linguistic landscape functions as a counter-narrative to the city’s official image, enabling citizens to construct urban identities through irony, code-switching, and the symbolic reappropriation of space. In doing so, the study demonstrates how digitally mediated discourse operates as a critical layer of contemporary urban semiotics in Central Asian contexts, thereby contributing to sociolinguistic scholarship and extending the theory of the virtual linguistic landscape (Ivkovic & Lotherington, 2009) to a previously underexplored regional setting. The findings also have practical implications for the development of more inclusive communication strategies by municipal authorities.

References:

Ivkovic, D., & Lotherington, H. (2009). Multilingualism in cyberspace: Conceptualising the virtual linguistic landscape. International Journal of Multilingualism, 6(1), 17–36.

Adapting Artificial Intelligence for Creating Spoken Language Corpora: The Case of Kazakh

Nikolay Mikhailov (Nazarbayev University)

Send message to Author

Abstract

The rapid development of artificial intelligence (AI) technologies often overlooks low-resource languages, potentially leading to "digital language death" and preventing speakers from accessing resources and knowledge. This project aims to counter this trend by developing resources based on natural spoken Kazakh using AI processing, ultimately establishing a replicable workflow that can be applied to other low-resource languages.

A significant hurdle in automated corpus creation is transcribing naturally occurring interactional speech events (NOISE). These speech events are typically messy due to high noise-to-signal ratios, deficient articulation, simultaneous speakers, and code-switching. Most existing speech-to-text (STT) models, which are often trained on read-aloud written prompts rather than conversational data, perform poorly on NOISE. To address this, the first step in our workflow is to fine-tune the Whisper STT model specifically for conversational Kazakh. This data is sourced from the Multimedia Corpus of Spoken Kazakh Language, which contains roughly 70 hours of annotated data. To better reflect authentic speech, the source data is intentionally left noisy with minimal cleanup. Issues such as varying audio lengths and multi-speaker overlap are managed through data padding, sequential file combination, and neural network-based speaker separation.

The second phase of the workflow transitions from standard text transcription to segmentation into Intonation Units (IUs). IUs represent speech more naturally and align better with human cognitive processes than forcing speech into strict written norms. To achieve this conversion, we are exploring audio processing using a separate model, as well as an alternative method based on regression analysis. This alternative approach hypothesizes that the deltas of specific speech features at intonation boundaries can serve as accurate predictors for IU segmentation.

In the final step, the processed data is integrated into ELAN, converting the output into search system-indexable XML files with hierarchical tier structures. Because IUs present challenges for standard search architectures, a flexible Solr-based corpus search system is currently in development to accommodate them. By leveraging available conversational data to generate more annotated data efficiently, this project not only elevates the status of Kazakh but also provides a vital methodological framework for broader linguistic research.

This paper is intended to be presented at panel: «Voices of Steppe and Taiga - Bridging the Digital Divide: Language Documentation and Resource Development for the Languages of Central and Northern Asia.»