Log in to star items.
- Convenor:
-
Andrey Filchenko
(Nazarbayev University)
Send message to Convenor
- Format:
- Open panel
- Theme:
- Language & Linguistics
Abstract
The linguistic landscape of Central and Northern Asia, stretching from the Pontic steppes to Siberia and the Far East, represents one of the most diverse yet critically endangered regions on Earth. This panel examines the intersection of traditional documentary and corpus linguistics with contemporary computational strategies and pedagogical innovations for "low-resource" languages. While major regional languages like Uzbek or Kazakh are making strides in the digital domain, dozens of minority languages, including Turkic, Mongolic, Tungusic, Ugric, Samoyedic, and Paleoasiatic varieties, remain on the "digital periphery".
The panel explores the entire research cycle: from data collection to corpus design and its application in teaching and learning. Key areas of focus include:
- Collaborative Documentation and Pedagogy: Analyzing community-engaged models that prioritize "language-in-use" over static literary standards.
- Technological Adaptation: Evaluating the efficacy of transfer learning, cross-lingual embeddings, and OCR in creating corpora for languages with limited annotated digital data.
- Curriculum Integration: Developing frameworks to enhance language education by integrating corpus-based resources, such as spoken corpora, into curricula and teacher-training programs.
- Ethical Data Sovereignty: Addressing tensions between open-access archiving and the intellectual property rights of speaker communities.
By bringing together linguists, computational scientists, and educators, this panel highlights how digital humanities can modernize Central Asian languages. Ultimately, we argue that preservation depends on transitioning from "passive archiving" to "active resource development," ensuring these voices are functionally integrated into both the global digital ecosystem and the classroom.
Accepted papers
Abstract
THE LINGUISTIC LANDSCAPE OF THE UYGHUR PRESS ADVERTISING STORAGE IN KYRGYZSTAN
In the context of globalization, ethnic media act as sociolinguistic indicators. Ethnic journalism is closely linked to the sociocultural realities of ethnic groups. The newspaper Ittipak represents a unique model of a trilingual media space. The choice of graphics and language is determined not only by the target audience but also by the functional purpose of the text. I argue that the use of multiple languages contributes to the development of national culture, transcending the established boundaries of native language and culture.
One of the publication's key features is the coexistence of three graphic systems. Cyrillic is used, as is traditional in Kyrgyzstan and Kazakhstan, for the main body of the newspaper's text. Writing based on the Arabic script (ېزىق كونا) serves as a tool for preserving ethnocultural identity and connection to historical heritage. The Latin alphabet is perceived as a sign of modernization.
The functional division of languages in advertising discourse is of interest. A study using comparative methods and quantitative analysis reveals a clear dichotomy based on the pragmatics of the text. Russian dominates commercial advertising texts. This is due to its established status as the language of interethnic communication and business in Kyrgyzstan. Ethnocultural advertising in the context of social messages includes greetings, condolences, and public announcements. Such texts are published predominantly in the Uyghur language. Here, language plays a mediating role, strengthening intragroup unity and ethnic self-identification. The dynamics of the visual landscape reveal that the frequent use of the Latin alphabet in advertising texts testifies to the newspaper media landscape's aspirations to embrace the sociolinguistic trends of Central Eurasia. In this context, the Latin alphabet is perceived as the "language of technology and progress," while Cyrillic and Arabic remain the guardians of tradition.
Thus, the media landscape of the Ittipak newspaper reflects the complex process of adaptation of an ethnic minority to contemporary socioeconomic realities in Central Asia. I argue that the pronounced antithesis between commercial intentions and ethnocultural values, as well as the diversity of graphic styles, demonstrate the flexibility of the ethnic press. The ethnic press strives to maintain a balance between preserving traditions and the demands of a market economy. This article is based on materials from the Ittipak newspaper for 2025.
Abstract
There remain various linguistic phenomena that have not yet been fully described or discussed in grammars of languages spoken in Eurasia. As one such phenomenon, this paper examines a typologically unique type of direct object marking in transitive clauses in Tundra Yukaghir (Siberia, Yukaghir language family), namely transformative-essive objects. Based on a detailed analysis of corpus data, this paper demonstrates that (1) this non-canonical object marking is licensed when the direct object is a semantically “effected object” (Fillmore 1968) in a broad sense and when the object NP includes a beneficiary indicated either by a pronominal modifier or by a possessive suffix, and that (2) the use of the transformative-essive case for objects has plausibly been derived from its use in result roles (e.g., with the verb “make”) or as a secondary predicate. The analysis further shows that this construction is not marginal but is systematically attested in natural discourse, thereby constituting an integral part of the argument-marking system of the language. In addition, the paper argues that transformative-essive objects emerged under the influence of Ewen (Tungusic language family), one of the contact languages of Tundra Yukaghir, which uses the designative case for direct objects under similar structural conditions.
Abstract
The Kerek language, once spoken along the northwestern coast of the Bering Sea, ceased to be actively spoken around the turn of the twenty‑first century, when its last native speakers passed away. A member of the Chukotko‑Kamchatkan family, Kerek represents a missing link between Chukchi and Alutor, occupying an important position for understanding the internal relationships within this small but complex group. Despite this significance, Kerek has remained largely understudied because almost no primary data have been accessible. Until recently, published sources consisted only of a brief grammatical sketch by P. Skorik and a single publication by V. Leontev containing ethnographic descriptions and Russian translations of Kerek folklore texts.
Leontev’s publication, however, did not include the original Kerek versions of the texts. Archival research in recent years has revealed that the original Kerek materials and the corresponding sound recordings used by Leontev are preserved in Magadan. Additional field materials collected by E. Asinovsky and A. Volodin are held at the Archives of the Institute for Linguistic Studies in St. Petersburg, comprising several transcribed texts and recordings. These holdings had been known only to a few specialists, and the confirmation of their scope and condition represents an important development for Kerek studies.
Careful examination of these materials is expected to yield valuable information on the structure of Kerek, including aspects of phonology, morphology, and syntax that have remained poorly described. The combination of textual and audio data offers a rare opportunity to analyze natural speech and narrative style in a language that was no longer transmitted to new generations by the end of the last century. This presentation introduces the contents and current state of these archival collections, outlines the kinds of data they contain, and reflects on how their study may contribute to a more complete understanding of Kerek as a missing link within the Chukotko‑Kamchatkan family.
Abstract
Eastern Yugur is a Mongolic language which might either belong to the Central Mongolic or Southern Mongolic branch (cf. Nugteren 2011: 20, 35-57) and, being spoken in north-western Gansu, is also located geographically in-between the Central and Southern Mongolic area. With an estimated 1000 speakers (Janhunen 2020: 6), it is the Mongolic language with the second-fewest speakers. Along with the rest of Southern Mongolic, its morphology has been described fairly well (Tenishev & Todaeva 1966, Jaġunasutu 1981, Bulučilaġu & Jalsan 1990, Nugteren 2003, Altansubud 2017, Sečencoġtu 2024). At the same time, there is a lack of research papers in morphosyntax and semantics, showing that this language still remains relatively inaccessible to functional-typological linguistics.
In our paper, we will address the first steps in our effort to create a corpus of Eastern Yugur. This includes the digitalization of language materials from existing text collections (Bolučilaġu & Jalsan 1988, Arslan et al. 2013, Tеmür et al. n.d.) and grammars, the creation of a standardized transcription, probably morpheme analysis, the creation of English translations of the existing materials (at least initially translating via Chinese) and the creation of a unified lexicon from existing dictionaries and wordlists (using (Bolučilaġu & Jalsan 1984, Ān et al. 2017, Sīqīncháokètú 2024).
Abstract
Foreign language teaching is often described in terms of grammar, vocabulary, and methodology, but in practice, it is deeply influenced by who the learners are and where they come from. This study looks at the linguistic and sociocultural dimensions of foreign language learning by comparing Kazakh, Turkish, Russian, and Uzbek learners, intending to understand how their backgrounds shape the way they approach a new language. One of the key observations is that learners do not start from zero. They bring with them the structures of their first language, and these strongly affect how they understand and produce the target language. For example, students from Turkic language backgrounds—Kazakh, Turkish, and Uzbek—often benefit from structural similarities, which can make certain aspects of learning feel more intuitive. At the same time, these similarities can sometimes lead to overgeneralization. Russian-speaking learners, on the other hand, tend to face different challenges due to greater grammatical differences, particularly in sentence structure and verb usage. The study is informed by well-known theories in second language acquisition, including contrastive analysis, interlanguage, and sociocultural theory. However, rather than focusing only on theory, it connects these ideas to what actually happens in the classroom. Learners are seen as active participants who build their own “in-between” language system as they progress. This process is not simply about making mistakes, but about experimenting, adjusting, and gradually gaining control over the new language. Another important dimension is the role of educational culture. Students who come from more traditional, teacher-centered systems often show strong accuracy in controlled tasks but may hesitate when asked to speak freely. In contrast, those who are used to more interactive learning environments tend to communicate more confidently, even if their language is less precise at early stages. Motivation also plays a significant role. Learners who clearly see the relevance of the foreign language to their academic or professional goals are generally more persistent and engaged. Without this sense of purpose, progress can be slower, regardless of linguistic background. Overall, the study suggests that effective language teaching requires more than a one-size-fits-all approach. Teachers need to be aware of both linguistic differences and sociocultural expectations in order to create supportive and meaningful learning environments.
Abstract
Multimedia Corpus of Modern Spoken Kazakh Language (MCSKL) can offer a valuable resource for corpus-informed language teaching. Data-driven learning (DDL) approaches, such as the use of concordance lines, enable the development of pedagogical and teaching materials, including classroom activities and assessments, based on natural and spontaneous spoken language. By providing access to naturally occurring language, the corpus includes real-life expressions and regional variation. Exposure to such data supports learners in understanding how Kazakh is used in everyday communication. Furthermore, the MCSKL corpus allows educators to integrate authentic dialogues and conversational data into class materials. This enhances contextualized learning, where grammatical structures are acquired through meaningful usage rather than in isolation. Such an approach enhances learners’ ability to interpret and apply linguistic forms in real communicative settings. This paper demonstrates how data-driven methods can be effectively employed in teaching Kazakh grammar, arguing that authentic corpus examples significantly enhance the language learning experience.
The study is particularly relevant given the research gap in the area of corpus application in Kazakh language teaching. MCSKL helps address this gap by offering up-to-date, context and culture rich language materials. In contrast to traditional textbooks, which often rely on decontextualised examples, corpus-based materials demonstrate actual language use and are therefore more applicable to everyday communication. This authenticity not only improves linguistic competence but also increases learner engagement and motivation.
Abstract
Shughni, an Eastern Iranian language spoken in Afghanistan and Tajikistan, possesses a binary system of grammatical gender (feminine and masculine) and a predominantly semantic system of gender assignment. In Shughni, unlike in most familiar Indo-European languages with grammatical gender (e.g., French, Russian), a noun's semantics, rather than its phonological form, is the most reliable predictor of its gender. In this paper, I explore two semantic patterns proposed to underlie gender assignment in the language: (i) thematic categorization (e.g., liquids and technological tools are feminine; milk products and sicknesses are masculine); and (ii) the relation of meronymy, where conceptual wholes are said to be feminine while their parts are masculine.
After providing a short description of the Shughni gender system, I present experimental evidence aimed at determining whether conceptual meronymy is at the heart of a veritable gender assignment pattern in Shughni. The experiment asked native Shughni speakers to assign gender to technological nouns (e.g., carburetor) in two conditions—when presented as a conceptual "part" (e.g. part of a car) and as a "whole" (as a stand-alone item). Although we find a statistically significant, albeit weak, correlation between feminine gender and "whole" condition, we argue for an alternative interpretation of our results in which thematic categorization drives gender assignment, and the role of meronymy is simply an illusion created by the system.
This experimental study on Shughni gender assignment is the first of its kind, and our results are admittedly difficult to interpret in the absence of an established framework for studies of this kind. Therefore, an important contribution of this paper is an initial set of experimental results and discussion on a semantic gender-assignment rule; this will set the stage for interpreting future studies of this kind. Moreover, through our experimental setup—a drag-and-drop (Duolingo-style) language production task built with the javaScript library jsPsych—we contribute to the available infrastructure for conducting field-based experiments with under-described language, which are glaringly underrepresented in psycholinguistic research. With this in mind, we discuss several of the solutions we implemented to challenges often faced in conducting experiments with such languages, including the creation of stimuli, the choice of writing system, and the overall implementation of the experiment. It is our hope that other researchers conducting experiments with unwritten languages can take lessons from our design process and reflections.