Click the star to add/remove an item to/from your individual schedule.
You need to be logged in to avail of this functionality, and to see the links to virtual rooms.

LANG01


Theoretical and methodological issues of language data representation in Central Asia 
Convenor:
Nikolay Mikhailov (Nazarbayev University)
Send message to Convenor
Chair:
Sami Honkasalo (University of Helsinki)
Discussant:
Timofey Arkhangelskiy (Universität Hamburg)
Format:
Panel
Theme:
Language & Linguistics
Location:
William Pitt Union (WPU): room 540
Sessions:
Friday 20 October, -
Time zone: America/New_York

Abstract:

Spoken language corpora have become an essential resource for linguistic research and language technology development, with a strong potential for use in other disciplines, such as anthropology, sociology, history among others. In a way, documenting language is documenting life - we express our knowledge and information about life through the language, and the many facets of the language make it a rich documentation material. Creating a spoken language corpus involves collecting, transcribing, and annotating large amounts of spoken language data, which presents several methodological, technical, ethical, and linguistic challenges. While some of the challenges become resolved with the development of technology, others arise associated with data quality and quantity, ease of automated processing, representativeness and information accessibility.

This panel aims to bring together researchers and practitioners experienced in designing, building, managing, and utilizing spoken corpora of Central Eurasian languages in a variety of projects. The panelists will discuss the state-of-the-art in theory and methodologies for spoken language corpora design, challenges and the solutions in their implementation.

The topics that will be covered in this panel include, but are not limited to:

- Methods for collecting spoken language data, data types, media, equipment, workflows, data sampling methods and storage.

- Transcription and annotation of spoken language data, manual vs. automated, segmentation, orthographic vs. phonetic transcription, annotation schemes and conventions.

- Challenges in creating spoken language corpora, such as data types selection, speaker and genre diversity, regional and social variation, transcription and annotation errors, and ethical considerations.

- Applications of spoken Central Eurasian language corpora in research, language technology development, education, anthropology, psychology, sociology, history.

- Interdisciplinary potentials of spoken language corpora.

The panelists will share experiences, insights, and recommendations based on recent and ongoing projects on spoken Central Eurasian language corpora, and engage in a discussion with the audience on the opportunities and challenges of the field. The panel will be of interest to researchers, practitioners, and students in linguistics, language technology, psychology, anthropology, sociology, education, history and other related fields.

Accepted papers:

Session 1 Friday 20 October, 2023, -