to star items.

T0378


Digitizing the Chagatai Manuscript Tradition: A Case Study of Abulghazi Bahadur Khan’s “Genealogy of the Turks 
Author:
Almas Imangaliyev (Astana IT University)
Send message to Author
Format:
Individual paper
Theme:
History

Abstract

This article investigates the application of OCR technology for the automatic recognition of manuscripts, using the Chagatai-language manuscript Shezhire-i Turki (“Genealogy of the Turks”) by Abulghazi Bahadur Khan as the research object. The process of converting text from PDF images into Word documents was carried out by engineers from Astana IT University, based on the electronic version preserved in the “National Corpus of the Kazakh Language” at the Institute of Linguistics named after Akhmet Baitursynov. During the study, the OCR-generated text was manually checked against the original PDF, errors were corrected, and the accuracy of graphical features and diacritical marks was evaluated. Comparative-historical, descriptive, and textological methods were applied.

The study aims to assess the effectiveness of OCR technology for automatically recognizing Chagatai manuscripts, determine the extent to which the extracted Arabic-script text differs from the original, and explore its potential applications in the humanities. The Chagatai language, with its complex structure, poses challenges for text analysis, orthographic standardization, and understanding historical forms. In this context, artificial intelligence and modern technologies facilitate manuscript digitization, enable textual analysis, and support the creation of linguistic databases, providing researchers with effective tools for working with historical texts. The results demonstrate that while OCR cannot always achieve full accuracy, it significantly enhances the efficiency of digital processing and can be widely applied in organizing texts and conducting linguistic analyses in humanities research.