Towards a fine-grained classification of Ryukyuan

Accepted Paper

John Huisman (Uppsala University)

Paper short abstract

This study uses model-based Bayesian clustering on data for 200 concepts in 120 lects, to produce a fine-grained classification of the Ryukyuan languages. Both macro and micro evolution are investigated, by analysing various combinations of lexical data and regular sound correspondences.

Paper long abstract

The general subdivision of the Ryukyuan branch of Japonic is well-established, with a main split between Northern and Southern lects, which in turn are divided into Amami and Okinawa on the one hand, and Miyako, Yaeyama and Yonaguni on the other (see e.g. Pellard 2015; De Boer 2020). Yet, many open questions remain around the fine-grained classification (although see e.g. Lawrence 2000; 2006 on Yaeyama and Okinawa, respectively; and Pellard 2009 on Miyako). This study will contribute to our understanding of the classification of Ryukyuan by using computational methods on high-resolution data.

Computational historical linguistics has provided new insights into the structure, age, and spread of language families. However, advances made by these approaches are largely built on lexical data, and unraveling the relations between the closely related varieties continues to pose a challenge, as lexical differentiation can be limited in contexts of more recent diversification–as is the case for Ryukyuan. Recent work has explored various additional data types as a source of phylogenetic signal, including phoneme inventories (Dockum 2017), phonotactics (Macklin-Cordes et al. 2021; Huisman et al. 2025) and pitch accent (Takahashi et al. 2023). Even so, the integration of regular sound correspondences–which form the cornerstone of the Comparative Method in traditional historical linguistics–remains limited. The study introduces a new approach to computationally extract and evaluate sound correspondences, which is used on a new comparative linguistic database of the Ryukyuan language in which all entries are segmented, aligned, and coded for cognacy.

The data is analysed with model-based Bayesian clustering methods as used in population genetics, to infer the clusters that best describe the data. Crucially, individuals can be admixed from multiple populations, which can account for horizontal transfer through historical contact. To understand both macro and micro level patterns of divergence in Ryukyuan, separate analyses are conducted for: 1) basic vocabulary; 2) non-basic vocabulary; 3) the complete vocabulary data; 4) sound correspondences; 5) sound correspondences together with basic vocabulary; and 6) sound correspondences together with the complete vocabulary data. The results are compared against previously suggested subdivisions of each major Ryukyuan subgroup.

Panel INDLING001
Language and Linguistics individual proposals panel
Session 10