Reassessing Japanese laryngeal specifications: Coexistence of voicing and aspiration in eastern Japan

Accepted Paper

Kuniya Nasukawa (Tohoku Gakuin University) Sachiko Kiyama (Tohoku University)

Paper short abstract

VOT data from 103 young speakers across eastern Japan reveal three regional laryngeal patterns, including an aspiration-like system in Tohoku. These findings suggest that voicing- and aspiration-based contrasts can coexist and may be derived from a unified representation.

Paper long abstract

This study investigates the phonological representation of laryngeal source contrasts in Japanese and the regional variation found among younger speakers in eastern Japan. Cross-linguistically, word-initial obstruents are often used to diagnose laryngeal contrasts because this position is prosodically strong and exhibits relatively stable phonetic cues compared with weaker environments such as intervocalic or word-final positions. Previous typological research has examined these contrasts using a range of acoustic parameters, including voice onset time (VOT) (Lisker & Abramson 1964), low-frequency energy reduction during closure, and the presence or absence of F1 cutback.

Within Element Theory (Harris 1994; Backley 2011), stop contrasts are represented with combinations of |ʔ| (closure), |H| (frication/aspiration), and |L| (voicing). Voiceless unaspirated stops (0 VOT) correspond to |ʔH|, voiced stops (−VOT) to |ʔHL|, and voiceless aspirated stops (+VOT) to |ʔHH|. Two-way laryngeal systems are therefore classified either as voicing languages, contrasting |ʔH| and |ʔHL|, or aspiration languages, contrasting |ʔH| and |ʔHH|. Although Japanese has long been analysed as a voicing language (Shimizu 1996), the classification has not been systematically re-examined for younger speakers across different regions.

To address this gap, we measured word-initial VOT values for /b d g/ and /p t k/ produced by 103 native speakers (mean age 20.8 ± 2.4) from Tohoku, Kanto, Chubu, and neighbouring regions. A hierarchical cluster analysis based on mean VOT values revealed three major patterns: Cluster 1 (primarily Kanto–Chubu) showed −17.7 ms for voiced stops and 42.2 ms for voiceless stops; Cluster 2 (Kanto–Hokuriku–Tohoku) showed slightly positive VOT for voiced stops (8.9 ms) with slightly longer voiceless values (42.2 ms); and Cluster 3 (mainly Tohoku) showed consistently positive VOT for voiced stops (15.1 ms) and markedly longer VOT for voiceless stops (59.3 ms). The third pattern points toward an aspiration-type contrast, partly consistent with Takada (2011).

To capture this internally conditioned variation, we propose a unified underlying representation combining properties of |H| and |L|, with regional outcomes derived through selective suppression of one element. Suppressing |L| yields |ʔH| (/d/-like), while suppressing |H| yields |ʔH| (/t/-like). This model predicts that aspiration- and voicing-based systems may coexist within Japanese.

Panel Ling06
Experimental phonetics and phonology
Session 1 Sunday 30 August, 2026, 9:00-10:30