Click the star to add/remove an item to/from your individual schedule.
You need to be logged in to avail of this functionality.

Accepted Paper:

Showa Speech Corpus: Design, Compilation and Analysis  
Takehiko Maruyama (Senshu University)

Paper short abstract:

Showa Speech Corpus (SSC) is a speech corpus which consists of monologue and dialogue recorded from 1952 to 1974 in NINJAL, a total of 50 hours. I will introduce the corpus design of SSC, and the procedure of compilation, how the sound material was gathered, transcribed, and annotated.

Paper long abstract:

Since 2016 the National Institute for Japanese Language and Linguistics (NINJAL) has worked on compiling a new corpus of spoken Japanese recorded almost 70 years ago, which is called the Showa Speech Corpus (SSC). SSC is a spontaneous speech corpus which consists of a collection of monologue and dialogue recorded from the early 1950s to the 1970s in NINJAL, a total of 50 hours, and will be freely-accessible on the internet in 2020.

In this presentation I will introduce the conception and the corpus design of SSC, and the procedure of compilation, how the sound material was gathered, transcribed, and annotated. In the process of compilation, we have tackled some difficulties peculiar to the old recordings; sometimes it was really hard to transcribe the sound because of deteriorated and unclear sound and serious overlaps, and sometimes we were in trouble to identify unfamiliar words and grammatical phrases.

Also some phonological and grammatical analyses of SSC will be shown, as compared to contemporary corpora of spoken Japanese, including the Corpus of Spontaneous Japanese (CSJ) and the Corpus of Everyday Japanese Conversation (CEJC). In the SSC we have found "strange" phonological patterns and grammatical variants, which can not be observed in the CSJ nor CEJC. For example, an intonation pattern of rapid raising at the end of a phrase or a sentence frequently occur in the SSC, especially in female conversation. An auxiliary verb 'masu' and its older form 'masuru' can be seen simultaneously in a speech by the same speaker, which means 'masu' and 'masuru' used to be morphological variants in the 1950s.

Both of the rapid raising intonation and the morphological variants can not be observed in the CSJ nor the CEJC, which indicates that phonological and grammatical changes have occurred to spoken Japanese during the 70 years. Such analyses serve a new viewpoint of what can be called "a diachronic change of spoken Japanese".

Panel Ling05
Individual papers in Language and Linguistics I
  Session 1 Thursday 26 August, 2021, -