Ling05: Individual papers in Language and Linguistics I

Ling05

Individual papers in Language and Linguistics I

Convenors:: Yoshiyuki Asahi (National Institute for Japanese Language and Linguistics)
Romuald Huszcza (Jagiellonian University)
Send message to Convenors

Section:: Language and Linguistics

Sessions:: Thursday 26 August, 10:15-11:45
Time zone: Europe/Brussels

Accepted papers

Session 1 Thursday 26 August, 2021, 10:15-11:45

Showa Speech Corpus: Design, Compilation and Analysis

Takehiko Maruyama (Senshu University)

Send message to Author

Paper short abstract

Showa Speech Corpus (SSC) is a speech corpus which consists of monologue and dialogue recorded from 1952 to 1974 in NINJAL, a total of 50 hours. I will introduce the corpus design of SSC, and the procedure of compilation, how the sound material was gathered, transcribed, and annotated.

Paper long abstract

Since 2016 the National Institute for Japanese Language and Linguistics (NINJAL) has worked on compiling a new corpus of spoken Japanese recorded almost 70 years ago, which is called the Showa Speech Corpus (SSC). SSC is a spontaneous speech corpus which consists of a collection of monologue and dialogue recorded from the early 1950s to the 1970s in NINJAL, a total of 50 hours, and will be freely-accessible on the internet in 2020.

In this presentation I will introduce the conception and the corpus design of SSC, and the procedure of compilation, how the sound material was gathered, transcribed, and annotated. In the process of compilation, we have tackled some difficulties peculiar to the old recordings; sometimes it was really hard to transcribe the sound because of deteriorated and unclear sound and serious overlaps, and sometimes we were in trouble to identify unfamiliar words and grammatical phrases.

Also some phonological and grammatical analyses of SSC will be shown, as compared to contemporary corpora of spoken Japanese, including the Corpus of Spontaneous Japanese (CSJ) and the Corpus of Everyday Japanese Conversation (CEJC). In the SSC we have found "strange" phonological patterns and grammatical variants, which can not be observed in the CSJ nor CEJC. For example, an intonation pattern of rapid raising at the end of a phrase or a sentence frequently occur in the SSC, especially in female conversation. An auxiliary verb 'masu' and its older form 'masuru' can be seen simultaneously in a speech by the same speaker, which means 'masu' and 'masuru' used to be morphological variants in the 1950s.

Both of the rapid raising intonation and the morphological variants can not be observed in the CSJ nor the CEJC, which indicates that phonological and grammatical changes have occurred to spoken Japanese during the 70 years. Such analyses serve a new viewpoint of what can be called "a diachronic change of spoken Japanese".

A corpus-based approach to personal deixis in Japanese benefactives

Natalia Solomkina (Russian State University for the Humanities, Moscow City University)

Send message to Author

Paper short abstract

In this paper we observe the criteria that different researches use to explain the choice of an auxiliary verb in Japanese benefactive constructions focusing on direct/inverse alignment. We verify this approach using data from the BCCWJ and the NPCMJ corpora.

Paper long abstract

The choice of an auxiliary verb in Japanese benefactive constructions obviously has to do with some deictic categories that do not translate easily into the grammatical person system of Standard Average European languages. One of these categories is social deixis having to do with a vertical hierarchy. And another dimension is personal deixis, sometimes described using such terms as direct-inverse alignment or empathy.

In this paper we are verifying the applicability of these terms using corpus data from the Balanced Corpus of Contemporary Written Japanese and the NINJAL Parsed Corpus of Modern Japanese.

We compare Japanese benefactives with canonical direct/inverse systems as described by Jacques and Antonov (2014) to demonstrate their highly non-canonical status. And although Japanese benefactives demonstrate a certain hierarchical alignment, it cannot be described in the terms of grammatical person only.

The direct-inverse alignment involves relative positions of the predication subject and object on a person-animacy hierarchy. Here is a version of the hierarchy we used, slightly modified for the purpose of our research: 1 > 2 > 3-animate > 3-inanimate. The direct construction (presumably the one with yaru, ageru and sashiageru) is used when the subject of the transitive clause outranks the object in the person hierarchy, and the inverse (presumably with kureru and kudasaru) is used when the object outranks the subject. According to our data and other examples both so-called direct (yaru, ageru, sashiageru) and inverse (kureru, kudasaru) verbs repeatedly violate the direct and inverse alignment respectively. Shigeko Nariyama in Ellipsis and Reference Tracking in Japanese (2003) explains it by the fact that the direct-inverse alignment gets overridden by empathy. We argue that all the cases where the direct-inverse alignment is not violated can also be described using the concept of empathy (people just tend to empathize more with themselves rather than with a third party, especially an inanimate one). Therefore the briefest linguistic description of deictic elements in Japanese benefactives will include only social deixis and empathy as understood by Kuno and Kaburaki (1977).

*supported by RSF (Russian Science Foundation, grant #17-18-01184)

JFLCorp (Japanese as a Foreign Language Corpus): Building a new L1 Spanish - L2 Japanese digital learner corpus following CEDEL2 standards

Nobuo Ignacio López-Sako (University of Granada) Cristóbal Lozano (Universidad de Granada)

Send message to Authors

Paper short abstract

The Japanese as a Foreign Language Corpus (JFLCorp), featuring L1 Spanish, is presented. The design criteria will be introduced, as well as the data collection method and data management. Preliminary data will be reported and a demonstration will be done of the open-access search engine to be used.

Paper long abstract

The development of second/foreign language (L2/FL) learner corpora has gained momentum in the last two decades. These corpora are nowadays essential in second language acquisition research (Granger et al., 2015; Tracy-Ventura & Paquot, 2020) and the development of teaching/learning resources (Granger, 2017; Hawkins & Filipovic, 2012). However, most learner corpora have focused on L2 English, and East-Asian languages as L2 are still under-represented. Some important exceptions are two subcorpora of L2 Japanese (C-JAS and I-JAS) included in the NINJAL platform (https://www.ninjal.ac.jp/english/database/subject/jsl/), the Jinan Chinese Learner Corpus or the Korean Learner Corpus, but they mainly focus on L1 English, the L1 Spanish - L2 Japanese combination being almost inexistent.

Aiming at filling the above gap, a new project has been launched to build an L1 Spanish - L2 Japanese learner corpus called Japanese as a Foreign Language Corpus (JFLCorp) as part of an on-going project featuring a wide variety of L1s (English, Spanish, Japanese, Arabic, Chinese, among others) and L2s (English, Spanish, Japanese). The L2 Spanish corpus (Corpus Escrito del Español L2, CEDEL2) (Lozano, 2009; Lozano & Mendikoetxea, 2013) is already available at http://cedel2.learnercorpora.com/, and the L2 English version (Corpus of English as a Foreign Language, COREFL) (Lozano, Díaz-Negrillo & Callies, 2020) is about to be launched (http://corefl.loearnercorpora.com).

In our presentation, we will introduce the criteria that have been strictly applied for JFLCorp following the 10 corpus-design principles stated by Sinclair (2005) and adapted to L2 corpora (Lozano & Medikoetxea, 2013). JFLCorp follows the same design principles as the other corpora within the Project (CEDEL2 and COREFL), which allows for multi-layered inter- and cross-linguistic comparisons amongst (sub)corpora. We will illustrate the online data collection method and present the different sections of the digital data-gathering tool, including a test of Japanese grammar to establish the level of competence of the participants. The transliteration criteria (Minami, 1998a, 1998b) will also be discussed. Finally, some preliminary corpus data will be reported and a demonstration will be done of the CEDEL2 search engine to illustrate how the future web-based JFLCorp interface will operate.

Paper Download (0 Bytes)