Click the star to add/remove an item to/from your individual schedule.
You need to be logged in to avail of this functionality.
Log in
- Convenor:
-
Toshinobu Ogiso
(National Institute for Japanese Language and Linguistics)
Send message to Convenor
- Discussant:
-
Bjarke Frellesvig
(University of Oxford)
- Stream:
- Language and Linguistics
- Location:
- Torre B, Piso 3, T16
- Start time:
- 31 August, 2017 at
Time zone: Europe/Lisbon
- Session slots:
- 1
Short Abstract:
At NINJAL, construction of the Corpus of Historical Japanese (CHJ) is proceeding. Most recently, we are developing two new sub-corpora of Man'yōshū and Christian Materials. This panel includes two presentations on these new sub-corpora and one research presentation utilizing the new CHJ.
Long Abstract:
At the National Institute for Japanese Language and Linguistics, construction of the Corpus of Historical Japanese (CHJ) is proceeding under the project "Construction of Diachronic Corpora and New Developments in Research on the History of Japanese". This corpus is planned to be a diachronic corpus that enables us to study the history of Japanese language in full. So far, we have created and published four sub-corpora: Heian period series (containing 16 works including the Tale of Genji), Kamakura period series I: Setsuwa and Zuihitsu (5 works including Konjaku Monogatarishū), Muromachi period series I: Kyōgen (Toraakira-bon Kyōgenshū) and Meiji-Taisho series I: Magazines (Meiroku Zasshi, Taiyō, etc.). Most recently, we are developing two new sub-corpora: Nara period series I: Man'yōshū and Muromachi period series II: Christian Materials.
This panel includes two presentations on these new sub-corpora and one research presentation utilizing the new CHJ.
In presentation 1, Atsuko Kawaguchi, Miwako Murayama, Yuki Watanabe will report on the construction of the corpus of "Christian materials". This sub-corpus consists of Esopo no Fabulas (Aesop's Fables) and Feiqe Monogatari (The Tale of the Heike) published by Portuguese missionaries in the late 16th century. As the original texts are written in the Latin alphabet, we created a Japanese character text as a basis for automated morphological analysis. On the other hand, we can study the pronunciation at the time based on the original text.
In presentation 2, Tomoaki Kōno will report on the construction of the corpus of Man'yōshū. As is widely known, Man'yōshū is an 8th century anthology of poetry and it is the most important material for studying the Japanese at the time.
In presentation 3, as a case study using the new CHJ, Takashi Nomura will present research on the usage of the perfective auxiliaries -tu and -nu in Nara and Heian period Japanese.
After that, we take questions and comments from the audience, and discuss methodology and possibilities of historical Japanese linguistic studies using corpora, etc. The discussant will be Bjarke Frellesvig.
Accepted papers:
Session 1Paper short abstract:
In this presentation, we report on the construction and the utilisation of the corpus of Christian materials (Kirishitan shiryō). These are crucial materials reflecting colloquial late Middle Japanese, and this corpus will be released as part of the Corpus of Historical Japanese.
Paper long abstract:
As part of the Corpus of Historical Japanese, we are constructing a corpus of Christian materials (Kirishitan shiryō) using Feiqe Monogatari (1592) and Esopo no Fabulas (1593). In this presentation, we discuss characteristics of the Christian materials, and comment on the construction of this Christian materials corpus. Further, we introduce a study example of late Middle Japanese by using this corpus. Christian materials are the documents written by Catholic missionaries, mainly the Jesuits, from the 16th to 17th centuries AD. These are important materials for the study of late Middle Japanese. Feiqe is the digest text of Heike-monogatari. Esopo is the Japanese translation of Aesop's Fables. Both texts are written in the spoken language using the Roman alphabet in the Portuguese style; they reveal much information about the Japanese language, which we cannot know only by studying Japanese characters. In the construction of this corpus, for analysing the texts using UniDic (the dictionary for morphological analysis), we transliterate them to Japanese characters. In order to do this, we use previous research and reprints of the original texts as references. For converting the texts to Kanji, we use information from the representative orthography of the form of UniDic. In addition, we render the texts in phonetic spelling using information from the Roman alphabet in the originals. This gives us evidence about the pronunciation at that time, for example, the euphonic change, the voiceless or voiced consonant, and so on.
The corpus not only can function as an index, but also enables more advanced researches and statistical analysis. In the corpus of Christian materials written in the Roman alphabet, there is scope for quantitatively investigating the tendency of voiced consonants and euphonic changes, for example. There is also scope for studying politeness by combining morphology information and information about the speakers. We have already released the corpus of Kyōgen (the traditional Japanese theatrical works), and we plan to construct a corpus of Heike-monogatari. Therefore, there is scope for comparing and analysing multiple works reflecting colloquial late Middle Japanese.
Paper short abstract:
Man'yoshu is a collection of Japanese poetry, compiled in the eight century. This anthology includes contemporary dialects and the original texts are written in kanji characters. We designed and are constructing the corpus of Man'yoshu, which enables researchers to study these features.
Paper long abstract:
The National Institute for Japanese Language and Linguistics is constructing an annotated diachronic corpus of the Japanese language. As part of this work, we designed and are constructing the corpus of Man'yoshu.
Man'yoshu is a collection of poems, compiled in the late eighth century. The composers consist of various kinds of people: from emperors down to peasants and soldiers. This anthology contains 4,500 poems in 20 volumes and the total number of the words is about 100,000. Some volumes represent contemporary dialects, and the regions from which the authors originated were noted, which enhances the value of Man'yoshu as a linguistic resource. Japanese borrowed kanji to represent their language since there were no Japan-original characters yet.
Our Man'yoshu corpus features the four characteristics below. First, the information on composers and volumes are attached to each poem, which is useful for the study of expressions peculiar to a composer or a volume. Second, the original (kanji) characters are aligned with the transcribed words. This enables researchers to study the individual style of using kanji peculiar to each composer. Third, the text of our corpus is divided into two types of lexical items: short unit words and long unit words. Short unit words are determined by the combinatory patterns of morphemes, with the objective of searching example data. Long unit words are based on phrases, with the objective of examining linguistic properties. These different items are provided so that researchers can use them depending on their purposes. Fourth, morphological information are provided for all of the texts: headwords, part-of-speech classifications, conjugation types, etc. Each lexical item in the corpus is linked to a corresponding entry in an electronic dictionary called UniDic. The entries of UniDic have hierarchical structures consisting of three levels: the lemma, form and orthographic levels. The lemma is like the headword of a general dictionary and is the highest level of the hierarchy. The form level distinguishes different forms and conjugation types while the orthographic level distinguishes variant spellings. Thus the corpus allows researchers to study the variation of dialectal forms or phonologically changed forms in Man'yoshu.
Paper short abstract:
I discuss the historical changes in the distribution of -tu and -nu, using the Corpus of Historical Japanese. In the Nara Period, -tu followed volitional verbs while -nu followed non-volitional verbs, and -tu rarely co-occurred with -keri. In the Heian Period, there existed unpredicted patterns.
Paper long abstract:
From the 7th century to now, Japanese people have written various materials, some of which reflected the contemporary spoken language. In pre-modern times, the most part of Japanese people were monolingual, and the Japanese language changed chiefly by internal factors. In order to examine the historical changes over such a long span, it is effective to utilize the Corpus of Historical Japanese (CHJ). I discuss the historical changes in distribution patterns of the perfective markers -tu and -nu from the Nara to Heian period, using the CHJ.
In the 8th century Japanese poetry collection Man'yoshu, -tu and -nu followed verbs in strictly regular ways: -tu followed volitional verbs while -nu followed verbs concerned with non-volitional or natural change. In addition, -nu often co-occurred with the past tense marker -keri and formed -nikeri while -tu rarely co-occurred with -keri. The speakers in the Nara period would intuitively grasp these differences of -tu and -nu; therefore exceptions hardly occurred.
In the Genji Monogatari of the early 11th century, the usage of -tu and -nu began to be confused. We can find the unpredicted patterns of -nu with volitional verbs and -tu with non-volitional verbs, and there existed not a few examples of -tu with -keri, which formed -tekeri.
This confusion further progressed in the Konjaku Monogatari-shu of the mid-12th century. Moreover, the previously stative marker -tari was reanalyzed as a perfective marker and took the place of both -tu and -nu. The form of -tari eventually changed to -ta, which is still used now. Concatenations such as -teiru and -teari emerged as complex aspectual elements at this age. I examined these phenomena quantitatively and qualitatively with the CHJ.