Dh02: Data-Driven Humanities in Japan: From AI Infrastructure to Computational Literary Analysis

Dh02

Data-Driven Humanities in Japan: From AI Infrastructure to Computational Literary Analysis

Convenor:: Nobuhiko Kikuchi (National Institute of Japanese Literature)
Send message to Convenor

Chair:: Nobuhiko Kikuchi (National Institute of Japanese Literature)

Discussant:: Nobuhiko Kikuchi (National Institute of Japanese Literature)

Format:: Panel

Section:: Interdisciplinary Section: Digital Humanities

Location:: 4.9

Sessions:: Saturday 29 August, 11:00-12:30
Time zone: Europe/Warsaw

Add to Calendar:

Short Abstract

Showcasing "The Model Building in Humanities through Data-Driven Problem Solving," this panel progresses from AI infrastructure to literary analysis. Papers cover CLIP image retrieval, RAG-based learning tools, gender in Kokinshu, and modern novel evolution, offering new digital pathways.

Long Abstract

This panel presents the latest achievements of the National Institute of Japanese Literature’s large-scale project, "The Model Building in Humanities through Data-Driven Problem Solving." Structured to trace the progression from digital infrastructure to analytical application, the four presentations demonstrate how computational approaches are reshaping Japanese humanities research.

The first half addresses the construction of AI-driven infrastructure for accessibility and education. The first presentation introduces a deep learning method using CLIP to retrieve illustrations via natural language queries. By fine-tuning models to recognize historical items like kichou, the study enables the efficient retrieval of visual materials lacking metadata. This approach supports intuitive exploration of classical materials and contributes to advanced historical and humanities research. Bridging retrieval and learning, the second study proposes an interactive search system using Retrieval-Augmented Generation (RAG). By visualizing textual variations across editions, this system supports secondary-level learners in understanding the complexity of textual transmission in classical books.

Building on these technological foundations, the second half applies quantitative methods to derive new literary insights. The third paper conducts a collocation network analysis of The Kokin Wakashuu, focusing on the verb omou. This systematic investigation not only offers a new perspective on Heian gender ideology but also sheds light on how quantitative methods reveal patterns that would otherwise remain unidentified by previous scholarship. Finally, the fourth presentation expands the scope to a macro-analysis of one million newspaper novels (1875–2025). By evidencing a genre-wide evolution toward psychological interiority, the study illustrates how quantitative measures, when combined with close reading, can reposition forgotten works and open new comparative perspectives on media formats, genres, and narrative techniques.

As "Data-Driven Humanities" rapidly establishes itself as a key term in Japanese academia, this panel seeks to go beyond mere presentation. By sharing these methodologies—ranging from AI-based archival retrieval to large-scale literary history—we aim to initiate a collaborative dialogue with European scholars to co-create a new paradigm of humanities research that transcends geographical boundaries and traditional methodological confines.

Abstract in Japanese (if needed)

Accepted papers

Session 1 Saturday 29 August, 2026, 11:00-12:30

Tracing the Evolution of Modern Japanese Novel Titles: A Computational Literary Analysis of Newspaper Novels (1875–2025)

Yoshitaka Hibi (Nagoya University)

Send message to Author

Paper short abstract

This paper applies computational analysis to a digitized chronology of Japanese newspaper novels (1875–2025) to trace large-scale shifts in title vocabulary, authorship, and deixis, highlighting emotionalization, stabilization, and increasing internal focalization.

Paper long abstract

This paper explores what computational literary studies can contribute to the history of modern Japanese fiction by examining a large-scale dataset of newspaper novels. Drawing on Takeo Takagi’s Chronology of Newspaper Novels, I construct a digitized corpus of a million serial works by more than 2,500 authors, published between 1875 and 2025. Focusing on the titles of these novels, which succinctly signal themes and guide readers’ interpretations, I apply basic quantitative methods—morphological analysis, lexical matching, and diversity measures—to trace long-term changes in vocabulary and authorship.

The preliminary analysis highlights three notable tendencies that become especially pronounced from the 1930s onward. First, there is a marked rise in emotion- and psychology-related vocabulary in titles, alongside a decline in body-related terms. This contrast suggests a gradual shift from narratives centered on physicality toward fiction that foregrounds psychological interiority. Second, the field of newspaper novels becomes more stable and concentrated: a shrinking number of authors accounts for a growing share of titles, and the range of lexical choices in titles also narrows. At the same time that serialization lengths increase, the overall diversity of surface expressions declines, pointing to a phase of standardization within the genre.

Third, the frequency of deictic expressions such as “this,” “that,” “here,” and “now” rises significantly in titles from the 1930s. Because deixis invariably presupposes a speaking or perceiving subject, this tendency can be read as evidence of a progressive localization of narrative perspective, in which stories are increasingly presented through the viewpoint of a protagonist or narrator—what Gérard Genette terms internal focalization. Taken together, these findings indicate that newspaper novels participated in a broader transformation of modern Japanese fiction toward psychological emphasis and individualized perspective. More broadly, the study illustrates how even relatively simple quantitative measures, when combined with close reading, can reposition forgotten or minor works within literary history and open new comparative perspectives on media formats, genres, and narrative techniques in modern Japan.

Supporting Secondary-Level Learning of Japanese Classical Texts through Edition-Aware Difference Retrieval

Tokinori Suzuki (University of Tsukuba) Keizo Oyama (National Institute of Japanese Literature)

Send message to Authors

Paper short abstract

We propose an RAG-based interactive search system that summarizes Japanese classical books and visualizes differences across editions to support secondary-level learning.

Paper long abstract

Japanese classical texts are taught as part of the Japanese language curriculum at the secondary level. In classes, a fixed version of the text is typically adopted through authorized textbooks. However, Japanese classical texts often exhibit textual variation across different editions, including revisions, deletions, and additions, as these works were historically copied by hand. We believe that this characteristic of classical texts is both distinctive and of scholarly interest, and investigating these variations by secondary-level students themselves in Japan helps better understand the narratives and their historical contexts, thereby fostering greater interest in the stories.

To this end, we develop a learning-oriented search system that supports the exploration of classical texts through guided, interactive retrieval in this study. The search system employs Retrieval-Augmented Generation (RAG) with large language models (LLM) to identify and present differences between multiple editions of the same text. As a retrieval framework, the system searches for semantically similar passages across editions and inputs them into an LLM, which summarizes information about textual changes, such as additions, deletions, and revisions, based on both the passage content and associated metadata, including dates of composition. Then, the system highlights the modified parts, and presents them to the searcher.

The above system's function enables edition-aware difference retrieval, which allows users to search for and interpretable how specific passages vary across different editions. This search helps learners grasp where and how texts have changed, supporting deeper comprehension of textual transmission. This study aims to propose a search and learning support framework tailored to the Japanese classical texts, contributing to more engaging classical literature education in secondary school settings.

Image Retrieval in Japanese Classical Documents using Deep Learning Method

Satoru Fujita (Hosei University) Keizo Oyama (National Institute of Japanese Literature) Shin'ichi Satoh

Send message to Authors

Paper short abstract

Japanese classical documents often contain numerous illustrations embedded in the margins. This paper presents a deep learning–based method that enables efficient retrieval of such illustrations using natural language queries. The method supports advanced historical research in the humanities.

Paper long abstract

Japanese classical documents often contain numerous illustrations embedded in the margins or across entire pages. While a few of these illustrations are well known, the majority remain largely unexplored. This paper presents a deep learning–based method for efficiently retrieving such illustrations from large-scale digital libraries using natural language queries. Our method employs CLIP (Contrastive Language–Image Pre-training), which learns joint text–image feature representations and enables users to retrieve relevant images based on natural language descriptions.

Several contributors provide CLIP models trained on Japanese texts and images; however, applying these models to Japanese classical documents requires additional adaptation. First, we fine-tuned the model to better recognize classical Japanese terms, including “kichou,” a type of partitioning curtain, and "shitomi," a type of gate board, both of which frequently depicted in historical materials. Second, to address CLIP’s difficulty in detecting small objects within high-resolution page images, we implemented a preprocessing step that identifies small items, such as dishes or instruments, and registers them as individual sub-images. Because this process significantly increases the total number of images by multiplying the number of sub-images per original page, we further designed a fast similarity computation method to maintain interactive retrieval performance. We further introduce a method for refining retrieved images through additional conditional queries, including color-related constraints. In this case, each query is represented as a combination of a reference image and natural-language modifiers.

The proposed approach enables users to rapidly discover illustrations of various sizes, styles, and colors across extensive digital libraries. It supports intuitive exploration of classical materials and contributes to advanced historical and humanities research.

What Do They Omou in the Poems?—Gender Differences in the Diction of the Short Poems in The Kokin Wakashuu

Ayano Takeuchi (National Institute of Japanese Literature)

Send message to Author

Paper short abstract

The current study aims to demonstrate gender differences in the diction of short poems contained in The Kokin Wakashuu, the first imperial anthology of poems, compiled during the early Heian period (794-1192) by conducting a quantitative analysis known as collocation network analysis.

Paper long abstract

This study aims to demonstrate gender differences in the diction of the short poems included in The Kokin Wakashuu, the first imperial anthology of poems, which was published during the early Heian period (794-1192). This anthology had a significant impact on Japanese poetry, shaping its future development. Additionally, it embodied the gender ideology of this period (Kondo 2005). While previous research on The Kokin Wakshuu primarily employs qualitative analysis, Kondo (2005) conducts an n-gram analysis on the poems and identifies expressions that appear exclusively in the poems composed by men. She argues that these expressions represent the ideal man of the Heian period. However, her analysis focuses on expressions used only in the male poems, leaving unexamined other expressions that appear only in the female poems or in both male and female poems.

The current study focuses on the verb omou (to think/feel) that appears in the poems contained in The Kokin Wakashuu. Kondo (2005) notes expressions containing this verb, such as omou-hito (person/people), omou-kokoro (heart), and mono (thing)-o (particle)-omou, only appear in the male poems, and they depict the man being active, which is the ideal man of this period. However, the verb omou itself appears in both male and female poems. This study thus examines the patterns of its use by conducting a quantitative analysis known as collocation network analysis to identify gender distinctions in the use of the verb. Collocation network analysis is a data visualization technique, which is based on the idea that, as Firth (1957) summarized, "you shall know a word by the company it keeps." This analysis visualizes patterns of words that frequently co-occur, providing insights that may be difficult to obtain through close reading. By conducting a quantitative analysis, the current study not only provides a thorough and systematic investigation of gender differences in diction in the poems but also sheds light on how quantitative methods contribute to the study of literary texts and reveal patterns, which otherwise would remain undetected.