Accepted Paper
Paper short abstract
We propose an RAG-based interactive search system that summarizes Japanese classical books and visualizes differences across editions to support secondary-level learning.
Paper long abstract
Japanese classical texts are taught as part of the Japanese language curriculum at the secondary level. In classes, a fixed version of the text is typically adopted through authorized textbooks. However, Japanese classical texts often exhibit textual variation across different editions, including revisions, deletions, and additions, as these works were historically copied by hand. We believe that this characteristic of classical texts is both distinctive and of scholarly interest, and investigating these variations by secondary-level students themselves in Japan helps better understand the narratives and their historical contexts, thereby fostering greater interest in the stories.
To this end, we develop a learning-oriented search system that supports the exploration of classical texts through guided, interactive retrieval in this study. The search system employs Retrieval-Augmented Generation (RAG) with large language models (LLM) to identify and present differences between multiple editions of the same text. As a retrieval framework, the system searches for semantically similar passages across editions and inputs them into an LLM, which summarizes information about textual changes, such as additions, deletions, and revisions, based on both the passage content and associated metadata, including dates of composition. Then, the system highlights the modified parts, and presents them to the searcher.
The above system's function enables edition-aware difference retrieval, which allows users to search for and interpretable how specific passages vary across different editions. This search helps learners grasp where and how texts have changed, supporting deeper comprehension of textual transmission. This study aims to propose a search and learning support framework tailored to the Japanese classical texts, contributing to more engaging classical literature education in secondary school settings.
Data-Driven Humanities in Japan: From AI Infrastructure to Computational Literary Analysis