- Convenors:
-
James Morris
(National Museum of Japanese HistoryNational Institutes for the Humanities)
Paweł Dybała (Laboratorium Językowe KOTOKEN)
Send message to Convenors
- Format:
- Panel
- Section:
- Interdisciplinary Section: Digital Humanities
| Abstract in Japanese (if needed) |
Accepted papers
Session 1Paper short abstract
OdGenji is the first interoperable LOD knowledge graph of olfactory evidence in Japanese classical literature, extracted from The Tale of Genji: AI disambiguates polysemous nioi in TEI-XML, annotates 140 passages, and publishes CIDOC-CRM/Odeuropa-aligned RDF for reproducible cross-lingual research.
Paper long abstract
OdGenji <https://odgenji.vercel.app/ja/smells/> is a scholarly Linked Open Data (LOD) database that curates strictly olfactory (smell-as-sensed) expressions in The Tale of Genji as searchable and reusable research data. Although cultural-heritage digitization has progressed, olfactory information typically remains embedded in narrative text, hindering cross-text comparison, quantitative analysis, and reuse in education or exhibitions. In Genji, this problem is compounded by the polysemy of nioi (にほひ), which frequently denotes visual radiance or metaphorical aura in addition to literal fragrance; robust extraction therefore requires explicit selection criteria and machine-readable structuring. Responding to current expectations for open, FAIR-aligned data—especially when generative AI is used—OdGenji emphasizes transparency in both methods and data modeling.
Using the Kōi Genji Monogatari TEI-XML corpus (a variant-collated TEI edition) as input, we implement an end-to-end pipeline that (1) extracts candidate scent passages from TEI-XML, (2) applies a generative AI model (Google Gemini 2.5 Pro) to retain only genuinely olfactory descriptions while excluding visual/metaphorical uses, and (3) enriches each selected instance with aligned textual layers—original text, a modern Japanese translation, and an English translation—together with structured metadata (scent lexemes and properties, odor sources, locations/spaces, perceivers, time, situational context, and effects). Records are published in RDF/XML under a data model aligned with CIDOC-CRM and the Odeuropa ontologies, supporting interoperability and cross-project querying; the system can also query Odeuropa to surface related information. We release technical documentation of the extraction procedures and modeling choices to enable third-party verification and reuse.
Applied to all 54 chapters, OdGenji identifies 140 olfactory descriptions and publishes integrated RDF for the 33 chapters in which olfactory evidence is confirmed. A web interface provides faceted search and cross-cutting views such as “smell sources” and “fragrant spaces,” reframing narrative descriptions as analyzable relations among persons, materials, spaces, and situations. Developed within a joint project with the National Institute of Japanese Literature (FY2025–FY2026), OdGenji provides (to our knowledge) the first interoperable LOD dataset of olfactory information in Japanese classical literature and a generalizable TEI-XML × generative AI × RDF workflow for multisensory textual scholarship.
Paper short abstract
This paper presents an automatic interlinear gloss generation framework using existing Japanese NLP tools, developed as a digital infrastructure for endangered language documentation within the Japonica language family.
Paper long abstract
This paper proposes an automatic interlinear gloss generation framework designed as a digital humanities infrastructure for endangered language documentation. The long-term goal of this project is to support community-centered documentation of endangered Japonic languages, particularly Ryukyuan varieties, by reducing the technical barriers associated with manual glossing.
As a foundational step, we implement and evaluate automatic gloss generation for Standard Japanese using existing Japanese natural language processing tools. Rather than developing a new morphological analyzer, the proposed system repurposes established NLP outputs—part-of-speech tags, dependency relations, and morphological features—and converts them into linguistically interpretable interlinear glosses following the Leipzig Glossing Rules. The framework distinguishes roots, clitics, and affixes, and generates gloss labels such as =TOP, =POL.NPST, and –PST without relying on predefined dictionaries.
This approach positions Japanese not as the primary object of study but as a high-resource testbed for constructing reusable documentation infrastructure. By evaluating the system against manually annotated data, we identify both the potential and the limitations of existing NLP tools when applied to humanistic annotation tasks. The results demonstrate that a substantial portion of gloss-level information can be derived deterministically from syntactic and morphological cues, while highlighting challenges in auxiliary sequencing and particle disambiguation.
From a digital humanities perspective, this study reframes interlinear glossing as a form of structured cultural knowledge representation rather than a purely linguistic task. The proposed framework emphasizes transparency, reproducibility, and extensibility, making it suitable for adaptation to endangered language contexts where linguistic expertise and resources are limited. Ultimately, this work contributes a methodological bridge between computational linguistics and documentary practice, advancing digital infrastructures that enable broader participation in the preservation of linguistic and cultural heritage.
Paper short abstract
This paper examines AI and IT content in European Japanese Studies curricula and analyzes attitudes toward AI, MT, and CAI among Polish students and alumni. Focusing on Japanese translation and interpreting, as affected industries, it highlights concerns over AI’s role in academic curricula.
Paper long abstract
The growing prominence of information technologies (IT), accelerated by the rapid development of generative artificial intelligence (AI) based on large language models (LLMs), is reshaping higher education worldwide. Universities are increasingly challenged to navigate this shifting landscape and to adapt curricula in response to technological change and its potential impact on graduate employability. This paper examines the presence of IT- and AI-related content in Japanese Studies university curricula in Europe and explores attitudes toward AI and the use of digital tools among students and alumni of Japanese Studies programs in Poland, with particular attention to translation and interpreting (TI).
The TI industry has been highly affected by the development of generative AI. Tools based on AI are increasingly used both as support for translators and interpreters—such as computer-assisted interpreting (CAI) and machine translation (MT)—and, in some contexts, as substitutes for human professionals. Against this background, the study employs a mixed-methods approach. First, selected curricula of Japanese Studies programs in Europe are analyzed to identify the scope and positioning of IT-related and digital-competence components. Second, empirical data were collected through an online survey of 150 students and alumni of Japanese Studies programs in Poland, complemented by 15 in-depth semi-structured interviews with participants form the same population.
The research investigates respondents’ experiences with digital tools in their studies, their perceptions of the inclusion of such tools in Japanese Studies programs, and their views on the future role of technology in Japanese-language TI. Findings indicate a limited and uneven representation of IT- and AI-related content in formal academic curricula. While survey participants expressed some support for the inclusion of such elements, interviewees frequently articulated concerns about AI’s impact on the Japanese translation industry, alongside a prevailing belief that human translators cannot be fully replaced.
The study contributes to broader discussions on curriculum development, digital literacy, and the future of Japanese Studies within an increasingly technology-driven academic landscape.
Paper short abstract
This presentation introduces a specialized Japanese corpus for Croatian tourism, based on authentic recordings of Japanese-speaking guides and translators across Croatia. It outlines corpus design, transcription, keyword extraction, and planned use in an AI-augmented VIRAI educational application.
Paper long abstract
This presentation introduces the development of JaTGuideCro-ja, a specialized Japanese language corpus focused on tourism in Croatia, and its application in terminology extraction, lexical analysis, and AI-augmented corpus linguistics. The corpus is based on authentic audio and video recordings of guided tours conducted in Japanese at multiple Croatian destinations.
The data were collected between 2019 and 2023 during on-site simulated and virtual tourist tours organized as practical training of Japanese language students for future professions. Licensed tourist guides and translators conducted tours in Japanese at key cultural and historical locations across Croatia, including Istria, Dalmatia, Zagreb, and surrounding regions. In total, approximately 30 hours of material were recorded from six on-site and six virtual tours, covering both tangible and intangible cultural heritage.
The methodology involved several transcription and analysis steps. Speaker diarization was first performed using the open-source toolkit pyannote.audio (Bredin et al., 2020), as the recordings contained speech in multiple languages, followed by automatic speech recognition (ASR) using Whisper by OpenAI (Radford et al., 2023). Selected recordings from Istrian locations (Pula, Opatija, Barban, and surrounding towns) were then morphologically analyzed using Japanese language tools within Sketch Engine (Kilgarriff et al., 2004; Srdanović et al., 2008), resulting in a Japanese language corpus that enabled corpus construction, comparison, and keyword extraction using frequency-based and statistical analyses.
A comparative analysis with the large-scale Japanese web corpus JaTenTen11 identified three vocabulary categories: specialized terms absent from general corpora but essential in the Croatian tourism context; terms shared by both corpora but used with domain-specific meanings; and general-purpose vocabulary. Lemma extraction also revealed limitations in existing Japanese morphological analyzers, particularly regarding place names, culture-specific terminology, and non-Japan-centered lexicon.
The corpus forms the basis for an AI-augmented Japanese language learning platform under development within the VIRAI project. A prototype language tutor and cultural tour application, tested using content related to the city of Pula and its attraction, the Arena, a Roman-period amphitheater, received positive feedback for usability and educational potential (Srdanović et al., 2025). Overall, the research demonstrates the value of specialized corpora for language learning, professional training, and applied linguistic research through AI-enhanced tools.
Paper short abstract
This paper offers an account of the challenges, possibilites, and limitations of drawing connections between Japanese dystopian fiction and dystopian fiction in the context of developing a system of structured annotation for a dystopian fiction database.
Paper long abstract
Keywords: Dystopian Fiction; Critical Dystopia; Mesotext; Japanese dystopia
This paper offers an account of the process of building a mesotext for documenting and connecting works of dystopian fiction across media, with a particular focus on Japanese dystopian fiction. A mesotext is system of s a system of structured annotations to draw connections across primary sources and research outputs (Boot 2025 [2009]), developed as a digital approach to the study of emblems. These typed annotations can "facilitate entry into and exploration of primary texts, and can provide the supporting arguments for the articles and studies that scholars write about primary texts" (ibid.). It ultimately reflects a model of the area that is being put under study. Beyond emblem studies, the concepts underpinning the idea of mesotexts have been applied in the developed of ENDLIT, a multi-area database for works of dystopian and post-apocalyptic fiction. Building upon dystopian fiction and postapocalyptic fiction as 'fiction that warns' (Cavalcanti 2022; Moylan 2018) on the present - especially as critical dystopian fiction - ENDLIT has operationalized a model for documenting dystopian fiction across areas that has forced itself to content with the peculiarities of dystopian fiction in local contexts.
In the Japanese case, this has resulted in the highlighting of specific emphases and warnings that run in contrast with the underlying sensibilities of hegemonic critical dystopian fiction, especially when the future of Japan is discussed through a lens that is nevertheless dystopic, and critical. One aspects that is most evident is the focus that Japanese dystopian fiction places on the Japanese polity as a set of persons, worldviews, social roles and way (cf. Tanaka 2014; Masataka 2011; Oguma 1995). Within this framework, ruin is not universalized or abstracted to a global scale, but rendered as local and situated, unfolding within a socially and culturally specific horizon of meaning. By outlining such contrasts throught the development of the ENDLIT database, this paper highlights the challenges and the possibility of looking at dystopia from Japanese perspectives, and how it might be, or might not be, connected to other geo-socio-cultural contexts.