OdGenji: Developing a Scent LOD Database for The Tale of Genji <https://odgenji.vercel.app/ja/smells/>

Accepted Paper

masao oi (DOSHISHA UNIVERSITY)

Paper short abstract

OdGenji is the first interoperable LOD knowledge graph of olfactory evidence in Japanese classical literature, extracted from The Tale of Genji: AI disambiguates polysemous nioi in TEI-XML, annotates 140 passages, and publishes CIDOC-CRM/Odeuropa-aligned RDF for reproducible cross-lingual research.

Paper long abstract

OdGenji <https://odgenji.vercel.app/ja/smells/> is a scholarly Linked Open Data (LOD) database that curates strictly olfactory (smell-as-sensed) expressions in The Tale of Genji as searchable and reusable research data. Although cultural-heritage digitization has progressed, olfactory information typically remains embedded in narrative text, hindering cross-text comparison, quantitative analysis, and reuse in education or exhibitions. In Genji, this problem is compounded by the polysemy of nioi (にほひ), which frequently denotes visual radiance or metaphorical aura in addition to literal fragrance; robust extraction therefore requires explicit selection criteria and machine-readable structuring. Responding to current expectations for open, FAIR-aligned data—especially when generative AI is used—OdGenji emphasizes transparency in both methods and data modeling.

Using the Kōi Genji Monogatari TEI-XML corpus (a variant-collated TEI edition) as input, we implement an end-to-end pipeline that (1) extracts candidate scent passages from TEI-XML, (2) applies a generative AI model (Google Gemini 2.5 Pro) to retain only genuinely olfactory descriptions while excluding visual/metaphorical uses, and (3) enriches each selected instance with aligned textual layers—original text, a modern Japanese translation, and an English translation—together with structured metadata (scent lexemes and properties, odor sources, locations/spaces, perceivers, time, situational context, and effects). Records are published in RDF/XML under a data model aligned with CIDOC-CRM and the Odeuropa ontologies, supporting interoperability and cross-project querying; the system can also query Odeuropa to surface related information. We release technical documentation of the extraction procedures and modeling choices to enable third-party verification and reuse.

Applied to all 54 chapters, OdGenji identifies 140 olfactory descriptions and publishes integrated RDF for the 33 chapters in which olfactory evidence is confirmed. A web interface provides faceted search and cross-cutting views such as “smell sources” and “fragrant spaces,” reframing narrative descriptions as analyzable relations among persons, materials, spaces, and situations. Developed within a joint project with the National Institute of Japanese Literature (FY2025–FY2026), OdGenji provides (to our knowledge) the first interoperable LOD dataset of olfactory information in Japanese classical literature and a generalizable TEI-XML × generative AI × RDF workflow for multisensory textual scholarship.

Panel Dh01
Interdisciplinary Section: Digital Humanities individual proposals panel
Session 2 Saturday 29 August, 2026, 9:00-10:30