Operationalizing topic models for writing conceptual history: the evolution of "data" (dēta) as observed in speeches in the Japanese National Diet

Accepted Paper

Harald Kuemmerle (German Institute for Japanese Studies)

Send message to Author

Paper short abstract

For Japan, parliamentary records are an understudied source highly rich in context. Using topic models and grounded theory, analyses on how concepts evolve with time can be carried out. The approach is applied to the concept of data, giving insight into the discourse on the digital transformation.

Paper long abstract

This paper is an outcome of larger project which analyzes the discourse on the "digital transformation" in Japan. For investigating this discourse, it is highly insightful to track how the concept of data evolved. Following the approach of conceptual history (Begriffsgeschichte) in the footsteps of Reinhart Koselleck, the further procedure would be to investigate how the term dēta (the standard translation of "data") is used in representative Japanese texts and to systematically analyze these. Instead, a very large number of texts is collected and investigated using topic models.

A corpus where all documents are context-rich and inherently relevant for politics has been chosen: parliamentary records. While projects in the digital humanities like the ParlaCLARIN workshops explore and gather insight about these for a variety of countries, the Japanese National Diet has not been given much attention yet. This is surprising, as all sessions since 1947 are completely digitalized and accessible via a well-documented API. Following this thought, an adaption of an already existing software tool for browsing a Japanese-language text corpus guided by a topic model has been carried out. Given a keyword - in this case, dēta -, all relevant speeches by members of parliament in a given timespan are retrieved, and the resulting corpus is analyzed through topic modeling. By having a link in the metadata, each speech - often part of an exchange of with multiple speakers - can be seen in context at the official website of the Diet.

In this setting, deducing satisfying results requires a systematic combination of distant reading and close reading, often referred to as blended reading. But as it has been pointed out by David Mimno in 2017, this practice can be considered equivalent to engaging in grounded theory. How the categories are created, and what they mean for writing a conceptual history, is explored in this paper.

As the need for triangulation using multiple text corpora is well acknowledged, comparison with results gained by mining a corpus of articles from the Nihon Keizai Shinbun (which focuses on business and industry) is presented to point out limitations of the approach.

Panel Ling13
Individual papers in Language and Linguistics IX
Session 1 Wednesday 25 August, 2021, 8:00-9:30