Click the star to add/remove an item to/from your individual schedule.
You need to be logged in to avail of this functionality, and to see the links to virtual rooms.

Accepted Paper:

Re-activating archive analytics: opportunities for multilingual search and advanced analytics for Nordic tradition archives  
Timothy Tangherlini (University of California, Berkeley)

Paper short abstract:

We offer an overview of the ISEBEL project and some of the challenges faced while developing a multilingual search engine for tradition archives representing diverse languages and cultures. We propose a series of analytic tools that could support a data-driven analysis of these expressive forms.

Paper long abstract:

In an EU-NEH project, researchers from the Netherlands (Meertens), Germany (Wossidlo), and Denmark (University of California), created ISEBEL (Intelligent Search Engine for Belief Legends), a "search once, retrieve from multiple archives" search engine designed specifically for tradition archives. Initial challenges included the complexities of archival restrictions, disparate classification regimes, and diverse languages. ISEBEL uses the OAI-PMH model for meta-data harvesting and a customizable open-source management portal, CKAN, that incorporates a relational database, Solr indexing, and map-based display and navigation. Downstream challenges included the development of (i) a common, minimal schema allowing local archives to control their data, (ii) an extensible open vocabulary for tradition specific terms, and (iii) neural machine translation models to support multilingual search. The SAMLA project in Norway explores many of these problems at a greater scale and with more heterogeneous data. SAMLA also presents the tantalizing challenge of working with dialect and the closely related Nordic languages. As such, it offers a unique opportunity for Nordic tradition archives to expand on some of the developments of ISEBEL, and national projects such as Sagnagrunnur (Iceland) and Folke (Sweden). Yet, there is a pressing need to develop data-driven analytics providing more sophisticated visualizations such as correlations between local regions and topics and the use of context-aware word embedding models for discovery of communities. Incorporating sophisticated multiplex representations of the tradition space would make available a series of graph-theoretic methods that may unlock unexpected insight into the tradition space.

Panel Narr01b
Re-activating the archives II
  Session 1 Tuesday 14 June, 2022, -