Mini-Muse research project

NLP algorithms and data visualization to access cultural archives.

Mini-Muse is a preliminary study that aims to combine Natural Language Process algorithms and data visualization to enhance the access and engagement with digitized publications about history.

User interface of the Mini-Muse prototype, an information system that tries to provide users with a visual overview and graphical hints to explore the cultural collection.

Launch the
Mini-Muse prototype

Project outline

Mini-Muse is a preliminary project aimed at implementing AI-based data visualizations to enhance access and engagement with archives of digitized publications. The project combines Natural Language Processing algorithms and design methods to go beyond the traditional search bar (which is often ineffective for users seeking a small set of relevant articles for their research) by providing a visual overview and graphical hints to explore the collection. The project envisions using artificial intelligence and data visualization to improve access and discoverability, and to enhance the informative value of digital archives. It builds on current studies in information design, artificial intelligence and digital humanities to lay the groundwork for innovative archival technology that can be applied in research, cultural, and educational fields.

Goals

The Mini-Muse project aims to acquire preliminary knowledge on the integration of AI algorithms and data visualization methods to access and analyse digital libraries. In specific, Mini-Muse builds a basic prototype that serves as a means to achieve the following objectives:

  • Define a set of user requirements

    for the development of NLP algorithms and data visualizations to access digital publications.
  • Develop a basic prototype of the information system

    to access the archive. The information system is composed of a set of NLP algorithms, data visualizations and graphical user interfaces.
  • Gather reader feedback

    from the test of the prototype to assess the information system usability and efficiency in providing readers with meaningful information and avoiding information overload.

Methodology

The Mini-Muse project adopts a user-centered design approach to develop user-friendly ITC and design solutions and to meet real user needs. The project uses technologies based on natural language recognition and integration within a novel digital archival interface paradigm. The project runs over 12 months and consists of the following work packages:

  1. WP1: User research and content acquisition

    1. Conduction of semi-structured user interviews with people that are used to use cultural digital archives for reasons related to the study of history.
    2. Gathering of a set of desiderata in terms of features of cultural digital archives.
  2. WP2: Development of the NLP algorithms

    1. The journal content, coming from E-Periodica, are converted in a format suitable for the NLP algorithms.
    2. Selection of 50 recent articles with the following attributes: talking about politics, in German, more than one article coming from the same issue, with shared historical figures.
    3. Store annotations, extracted content, and summarized documents within a graph database to enable data retrieval, and relationship mapping.
    4. Implementation of a secure web API (backend).
    5. Selection of a set of type of actions (including decide something and state something).
    6. Definition of a set of annotation guidelines: a written document containing the information about what can be defined an historical figure and what are the types of entity to be detected.
    7. Implementation of NLP algorithms which belong to the categories: rule-based and Large Language Model-based (LLM).
  3. WP3: Design and implementation of the basic prototype

    1. Definition of a set of use cases.
    2. Design of interface wireframes and mockups.
    3. Implementation of the user interface (frontend) based on the use of metadata visualization.
  4. WP4: Evaluation of the prototype

    1. Review of both backend and frontend according to the feedback received from ETH-Library team.
    2. Conduction of semi-structured user interviews with people involved in the user research.
    3. Anonymous survey with people involved in the user research.

Data source

The project uses a set of articles coming from the Swiss Historical Journal, published quarterly by the Swiss History Society since 1951, and made available online by E-Periodica, a service by ETH Library. The Swiss Historical Journal features scientific articles about Swiss history and global history in German, French and Italian. It is printed by the Schwabe Verlag in Basel.

For the aim of the project we selected on a small set of articles with the following attributes: in German, about politics, recent (published after 2012). Furthermore, we selected more than one article coming from the same issue, with shared historical entities.

Information extraction and summarization

The project utilizes information extraction algorithms to build a piece of software capable of interpreting the text present in the publications and indexing content, such as dates, titles and subtitles, places, historical entities and performed actions. While text summarization is intended as a means to extract key information and to produce narratives according to user visual and textual input.

NLP algorithms maps action flows, the sequence of actions in a text showing who did what, when, and where. The projects uses two types of NLP algorithms:

  • Rule-based

    Text Parsing, NER, Dependency Parsing, and Rule-Based Systems analyze sentence structures and grammatical relationships to classify and link entities like historical figures, objects, and locations.

  • Large Language Model-based (LLM)

    leveraging models like GPT-4 that use structured prompting and large text inputs to understand and interpret complex contexts, identifying actions, historical figures, and relationships, and inferring details like time and location for accurate action flow detection.

Metadata and information visualization

The project adopts metadata and information visualization methods and tools, through an humanistic approach, to define the layout and the set of charts of the graphical user interface of the tool providing readers with preliminary insights on the articles while doing their research.

Specifically, the prototype uses an interactive timeline programmatically built with the D3 JavaScript Library. The position of the timeline bar shows when the historical entity undertook an individual action, and the color of the bar indicates the type of action undertaken.

Project results

Results dissemination

  • AI-assisted search for digitized publication archives. Fostering the study of historical figures through the use of Natural Language Processing (NLP) and data visualization techniques.
    Profeta, G., Presentation at Digital History Switzerland 2024 conference, University of Basel, Department of History, September 12, 2024.
    → Read the extended abstract