Towards Unlocking Insights from Logbooks Using AI
Authors:
Antonin Sulc,
Alex Bien,
Annika Eichler,
Daniel Ratner,
Florian Rehm,
Frank Mayet,
Gregor Hartmann,
Hayden Hoschouer,
Henrik Tuennermann,
Jan Kaiser,
Jason St. John,
Jennefer Maldonado,
Kyle Hazelwood,
Raimund Kammering,
Thorsten Hellert,
Tim Wilksen,
Verena Kain,
Wan-Lin Hu
Abstract:
Electronic logbooks contain valuable information about activities and events concerning their associated particle accelerator facilities. However, the highly technical nature of logbook entries can hinder their usability and automation. As natural language processing (NLP) continues advancing, it offers opportunities to address various challenges that logbooks present. This work explores jointly t…
▽ More
Electronic logbooks contain valuable information about activities and events concerning their associated particle accelerator facilities. However, the highly technical nature of logbook entries can hinder their usability and automation. As natural language processing (NLP) continues advancing, it offers opportunities to address various challenges that logbooks present. This work explores jointly testing a tailored Retrieval Augmented Generation (RAG) model for enhancing the usability of particle accelerator logbooks at institutes like DESY, BESSY, Fermilab, BNL, SLAC, LBNL, and CERN. The RAG model uses a corpus built on logbook contributions and aims to unlock insights from these logbooks by leveraging retrieval over facility datasets, including discussion about potential multimodal sources. Our goals are to increase the FAIR-ness (findability, accessibility, interoperability, and reusability) of logbooks by exploiting their information content to streamline everyday use, enable macro-analysis for root cause analysis, and facilitate problem-solving automation.
△ Less
Submitted 25 May, 2024;
originally announced June 2024.
PACuna: Automated Fine-Tuning of Language Models for Particle Accelerators
Authors:
Antonin Sulc,
Raimund Kammering,
Annika Eichler,
Tim Wilksen
Abstract:
Navigating the landscape of particle accelerators has become increasingly challenging with recent surges in contributions. These intricate devices challenge comprehension, even within individual facilities. To address this, we introduce PACuna, a fine-tuned language model refined through publicly available accelerator resources like conferences, pre-prints, and books. We automated data collection…
▽ More
Navigating the landscape of particle accelerators has become increasingly challenging with recent surges in contributions. These intricate devices challenge comprehension, even within individual facilities. To address this, we introduce PACuna, a fine-tuned language model refined through publicly available accelerator resources like conferences, pre-prints, and books. We automated data collection and question generation to minimize expert involvement and make the data publicly available. PACuna demonstrates proficiency in addressing intricate accelerator questions, validated by experts. Our approach shows adapting language models to scientific domains by fine-tuning technical texts and auto-generated corpora capturing the latest developments can further produce pre-trained models to answer some intricate questions that commercially available assistants cannot and can serve as intelligent assistants for individual facilities.
△ Less
Submitted 27 November, 2023; v1 submitted 29 October, 2023;
originally announced October 2023.