-
A new light at the end of the tunnel: fiber gas discharge lasers
Authors:
A. L. Love,
S. A. Bateman,
W. Belardi,
F. Yu,
J. C. Knight,
D. W. Coutts,
C. E. Webb,
W. J. Wadsworth
Abstract:
Optical fibers have emerged as a transformative platform for building better and more robust solid state lasers. However, the wavelengths available to these lasers are limited. Using hollow core optical fibers allows us to add gases as new potential gain media for fiber lasers, and also liberates the gas laser from the limits normally imposed by diffraction. To demonstrate the new technology, we p…
▽ More
Optical fibers have emerged as a transformative platform for building better and more robust solid state lasers. However, the wavelengths available to these lasers are limited. Using hollow core optical fibers allows us to add gases as new potential gain media for fiber lasers, and also liberates the gas laser from the limits normally imposed by diffraction. To demonstrate the new technology, we present a fiber laser at 3500 nm wavelength, using an antiresonant guiding hollow core optical fiber containing neutral xenon atoms pumped by an afterglow discharge of a helium-xenon mixture within a fiber of over 1 m in length. Laser action is confirmed through observation of polarization dependence, mode pulling and mode beating. Our results unlock a new breed of flexible fiber lasers operating at a plethora of wavelengths, many previous unavailable.
△ Less
Submitted 19 December, 2023;
originally announced December 2023.
-
LitSumm: Large language models for literature summarisation of non-coding RNAs
Authors:
Andrew Green,
Carlos Ribas,
Nancy Ontiveros-Palacios,
Sam Griffiths-Jones,
Anton I. Petrov,
Alex Bateman,
Blake Sweeney
Abstract:
Motivation: Curation of literature in life sciences is a growing challenge. The continued increase in the rate of publication, coupled with the relatively fixed number of curators worldwide presents a major challenge to developers of biomedical knowledgebases. Very few knowledgebases have resources to scale to the whole relevant literature and all have to prioritise their efforts.
Results: In th…
▽ More
Motivation: Curation of literature in life sciences is a growing challenge. The continued increase in the rate of publication, coupled with the relatively fixed number of curators worldwide presents a major challenge to developers of biomedical knowledgebases. Very few knowledgebases have resources to scale to the whole relevant literature and all have to prioritise their efforts.
Results: In this work, we take a first step to alleviating the lack of curator time in RNA science by generating summaries of literature for non-coding RNAs using large language models (LLMs). We demonstrate that high-quality, factually accurate summaries with accurate references can be automatically generated from the literature using a commercial LLM and a chain of prompts and checks. Manual assessment was carried out for a subset of summaries, with the majority being rated extremely high quality. We also applied the most commonly used automated evaluation approaches, finding that they do not correlate with human assessment. Finally, we apply our tool to a selection of over 4,600 ncRNAs and make the generated summaries available via the RNAcentral resource. We conclude that automated literature summarization is feasible with the current generation of LLMs, provided careful prompting and automated checking are applied.
Availability: Code used to produce these summaries can be found here: https://github.com/RNAcentral/litscan-summarization and the dataset of contexts and summaries can be found here: https://huggingface.co/datasets/RNAcentral/litsumm-v1. Summaries are also displayed on the RNA report pages in RNAcentral (https://rnacentral.org/)
△ Less
Submitted 19 April, 2024; v1 submitted 6 November, 2023;
originally announced November 2023.
-
Semiotically-grounded distant viewing of diagrams: insights from two multimodal corpora
Authors:
Tuomo Hiippala,
John A. Bateman
Abstract:
In this article, we bring together theories of multimodal communication and computational methods to study how primary school science diagrams combine multiple expressive resources. We position our work within the field of digital humanities, and show how annotations informed by multimodality research, which target expressive resources and discourse structure, allow imposing structure on the outpu…
▽ More
In this article, we bring together theories of multimodal communication and computational methods to study how primary school science diagrams combine multiple expressive resources. We position our work within the field of digital humanities, and show how annotations informed by multimodality research, which target expressive resources and discourse structure, allow imposing structure on the output of computational methods. We illustrate our approach by analysing two multimodal diagram corpora: the first corpus is intended to support research on automatic diagram processing, whereas the second is oriented towards studying diagrams as a mode of communication. Our results show that multimodally-informed annotations can bring out structural patterns in the diagrams, which also extend across diagrams that deal with different topics.
△ Less
Submitted 8 March, 2021;
originally announced March 2021.
-
Introducing the diagrammatic semiotic mode
Authors:
Tuomo Hiippala,
John A. Bateman
Abstract:
As the use and diversity of diagrams across many disciplines grows, there is an increasing interest in the diagrams research community concerning how such diversity might be documented and explained. In this article, we argue that one way of achieving increased reliability, coverage, and utility for a general classification of diagrams is to draw on recently developed semiotic principles developed…
▽ More
As the use and diversity of diagrams across many disciplines grows, there is an increasing interest in the diagrams research community concerning how such diversity might be documented and explained. In this article, we argue that one way of achieving increased reliability, coverage, and utility for a general classification of diagrams is to draw on recently developed semiotic principles developed within the field of multimodality. To this end, we sketch out the internal details of what may tentatively be termed the diagrammatic semiotic mode. This provides a natural account of how diagrammatic representations may integrate natural language, various forms of graphics, diagrammatic elements such as arrows, lines and other expressive resources into coherent organisations, while still respecting the crucial diagrammatic contributions of visual organisation. We illustrate the proposed approach using two recent diagram corpora and show how a multimodal approach supports the empirical analysis of diagrammatic representations, especially in identifying diagrammatic constituents and describing their interrelations in a manner that may be generalised across diagram types and be used to characterise distinct kinds of functionality.
△ Less
Submitted 12 June, 2022; v1 submitted 30 January, 2020;
originally announced January 2020.
-
AI2D-RST: A multimodal corpus of 1000 primary school science diagrams
Authors:
Tuomo Hiippala,
Malihe Alikhani,
Jonas Haverinen,
Timo Kalliokoski,
Evanfiya Logacheva,
Serafina Orekhova,
Aino Tuomainen,
Matthew Stone,
John A. Bateman
Abstract:
This article introduces AI2D-RST, a multimodal corpus of 1000 English-language diagrams that represent topics in primary school natural sciences, such as food webs, life cycles, moon phases and human physiology. The corpus is based on the Allen Institute for Artificial Intelligence Diagrams (AI2D) dataset, a collection of diagrams with crowd-sourced descriptions, which was originally developed to…
▽ More
This article introduces AI2D-RST, a multimodal corpus of 1000 English-language diagrams that represent topics in primary school natural sciences, such as food webs, life cycles, moon phases and human physiology. The corpus is based on the Allen Institute for Artificial Intelligence Diagrams (AI2D) dataset, a collection of diagrams with crowd-sourced descriptions, which was originally developed to support research on automatic diagram understanding and visual question answering. Building on the segmentation of diagram layouts in AI2D, the AI2D-RST corpus presents a new multi-layer annotation schema that provides a rich description of their multimodal structure. Annotated by trained experts, the layers describe (1) the grou** of diagram elements into perceptual units, (2) the connections set up by diagrammatic elements such as arrows and lines, and (3) the discourse relations between diagram elements, which are described using Rhetorical Structure Theory (RST). Each annotation layer in AI2D-RST is represented using a graph. The corpus is freely available for research and teaching.
△ Less
Submitted 20 March, 2020; v1 submitted 9 December, 2019;
originally announced December 2019.
-
Application-driven automatic subgrammar extraction
Authors:
Renate Henschel,
John A. Bateman
Abstract:
The space and run-time requirements of broad coverage grammars appear for many applications unreasonably large in relation to the relative simplicity of the task at hand. On the other hand, handcrafted development of application-dependent grammars is in danger of duplicating work which is then difficult to re-use in other contexts of application. To overcome this problem, we present in this pape…
▽ More
The space and run-time requirements of broad coverage grammars appear for many applications unreasonably large in relation to the relative simplicity of the task at hand. On the other hand, handcrafted development of application-dependent grammars is in danger of duplicating work which is then difficult to re-use in other contexts of application. To overcome this problem, we present in this paper a procedure for the automatic extraction of application-tuned consistent subgrammars from proved large-scale generation grammars. The procedure has been implemented for large-scale systemic grammars and builds on the formal equivalence between systemic grammars and typed unification based grammars. Its evaluation for the generation of encyclopedia entries is described, and directions of future development, applicability, and extensions are discussed.
△ Less
Submitted 19 November, 1997;
originally announced November 1997.
-
Some apparently disjoint aims and requirements for grammar development environments: the case of natural language generation
Authors:
John A. Bateman
Abstract:
Grammar development environments (GDE's) for analysis and for generation have not yet come together. Despite the fact that analysis-oriented GDE's (such as ALEP) may include some possibility of sentence generation, the development techniques and kinds of resources suggested are apparently not those required for practical, large-scale natural language generation work. Indeed, there is no use of `…
▽ More
Grammar development environments (GDE's) for analysis and for generation have not yet come together. Despite the fact that analysis-oriented GDE's (such as ALEP) may include some possibility of sentence generation, the development techniques and kinds of resources suggested are apparently not those required for practical, large-scale natural language generation work. Indeed, there is no use of `standard' (i.e., analysis-oriented) GDE's in current projects/applications targetting the generation of fluent, coherent texts. This unsatisfactory situation requires some analysis and explanation, which this paper attempts using as an example an extensive GDE for generation. The support provided for distributed large-scale grammar development, multilinguality, and resource maintenance are discussed and contrasted with analysis-oriented approaches.
△ Less
Submitted 19 November, 1997;
originally announced November 1997.
-
Emphatic generation: employing the theory of semantic emphasis for text generation
Authors:
Elke Teich,
Beate Firzlaff,
John A. Bateman
Abstract:
The paper deals with the problem of text generation and planning approaches making only limited formally specifiable contact with accounts of grammar. We propose an enhancement of a systemically-based generation architecture for German (the KOMET system) by aspects of Kunze's theory of semantic emphasis. Doing this, we gain more control over both concept selection in generation and choice of fin…
▽ More
The paper deals with the problem of text generation and planning approaches making only limited formally specifiable contact with accounts of grammar. We propose an enhancement of a systemically-based generation architecture for German (the KOMET system) by aspects of Kunze's theory of semantic emphasis. Doing this, we gain more control over both concept selection in generation and choice of fine-grained grammatical variation.
△ Less
Submitted 25 April, 1997;
originally announced April 1997.
-
The Theoretical Status of Ontologies in Natural Language Processing
Authors:
John A. Bateman
Abstract:
This paper discusses the use of `ontologies' in Natural Language Processing. It classifies various kinds of ontologies that have been employed in NLP and discusses various benefits and problems with those designs. Particular focus is then placed on experiences gained in the use of the Upper Model, a linguistically-motivated `ontology' originally designed for use with the Penman text generation s…
▽ More
This paper discusses the use of `ontologies' in Natural Language Processing. It classifies various kinds of ontologies that have been employed in NLP and discusses various benefits and problems with those designs. Particular focus is then placed on experiences gained in the use of the Upper Model, a linguistically-motivated `ontology' originally designed for use with the Penman text generation system. Some proposals for further NLP ontology design criteria are then made.
△ Less
Submitted 25 April, 1997;
originally announced April 1997.