Search | arXiv e-print repository

A new light at the end of the tunnel: fiber gas discharge lasers

Authors: A. L. Love, S. A. Bateman, W. Belardi, F. Yu, J. C. Knight, D. W. Coutts, C. E. Webb, W. J. Wadsworth

Abstract: Optical fibers have emerged as a transformative platform for building better and more robust solid state lasers. However, the wavelengths available to these lasers are limited. Using hollow core optical fibers allows us to add gases as new potential gain media for fiber lasers, and also liberates the gas laser from the limits normally imposed by diffraction. To demonstrate the new technology, we p… ▽ More Optical fibers have emerged as a transformative platform for building better and more robust solid state lasers. However, the wavelengths available to these lasers are limited. Using hollow core optical fibers allows us to add gases as new potential gain media for fiber lasers, and also liberates the gas laser from the limits normally imposed by diffraction. To demonstrate the new technology, we present a fiber laser at 3500 nm wavelength, using an antiresonant guiding hollow core optical fiber containing neutral xenon atoms pumped by an afterglow discharge of a helium-xenon mixture within a fiber of over 1 m in length. Laser action is confirmed through observation of polarization dependence, mode pulling and mode beating. Our results unlock a new breed of flexible fiber lasers operating at a plethora of wavelengths, many previous unavailable. △ Less

Submitted 19 December, 2023; originally announced December 2023.

Comments: 10 pages

arXiv:2311.03056 [pdf]

LitSumm: Large language models for literature summarisation of non-coding RNAs

Authors: Andrew Green, Carlos Ribas, Nancy Ontiveros-Palacios, Sam Griffiths-Jones, Anton I. Petrov, Alex Bateman, Blake Sweeney

Abstract: Motivation: Curation of literature in life sciences is a growing challenge. The continued increase in the rate of publication, coupled with the relatively fixed number of curators worldwide presents a major challenge to developers of biomedical knowledgebases. Very few knowledgebases have resources to scale to the whole relevant literature and all have to prioritise their efforts. Results: In th… ▽ More Motivation: Curation of literature in life sciences is a growing challenge. The continued increase in the rate of publication, coupled with the relatively fixed number of curators worldwide presents a major challenge to developers of biomedical knowledgebases. Very few knowledgebases have resources to scale to the whole relevant literature and all have to prioritise their efforts. Results: In this work, we take a first step to alleviating the lack of curator time in RNA science by generating summaries of literature for non-coding RNAs using large language models (LLMs). We demonstrate that high-quality, factually accurate summaries with accurate references can be automatically generated from the literature using a commercial LLM and a chain of prompts and checks. Manual assessment was carried out for a subset of summaries, with the majority being rated extremely high quality. We also applied the most commonly used automated evaluation approaches, finding that they do not correlate with human assessment. Finally, we apply our tool to a selection of over 4,600 ncRNAs and make the generated summaries available via the RNAcentral resource. We conclude that automated literature summarization is feasible with the current generation of LLMs, provided careful prompting and automated checking are applied. Availability: Code used to produce these summaries can be found here: https://github.com/RNAcentral/litscan-summarization and the dataset of contexts and summaries can be found here: https://huggingface.co/datasets/RNAcentral/litsumm-v1. Summaries are also displayed on the RNA report pages in RNAcentral (https://rnacentral.org/) △ Less

Submitted 19 April, 2024; v1 submitted 6 November, 2023; originally announced November 2023.

arXiv:2103.04692 [pdf, other]

doi 10.1093/llc/fqab063

Semiotically-grounded distant viewing of diagrams: insights from two multimodal corpora

Authors: Tuomo Hiippala, John A. Bateman

Abstract: In this article, we bring together theories of multimodal communication and computational methods to study how primary school science diagrams combine multiple expressive resources. We position our work within the field of digital humanities, and show how annotations informed by multimodality research, which target expressive resources and discourse structure, allow imposing structure on the outpu… ▽ More In this article, we bring together theories of multimodal communication and computational methods to study how primary school science diagrams combine multiple expressive resources. We position our work within the field of digital humanities, and show how annotations informed by multimodality research, which target expressive resources and discourse structure, allow imposing structure on the output of computational methods. We illustrate our approach by analysing two multimodal diagram corpora: the first corpus is intended to support research on automatic diagram processing, whereas the second is oriented towards studying diagrams as a mode of communication. Our results show that multimodally-informed annotations can bring out structural patterns in the diagrams, which also extend across diagrams that deal with different topics. △ Less

Submitted 8 March, 2021; originally announced March 2021.

Comments: 22 pages, 11 figures. Under review at Digital Scholarship in the Humanities

Journal ref: Digital Scholarship in the Humanities, 2021 (ahead of press)

arXiv:2001.11224 [pdf, other]

Introducing the diagrammatic semiotic mode

Authors: Tuomo Hiippala, John A. Bateman

Abstract: As the use and diversity of diagrams across many disciplines grows, there is an increasing interest in the diagrams research community concerning how such diversity might be documented and explained. In this article, we argue that one way of achieving increased reliability, coverage, and utility for a general classification of diagrams is to draw on recently developed semiotic principles developed… ▽ More As the use and diversity of diagrams across many disciplines grows, there is an increasing interest in the diagrams research community concerning how such diversity might be documented and explained. In this article, we argue that one way of achieving increased reliability, coverage, and utility for a general classification of diagrams is to draw on recently developed semiotic principles developed within the field of multimodality. To this end, we sketch out the internal details of what may tentatively be termed the diagrammatic semiotic mode. This provides a natural account of how diagrammatic representations may integrate natural language, various forms of graphics, diagrammatic elements such as arrows, lines and other expressive resources into coherent organisations, while still respecting the crucial diagrammatic contributions of visual organisation. We illustrate the proposed approach using two recent diagram corpora and show how a multimodal approach supports the empirical analysis of diagrammatic representations, especially in identifying diagrammatic constituents and describing their interrelations in a manner that may be generalised across diagram types and be used to characterise distinct kinds of functionality. △ Less

Submitted 12 June, 2022; v1 submitted 30 January, 2020; originally announced January 2020.

Comments: 16 pages; accepted at Diagrams 2022

arXiv:1912.03879 [pdf, other]

doi 10.1007/s10579-020-09517-1

AI2D-RST: A multimodal corpus of 1000 primary school science diagrams

Authors: Tuomo Hiippala, Malihe Alikhani, Jonas Haverinen, Timo Kalliokoski, Evanfiya Logacheva, Serafina Orekhova, Aino Tuomainen, Matthew Stone, John A. Bateman

Abstract: This article introduces AI2D-RST, a multimodal corpus of 1000 English-language diagrams that represent topics in primary school natural sciences, such as food webs, life cycles, moon phases and human physiology. The corpus is based on the Allen Institute for Artificial Intelligence Diagrams (AI2D) dataset, a collection of diagrams with crowd-sourced descriptions, which was originally developed to… ▽ More This article introduces AI2D-RST, a multimodal corpus of 1000 English-language diagrams that represent topics in primary school natural sciences, such as food webs, life cycles, moon phases and human physiology. The corpus is based on the Allen Institute for Artificial Intelligence Diagrams (AI2D) dataset, a collection of diagrams with crowd-sourced descriptions, which was originally developed to support research on automatic diagram understanding and visual question answering. Building on the segmentation of diagram layouts in AI2D, the AI2D-RST corpus presents a new multi-layer annotation schema that provides a rich description of their multimodal structure. Annotated by trained experts, the layers describe (1) the grou** of diagram elements into perceptual units, (2) the connections set up by diagrammatic elements such as arrows and lines, and (3) the discourse relations between diagram elements, which are described using Rhetorical Structure Theory (RST). Each annotation layer in AI2D-RST is represented using a graph. The corpus is freely available for research and teaching. △ Less

Submitted 20 March, 2020; v1 submitted 9 December, 2019; originally announced December 2019.

Comments: 24 pages; revised version submitted to Language Resources & Evaluation

Journal ref: Language Resources and Evaluation 55(3), 2021, pp. 661-688

arXiv:cmp-lg/9711010 [pdf, ps, other]

Application-driven automatic subgrammar extraction

Authors: Renate Henschel, John A. Bateman

Abstract: The space and run-time requirements of broad coverage grammars appear for many applications unreasonably large in relation to the relative simplicity of the task at hand. On the other hand, handcrafted development of application-dependent grammars is in danger of duplicating work which is then difficult to re-use in other contexts of application. To overcome this problem, we present in this pape… ▽ More The space and run-time requirements of broad coverage grammars appear for many applications unreasonably large in relation to the relative simplicity of the task at hand. On the other hand, handcrafted development of application-dependent grammars is in danger of duplicating work which is then difficult to re-use in other contexts of application. To overcome this problem, we present in this paper a procedure for the automatic extraction of application-tuned consistent subgrammars from proved large-scale generation grammars. The procedure has been implemented for large-scale systemic grammars and builds on the formal equivalence between systemic grammars and typed unification based grammars. Its evaluation for the generation of encyclopedia entries is described, and directions of future development, applicability, and extensions are discussed. △ Less

Submitted 19 November, 1997; originally announced November 1997.

Comments: 8 pages, uses: aclap.sty, epic.sty, put-inserts Paper presented at the ACL/EACL'97 Madrid Workshop on Computational Environments for Grammar Development and Linguistic Engineering

arXiv:cmp-lg/9711005 [pdf, ps, other]

Some apparently disjoint aims and requirements for grammar development environments: the case of natural language generation

Authors: John A. Bateman

Abstract: Grammar development environments (GDE's) for analysis and for generation have not yet come together. Despite the fact that analysis-oriented GDE's (such as ALEP) may include some possibility of sentence generation, the development techniques and kinds of resources suggested are apparently not those required for practical, large-scale natural language generation work. Indeed, there is no use of `… ▽ More Grammar development environments (GDE's) for analysis and for generation have not yet come together. Despite the fact that analysis-oriented GDE's (such as ALEP) may include some possibility of sentence generation, the development techniques and kinds of resources suggested are apparently not those required for practical, large-scale natural language generation work. Indeed, there is no use of `standard' (i.e., analysis-oriented) GDE's in current projects/applications targetting the generation of fluent, coherent texts. This unsatisfactory situation requires some analysis and explanation, which this paper attempts using as an example an extensive GDE for generation. The support provided for distributed large-scale grammar development, multilinguality, and resource maintenance are discussed and contrasted with analysis-oriented approaches. △ Less

Submitted 19 November, 1997; originally announced November 1997.

Comments: 9 pages, EPS figures, uses: aclap.sty, psfig.sty Paper presented at the ACL/EACL'97 Madrid Workshop on Computational Environments for Grammar Development and Linguistic Engineering

arXiv:cmp-lg/9704012 [pdf, ps, other]

Emphatic generation: employing the theory of semantic emphasis for text generation

Authors: Elke Teich, Beate Firzlaff, John A. Bateman

Abstract: The paper deals with the problem of text generation and planning approaches making only limited formally specifiable contact with accounts of grammar. We propose an enhancement of a systemically-based generation architecture for German (the KOMET system) by aspects of Kunze's theory of semantic emphasis. Doing this, we gain more control over both concept selection in generation and choice of fin… ▽ More The paper deals with the problem of text generation and planning approaches making only limited formally specifiable contact with accounts of grammar. We propose an enhancement of a systemically-based generation architecture for German (the KOMET system) by aspects of Kunze's theory of semantic emphasis. Doing this, we gain more control over both concept selection in generation and choice of fine-grained grammatical variation. △ Less

Submitted 25 April, 1997; originally announced April 1997.

Comments: 11pp; uses: 11pt,twocolumn,named,a4wide,program,psfig; 1psfig

arXiv:cmp-lg/9704010 [pdf, ps, other]

The Theoretical Status of Ontologies in Natural Language Processing

Authors: John A. Bateman

Abstract: This paper discusses the use of `ontologies' in Natural Language Processing. It classifies various kinds of ontologies that have been employed in NLP and discusses various benefits and problems with those designs. Particular focus is then placed on experiences gained in the use of the Upper Model, a linguistically-motivated `ontology' originally designed for use with the Penman text generation s… ▽ More This paper discusses the use of `ontologies' in Natural Language Processing. It classifies various kinds of ontologies that have been employed in NLP and discusses various benefits and problems with those designs. Particular focus is then placed on experiences gained in the use of the Upper Model, a linguistically-motivated `ontology' originally designed for use with the Penman text generation system. Some proposals for further NLP ontology design criteria are then made. △ Less

Submitted 25 April, 1997; originally announced April 1997.

Comments: 43pp, uses: twocolumn,named,a4wide,psfig, 7eps figs

Showing 1–9 of 9 results for author: Bateman, A