-
On the Evaluation of Machine-Generated Reports
Authors:
James Mayfield,
Eugene Yang,
Dawn Lawrie,
Sean MacAvaney,
Paul McNamee,
Douglas W. Oard,
Luca Soldaini,
Ian Soboroff,
Orion Weller,
Efsun Kayi,
Kate Sanders,
Marc Mason,
Noah Hibbler
Abstract:
Large Language Models (LLMs) have enabled new ways to satisfy information needs. Although great strides have been made in applying them to settings like document ranking and short-form text generation, they still struggle to compose complete, accurate, and verifiable long-form reports. Reports with these qualities are necessary to satisfy the complex, nuanced, or multi-faceted information needs of…
▽ More
Large Language Models (LLMs) have enabled new ways to satisfy information needs. Although great strides have been made in applying them to settings like document ranking and short-form text generation, they still struggle to compose complete, accurate, and verifiable long-form reports. Reports with these qualities are necessary to satisfy the complex, nuanced, or multi-faceted information needs of users. In this perspective paper, we draw together opinions from industry and academia, and from a variety of related research areas, to present our vision for automatic report generation, and -- critically -- a flexible framework by which such reports can be evaluated. In contrast with other summarization tasks, automatic report generation starts with a detailed description of an information need, stating the necessary background, requirements, and scope of the report. Further, the generated reports should be complete, accurate, and verifiable. These qualities, which are desirable -- if not required -- in many analytic report-writing settings, require rethinking how to build and evaluate systems that exhibit these qualities. To foster new efforts in building these systems, we present an evaluation framework that draws on ideas found in various evaluations. To test completeness and accuracy, the framework uses nuggets of information, expressed as questions and answers, that need to be part of any high-quality generated report. Additionally, evaluation of citations that map claims made in the report to their source documents ensures verifiability.
△ Less
Submitted 9 May, 2024; v1 submitted 1 May, 2024;
originally announced May 2024.
-
Extending Translate-Train for ColBERT-X to African Language CLIR
Authors:
Eugene Yang,
Dawn J. Lawrie,
Paul McNamee,
James Mayfield
Abstract:
This paper describes the submission runs from the HLTCOE team at the CIRAL CLIR tasks for African languages at FIRE 2023. Our submissions use machine translation models to translate the documents and the training passages, and ColBERT-X as the retrieval model. Additionally, we present a set of unofficial runs that use an alternative training procedure with a similar training setting.
This paper describes the submission runs from the HLTCOE team at the CIRAL CLIR tasks for African languages at FIRE 2023. Our submissions use machine translation models to translate the documents and the training passages, and ColBERT-X as the retrieval model. Additionally, we present a set of unofficial runs that use an alternative training procedure with a similar training setting.
△ Less
Submitted 11 April, 2024;
originally announced April 2024.
-
Overview of the TREC 2023 NeuCLIR Track
Authors:
Dawn Lawrie,
Sean MacAvaney,
James Mayfield,
Paul McNamee,
Douglas W. Oard,
Luca Soldaini,
Eugene Yang
Abstract:
The principal goal of the TREC Neural Cross-Language Information Retrieval (NeuCLIR) track is to study the impact of neural approaches to cross-language information retrieval. The track has created four collections, large collections of Chinese, Persian, and Russian newswire and a smaller collection of Chinese scientific abstracts. The principal tasks are ranked retrieval of news in one of the thr…
▽ More
The principal goal of the TREC Neural Cross-Language Information Retrieval (NeuCLIR) track is to study the impact of neural approaches to cross-language information retrieval. The track has created four collections, large collections of Chinese, Persian, and Russian newswire and a smaller collection of Chinese scientific abstracts. The principal tasks are ranked retrieval of news in one of the three languages, using English topics. Results for a multilingual task, also with English topics but with documents from all three newswire collections, are also reported. New in this second year of the track is a pilot technical documents CLIR task for ranked retrieval of Chinese technical documents using English topics. A total of 220 runs across all tasks were submitted by six participating teams and, as baselines, by track coordinators. Task descriptions and results are presented.
△ Less
Submitted 11 April, 2024;
originally announced April 2024.
-
Feature Aggregation in Joint Sound Classification and Localization Neural Networks
Authors:
Brendan Healy,
Patrick McNamee,
Zahra Nili Ahmadabadi
Abstract:
This study addresses the application of deep learning techniques in joint sound signal classification and localization networks. Current state-of-the-art sound source localization deep learning networks lack feature aggregation within their architecture. Feature aggregation enhances model performance by enabling the consolidation of information from different feature scales, thereby improving feat…
▽ More
This study addresses the application of deep learning techniques in joint sound signal classification and localization networks. Current state-of-the-art sound source localization deep learning networks lack feature aggregation within their architecture. Feature aggregation enhances model performance by enabling the consolidation of information from different feature scales, thereby improving feature robustness and invariance. This is particularly important in SSL networks, which must differentiate direct and indirect acoustic signals. To address this gap, we adapt feature aggregation techniques from computer vision neural networks to signal detection neural networks. Additionally, we propose the Scale Encoding Network (SEN) for feature aggregation to encode features from various scales, compressing the network for more computationally efficient aggregation. To evaluate the efficacy of feature aggregation in SSL networks, we integrated the following computer vision feature aggregation sub-architectures into a SSL control architecture: Path Aggregation Network (PANet), Weighted Bi-directional Feature Pyramid Network (BiFPN), and SEN. These sub-architectures were evaluated using two metrics for signal classification and two metrics for direction-of-arrival regression. PANet and BiFPN are established aggregators in computer vision models, while the proposed SEN is a more compact aggregator. The results suggest that models incorporating feature aggregations outperformed the control model, the Sound Event Localization and Detection network (SELDnet), in both sound signal classification and localization. The feature aggregation techniques enhance the performance of sound detection neural networks, particularly in direction-of-arrival regression.
△ Less
Submitted 27 January, 2024; v1 submitted 29 October, 2023;
originally announced October 2023.
-
Autonomous search of real-life environments combining dynamical system-based path planning and unsupervised learning
Authors:
Uyiosa Philip Amadasun,
Patrick McNamee,
Zahra Nili Ahmadabadi,
Peiman Naseradinmousavi
Abstract:
In recent years, advancements have been made towards the goal of using chaotic coverage path planners for autonomous search and traversal of spaces with limited environmental cues. However, the state of this field is still in its infancy as there has been little experimental work done. Current experimental work has not developed robust methods to satisfactorily address the immediate set of problem…
▽ More
In recent years, advancements have been made towards the goal of using chaotic coverage path planners for autonomous search and traversal of spaces with limited environmental cues. However, the state of this field is still in its infancy as there has been little experimental work done. Current experimental work has not developed robust methods to satisfactorily address the immediate set of problems a chaotic coverage path planner needs to overcome in order to scan realistic environments within reasonable coverage times. These immediate problems are as follows: (1) an obstacle avoidance technique which generally maintains the kinematic efficiency of the robot's motion, (2) a means to spread chaotic trajectories across the environment (especially crucial for large and/or complex-shaped environments) that need to be covered, and (3) a real-time coverage calculation technique that is accurate and independent of cell size. This paper aims to progress the field by proposing algorithms that address all of these problems by providing techniques for obstacle avoidance, chaotic trajectory dispersal, and accurate coverage calculation. The algorithms produce generally smooth chaotic trajectories and provide high scanning coverage of environments. These algorithms were created within the ROS framework and make up a newly developed chaotic path planning application. The performance of this application was comparable to that of a conventional optimal path planner. The performance tests were carried out in environments of various sizes, shapes, and obstacle densities, both in real-life and Gazebo simulations.
△ Less
Submitted 27 October, 2023; v1 submitted 2 May, 2023;
originally announced May 2023.
-
Overview of the TREC 2022 NeuCLIR Track
Authors:
Dawn Lawrie,
Sean MacAvaney,
James Mayfield,
Paul McNamee,
Douglas W. Oard,
Luca Soldaini,
Eugene Yang
Abstract:
This is the first year of the TREC Neural CLIR (NeuCLIR) track, which aims to study the impact of neural approaches to cross-language information retrieval. The main task in this year's track was ad hoc ranked retrieval of Chinese, Persian, or Russian newswire documents using queries expressed in English. Topics were developed using standard TREC processes, except that topics developed by an annot…
▽ More
This is the first year of the TREC Neural CLIR (NeuCLIR) track, which aims to study the impact of neural approaches to cross-language information retrieval. The main task in this year's track was ad hoc ranked retrieval of Chinese, Persian, or Russian newswire documents using queries expressed in English. Topics were developed using standard TREC processes, except that topics developed by an annotator for one language were assessed by a different annotator when evaluating that topic on a different language. There were 172 total runs submitted by twelve teams.
△ Less
Submitted 24 September, 2023; v1 submitted 24 April, 2023;
originally announced April 2023.
-
Transfer Learning Approaches for Building Cross-Language Dense Retrieval Models
Authors:
Suraj Nair,
Eugene Yang,
Dawn Lawrie,
Kevin Duh,
Paul McNamee,
Kenton Murray,
James Mayfield,
Douglas W. Oard
Abstract:
The advent of transformer-based models such as BERT has led to the rise of neural ranking models. These models have improved the effectiveness of retrieval systems well beyond that of lexical term matching models such as BM25. While monolingual retrieval tasks have benefited from large-scale training collections such as MS MARCO and advances in neural architectures, cross-language retrieval tasks…
▽ More
The advent of transformer-based models such as BERT has led to the rise of neural ranking models. These models have improved the effectiveness of retrieval systems well beyond that of lexical term matching models such as BM25. While monolingual retrieval tasks have benefited from large-scale training collections such as MS MARCO and advances in neural architectures, cross-language retrieval tasks have fallen behind these advancements. This paper introduces ColBERT-X, a generalization of the ColBERT multi-representation dense retrieval model that uses the XLM-RoBERTa (XLM-R) encoder to support cross-language information retrieval (CLIR). ColBERT-X can be trained in two ways. In zero-shot training, the system is trained on the English MS MARCO collection, relying on the XLM-R encoder for cross-language map**s. In translate-train, the system is trained on the MS MARCO English queries coupled with machine translations of the associated MS MARCO passages. Results on ad hoc document ranking tasks in several languages demonstrate substantial and statistically significant improvements of these trained dense retrieval models over traditional lexical CLIR baselines.
△ Less
Submitted 20 January, 2022;
originally announced January 2022.
-
Online search of unknown terrains using a dynamical system-based path planning approach
Authors:
Karan Sridharan,
Patrick McNamee,
Zahra Nili Ahmadabadi,
Jeffrey Hudack
Abstract:
Surveillance and exploration of large environments is a tedious task. In spaces with limited environmental cues, random-like search is an effective approach as it allows the robot to perform online coverage of environments using simple algorithm designs. One way to generate random-like scanning search is to use nonlinear dynamical systems to impart chaos into the searching robot's controller. This…
▽ More
Surveillance and exploration of large environments is a tedious task. In spaces with limited environmental cues, random-like search is an effective approach as it allows the robot to perform online coverage of environments using simple algorithm designs. One way to generate random-like scanning search is to use nonlinear dynamical systems to impart chaos into the searching robot's controller. This will result in the generation of unpredictable yet deterministic trajectories, allowing designers to control the system and achieve a high scanning coverage of an area. However, the unpredictability comes at the cost of increased coverage time and a lack of scalability, both of which have been ignored by the state-of-the-art chaotic path planners. This work introduces a new, scalable technique that helps a robot to steer away from the obstacles and cover the entire search space in a short period of time. The technique involves coupling and manipulating two chaotic systems to reduce the coverage time and enable scanning of unknown environments with different online properties. Using this new technique resulted in an average 49% boost in the robot's performance compared to the state-of-the-art planners. the overall search performance of the chaotic planner remained comparable to optimal systems while still ensuring unpredictable paths.
△ Less
Submitted 11 November, 2022; v1 submitted 22 March, 2021;
originally announced March 2021.
-
Curriculum Learning for Domain Adaptation in Neural Machine Translation
Authors:
Xuan Zhang,
Pamela Shapiro,
Gaurav Kumar,
Paul McNamee,
Marine Carpuat,
Kevin Duh
Abstract:
We introduce a curriculum learning approach to adapt generic neural machine translation models to a specific domain. Samples are grouped by their similarities to the domain of interest and each group is fed to the training algorithm with a particular schedule. This approach is simple to implement on top of any neural framework or architecture, and consistently outperforms both unadapted and adapte…
▽ More
We introduce a curriculum learning approach to adapt generic neural machine translation models to a specific domain. Samples are grouped by their similarities to the domain of interest and each group is fed to the training algorithm with a particular schedule. This approach is simple to implement on top of any neural framework or architecture, and consistently outperforms both unadapted and adapted baselines in experiments with two distinct domains and two language pairs.
△ Less
Submitted 14 May, 2019;
originally announced May 2019.
-
An Empirical Exploration of Curriculum Learning for Neural Machine Translation
Authors:
Xuan Zhang,
Gaurav Kumar,
Huda Khayrallah,
Kenton Murray,
Jeremy Gwinnup,
Marianna J Martindale,
Paul McNamee,
Kevin Duh,
Marine Carpuat
Abstract:
Machine translation systems based on deep neural networks are expensive to train. Curriculum learning aims to address this issue by choosing the order in which samples are presented during training to help train better models faster. We adopt a probabilistic view of curriculum learning, which lets us flexibly evaluate the impact of curricula design, and perform an extensive exploration on a German…
▽ More
Machine translation systems based on deep neural networks are expensive to train. Curriculum learning aims to address this issue by choosing the order in which samples are presented during training to help train better models faster. We adopt a probabilistic view of curriculum learning, which lets us flexibly evaluate the impact of curricula design, and perform an extensive exploration on a German-English translation task. Results show that it is possible to improve convergence time at no loss in translation quality. However, results are highly sensitive to the choice of sample difficulty criteria, curriculum schedule and other hyperparameters.
△ Less
Submitted 2 November, 2018;
originally announced November 2018.
-
Freezing Subnetworks to Analyze Domain Adaptation in Neural Machine Translation
Authors:
Brian Thompson,
Huda Khayrallah,
Antonios Anastasopoulos,
Arya D. McCarthy,
Kevin Duh,
Rebecca Marvin,
Paul McNamee,
Jeremy Gwinnup,
Tim Anderson,
Philipp Koehn
Abstract:
To better understand the effectiveness of continued training, we analyze the major components of a neural machine translation system (the encoder, decoder, and each embedding space) and consider each component's contribution to, and capacity for, domain adaptation. We find that freezing any single component during continued training has minimal impact on performance, and that performance is surpri…
▽ More
To better understand the effectiveness of continued training, we analyze the major components of a neural machine translation system (the encoder, decoder, and each embedding space) and consider each component's contribution to, and capacity for, domain adaptation. We find that freezing any single component during continued training has minimal impact on performance, and that performance is surprisingly good when a single component is adapted while holding the rest of the model fixed. We also find that continued training does not move the model very far from the out-of-domain model, compared to a sensitivity analysis metric, suggesting that the out-of-domain model can provide a good generic initialization for the new domain.
△ Less
Submitted 15 January, 2019; v1 submitted 13 September, 2018;
originally announced September 2018.
-
Using of heterogeneous corpora for training of an ASR system
Authors:
Jan Trmal,
Gaurav Kumar,
Vimal Manohar,
Sanjeev Khudanpur,
Matt Post,
Paul McNamee
Abstract:
The paper summarizes the development of the LVCSR system built as a part of the Pashto speech-translation system at the SCALE (Summer Camp for Applied Language Exploration) 2015 workshop on "Speech-to-text-translation for low-resource languages". The Pashto language was chosen as a good "proxy" low-resource language, exhibiting multiple phenomena which make the speech-recognition and and speech-to…
▽ More
The paper summarizes the development of the LVCSR system built as a part of the Pashto speech-translation system at the SCALE (Summer Camp for Applied Language Exploration) 2015 workshop on "Speech-to-text-translation for low-resource languages". The Pashto language was chosen as a good "proxy" low-resource language, exhibiting multiple phenomena which make the speech-recognition and and speech-to-text-translation systems development hard.
Even when the amount of data is seemingly sufficient, given the fact that the data originates from multiple sources, the preliminary experiments reveal that there is little to no benefit in merging (concatenating) the corpora and more elaborate ways of making use of all of the data must be worked out.
This paper concentrates only on the LVCSR part and presents a range of different techniques that were found to be useful in order to benefit from multiple different corpora
△ Less
Submitted 1 June, 2017;
originally announced June 2017.
-
Interactive Knowledge Base Population
Authors:
Travis Wolfe,
Mark Dredze,
James Mayfield,
Paul McNamee,
Craig Harman,
Tim Finin,
Benjamin Van Durme
Abstract:
Most work on building knowledge bases has focused on collecting entities and facts from as large a collection of documents as possible. We argue for and describe a new paradigm where the focus is on a high-recall extraction over a small collection of documents under the supervision of a human expert, that we call Interactive Knowledge Base Population (IKBP).
Most work on building knowledge bases has focused on collecting entities and facts from as large a collection of documents as possible. We argue for and describe a new paradigm where the focus is on a high-recall extraction over a small collection of documents under the supervision of a human expert, that we call Interactive Knowledge Base Population (IKBP).
△ Less
Submitted 31 May, 2015;
originally announced June 2015.