-
FOLIO: Natural Language Reasoning with First-Order Logic
Authors:
Simeng Han,
Hailey Schoelkopf,
Yilun Zhao,
Zhenting Qi,
Martin Riddell,
Wenfei Zhou,
James Coady,
David Peng,
Yujie Qiao,
Luke Benson,
Lucy Sun,
Alex Wardle-Solano,
Hannah Szabo,
Ekaterina Zubova,
Matthew Burtell,
Jonathan Fan,
Yixin Liu,
Brian Wong,
Malcolm Sailor,
Ansong Ni,
Linyong Nan,
Jungo Kasai,
Tao Yu,
Rui Zhang,
Alexander R. Fabbri
, et al. (10 additional authors not shown)
Abstract:
Large language models (LLMs) have achieved remarkable performance on a variety of natural language understanding tasks. However, existing benchmarks are inadequate in measuring the complex logical reasoning capabilities of a model. We present FOLIO, a human-annotated, logically complex and diverse dataset for reasoning in natural language (NL), equipped with first-order logic (FOL) annotations. FO…
▽ More
Large language models (LLMs) have achieved remarkable performance on a variety of natural language understanding tasks. However, existing benchmarks are inadequate in measuring the complex logical reasoning capabilities of a model. We present FOLIO, a human-annotated, logically complex and diverse dataset for reasoning in natural language (NL), equipped with first-order logic (FOL) annotations. FOLIO consists of 1,430 examples (unique conclusions), each paired with one of 487 sets of premises used to deductively reason for the validity of each conclusion. The logical correctness of the premises and conclusions is ensured by their FOL annotations, which are automatically verified by an FOL inference engine. In addition to the main NL reasoning task, NL-FOL pairs in FOLIO constitute a new NL-FOL translation dataset. Our experiments on FOLIO systematically evaluate the FOL reasoning ability of supervised fine-tuning on medium-sized language models. For both NL reasoning and NL-FOL translation, we benchmark multiple state-of-the-art language models. Our results show that a subset of FOLIO presents a challenge for one of the most capable {Large Language Model (LLM)} publicly available, GPT-4.
△ Less
Submitted 17 May, 2024; v1 submitted 2 September, 2022;
originally announced September 2022.
-
R2D2: Robust Data-to-Text with Replacement Detection
Authors:
Linyong Nan,
Lorenzo Jaime Yu Flores,
Yilun Zhao,
Yixin Liu,
Luke Benson,
Wei** Zou,
Dragomir Radev
Abstract:
Unfaithful text generation is a common problem for text generation systems. In the case of Data-to-Text (D2T) systems, the factuality of the generated text is particularly crucial for any real-world applications. We introduce R2D2, a training framework that addresses unfaithful Data-to-Text generation by training a system both as a generator and a faithfulness discriminator with additional replace…
▽ More
Unfaithful text generation is a common problem for text generation systems. In the case of Data-to-Text (D2T) systems, the factuality of the generated text is particularly crucial for any real-world applications. We introduce R2D2, a training framework that addresses unfaithful Data-to-Text generation by training a system both as a generator and a faithfulness discriminator with additional replacement detection and unlikelihood learning tasks. To facilitate such training, we propose two methods for sampling unfaithful sentences. We argue that the poor entity retrieval capability of D2T systems is one of the primary sources of unfaithfulness, so in addition to the existing metrics, we further propose NER-based metrics to evaluate the fidelity of D2T generations. Our experimental results show that R2D2 systems could effectively mitigate the unfaithful text generation, and they achieve new state-of-the-art results on FeTaQA, LogicNLG, and ToTTo, all with significant improvements.
△ Less
Submitted 24 May, 2022;
originally announced May 2022.
-
Modelling Electron Transfers Using Quasidiabatic Hartree-Fock States
Authors:
Kristopher T. Jensen,
Raz L. Benson,
Salvatore Cardamone,
Alex J. W. Thom
Abstract:
Electron transfer processes are ubiquitous in chemistry and of great importance in many systems of biological and commercial interest. The ab-initio description of these processes remains a challenge in theoretical chemistry, partly due to the high scaling of many post-Hartree--Fock computational methods. This poses a problem for systems of interest that are not easily investigated experimentally.…
▽ More
Electron transfer processes are ubiquitous in chemistry and of great importance in many systems of biological and commercial interest. The ab-initio description of these processes remains a challenge in theoretical chemistry, partly due to the high scaling of many post-Hartree--Fock computational methods. This poses a problem for systems of interest that are not easily investigated experimentally. We show that readily available Hartree--Fock solutions can be used as a quasidiabatic basis to understand electron transfer reactions in a Marcus framework. Non-orthogonal configuration interaction calculations can be used to quantify interactions between the resulting electronic states, and to investigate the adiabatic electron transfer process. When applied to a titanium-alizarin complex used as a model of a Grätzel-type solar cell, this approach yields a correct description of the electron transfer and provides information about the electronic states involved in the process.
△ Less
Submitted 13 July, 2018;
originally announced July 2018.
-
A Quantitative Analysis of Possible Futures of Autonomous Transport
Authors:
Christopher L. Benson,
Pranav D Sumanth,
Alina P Colling
Abstract:
Autonomous ships (AS) used for cargo transport have gained a considerable amount of attention in recent years. They promise benefits such as reduced crew costs, increased safety and increased flexibility. This paper explores the effects of a faster increase in technological performance in maritime ship** achieved by leveraging fast-improving technological domains such as computer processors, and…
▽ More
Autonomous ships (AS) used for cargo transport have gained a considerable amount of attention in recent years. They promise benefits such as reduced crew costs, increased safety and increased flexibility. This paper explores the effects of a faster increase in technological performance in maritime ship** achieved by leveraging fast-improving technological domains such as computer processors, and advanced energy storage. Based on historical improvement rates of several modes of transport (Cargo Ships, Air, Rail, Trucking) a simplified Markov-chain Monte-Carlo (MCMC) simulation of an intermodal transport model (IMTM) is used to explore the effects of differing technological improvement rates for AS. The results show that the annual improvement rates of traditional ship** (Ocean Cargo Ships = 2.6%, Air Cargo = 5.5%, Trucking = 0.6%, Rail = 1.9%, Inland Water Transport = 0.4%) improve at lower rates than technologies associated with automation such as Computer Processors (35.6%), Fuel Cells (14.7%) and Automotive Autonomous Hardware (27.9%). The IMTM simulations up to the year 2050 show that the introduction of any mode of autonomous transport will increase competition in lower cost ship** options, but is unlikely to significantly alter the overall distribution of transport mode costs. Secondly, if all forms of transport end up converting to autonomous systems, then the uncertainty surrounding the improvement rates yields a complex intermodal transport solution involving several options, all at a much lower cost over time. Ultimately, the research shows a need for more accurate measurement of current autonomous transport costs and how they are changing over time.
△ Less
Submitted 5 June, 2018;
originally announced June 2018.
-
Data-Driven Investment Decision-Making: Applying Moore's Law and S-Curves to Business Strategies
Authors:
Christopher L. Benson,
Christopher L. Magee
Abstract:
This paper introduces a method for linking technological improvement rates (i.e. Moore's Law) and technology adoption curves (i.e. S-Curves). There has been considerable research surrounding Moore's Law and the generalized versions applied to the time dependence of performance for other technologies. The prior work has culminated with methodology for quantitative estimation of technological improv…
▽ More
This paper introduces a method for linking technological improvement rates (i.e. Moore's Law) and technology adoption curves (i.e. S-Curves). There has been considerable research surrounding Moore's Law and the generalized versions applied to the time dependence of performance for other technologies. The prior work has culminated with methodology for quantitative estimation of technological improvement rates for nearly any technology. This paper examines the implications of such regular time dependence for performance upon the timing of key events in the technological adoption process. We propose a simple crossover point in performance which is based upon the technological improvement rates and current level differences for target and replacement technologies. The timing for the cross-over is hypothesized as corresponding to the first 'knee'? in the technology adoption "S-curve" and signals when the market for a given technology will start to be rewarding for innovators. This is also when potential entrants are likely to intensely experiment with product-market fit and when the competition to achieve a dominant design begins. This conceptual framework is then back-tested by examining two technological changes brought about by the internet, namely music and video transmission. The uncertainty analysis around the cases highlight opportunities for organizations to reduce future technological uncertainty. Overall, the results from the case studies support the reliability and utility of the conceptual framework in strategic business decision-making with the caveat that while technical uncertainty is reduced, it is not eliminated.
△ Less
Submitted 16 May, 2018;
originally announced May 2018.