-
14 Examples of How LLMs Can Transform Materials Science and Chemistry: A Reflection on a Large Language Model Hackathon
Authors:
Kevin Maik Jablonka,
Qianxiang Ai,
Alexander Al-Feghali,
Shruti Badhwar,
Joshua D. Bocarsly,
Andres M Bran,
Stefan Bringuier,
L. Catherine Brinson,
Kamal Choudhary,
Defne Circi,
Sam Cox,
Wibe A. de Jong,
Matthew L. Evans,
Nicolas Gastellu,
Jerome Genzling,
María Victoria Gil,
Ankur K. Gupta,
Zhi Hong,
Alishba Imran,
Sabine Kruschwitz,
Anne Labarre,
Jakub Lála,
Tao Liu,
Steven Ma,
Sauradeep Majumdar
, et al. (28 additional authors not shown)
Abstract:
Large-language models (LLMs) such as GPT-4 caught the interest of many scientists. Recent studies suggested that these models could be useful in chemistry and materials science. To explore these possibilities, we organized a hackathon.
This article chronicles the projects built as part of this hackathon. Participants employed LLMs for various applications, including predicting properties of mole…
▽ More
Large-language models (LLMs) such as GPT-4 caught the interest of many scientists. Recent studies suggested that these models could be useful in chemistry and materials science. To explore these possibilities, we organized a hackathon.
This article chronicles the projects built as part of this hackathon. Participants employed LLMs for various applications, including predicting properties of molecules and materials, designing novel interfaces for tools, extracting knowledge from unstructured data, and develo** new educational applications.
The diverse topics and the fact that working prototypes could be generated in less than two days highlight that LLMs will profoundly impact the future of our fields. The rich collection of ideas and projects also indicates that the applications of LLMs are not limited to materials science and chemistry but offer potential benefits to a wide range of scientific disciplines.
△ Less
Submitted 14 July, 2023; v1 submitted 9 June, 2023;
originally announced June 2023.
-
The Logic of Logic Programming
Authors:
Marc Denecker,
David S. Warren
Abstract:
Our position is that logic programming is not programming in the Horn clause sublogic of classical logic, but programming in a logic of (inductive) definitions. Thus, the similarity between prototypical Prolog programs (e.g., member, append, ...) and how inductive definitions are expressed in mathematical text, is not coincidental but essential. We argue here that this provides a natural solution…
▽ More
Our position is that logic programming is not programming in the Horn clause sublogic of classical logic, but programming in a logic of (inductive) definitions. Thus, the similarity between prototypical Prolog programs (e.g., member, append, ...) and how inductive definitions are expressed in mathematical text, is not coincidental but essential. We argue here that this provides a natural solution to the main lingering semantic questions of Logic Programming and its extensions.
△ Less
Submitted 26 April, 2023;
originally announced April 2023.
-
Proceedings of the 2nd Workshop on Logic and Practice of Programming (LPOP)
Authors:
David S. Warren,
Peter Van Roy,
Yanhong A. Liu
Abstract:
This proceedings contains abstracts and position papers for the work presented at the second Logic and Practice of Programming (LPOP) Workshop. The workshop was held online, virtually in place of Chicago, USA, on November 15, 2010, in conjunction with the ACM SIGPLAN Conference on Systems, Programming, Languages, and Applications: Software for Humanity (SPLASH) 2020. The purpose of this workshop i…
▽ More
This proceedings contains abstracts and position papers for the work presented at the second Logic and Practice of Programming (LPOP) Workshop. The workshop was held online, virtually in place of Chicago, USA, on November 15, 2010, in conjunction with the ACM SIGPLAN Conference on Systems, Programming, Languages, and Applications: Software for Humanity (SPLASH) 2020. The purpose of this workshop is to be a bridge between different areas of computer science that use logic as a practical tool. We take advantage of the common language of formal logic to exchange ideas between these different areas.
△ Less
Submitted 17 November, 2022;
originally announced November 2022.
-
Unity Perception: Generate Synthetic Data for Computer Vision
Authors:
Steve Borkman,
Adam Crespi,
Saurav Dhakad,
Sujoy Ganguly,
Jonathan Hogins,
You-Cyuan Jhang,
Mohsen Kamalzadeh,
Bowen Li,
Steven Leal,
Pete Parisi,
Cesar Romero,
Wesley Smith,
Alex Thaman,
Samuel Warren,
Nupur Yadav
Abstract:
We introduce the Unity Perception package which aims to simplify and accelerate the process of generating synthetic datasets for computer vision tasks by offering an easy-to-use and highly customizable toolset. This open-source package extends the Unity Editor and engine components to generate perfectly annotated examples for several common computer vision tasks. Additionally, it offers an extensi…
▽ More
We introduce the Unity Perception package which aims to simplify and accelerate the process of generating synthetic datasets for computer vision tasks by offering an easy-to-use and highly customizable toolset. This open-source package extends the Unity Editor and engine components to generate perfectly annotated examples for several common computer vision tasks. Additionally, it offers an extensible Randomization framework that lets the user quickly construct and configure randomized simulation parameters in order to introduce variation into the generated datasets. We provide an overview of the provided tools and how they work, and demonstrate the value of the generated synthetic datasets by training a 2D object detection model. The model trained with mostly synthetic data outperforms the model trained using only real data.
△ Less
Submitted 19 July, 2021; v1 submitted 9 July, 2021;
originally announced July 2021.
-
LPOP: Challenges and Advances in Logic and Practice of Programming
Authors:
David S. Warren,
Yanhong A. Liu
Abstract:
This article describes the work presented at the first Logic and Practice of Programming (LPOP) Workshop, which was held in Oxford, UK, on July 18, 2018, in conjunction with the Federated Logic Conference (FLoC) 2018. Its focus is challenges and advances in logic and practice of programming. The workshop was organized around a challenge problem that specifies issues in role-based access control (R…
▽ More
This article describes the work presented at the first Logic and Practice of Programming (LPOP) Workshop, which was held in Oxford, UK, on July 18, 2018, in conjunction with the Federated Logic Conference (FLoC) 2018. Its focus is challenges and advances in logic and practice of programming. The workshop was organized around a challenge problem that specifies issues in role-based access control (RBAC), with many participants proposing combined imperative and declarative solutions expressed in the languages of their choice.
△ Less
Submitted 15 August, 2020;
originally announced August 2020.
-
Mobility Changes in Response to COVID-19
Authors:
Michael S. Warren,
Samuel W. Skillman
Abstract:
In response to the COVID-19 pandemic, both voluntary changes in behavior and administrative restrictions on human interactions have occurred. These actions are intended to reduce the transmission rate of the severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2). We use anonymized and/or de-identified mobile device locations to measure mobility, a statistic representing the distance a typica…
▽ More
In response to the COVID-19 pandemic, both voluntary changes in behavior and administrative restrictions on human interactions have occurred. These actions are intended to reduce the transmission rate of the severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2). We use anonymized and/or de-identified mobile device locations to measure mobility, a statistic representing the distance a typical member of a given population moves in a day. Results indicate that a large reduction in mobility has taken place, both in the US and globally. In the United States, large mobility reductions have been detected associated with the onset of the COVID-19 threat and specific government directives. Mobility data at the US admin1 (state) and admin2 (county) level have been made freely available under a Creative Commons Attribution (CC BY 4.0) license via the GitHub repository https://github.com/descarteslabs/DL-COVID-19/
△ Less
Submitted 31 March, 2020;
originally announced March 2020.
-
Visual search over billions of aerial and satellite images
Authors:
Ryan Keisler,
Samuel W. Skillman,
Sunny Gonnabathula,
Justin Poehnelt,
Xander Rudelis,
Michael S. Warren
Abstract:
We present a system for performing visual search over billions of aerial and satellite images. The purpose of visual search is to find images that are visually similar to a query image. We define visual similarity using 512 abstract visual features generated by a convolutional neural network that has been trained on aerial and satellite imagery. The features are converted to binary values to reduc…
▽ More
We present a system for performing visual search over billions of aerial and satellite images. The purpose of visual search is to find images that are visually similar to a query image. We define visual similarity using 512 abstract visual features generated by a convolutional neural network that has been trained on aerial and satellite imagery. The features are converted to binary values to reduce data and compute requirements. We employ a hash-based search using Bigtable, a scalable database service from Google Cloud. Searching the continental United States at 1-meter pixel resolution, corresponding to approximately 2 billion images, takes approximately 0.1 seconds. This system enables real-time visual search over the surface of the earth, and an interactive demo is available at https://search.descarteslabs.com.
△ Less
Submitted 6 February, 2020;
originally announced February 2020.
-
Top-down and Bottom-up Evaluation Procedurally Integrated
Authors:
David S. Warren
Abstract:
This paper describes how XSB combines top-down and bottom-up computation through the mechanisms of variant tabling and subsumptive tabling with abstraction, respectively.
It is well known that top-down evaluation of logical rules in Prolog has a procedural interpretation as recursive procedure invocation (Kowalski 1986). Tabling adds the intuition of short-circuiting redundant computations (Warr…
▽ More
This paper describes how XSB combines top-down and bottom-up computation through the mechanisms of variant tabling and subsumptive tabling with abstraction, respectively.
It is well known that top-down evaluation of logical rules in Prolog has a procedural interpretation as recursive procedure invocation (Kowalski 1986). Tabling adds the intuition of short-circuiting redundant computations (Warren 1992) .This paper shows how to introduce into tabled logic program evaluation a bottom-up component, whose procedural intuition is the initialization of a data structure, in which a relation is initially computed and filled, on first demand, and then used throughout the remainder of a larger computation for efficient lookup. This allows many Prolog programs to be expressed fully declaratively, programs which formerly required procedural features, such as assert, to be made efficient.
This paper is under consideration for acceptance in "Theory and Practice of Logic Programming (TPLP)".
△ Less
Submitted 23 April, 2018;
originally announced April 2018.
-
AppLP: A Dialogue on Applications of Logic Programming
Authors:
David S. Warren,
Yanhong A. Liu
Abstract:
This document describes the contributions of the 2016 Applications of Logic Programming Workshop (AppLP), which was held on October 17 and associated with the International Conference on Logic Programming (ICLP) in Flushing, New York City.
This document describes the contributions of the 2016 Applications of Logic Programming Workshop (AppLP), which was held on October 17 and associated with the International Conference on Logic Programming (ICLP) in Flushing, New York City.
△ Less
Submitted 7 April, 2017;
originally announced April 2017.
-
Data-Intensive Supercomputing in the Cloud: Global Analytics for Satellite Imagery
Authors:
Michael S. Warren,
Samuel W. Skillman,
Rick Chartrand,
Tim Kelton,
Ryan Keisler,
David Raleigh,
Matthew Turk
Abstract:
We present our experiences using cloud computing to support data-intensive analytics on satellite imagery for commercial applications. Drawing from our background in high-performance computing, we draw parallels between the early days of clustered computing systems and the current state of cloud computing and its potential to disrupt the HPC market. Using our own virtual file system layer on top o…
▽ More
We present our experiences using cloud computing to support data-intensive analytics on satellite imagery for commercial applications. Drawing from our background in high-performance computing, we draw parallels between the early days of clustered computing systems and the current state of cloud computing and its potential to disrupt the HPC market. Using our own virtual file system layer on top of cloud remote object storage, we demonstrate aggregate read bandwidth of 230 gigabytes per second using 512 Google Compute Engine (GCE) nodes accessing a USA multi-region standard storage bucket. This figure is comparable to the best HPC storage systems in existence. We also present several of our application results, including the identification of field boundaries in Ukraine, and the generation of a global cloud-free base layer from Landsat imagery.
△ Less
Submitted 13 February, 2017;
originally announced February 2017.
-
Pulse processing routines for neutron time-of-flight data
Authors:
P. Žugec,
C. Weiß,
C. Guerrero,
F. Gunsing,
V. Vlachoudis,
M. Sabate-Gilarte,
A. Stamatopoulos,
T. Wright,
J. Lerendegui-Marco,
F. Mingrone,
J. A. Ryan,
S. G. Warren,
A. Tsinganis,
M. Barbagallo
Abstract:
A pulse shape analysis framework is described, which was developed for n_TOF-Phase3, the third phase in the operation of the n_TOF facility at CERN. The most notable feature of this new framework is the adoption of generic pulse shape analysis routines, characterized by a minimal number of explicit assumptions about the nature of pulses. The aim of these routines is to be applicable to a wide vari…
▽ More
A pulse shape analysis framework is described, which was developed for n_TOF-Phase3, the third phase in the operation of the n_TOF facility at CERN. The most notable feature of this new framework is the adoption of generic pulse shape analysis routines, characterized by a minimal number of explicit assumptions about the nature of pulses. The aim of these routines is to be applicable to a wide variety of detectors, thus facilitating the introduction of the new detectors or types of detectors into the analysis framework. The operational details of the routines are suited to the specific requirements of particular detectors by adjusting the set of external input parameters. Pulse recognition, baseline calculation and the pulse shape fitting procedure are described. Special emphasis is put on their computational efficiency, since the most basic implementations of these conceptually simple methods are often computationally inefficient.
△ Less
Submitted 18 January, 2016;
originally announced January 2016.
-
2HOT: An Improved Parallel Hashed Oct-Tree N-Body Algorithm for Cosmological Simulation
Authors:
Michael S. Warren
Abstract:
We report on improvements made over the past two decades to our adaptive treecode N-body method (HOT). A mathematical and computational approach to the cosmological N-body problem is described, with performance and scalability measured up to 256k ($2^{18}$) processors. We present error analysis and scientific application results from a series of more than ten 69 billion ($4096^3$) particle cosmolo…
▽ More
We report on improvements made over the past two decades to our adaptive treecode N-body method (HOT). A mathematical and computational approach to the cosmological N-body problem is described, with performance and scalability measured up to 256k ($2^{18}$) processors. We present error analysis and scientific application results from a series of more than ten 69 billion ($4096^3$) particle cosmological simulations, accounting for $4 \times 10^{20}$ floating point operations. These results include the first simulations using the new constraints on the standard model of cosmology from the Planck satellite. Our simulations set a new standard for accuracy and scientific throughput, while meeting or exceeding the computational efficiency of the latest generation of hybrid TreePM N-body methods.
△ Less
Submitted 16 October, 2013;
originally announced October 2013.
-
Efficiently Retrieving Function Dependencies in the Linux Kernel Using XSB
Authors:
Spyros Hadjichristodoulou,
Donald E. Porter,
David S. Warren
Abstract:
In this paper we investigate XSB-Prolog as a static analysis engine for data represented by medium-sized graphs. We use XSB-Prolog to automatically identify function dependencies in the Linux Kernel---queries that are difficult to implement efficiently in a commodity database and that developers often have to identify manually. This project illustrates that Prolog systems are ideal for building to…
▽ More
In this paper we investigate XSB-Prolog as a static analysis engine for data represented by medium-sized graphs. We use XSB-Prolog to automatically identify function dependencies in the Linux Kernel---queries that are difficult to implement efficiently in a commodity database and that developers often have to identify manually. This project illustrates that Prolog systems are ideal for building tools for use in other disciplines that require sophisticated inferences, because Prolog is both declarative and can efficiently implement complex problem specifications through tabling and indexing.
△ Less
Submitted 19 August, 2013;
originally announced August 2013.
-
Interning Ground Terms in XSB
Authors:
David S. Warren
Abstract:
This paper presents an implementation of interning of ground terms in the XSB Tabled Prolog system. This is related to the idea of hash-consing. I describe the concept of interning atoms and discuss the issues around interning ground structured terms, motivating why tabling Prolog systems may change the cost-benefit tradeoffs from those of traditional Prolog systems. I describe the details of the…
▽ More
This paper presents an implementation of interning of ground terms in the XSB Tabled Prolog system. This is related to the idea of hash-consing. I describe the concept of interning atoms and discuss the issues around interning ground structured terms, motivating why tabling Prolog systems may change the cost-benefit tradeoffs from those of traditional Prolog systems. I describe the details of the implementation of interning ground terms in the XSB Tabled Prolog System and show some of its performance properties. This implementation achieves the effects of that of Zhou and Have but is tuned for XSB's representations and is arguably simpler.
△ Less
Submitted 17 July, 2013;
originally announced July 2013.
-
XSB: Extending Prolog with Tabled Logic Programming
Authors:
Terrance Swift,
David S. Warren
Abstract:
The paradigm of Tabled Logic Programming (TLP) is now supported by a number of Prolog systems, including XSB, YAP Prolog, B-Prolog, Mercury, ALS, and Ciao. The reasons for this are partly theoretical: tabling ensures termination and optimal known complexity for queries to a large class of programs. However the overriding reasons are practical. TLP allows sophisticated programs to be written concis…
▽ More
The paradigm of Tabled Logic Programming (TLP) is now supported by a number of Prolog systems, including XSB, YAP Prolog, B-Prolog, Mercury, ALS, and Ciao. The reasons for this are partly theoretical: tabling ensures termination and optimal known complexity for queries to a large class of programs. However the overriding reasons are practical. TLP allows sophisticated programs to be written concisely and efficiently, especially when mechanisms such as tabled negation and call and answer subsumption are supported. As a result TLP has now been used in a variety of applications from program analysis to querying over the semantic web. This paper provides a survey of TLP and its applications as implemented in XSB Prolog, along with discussion of how XSB supports tabling with dynamically changing code, and in a multi-threaded environment.
△ Less
Submitted 22 December, 2010;
originally announced December 2010.
-
Swap** Evaluation: A Memory-Scalable Solution for Answer-On-Demand Tabling
Authors:
Pablo Chico de Guzman,
Manuel Carro,
David S. Warren
Abstract:
One of the differences among the various approaches to suspension-based tabled evaluation is the scheduling strategy. The two most popular strategies are local and batched evaluation.
The former collects all the solutions to a tabled predicate before making any one of them available outside the tabled computation. The latter returns answers one by one before computing them all, which in principl…
▽ More
One of the differences among the various approaches to suspension-based tabled evaluation is the scheduling strategy. The two most popular strategies are local and batched evaluation.
The former collects all the solutions to a tabled predicate before making any one of them available outside the tabled computation. The latter returns answers one by one before computing them all, which in principle is better if only one answer (or a subset of the answers) is desired.
Batched evaluation is closer to SLD evaluation in that it computes solutions lazily as they are demanded, but it may need arbitrarily more memory than local evaluation, which is able to reclaim memory sooner. Some programs which in practice can be executed under the local strategy quickly run out of memory under batched evaluation. This has led to the general adoption of local evaluation at the expense of the more depth-first batched strategy.
In this paper we study the reasons for the high memory consumption of batched evaluation and propose a new scheduling strategy which we have termed swap** evaluation. Swap** evaluation also returns answers one by one before completing a tabled call, but its memory usage can be orders of magnitude less than batched evaluation. An experimental implementation in the XSB system shows that swap** evaluation is a feasible memory-scalable strategy that need not compromise execution speed.
△ Less
Submitted 22 July, 2010;
originally announced July 2010.
-
Managing Information for Sparsely Distributed Articles and Readers: The Virtual Journals of the Joint Institute for Nuclear Astrophysics (JINA)
Authors:
Richard H. Cyburt,
Sam M. Austin,
Timothy C. Beers,
Alfredo Estrade,
Ryan M. Ferguson,
A. Sakharuk,
Karl Smith,
Scott Warren
Abstract:
The research area of nuclear astrophysics is characterized by a need for information published in tens of journals in several fields and an extremely dilute distribution of researchers. For these reasons it is difficult for researchers, especially students, to be adequately informed of the relevant published research. For example, the commonly employed journal club is inefficient for a group con…
▽ More
The research area of nuclear astrophysics is characterized by a need for information published in tens of journals in several fields and an extremely dilute distribution of researchers. For these reasons it is difficult for researchers, especially students, to be adequately informed of the relevant published research. For example, the commonly employed journal club is inefficient for a group consisting of a professor and his two students. In an attempt to address this problem, we have developed a virtual journal (VJ), a process for collecting and distributing a weekly compendium of articles of interest to researchers in nuclear astrophysics. Subscribers are notified of each VJ issue using an email-list server or an RSS feed. The VJ data base is searchable by topics assigned by the editors, or by keywords. There are two related VJs: the Virtual Journal of Nuclear Astrophysics (JINA VJ), and the SEGUE Virtual Journal (SEGUE VJ). The JINA VJ also serves as a source of new experimental and theoretical information for the JINA REACLIB reaction rate database. References to review articles and popular level articles provide an introduction to the literature for students. The VJs and support information are available at http://groups.nscl.msu.edu/**a/journals
△ Less
Submitted 16 July, 2009;
originally announced July 2009.
-
TCHR: a framework for tabled CLP
Authors:
Tom Schrijvers,
Bart Demoen,
David S. Warren
Abstract:
Tabled Constraint Logic Programming is a powerful execution mechanism for dealing with Constraint Logic Programming without worrying about fixpoint computation. Various applications, e.g in the fields of program analysis and model checking, have been proposed. Unfortunately, a high-level system for develo** new applications is lacking, and programmers are forced to resort to complicated ad hoc…
▽ More
Tabled Constraint Logic Programming is a powerful execution mechanism for dealing with Constraint Logic Programming without worrying about fixpoint computation. Various applications, e.g in the fields of program analysis and model checking, have been proposed. Unfortunately, a high-level system for develo** new applications is lacking, and programmers are forced to resort to complicated ad hoc solutions.
This papers presents TCHR, a high-level framework for tabled Constraint Logic Programming. It integrates in a light-weight manner Constraint Handling Rules (CHR), a high-level language for constraint solvers, with tabled Logic Programming. The framework is easily instantiated with new application-specific constraint domains. Various high-level operations can be instantiated to control performance. In particular, we propose a novel, generalized technique for compacting answer sets.
△ Less
Submitted 26 December, 2007;
originally announced December 2007.
-
An Environment for the Exploration of Non Monotonic Logic Programs
Authors:
Luis F. Castro,
David S. Warren
Abstract:
Stable Model Semantics and Well Founded Semantics have been shown to be very useful in several applications of non-monotonic reasoning. However, Stable Models presents a high computational complexity, whereas Well Founded Semantics is easy to compute and provides an approximation of Stable Models. Efficient engines exist for both semantics of logic programs. This work presents a computational in…
▽ More
Stable Model Semantics and Well Founded Semantics have been shown to be very useful in several applications of non-monotonic reasoning. However, Stable Models presents a high computational complexity, whereas Well Founded Semantics is easy to compute and provides an approximation of Stable Models. Efficient engines exist for both semantics of logic programs. This work presents a computational integration of two of such systems, namely XSB and SMODELS. The resulting system is called XNMR, and provides an interactive system for the exploration of both semantics. Aspects such as modularity can be exploited in order to ease debugging of large knowledge bases with the usual Prolog debugging techniques and an interactive environment. Besides, the use of a full Prolog system as a front-end to a Stable Models engine augments the language usually accepted by such systems.
△ Less
Submitted 19 November, 2001;
originally announced November 2001.