Search | arXiv e-print repository

Machine-Learned Premise Selection for Lean

Authors: Bartosz Piotrowski, Ramon Fernández Mir, Edward Ayers

Abstract: We introduce a machine-learning-based tool for the Lean proof assistant that suggests relevant premises for theorems being proved by a user. The design principles for the tool are (1) tight integration with the proof assistant, (2) ease of use and installation, (3) a lightweight and fast approach. For this purpose, we designed a custom version of the random forest model, trained in an online fashi… ▽ More We introduce a machine-learning-based tool for the Lean proof assistant that suggests relevant premises for theorems being proved by a user. The design principles for the tool are (1) tight integration with the proof assistant, (2) ease of use and installation, (3) a lightweight and fast approach. For this purpose, we designed a custom version of the random forest model, trained in an online fashion. It is implemented directly in Lean, which was possible thanks to the rich and efficient metaprogramming features of Lean 4. The random forest is trained on data extracted from mathlib -- Lean's mathematics library. We experiment with various options for producing training features and labels. The advice from a trained model is accessible to the user via the suggest_premises tactic which can be called in an editor while constructing a proof interactively. △ Less

Submitted 14 June, 2023; v1 submitted 17 March, 2023; originally announced April 2023.

arXiv:2302.12433 [pdf, ps, other]

ProofNet: Autoformalizing and Formally Proving Undergraduate-Level Mathematics

Authors: Zhangir Azerbayev, Bartosz Piotrowski, Hailey Schoelkopf, Edward W. Ayers, Dragomir Radev, Jeremy Avigad

Abstract: We introduce ProofNet, a benchmark for autoformalization and formal proving of undergraduate-level mathematics. The ProofNet benchmarks consists of 371 examples, each consisting of a formal theorem statement in Lean 3, a natural language theorem statement, and a natural language proof. The problems are primarily drawn from popular undergraduate pure mathematics textbooks and cover topics such as r… ▽ More We introduce ProofNet, a benchmark for autoformalization and formal proving of undergraduate-level mathematics. The ProofNet benchmarks consists of 371 examples, each consisting of a formal theorem statement in Lean 3, a natural language theorem statement, and a natural language proof. The problems are primarily drawn from popular undergraduate pure mathematics textbooks and cover topics such as real and complex analysis, linear algebra, abstract algebra, and topology. We intend for ProofNet to be a challenging benchmark that will drive progress in autoformalization and automatic theorem proving. We report baseline results on statement autoformalization via in-context learning. Moreover, we introduce two novel statement autoformalization methods: prompt retrieval and distilled backtranslation. △ Less

Submitted 23 February, 2023; originally announced February 2023.

arXiv:2209.11559 [pdf, other]

doi 10.1609/aaai.v37i12.26717

Query-based Hard-Image Retrieval for Object Detection at Test Time

Authors: Edward Ayers, Jonathan Sadeghi, John Redford, Romain Mueller, Puneet K. Dokania

Abstract: There is a longstanding interest in capturing the error behaviour of object detectors by finding images where their performance is likely to be unsatisfactory. In real-world applications such as autonomous driving, it is also crucial to characterise potential failures beyond simple requirements of detection performance. For example, a missed detection of a pedestrian close to an ego vehicle will g… ▽ More There is a longstanding interest in capturing the error behaviour of object detectors by finding images where their performance is likely to be unsatisfactory. In real-world applications such as autonomous driving, it is also crucial to characterise potential failures beyond simple requirements of detection performance. For example, a missed detection of a pedestrian close to an ego vehicle will generally require closer inspection than a missed detection of a car in the distance. The problem of predicting such potential failures at test time has largely been overlooked in the literature and conventional approaches based on detection uncertainty fall short in that they are agnostic to such fine-grained characterisation of errors. In this work, we propose to reformulate the problem of finding "hard" images as a query-based hard image retrieval task, where queries are specific definitions of "hardness", and offer a simple and intuitive method that can solve this task for a large family of queries. Our method is entirely post-hoc, does not require ground-truth annotations, is independent of the choice of a detector, and relies on an efficient Monte Carlo estimation that uses a simple stochastic model in place of the ground-truth. We show experimentally that it can be applied successfully to a wide variety of queries for which it can reliably identify hard images for a given detector without any labelled data. We provide results on ranking and classification tasks using the widely used RetinaNet, Faster-RCNN, Mask-RCNN, and Cascade Mask-RCNN object detectors. The code for this project is available at https://github.com/fiveai/hardest. △ Less

Submitted 29 June, 2023; v1 submitted 23 September, 2022; originally announced September 2022.

Journal ref: Proceedings of the AAAI Conference on Artificial Intelligence, 37(12), 14692-14700 (2023)

arXiv:2102.06203 [pdf, other]

Proof Artifact Co-training for Theorem Proving with Language Models

Authors: Jesse Michael Han, Jason Rute, Yuhuai Wu, Edward W. Ayers, Stanislas Polu

Abstract: Labeled data for imitation learning of theorem proving in large libraries of formalized mathematics is scarce as such libraries require years of concentrated effort by human specialists to be built. This is particularly challenging when applying large Transformer language models to tactic prediction, because the scaling of performance with respect to model size is quickly disrupted in the data-sca… ▽ More Labeled data for imitation learning of theorem proving in large libraries of formalized mathematics is scarce as such libraries require years of concentrated effort by human specialists to be built. This is particularly challenging when applying large Transformer language models to tactic prediction, because the scaling of performance with respect to model size is quickly disrupted in the data-scarce, easily-overfitted regime. We propose PACT ({\bf P}roof {\bf A}rtifact {\bf C}o-{\bf T}raining), a general methodology for extracting abundant self-supervised data from kernel-level proof terms for co-training alongside the usual tactic prediction objective. We apply this methodology to Lean, an interactive proof assistant which hosts some of the most sophisticated formalized mathematics to date. We instrument Lean with a neural theorem prover driven by a Transformer language model and show that PACT improves theorem proving success rate on a held-out suite of test theorems from 32\% to 48\%. △ Less

Submitted 15 March, 2022; v1 submitted 11 February, 2021; originally announced February 2021.

arXiv:2001.02152 [pdf, other]

PaRoT: A Practical Framework for Robust Deep Neural Network Training

Authors: Edward Ayers, Francisco Eiras, Majd Hawasly, Iain Whiteside

Abstract: Deep Neural Networks (DNNs) are finding important applications in safety-critical systems such as Autonomous Vehicles (AVs), where perceiving the environment correctly and robustly is necessary for safe operation. Raising unique challenges for assurance due to their black-box nature, DNNs pose a fundamental problem for regulatory acceptance of these types of systems. Robust training --- training t… ▽ More Deep Neural Networks (DNNs) are finding important applications in safety-critical systems such as Autonomous Vehicles (AVs), where perceiving the environment correctly and robustly is necessary for safe operation. Raising unique challenges for assurance due to their black-box nature, DNNs pose a fundamental problem for regulatory acceptance of these types of systems. Robust training --- training to minimize excessive sensitivity to small changes in input --- has emerged as one promising technique to address this challenge. However, existing robust training tools are inconvenient to use or apply to existing codebases and models: they typically only support a small subset of model elements and require users to extensively rewrite the training code. In this paper we introduce a novel framework, PaRoT, developed on the popular TensorFlow platform, that greatly reduces the barrier to entry. Our framework enables robust training to be performed on arbitrary DNNs without any rewrites to the model. We demonstrate that our framework's performance is comparable to prior art, and exemplify its ease of use on off-the-shelf, trained models and its testing capabilities on a real-world industrial application: a traffic light detection network. △ Less

Submitted 25 March, 2020; v1 submitted 7 January, 2020; originally announced January 2020.

Comments: Accepted at 12th NASA Formal Methods Symposium, NFM 2020

arXiv:1907.05567 [pdf]

doi 10.1119/perc.2019.pr.Porter

A controlled study of virtual reality in first-year magnetostatics

Authors: Chris D. Porter, Jonathan Brown, Joseph R. Smith, Amber Simmons, Megan Nieberding, Abigail E. Ayers, Chris Orban

Abstract: Stereoscopic virtual reality (VR) has experienced a resurgence due to flagship products such as the Oculus Rift, HTC Vive and smartphone-based VR solutions like Google Cardboard. This is causing the question to resurface: how can stereoscopic VR be useful in instruction, if at all, and what are the pedagogical best practices for its use? To address this, and to continue our work in this sphere, we… ▽ More Stereoscopic virtual reality (VR) has experienced a resurgence due to flagship products such as the Oculus Rift, HTC Vive and smartphone-based VR solutions like Google Cardboard. This is causing the question to resurface: how can stereoscopic VR be useful in instruction, if at all, and what are the pedagogical best practices for its use? To address this, and to continue our work in this sphere, we performed a study of 289 introductory physics students who were sorted into three different treatment types: stereoscopic virtual reality, WebGL simulation, and static 2D images, each designed to provide information about magnetic fields and forces. Students were assessed using preliminary items designed to focus on heavily-3D systems. We report on assessment reliability, and on student performance. Overall, we find that students who used VR did not significantly outperform students using other treatment types. There were significant differences between sexes, as other studies have noted. Dependence on students' self-reported 3D videogame play was observed, in kee** with previous studies, but this dependence was not restricted to the VR treatment. △ Less

Submitted 11 July, 2019; originally announced July 2019.

Comments: 6 pages, 4 figures, submitted to the 2019 Physics Education Research Conference

Showing 1–6 of 6 results for author: Ayers, E