-
REXEL: An End-to-end Model for Document-Level Relation Extraction and Entity Linking
Authors:
Nacime Bouziani,
Shubhi Tyagi,
Joseph Fisher,
Jens Lehmann,
Andrea Pierleoni
Abstract:
Extracting structured information from unstructured text is critical for many downstream NLP applications and is traditionally achieved by closed information extraction (cIE). However, existing approaches for cIE suffer from two limitations: (i) they are often pipelines which makes them prone to error propagation, and/or (ii) they are restricted to sentence level which prevents them from capturing…
▽ More
Extracting structured information from unstructured text is critical for many downstream NLP applications and is traditionally achieved by closed information extraction (cIE). However, existing approaches for cIE suffer from two limitations: (i) they are often pipelines which makes them prone to error propagation, and/or (ii) they are restricted to sentence level which prevents them from capturing long-range dependencies and results in expensive inference time. We address these limitations by proposing REXEL, a highly efficient and accurate model for the joint task of document level cIE (DocIE). REXEL performs mention detection, entity ty**, entity disambiguation, coreference resolution and document-level relation classification in a single forward pass to yield facts fully linked to a reference knowledge graph. It is on average 11 times faster than competitive existing approaches in a similar setting and performs competitively both when optimised for any of the individual subtasks and a variety of combinations of different joint tasks, surpassing the baselines by an average of more than 6 F1 points. The combination of speed and accuracy makes REXEL an accurate cost-efficient system for extracting structured information at web-scale. We also release an extension of the DocRED dataset to enable benchmarking of future work on DocIE, which is available at https://github.com/amazon-science/e2e-docie.
△ Less
Submitted 19 April, 2024;
originally announced April 2024.
-
Physics-driven machine learning models coupling PyTorch and Firedrake
Authors:
Nacime Bouziani,
David A. Ham
Abstract:
Partial differential equations (PDEs) are central to describing and modelling complex physical systems that arise in many disciplines across science and engineering. However, in many realistic applications PDE modelling provides an incomplete description of the physics of interest. PDE-based machine learning techniques are designed to address this limitation. In this approach, the PDE is used as a…
▽ More
Partial differential equations (PDEs) are central to describing and modelling complex physical systems that arise in many disciplines across science and engineering. However, in many realistic applications PDE modelling provides an incomplete description of the physics of interest. PDE-based machine learning techniques are designed to address this limitation. In this approach, the PDE is used as an inductive bias enabling the coupled model to rely on fundamental physical laws while requiring less training data. The deployment of high-performance simulations coupling PDEs and machine learning to complex problems necessitates the composition of capabilities provided by machine learning and PDE-based frameworks. We present a simple yet effective coupling between the machine learning framework PyTorch and the PDE system Firedrake that provides researchers, engineers and domain specialists with a high productive way of specifying coupled models while only requiring trivial changes to existing code.
△ Less
Submitted 1 April, 2023; v1 submitted 13 March, 2023;
originally announced March 2023.
-
Esca** the abstraction: a foreign function interface for the Unified Form Language [UFL]
Authors:
Nacime Bouziani,
David A. Ham
Abstract:
High level domain specific languages for the finite element method underpin high productivity programming environments for simulations based on partial differential equations (PDE) while employing automatic code generation to achieve high performance. However, a limitation of this approach is that it does not support operators that are not directly expressible in the vector calculus. This is criti…
▽ More
High level domain specific languages for the finite element method underpin high productivity programming environments for simulations based on partial differential equations (PDE) while employing automatic code generation to achieve high performance. However, a limitation of this approach is that it does not support operators that are not directly expressible in the vector calculus. This is critical in applications where PDEs are not enough to accurately describe the physical problem of interest. The use of deep learning techniques have become increasingly popular in filling this knowledge gap, for example to include features not represented in the differential equations, or closures for unresolved spatiotemporal scales. We introduce an interface within the Firedrake finite element system that enables a seamless interface with deep learning models. This new feature composes with the automatic differentiation capabilities of Firedrake, enabling the automated solution of inverse problems. Our implementation interfaces with PyTorch and can be extended to other machine learning libraries. The resulting framework supports complex models coupling PDEs and deep learning whilst maintaining separation of concerns between application scientists and software experts.
△ Less
Submitted 1 November, 2021;
originally announced November 2021.
-
Cometary Activity Beyond The Planets
Authors:
Naceur Bouziani,
David Jewitt
Abstract:
Recent observations show activity in long-period comet C/2017 K2 at heliocentric distances beyond the orbit of Uranus. With this as motivation, we constructed a simple model that takes a detailed account of gas transport modes and simulates the time-dependent sublimation of super-volatile ice from beneath a porous mantle on an incoming cometary nucleus. The model reveals a localized increase in ca…
▽ More
Recent observations show activity in long-period comet C/2017 K2 at heliocentric distances beyond the orbit of Uranus. With this as motivation, we constructed a simple model that takes a detailed account of gas transport modes and simulates the time-dependent sublimation of super-volatile ice from beneath a porous mantle on an incoming cometary nucleus. The model reveals a localized increase in carbon monoxide (CO) sublimation close to heliocentric distance rH = 150 AU (local blackbody temperature around 23 K), followed by a plateau and then a slow increase in activity towards smaller distances. This localized increase occurs as heat transport in the nucleus transitions between two regimes characterized by the rising temperature of the CO front at larger distances and nearly isothermal CO at smaller distances. As this transition is a general property of sublimation through a porous mantle, we predict that future observations of sufficient sensitivity will show that inbound comets (and interstellar interlopers) will exhibit activity at distances far beyond the planetary region of the solar system.
△ Less
Submitted 31 October, 2021;
originally announced November 2021.
-
Optimal Order Simple Regret for Gaussian Process Bandits
Authors:
Sattar Vakili,
Nacime Bouziani,
Sepehr Jalali,
Alberto Bernacchia,
Da-shan Shiu
Abstract:
Consider the sequential optimization of a continuous, possibly non-convex, and expensive to evaluate objective function $f$. The problem can be cast as a Gaussian Process (GP) bandit where $f$ lives in a reproducing kernel Hilbert space (RKHS). The state of the art analysis of several learning algorithms shows a significant gap between the lower and upper bounds on the simple regret performance. W…
▽ More
Consider the sequential optimization of a continuous, possibly non-convex, and expensive to evaluate objective function $f$. The problem can be cast as a Gaussian Process (GP) bandit where $f$ lives in a reproducing kernel Hilbert space (RKHS). The state of the art analysis of several learning algorithms shows a significant gap between the lower and upper bounds on the simple regret performance. When $N$ is the number of exploration trials and $γ_N$ is the maximal information gain, we prove an $\tilde{\mathcal{O}}(\sqrt{γ_N/N})$ bound on the simple regret performance of a pure exploration algorithm that is significantly tighter than the existing bounds. We show that this bound is order optimal up to logarithmic factors for the cases where a lower bound on regret is known. To establish these results, we prove novel and sharp confidence intervals for GP models applicable to RKHS elements which may be of broader interest.
△ Less
Submitted 20 August, 2021;
originally announced August 2021.
-
A Unified Framework for Double Sweep Methods for the Helmholtz Equation
Authors:
Nacime Bouziani,
Frédéric Nataf,
Pierre-Henri Tournier
Abstract:
We consider swee** domain decomposition preconditioners to solve the Helmholtz equation in the case of stripwise domain decomposition with or without overlaps. We unify their derivation and convergence studies by expressing them as Jacobi, Gauss-Seidel, and Symmetric Gauss-Seidel methods for different numbering of the unknowns. The proposed framework enables theoretical comparisons between the d…
▽ More
We consider swee** domain decomposition preconditioners to solve the Helmholtz equation in the case of stripwise domain decomposition with or without overlaps. We unify their derivation and convergence studies by expressing them as Jacobi, Gauss-Seidel, and Symmetric Gauss-Seidel methods for different numbering of the unknowns. The proposed framework enables theoretical comparisons between the double sweep methods in [Nataf and Nier (1997), Vion and Geuzaine (2018)] and those in [Stolk (2013, 2017), Vion and Geuzaine (2014)]. Additionally, it facilitates the introduction of a new swee** algorithm. We provide numerical test cases to assess the validity of the theoretical studies.
△ Less
Submitted 5 December, 2023; v1 submitted 26 October, 2020;
originally announced October 2020.