-
The Ethics of Advanced AI Assistants
Authors:
Iason Gabriel,
Arianna Manzini,
Geoff Keeling,
Lisa Anne Hendricks,
Verena Rieser,
Hasan Iqbal,
Nenad Tomašev,
Ira Ktena,
Zachary Kenton,
Mikel Rodriguez,
Seliem El-Sayed,
Sasha Brown,
Canfer Akbulut,
Andrew Trask,
Edward Hughes,
A. Stevie Bergman,
Renee Shelby,
Nahema Marchal,
Conor Griffin,
Juan Mateos-Garcia,
Laura Weidinger,
Winnie Street,
Benjamin Lange,
Alex Ingerman,
Alison Lentz
, et al. (32 additional authors not shown)
Abstract:
This paper focuses on the opportunities and the ethical and societal risks posed by advanced AI assistants. We define advanced AI assistants as artificial agents with natural language interfaces, whose function is to plan and execute sequences of actions on behalf of a user, across one or more domains, in line with the user's expectations. The paper starts by considering the technology itself, pro…
▽ More
This paper focuses on the opportunities and the ethical and societal risks posed by advanced AI assistants. We define advanced AI assistants as artificial agents with natural language interfaces, whose function is to plan and execute sequences of actions on behalf of a user, across one or more domains, in line with the user's expectations. The paper starts by considering the technology itself, providing an overview of AI assistants, their technical foundations and potential range of applications. It then explores questions around AI value alignment, well-being, safety and malicious uses. Extending the circle of inquiry further, we next consider the relationship between advanced AI assistants and individual users in more detail, exploring topics such as manipulation and persuasion, anthropomorphism, appropriate relationships, trust and privacy. With this analysis in place, we consider the deployment of advanced assistants at a societal scale, focusing on cooperation, equity and access, misinformation, economic impact, the environment and how best to evaluate advanced AI assistants. Finally, we conclude by providing a range of recommendations for researchers, developers, policymakers and public stakeholders.
△ Less
Submitted 28 April, 2024; v1 submitted 24 April, 2024;
originally announced April 2024.
-
A Mechanism-Based Approach to Mitigating Harms from Persuasive Generative AI
Authors:
Seliem El-Sayed,
Canfer Akbulut,
Amanda McCroskery,
Geoff Keeling,
Zachary Kenton,
Zaria Jalan,
Nahema Marchal,
Arianna Manzini,
Toby Shevlane,
Shannon Vallor,
Daniel Susser,
Matija Franklin,
Sophie Bridgers,
Harry Law,
Matthew Rahtz,
Murray Shanahan,
Michael Henry Tessler,
Arthur Douillard,
Tom Everitt,
Sasha Brown
Abstract:
Recent generative AI systems have demonstrated more advanced persuasive capabilities and are increasingly permeating areas of life where they can influence decision-making. Generative AI presents a new risk profile of persuasion due the opportunity for reciprocal exchange and prolonged interactions. This has led to growing concerns about harms from AI persuasion and how they can be mitigated, high…
▽ More
Recent generative AI systems have demonstrated more advanced persuasive capabilities and are increasingly permeating areas of life where they can influence decision-making. Generative AI presents a new risk profile of persuasion due the opportunity for reciprocal exchange and prolonged interactions. This has led to growing concerns about harms from AI persuasion and how they can be mitigated, highlighting the need for a systematic study of AI persuasion. The current definitions of AI persuasion are unclear and related harms are insufficiently studied. Existing harm mitigation approaches prioritise harms from the outcome of persuasion over harms from the process of persuasion. In this paper, we lay the groundwork for the systematic study of AI persuasion. We first put forward definitions of persuasive generative AI. We distinguish between rationally persuasive generative AI, which relies on providing relevant facts, sound reasoning, or other forms of trustworthy evidence, and manipulative generative AI, which relies on taking advantage of cognitive biases and heuristics or misrepresenting information. We also put forward a map of harms from AI persuasion, including definitions and examples of economic, physical, environmental, psychological, sociocultural, political, privacy, and autonomy harm. We then introduce a map of mechanisms that contribute to harmful persuasion. Lastly, we provide an overview of approaches that can be used to mitigate against process harms of persuasion, including prompt engineering for manipulation classification and red teaming. Future work will operationalise these mitigations and study the interaction between different types of mechanisms of persuasion.
△ Less
Submitted 23 April, 2024;
originally announced April 2024.
-
SMC Is All You Need: Parallel Strong Scaling
Authors:
Xinzhu Liang,
Joseph M. Lukens,
Sanjaya Lohani,
Brian T. Kirby,
Thomas A. Searles,
Kody J. H. Law
Abstract:
The Bayesian posterior distribution can only be evaluated up-to a constant of proportionality, which makes simulation and consistent estimation challenging. Classical consistent Bayesian methods such as sequential Monte Carlo (SMC) and Markov chain Monte Carlo (MCMC) have unbounded time complexity requirements. We develop a fully parallel sequential Monte Carlo (pSMC) method which provably deliver…
▽ More
The Bayesian posterior distribution can only be evaluated up-to a constant of proportionality, which makes simulation and consistent estimation challenging. Classical consistent Bayesian methods such as sequential Monte Carlo (SMC) and Markov chain Monte Carlo (MCMC) have unbounded time complexity requirements. We develop a fully parallel sequential Monte Carlo (pSMC) method which provably delivers parallel strong scaling, i.e. the time complexity (and per-node memory) remains bounded if the number of asynchronous processes is allowed to grow. More precisely, the pSMC has a theoretical convergence rate of Mean Square Error (MSE)$ = O(1/NP)$, where $N$ denotes the number of communicating samples in each processor and $P$ denotes the number of processors. In particular, for suitably-large problem-dependent $N$, as $P \rightarrow \infty$ the method converges to infinitesimal accuracy MSE$=O(\varepsilon^2)$ with a fixed finite time-complexity Cost$=O(1)$ and with no efficiency leakage, i.e. computational complexity Cost$=O(\varepsilon^{-2})$. A number of Bayesian inference problems are taken into consideration to compare the pSMC and MCMC methods.
△ Less
Submitted 2 June, 2024; v1 submitted 8 February, 2024;
originally announced February 2024.
-
Accelerating Look-ahead in Bayesian Optimization: Multilevel Monte Carlo is All you Need
Authors:
Shangda Yang,
Vitaly Zankin,
Maximilian Balandat,
Stefan Scherer,
Kevin Carlberg,
Neil Walton,
Kody J. H. Law
Abstract:
We leverage multilevel Monte Carlo (MLMC) to improve the performance of multi-step look-ahead Bayesian optimization (BO) methods that involve nested expectations and maximizations. Often these expectations must be computed by Monte Carlo (MC). The complexity rate of naive MC degrades for nested operations, whereas MLMC is capable of achieving the canonical MC convergence rate for this type of prob…
▽ More
We leverage multilevel Monte Carlo (MLMC) to improve the performance of multi-step look-ahead Bayesian optimization (BO) methods that involve nested expectations and maximizations. Often these expectations must be computed by Monte Carlo (MC). The complexity rate of naive MC degrades for nested operations, whereas MLMC is capable of achieving the canonical MC convergence rate for this type of problem, independently of dimension and without any smoothness assumptions. Our theoretical study focuses on the approximation improvements for twoand three-step look-ahead acquisition functions, but, as we discuss, the approach is generalizable in various ways, including beyond the context of BO. Our findings are verified numerically and the benefits of MLMC for BO are illustrated on several benchmark examples. Code is available at https://github.com/Shangda-Yang/MLMCBO .
△ Less
Submitted 25 June, 2024; v1 submitted 3 February, 2024;
originally announced February 2024.
-
Longest-chain Attacks: Difficulty Adjustment and Timestamp Verifiability
Authors:
Tzuo Hann Law,
Selman Erol,
Lewis Tseng
Abstract:
We study an adversary who attacks a Proof-of-Work (POW) blockchain by selfishly constructing an alternative longest chain. We characterize optimal strategies employed by the adversary when a difficulty adjustment rule alà Bitcoin applies. As time (namely the times-tamp specified in each block) in most permissionless POW blockchains is somewhat subjective, we focus on two extreme scenarios: when ti…
▽ More
We study an adversary who attacks a Proof-of-Work (POW) blockchain by selfishly constructing an alternative longest chain. We characterize optimal strategies employed by the adversary when a difficulty adjustment rule alà Bitcoin applies. As time (namely the times-tamp specified in each block) in most permissionless POW blockchains is somewhat subjective, we focus on two extreme scenarios: when time is completely verifiable, and when it is completely unverifiable. We conclude that an adversary who faces a difficulty adjustment rule will find a longest-chain attack very challenging when timestamps are verifiable. POW blockchains with frequent difficulty adjustments relative to time reporting flexibility will be substantially more vulnerable to longest-chain attacks. Our main fining provides guidance on the design of difficulty adjustment rules and demonstrates the importance of timestamp verifiability.
△ Less
Submitted 29 August, 2023;
originally announced August 2023.
-
Infinite Photorealistic Worlds using Procedural Generation
Authors:
Alexander Raistrick,
Lahav Lipson,
Zeyu Ma,
Lingjie Mei,
Mingzhe Wang,
Yiming Zuo,
Karhan Kayan,
Hongyu Wen,
Beining Han,
Yihan Wang,
Alejandro Newell,
Hei Law,
Ankit Goyal,
Kaiyu Yang,
Jia Deng
Abstract:
We introduce Infinigen, a procedural generator of photorealistic 3D scenes of the natural world. Infinigen is entirely procedural: every asset, from shape to texture, is generated from scratch via randomized mathematical rules, using no external source and allowing infinite variation and composition. Infinigen offers broad coverage of objects and scenes in the natural world including plants, anima…
▽ More
We introduce Infinigen, a procedural generator of photorealistic 3D scenes of the natural world. Infinigen is entirely procedural: every asset, from shape to texture, is generated from scratch via randomized mathematical rules, using no external source and allowing infinite variation and composition. Infinigen offers broad coverage of objects and scenes in the natural world including plants, animals, terrains, and natural phenomena such as fire, cloud, rain, and snow. Infinigen can be used to generate unlimited, diverse training data for a wide range of computer vision tasks including object detection, semantic segmentation, optical flow, and 3D reconstruction. We expect Infinigen to be a useful resource for computer vision research and beyond. Please visit https://infinigen.org for videos, code and pre-generated data.
△ Less
Submitted 26 June, 2023; v1 submitted 15 June, 2023;
originally announced June 2023.
-
Generating Dispatching Rules for the Interrupting Swap-Allowed Blocking Job Shop Problem Using Graph Neural Network and Reinforcement Learning
Authors:
Vivian W. H. Wong,
Sang Hun Kim,
Junyoung Park,
**kyoo Park,
Kincho H. Law
Abstract:
The interrupting swap-allowed blocking job shop problem (ISBJSSP) is a complex scheduling problem that is able to model many manufacturing planning and logistics applications realistically by addressing both the lack of storage capacity and unforeseen production interruptions. Subjected to random disruptions due to machine malfunction or maintenance, industry production settings often choose to ad…
▽ More
The interrupting swap-allowed blocking job shop problem (ISBJSSP) is a complex scheduling problem that is able to model many manufacturing planning and logistics applications realistically by addressing both the lack of storage capacity and unforeseen production interruptions. Subjected to random disruptions due to machine malfunction or maintenance, industry production settings often choose to adopt dispatching rules to enable adaptive, real-time re-scheduling, rather than traditional methods that require costly re-computation on the new configuration every time the problem condition changes dynamically. To generate dispatching rules for the ISBJSSP problem, we introduce a dynamic disjunctive graph formulation characterized by nodes and edges subjected to continuous deletions and additions. This formulation enables the training of an adaptive scheduler utilizing graph neural networks and reinforcement learning. Furthermore, a simulator is developed to simulate interruption, swap**, and blocking in the ISBJSSP setting. Employing a set of reported benchmark instances, we conduct a detailed experimental study on ISBJSSP instances with a range of machine shutdown probabilities to show that the scheduling policies generated can outperform or are at least as competitive as existing dispatching rules with predetermined priority. This study shows that the ISBJSSP, which requires real-time adaptive solutions, can be scheduled efficiently with the proposed method when production interruptions occur with random machine shutdowns.
△ Less
Submitted 28 September, 2023; v1 submitted 5 February, 2023;
originally announced February 2023.
-
A randomized multi-index sequential Monte Carlo method
Authors:
Xinzhu Liang,
Shangda Yang,
Simon L. Cotter,
Kody J. H. Law
Abstract:
We consider the problem of estimating expectations with respect to a target distribution with an unknown normalizing constant, and where even the unnormalized target needs to be approximated at finite resolution. Under such an assumption, this work builds upon a recently introduced multi-index Sequential Monte Carlo (SMC) ratio estimator, which provably enjoys the complexity improvements of multi-…
▽ More
We consider the problem of estimating expectations with respect to a target distribution with an unknown normalizing constant, and where even the unnormalized target needs to be approximated at finite resolution. Under such an assumption, this work builds upon a recently introduced multi-index Sequential Monte Carlo (SMC) ratio estimator, which provably enjoys the complexity improvements of multi-index Monte Carlo (MIMC) and the efficiency of SMC for inference. The present work leverages a randomization strategy to remove bias entirely, which simplifies estimation substantially, particularly in the MIMC context, where the choice of index set is otherwise important. Under reasonable assumptions, the proposed method provably achieves the same canonical complexity of MSE$^{-1}$ as the original method (where MSE is mean squared error), but without discretization bias. It is illustrated on examples of Bayesian inverse and spatial statistics problems.
△ Less
Submitted 28 June, 2023; v1 submitted 27 October, 2022;
originally announced October 2022.
-
Label-Free Synthetic Pretraining of Object Detectors
Authors:
Hei Law,
Jia Deng
Abstract:
We propose a new approach, Synthetic Optimized Layout with Instance Detection (SOLID), to pretrain object detectors with synthetic images. Our "SOLID" approach consists of two main components: (1) generating synthetic images using a collection of unlabelled 3D models with optimized scene arrangement; (2) pretraining an object detector on "instance detection" task - given a query image depicting an…
▽ More
We propose a new approach, Synthetic Optimized Layout with Instance Detection (SOLID), to pretrain object detectors with synthetic images. Our "SOLID" approach consists of two main components: (1) generating synthetic images using a collection of unlabelled 3D models with optimized scene arrangement; (2) pretraining an object detector on "instance detection" task - given a query image depicting an object, detecting all instances of the exact same object in a target image. Our approach does not need any semantic labels for pretraining and allows the use of arbitrary, diverse 3D models. Experiments on COCO show that with optimized data generation and a proper pretraining task, synthetic data can be highly effective data for pretraining object detectors. In particular, pretraining on rendered images achieves performance competitive with pretraining on real images while using significantly less computing resources. Code is available at https://github.com/princeton-vl/SOLID.
△ Less
Submitted 8 August, 2022;
originally announced August 2022.
-
Digital Fingerprinting of Microstructures
Authors:
Michael D. White,
Alexander Tarakanov,
Christopher P. Race,
Philip J. Withers,
Kody J. H. Law
Abstract:
Finding efficient means of fingerprinting microstructural information is a critical step towards harnessing data-centric machine learning approaches. A statistical framework is systematically developed for compressed characterisation of a population of images, which includes some classical computer vision methods as special cases. The focus is on materials microstructure. The ultimate purpose is t…
▽ More
Finding efficient means of fingerprinting microstructural information is a critical step towards harnessing data-centric machine learning approaches. A statistical framework is systematically developed for compressed characterisation of a population of images, which includes some classical computer vision methods as special cases. The focus is on materials microstructure. The ultimate purpose is to rapidly fingerprint sample images in the context of various high-throughput design/make/test scenarios. This includes, but is not limited to, quantification of the disparity between microstructures for quality control, classifying microstructures, predicting materials properties from image data and identifying potential processing routes to engineer new materials with specific properties. Here, we consider microstructure classification and utilise the resulting features over a range of related machine learning tasks, namely supervised, semi-supervised, and unsupervised learning.
The approach is applied to two distinct datasets to illustrate various aspects and some recommendations are made based on the findings. In particular, methods that leverage transfer learning with convolutional neural networks (CNNs), pretrained on the ImageNet dataset, are generally shown to outperform other methods. Additionally, dimensionality reduction of these CNN-based fingerprints is shown to have negligible impact on classification accuracy for the supervised learning approaches considered. In situations where there is a large dataset with only a handful of images labelled, graph-based label propagation to unlabelled data is shown to be favourable over discarding unlabelled data and performing supervised learning. In particular, label propagation by Poisson learning is shown to be highly effective at low label rates.
△ Less
Submitted 22 January, 2024; v1 submitted 25 March, 2022;
originally announced March 2022.
-
Multilevel Bayesian Deep Neural Networks
Authors:
Neil K. Chada,
Ajay Jasra,
Kody J. H. Law,
Sumeetpal S. Singh
Abstract:
In this article we consider Bayesian inference associated to deep neural networks (DNNs) and in particular, trace-class neural network (TNN) priors which were proposed by Sell et al. [39]. Such priors were developed as more robust alternatives to classical architectures in the context of inference problems. For this work we develop multilevel Monte Carlo (MLMC) methods for such models. MLMC is a p…
▽ More
In this article we consider Bayesian inference associated to deep neural networks (DNNs) and in particular, trace-class neural network (TNN) priors which were proposed by Sell et al. [39]. Such priors were developed as more robust alternatives to classical architectures in the context of inference problems. For this work we develop multilevel Monte Carlo (MLMC) methods for such models. MLMC is a popular variance reduction technique, with particular applications in Bayesian statistics and uncertainty quantification. We show how a particular advanced MLMC method that was introduced in [4] can be applied to Bayesian inference from DNNs and establish mathematically, that the computational cost to achieve a particular mean square error, associated to posterior expectation computation, can be reduced by several orders, versus more conventional techniques. To verify such results we provide numerous numerical experiments on model problems arising in machine learning. These include Bayesian regression, as well as Bayesian classification and reinforcement learning.
△ Less
Submitted 20 July, 2022; v1 submitted 24 March, 2022;
originally announced March 2022.
-
Multi-index Sequential Monte Carlo ratio estimators for Bayesian Inverse problems
Authors:
Kody J. H. Law,
Neil Walton,
Shangda Yang,
Ajay Jasra
Abstract:
We consider the problem of estimating expectations with respect to a target distribution with an unknown normalizing constant, and where even the unnormalized target needs to be approximated at finite resolution. This setting is ubiquitous across science and engineering applications, for example in the context of Bayesian inference where a physics-based model governed by an intractable partial dif…
▽ More
We consider the problem of estimating expectations with respect to a target distribution with an unknown normalizing constant, and where even the unnormalized target needs to be approximated at finite resolution. This setting is ubiquitous across science and engineering applications, for example in the context of Bayesian inference where a physics-based model governed by an intractable partial differential equation (PDE) appears in the likelihood. A multi-index Sequential Monte Carlo (MISMC) method is used to construct ratio estimators which provably enjoy the complexity improvements of multi-index Monte Carlo (MIMC) as well as the efficiency of Sequential Monte Carlo (SMC) for inference. In particular, the proposed method provably achieves the canonical complexity of MSE$^{-1}$, while single level methods require MSE$^{-ξ}$ for $ξ>1$. This is illustrated on examples of Bayesian inverse problems with an elliptic PDE forward model in $1$ and $2$ spatial dimensions, where $ξ=5/4$ and $ξ=3/2$, respectively. It is also illustrated on a more challenging log Gaussian process models, where single level complexity is approximately $ξ=9/4$ and multilevel Monte Carlo (or MIMC with an inappropriate index set) gives $ξ= 5/4 + ω$, for any $ω> 0$, whereas our method is again canonical.
△ Less
Submitted 21 March, 2023; v1 submitted 10 March, 2022;
originally announced March 2022.
-
Charged and CP-violating Kink Solutions in the Two Higgs Doublet Model
Authors:
Kai Hong Law,
Apostolos Pilaftsis
Abstract:
Minimal extensions of the Standard Model (SM), such as the so-called Two Higgs Doublet Model (2HDM), can possess several accidental discrete symmetries whose spontaneous breakdown in the early universe usually lead to the formation of domain walls. We extend an earlier work [arXiv:2006.13273] on this topic by studying in more detail the analytic properties of electrically charged and CP-violating…
▽ More
Minimal extensions of the Standard Model (SM), such as the so-called Two Higgs Doublet Model (2HDM), can possess several accidental discrete symmetries whose spontaneous breakdown in the early universe usually lead to the formation of domain walls. We extend an earlier work [arXiv:2006.13273] on this topic by studying in more detail the analytic properties of electrically charged and CP-violating kink solutions in the $Z_{2}$-symmetric 2HDM. We derive the complete set of equations of motion that describe the 1D spatial profile of both the 2HDM vacuum parameters and the would-be Goldstone bosons $G^{1,2,3}$ of the SM. These equations are then solved numerically using the gradient flow technique, and the results of our analysis are presented in different parametrizations of the Higgs doublets. In particular, we show analytically how an electrically charged profile should arise in 1D kink solutions when asymmetric boundary conditions are imposed on the Goldstone mode $G^{2}$ at spatial infinities, i.e. as ${x\to \pm \infty}$. If asymmetric boundary conditions are selected at ${x\to \pm \infty}$ for the Goldstone mode $G^{3}$ or the longitudinal mode $θ$ corresponding to a would-be massive photon, the derived kink solutions are then shown to exhibit CP violation. Possible cosmological implications of the electrically charged and CP-violating domain walls in the 2HDM are discussed.
△ Less
Submitted 25 February, 2022; v1 submitted 24 October, 2021;
originally announced October 2021.
-
Randomized multilevel Monte Carlo for embarrassingly parallel inference
Authors:
Ajay Jasra,
Kody J. H. Law,
Alexander Tarakanov,
Fangyuan Yu
Abstract:
This position paper summarizes a recently developed research program focused on inference in the context of data centric science and engineering applications, and forecasts its trajectory forward over the next decade. Often one endeavours in this context to learn complex systems in order to make more informed predictions and high stakes decisions under uncertainty. Some key challenges which must b…
▽ More
This position paper summarizes a recently developed research program focused on inference in the context of data centric science and engineering applications, and forecasts its trajectory forward over the next decade. Often one endeavours in this context to learn complex systems in order to make more informed predictions and high stakes decisions under uncertainty. Some key challenges which must be met in this context are robustness, generalizability, and interpretability. The Bayesian framework addresses these three challenges elegantly, while bringing with it a fourth, undesirable feature: it is typically far more expensive than its deterministic counterparts. In the 21st century, and increasingly over the past decade, a growing number of methods have emerged which allow one to leverage cheap low-fidelity models in order to precondition algorithms for performing inference with more expensive models and make Bayesian inference tractable in the context of high-dimensional and expensive models. Notable examples are multilevel Monte Carlo (MLMC), multi-index Monte Carlo (MIMC), and their randomized counterparts (rMLMC), which are able to provably achieve a dimension-independent (including $\infty-$dimension) canonical complexity rate with respect to mean squared error (MSE) of $1/$MSE. Some parallelizability is typically lost in an inference context, but recently this has been largely recovered via novel double randomization approaches. Such an approach delivers i.i.d. samples of quantities of interest which are unbiased with respect to the infinite resolution target distribution. Over the coming decade, this family of algorithms has the potential to transform data centric science and engineering, as well as classical machine learning applications such as deep learning, by scaling up and scaling out fully Bayesian inference.
△ Less
Submitted 3 December, 2021; v1 submitted 5 July, 2021;
originally announced July 2021.
-
Revisiting Point Cloud Shape Classification with a Simple and Effective Baseline
Authors:
Ankit Goyal,
Hei Law,
Bowei Liu,
Alejandro Newell,
Jia Deng
Abstract:
Processing point cloud data is an important component of many real-world systems. As such, a wide variety of point-based approaches have been proposed, reporting steady benchmark improvements over time. We study the key ingredients of this progress and uncover two critical results. First, we find that auxiliary factors like different evaluation schemes, data augmentation strategies, and loss funct…
▽ More
Processing point cloud data is an important component of many real-world systems. As such, a wide variety of point-based approaches have been proposed, reporting steady benchmark improvements over time. We study the key ingredients of this progress and uncover two critical results. First, we find that auxiliary factors like different evaluation schemes, data augmentation strategies, and loss functions, which are independent of the model architecture, make a large difference in performance. The differences are large enough that they obscure the effect of architecture. When these factors are controlled for, PointNet++, a relatively older network, performs competitively with recent methods. Second, a very simple projection-based method, which we refer to as SimpleView, performs surprisingly well. It achieves on par or better results than sophisticated state-of-the-art methods on ModelNet40 while being half the size of PointNet++. It also outperforms state-of-the-art methods on ScanObjectNN, a real-world point cloud benchmark, and demonstrates better cross-dataset generalization. Code is available at https://github.com/princeton-vl/SimpleView.
△ Less
Submitted 9 June, 2021;
originally announced June 2021.
-
Sparse online variational Bayesian regression
Authors:
Kody J. H. Law,
Vitaly Zankin
Abstract:
This work considers variational Bayesian inference as an inexpensive and scalable alternative to a fully Bayesian approach in the context of sparsity-promoting priors. In particular, the priors considered arise from scale mixtures of Normal distributions with a generalized inverse Gaussian mixing distribution. This includes the variational Bayesian LASSO as an inexpensive and scalable alternative…
▽ More
This work considers variational Bayesian inference as an inexpensive and scalable alternative to a fully Bayesian approach in the context of sparsity-promoting priors. In particular, the priors considered arise from scale mixtures of Normal distributions with a generalized inverse Gaussian mixing distribution. This includes the variational Bayesian LASSO as an inexpensive and scalable alternative to the Bayesian LASSO introduced in [65]. It also includes a family of priors which more strongly promote sparsity. For linear models the method requires only the iterative solution of deterministic least squares problems. Furthermore, for p unknown covariates the method can be implemented exactly online with a cost of $O(p^3)$ in computation and $O(p^2)$ in memory per iteration -- in other words, the cost per iteration is independent of n, and in principle infinite data can be considered. For large $p$ an approximation is able to achieve promising results for a cost of $O(p)$ per iteration, in both computation and memory. Strategies for hyper-parameter tuning are also considered. The method is implemented for real and simulated data. It is shown that the performance in terms of variable selection and uncertainty quantification of the variational Bayesian LASSO can be comparable to the Bayesian LASSO for problems which are tractable with that method, and for a fraction of the cost. The present method comfortably handles $n = 65536$, $p = 131073$ on a laptop in less than 30 minutes, and $n = 10^5$, $p = 2.1 \times 10^6$ overnight.
△ Less
Submitted 22 December, 2021; v1 submitted 24 February, 2021;
originally announced February 2021.
-
On Unbiased Estimation for Discretized Models
Authors:
Jeremy Heng,
Ajay Jasra,
Kody J. H. Law,
Alexander Tarakanov
Abstract:
In this article, we consider computing expectations w.r.t. probability measures which are subject to discretization error. Examples include partially observed diffusion processes or inverse problems, where one may have to discretize time and/or space, in order to practically work with the probability of interest. Given access only to these discretizations, we consider the construction of unbiased…
▽ More
In this article, we consider computing expectations w.r.t. probability measures which are subject to discretization error. Examples include partially observed diffusion processes or inverse problems, where one may have to discretize time and/or space, in order to practically work with the probability of interest. Given access only to these discretizations, we consider the construction of unbiased Monte Carlo estimators of expectations w.r.t. such target probability distributions. It is shown how to obtain such estimators using a novel adaptation of randomization schemes and Markov simulation methods. Under appropriate assumptions, these estimators possess finite variance and finite expected cost. There are two important consequences of this approach: (i) unbiased inference is achieved at the canonical complexity rate, and (ii) the resulting estimators can be generated independently, thereby allowing strong scaling to arbitrarily many parallel processors. Several algorithms are presented, and applied to some examples of Bayesian inference problems, with both simulated and real observed data.
△ Less
Submitted 24 February, 2021;
originally announced February 2021.
-
Automatic Volumetric Segmentation of Additive Manufacturing Defects with 3D U-Net
Authors:
Vivian Wen Hui Wong,
Max Ferguson,
Kincho H. Law,
Yung-Tsun Tina Lee,
Paul Witherell
Abstract:
Segmentation of additive manufacturing (AM) defects in X-ray Computed Tomography (XCT) images is challenging, due to the poor contrast, small sizes and variation in appearance of defects. Automatic segmentation can, however, provide quality control for additive manufacturing. Over recent years, three-dimensional convolutional neural networks (3D CNNs) have performed well in the volumetric segmenta…
▽ More
Segmentation of additive manufacturing (AM) defects in X-ray Computed Tomography (XCT) images is challenging, due to the poor contrast, small sizes and variation in appearance of defects. Automatic segmentation can, however, provide quality control for additive manufacturing. Over recent years, three-dimensional convolutional neural networks (3D CNNs) have performed well in the volumetric segmentation of medical images. In this work, we leverage techniques from the medical imaging domain and propose training a 3D U-Net model to automatically segment defects in XCT images of AM samples. This work not only contributes to the use of machine learning for AM defect detection but also demonstrates for the first time 3D volumetric segmentation in AM. We train and test with three variants of the 3D U-Net on an AM dataset, achieving a mean intersection of union (IOU) value of 88.4%.
△ Less
Submitted 22 January, 2021;
originally announced January 2021.
-
Materials Fingerprinting Classification
Authors:
Adam Spannaus,
Kody J. H. Law,
Piotr Luszczek,
Farzana Nasrin,
Cassie Putman Micucci,
Peter K. Liaw,
Louis J. Santodonato,
David J. Keffer,
Vasileios Maroulas
Abstract:
Significant progress in many classes of materials could be made with the availability of experimentally-derived large datasets composed of atomic identities and three-dimensional coordinates. Methods for visualizing the local atomic structure, such as atom probe tomography (APT), which routinely generate datasets comprised of millions of atoms, are an important step in realizing this goal. However…
▽ More
Significant progress in many classes of materials could be made with the availability of experimentally-derived large datasets composed of atomic identities and three-dimensional coordinates. Methods for visualizing the local atomic structure, such as atom probe tomography (APT), which routinely generate datasets comprised of millions of atoms, are an important step in realizing this goal. However, state-of-the-art APT instruments generate noisy and sparse datasets that provide information about elemental type, but obscure atomic structures, thus limiting their subsequent value for materials discovery. The application of a materials fingerprinting process, a machine learning algorithm coupled with topological data analysis, provides an avenue by which here-to-fore unprecedented structural information can be extracted from an APT dataset. As a proof of concept, the material fingerprint is applied to high-entropy alloy APT datasets containing body-centered cubic (BCC) and face-centered cubic (FCC) crystal structures. A local atomic configuration centered on an arbitrary atom is assigned a topological descriptor, with which it can be characterized as a BCC or FCC lattice with near perfect accuracy, despite the inherent noise in the dataset. This successful identification of a fingerprint is a crucial first step in the development of algorithms which can extract more nuanced information, such as chemical ordering, from existing datasets of complex materials.
△ Less
Submitted 14 January, 2021;
originally announced January 2021.
-
A Bayesian analysis of classical shadows
Authors:
Joseph M. Lukens,
Kody J. H. Law,
Ryan S. Bennink
Abstract:
The method of classical shadows heralds unprecedented opportunities for quantum estimation with limited measurements [H.-Y. Huang, R. Kueng, and J. Preskill, Nat. Phys. 16, 1050 (2020)]. Yet its relationship to established quantum tomographic approaches, particularly those based on likelihood models, remains unclear. In this article, we investigate classical shadows through the lens of Bayesian me…
▽ More
The method of classical shadows heralds unprecedented opportunities for quantum estimation with limited measurements [H.-Y. Huang, R. Kueng, and J. Preskill, Nat. Phys. 16, 1050 (2020)]. Yet its relationship to established quantum tomographic approaches, particularly those based on likelihood models, remains unclear. In this article, we investigate classical shadows through the lens of Bayesian mean estimation (BME). In direct tests on numerical data, BME is found to attain significantly lower error on average, but classical shadows prove remarkably more accurate in specific situations -- such as high-fidelity ground truth states -- which are improbable in a fully uniform Hilbert space. We then introduce an observable-oriented pseudo-likelihood that successfully emulates the dimension-independence and state-specific optimality of classical shadows, but within a Bayesian framework that ensures only physical states. Our research reveals how classical shadows effect important departures from conventional thinking in quantum state estimation, as well as the utility of Bayesian methods for uncovering and formalizing statistical assumptions.
△ Less
Submitted 16 December, 2020;
originally announced December 2020.
-
Quasiconformal model with CNN features for large deformation image registration
Authors:
Ho Law,
Gary P. T. Choi,
Ka Chun Lam,
Lok Ming Lui
Abstract:
Image registration has been widely studied over the past several decades, with numerous applications in science, engineering and medicine. Most of the conventional mathematical models for large deformation image registration rely on prescribed landmarks, which usually require tedious manual labeling and are prone to error. In recent years, there has been a surge of interest in the use of machine l…
▽ More
Image registration has been widely studied over the past several decades, with numerous applications in science, engineering and medicine. Most of the conventional mathematical models for large deformation image registration rely on prescribed landmarks, which usually require tedious manual labeling and are prone to error. In recent years, there has been a surge of interest in the use of machine learning for image registration. In this paper, we develop a novel method for large deformation image registration by a fusion of quasiconformal theory and convolutional neural network (CNN). More specifically, we propose a quasiconformal energy model with a novel fidelity term that incorporates the features extracted using a pre-trained CNN, thereby allowing us to obtain meaningful registration results without any guidance of prescribed landmarks. Moreover, unlike many prior image registration methods, the bijectivity of our method is guaranteed by quasiconformal theory. Experimental results are presented to demonstrate the effectiveness of the proposed method. More broadly, our work sheds light on how rigorous mathematical theories and practical machine learning approaches can be integrated for develo** computational methods with improved performance.
△ Less
Submitted 28 June, 2021; v1 submitted 30 October, 2020;
originally announced November 2020.
-
Decomposition of Longitudinal Deformations via Beltrami Descriptors
Authors:
Ho Law,
Lok Ming Lui,
Chun Yin Siu
Abstract:
We present a mathematical model to decompose a longitudinal deformation into normal and abnormal components. The goal is to detect and extract subtle quivers from periodic motions in a video sequence. It has important applications in medical image analysis. To achieve this goal, we consider a representation of the longitudinal deformation, called the Beltrami descriptor, based on quasiconformal th…
▽ More
We present a mathematical model to decompose a longitudinal deformation into normal and abnormal components. The goal is to detect and extract subtle quivers from periodic motions in a video sequence. It has important applications in medical image analysis. To achieve this goal, we consider a representation of the longitudinal deformation, called the Beltrami descriptor, based on quasiconformal theories. The Beltrami descriptor is a complex-valued matrix. Each longitudinal deformation is associated to a Beltrami descriptor and vice versa. To decompose the longitudinal deformation, we propose to carry out the low rank and sparse decomposition of the Beltrami descriptor. The low rank component corresponds to the periodic motions, whereas the sparse part corresponds to the abnormal motions of a longitudinal deformation. Experiments have been carried out on both synthetic and real video sequences. Results demonstrate the efficacy of our proposed model to decompose a longitudinal deformation into regular and irregular components.
△ Less
Submitted 30 March, 2021; v1 submitted 6 August, 2020;
originally announced August 2020.
-
Unbiased Estimation of the Gradient of the Log-Likelihood in Inverse Problems
Authors:
Ajay Jasra,
Kody J. H. Law,
Deng Lu
Abstract:
We consider the problem of estimating a parameter associated to a Bayesian inverse problem. Treating the unknown initial condition as a nuisance parameter, typically one must resort to a numerical approximation of gradient of the log-likelihood and also adopt a discretization of the problem in space and/or time. We develop a new methodology to unbiasedly estimate the gradient of the log-likelihood…
▽ More
We consider the problem of estimating a parameter associated to a Bayesian inverse problem. Treating the unknown initial condition as a nuisance parameter, typically one must resort to a numerical approximation of gradient of the log-likelihood and also adopt a discretization of the problem in space and/or time. We develop a new methodology to unbiasedly estimate the gradient of the log-likelihood with respect to the unknown parameter, i.e. the expectation of the estimate has no discretization bias. Such a property is not only useful for estimation in terms of the original stochastic model of interest, but can be used in stochastic gradient algorithms which benefit from unbiased estimates. Under appropriate assumptions, we prove that our estimator is not only unbiased but of finite variance. In addition, when implemented on a single processor, we show that the cost to achieve a given level of error is comparable to multilevel Monte Carlo methods, both practically and theoretically. However, the new algorithm provides the possibility for parallel computation on arbitrarily many processors without any loss of efficiency, asymptotically. In practice, this means any precision can be achieved in a fixed, finite constant time, provided that enough processors are available.
△ Less
Submitted 16 March, 2020; v1 submitted 10 March, 2020;
originally announced March 2020.
-
Reconstruction of effective potential from statistical analysis of dynamic trajectories
Authors:
Ali Yousefzadi Nobakht,
Ondrej Dyck,
David B. Lingerfelt,
Feng Bao,
Maxim Ziatdinov,
Artem Maksov,
Bobby G. Sumpter,
Richard Archibald,
Stephen Jesse,
Sergei V. Kalinin,
Kody J. H. Law
Abstract:
The broad incorporation of microscopic methods is yielding a wealth of information on atomic and mesoscale dynamics of individual atoms, molecules, and particles on surfaces and in open volumes. Analysis of such data necessitates statistical frameworks to convert observed dynamic behaviors to effective properties of materials. Here we develop a method for stochastic reconstruction of effective act…
▽ More
The broad incorporation of microscopic methods is yielding a wealth of information on atomic and mesoscale dynamics of individual atoms, molecules, and particles on surfaces and in open volumes. Analysis of such data necessitates statistical frameworks to convert observed dynamic behaviors to effective properties of materials. Here we develop a method for stochastic reconstruction of effective acting potentials from observed trajectories. Using the Silicon vacancy defect in graphene as a model, we develop a statistical framework to reconstruct the free energy landscape from calculated atomic displacements.
△ Less
Submitted 27 February, 2020;
originally announced February 2020.
-
A practical and efficient approach for Bayesian quantum state estimation
Authors:
Joseph M. Lukens,
Kody J. H. Law,
Ajay Jasra,
Pavel Lougovski
Abstract:
Bayesian inference is a powerful paradigm for quantum state tomography, treating uncertainty in meaningful and informative ways. Yet the numerical challenges associated with sampling from complex probability distributions hampers Bayesian tomography in practical settings. In this Article, we introduce an improved, self-contained approach for Bayesian quantum state estimation. Leveraging advances i…
▽ More
Bayesian inference is a powerful paradigm for quantum state tomography, treating uncertainty in meaningful and informative ways. Yet the numerical challenges associated with sampling from complex probability distributions hampers Bayesian tomography in practical settings. In this Article, we introduce an improved, self-contained approach for Bayesian quantum state estimation. Leveraging advances in machine learning and statistics, our formulation relies on highly efficient preconditioned Crank--Nicolson sampling and a pseudo-likelihood. We theoretically analyze the computational cost, and provide explicit examples of inference for both actual and simulated datasets, illustrating improved performance with respect to existing approaches.
△ Less
Submitted 24 February, 2020;
originally announced February 2020.
-
Building Information Modeling and Classification by Visual Learning At A City Scale
Authors:
Qian Yu,
Chaofeng Wang,
Barbaros Cetiner,
Stella X. Yu,
Frank Mckenna,
Ertugrul Taciroglu,
Kincho H. Law
Abstract:
In this paper, we provide two case studies to demonstrate how artificial intelligence can empower civil engineering. In the first case, a machine learning-assisted framework, BRAILS, is proposed for city-scale building information modeling. Building information modeling (BIM) is an efficient way of describing buildings, which is essential to architecture, engineering, and construction. Our propose…
▽ More
In this paper, we provide two case studies to demonstrate how artificial intelligence can empower civil engineering. In the first case, a machine learning-assisted framework, BRAILS, is proposed for city-scale building information modeling. Building information modeling (BIM) is an efficient way of describing buildings, which is essential to architecture, engineering, and construction. Our proposed framework employs deep learning technique to extract visual information of buildings from satellite/street view images. Further, a novel machine learning (ML)-based statistical tool, SURF, is proposed to discover the spatial patterns in building metadata.
The second case focuses on the task of soft-story building classification. Soft-story buildings are a type of buildings prone to collapse during a moderate or severe earthquake. Hence, identifying and retrofitting such buildings is vital in the current earthquake preparedness efforts. For this task, we propose an automated deep learning-based procedure for identifying soft-story buildings from street view images at a regional scale. We also create a large-scale building image database and a semi-automated image labeling approach that effectively annotates new database entries. Through extensive computational experiments, we demonstrate the effectiveness of the proposed method.
△ Less
Submitted 20 July, 2020; v1 submitted 14 October, 2019;
originally announced October 2019.
-
Cluster, Classify, Regress: A General Method For Learning Discountinous Functions
Authors:
David E. Bernholdt,
Mark R. Cianciosa,
Clement Etienam,
David L. Green,
Kody J. H. Law,
J. M. Park
Abstract:
This paper presents a method for solving the supervised learning problem in which the output is highly nonlinear and discontinuous. It is proposed to solve this problem in three stages: (i) cluster the pairs of input-output data points, resulting in a label for each point; (ii) classify the data, where the corresponding label is the output; and finally (iii) perform one separate regression for eac…
▽ More
This paper presents a method for solving the supervised learning problem in which the output is highly nonlinear and discontinuous. It is proposed to solve this problem in three stages: (i) cluster the pairs of input-output data points, resulting in a label for each point; (ii) classify the data, where the corresponding label is the output; and finally (iii) perform one separate regression for each class, where the training data corresponds to the subset of the original input-output pairs which have that label according to the classifier. It has not yet been proposed to combine these 3 fundamental building blocks of machine learning in this simple and powerful fashion. This can be viewed as a form of deep learning, where any of the intermediate layers can itself be deep. The utility and robustness of the methodology is illustrated on some toy problems, including one example problem arising from simulation of plasma fusion in a tokamak.
△ Less
Submitted 16 May, 2019; v1 submitted 15 May, 2019;
originally announced May 2019.
-
CornerNet-Lite: Efficient Keypoint Based Object Detection
Authors:
Hei Law,
Yun Teng,
Olga Russakovsky,
Jia Deng
Abstract:
Keypoint-based methods are a relatively new paradigm in object detection, eliminating the need for anchor boxes and offering a simplified detection framework. Keypoint-based CornerNet achieves state of the art accuracy among single-stage detectors. However, this accuracy comes at high processing cost. In this work, we tackle the problem of efficient keypoint-based object detection and introduce Co…
▽ More
Keypoint-based methods are a relatively new paradigm in object detection, eliminating the need for anchor boxes and offering a simplified detection framework. Keypoint-based CornerNet achieves state of the art accuracy among single-stage detectors. However, this accuracy comes at high processing cost. In this work, we tackle the problem of efficient keypoint-based object detection and introduce CornerNet-Lite. CornerNet-Lite is a combination of two efficient variants of CornerNet: CornerNet-Saccade, which uses an attention mechanism to eliminate the need for exhaustively processing all pixels of the image, and CornerNet-Squeeze, which introduces a new compact backbone architecture. Together these two variants address the two critical use cases in efficient object detection: improving efficiency without sacrificing accuracy, and improving accuracy at real-time efficiency. CornerNet-Saccade is suitable for offline processing, improving the efficiency of CornerNet by 6.0x and the AP by 1.0% on COCO. CornerNet-Squeeze is suitable for real-time detection, improving both the efficiency and accuracy of the popular real-time detector YOLOv3 (34.4% AP at 30ms for CornerNet-Squeeze compared to 33.0% AP at 39ms for YOLOv3 on COCO). Together these contributions for the first time reveal the potential of keypoint-based detection to be useful for applications requiring processing efficiency.
△ Less
Submitted 16 September, 2020; v1 submitted 18 April, 2019;
originally announced April 2019.
-
Estimation and uncertainty quantification for the output from quantum simulators
Authors:
Ryan Bennink,
Ajay Jasra,
Kody J. H. Law,
Pavel Lougovski
Abstract:
The problem of estimating certain distributions over $\{0,1\}^d$ is considered here. The distribution represents a quantum system of $d$ qubits, where there are non-trivial dependencies between the qubits. A maximum entropy approach is adopted to reconstruct the distribution from exact moments or observed empirical moments. The Robbins Monro algorithm is used to solve the intractable maximum entro…
▽ More
The problem of estimating certain distributions over $\{0,1\}^d$ is considered here. The distribution represents a quantum system of $d$ qubits, where there are non-trivial dependencies between the qubits. A maximum entropy approach is adopted to reconstruct the distribution from exact moments or observed empirical moments. The Robbins Monro algorithm is used to solve the intractable maximum entropy problem, by constructing an unbiased estimator of the un-normalized target with a sequential Monte Carlo sampler at each iteration. In the case of empirical moments, this coincides with a maximum likelihood estimator. A Bayesian formulation is also considered in order to quantify posterior uncertainty. Several approaches are proposed in order to tackle this challenging problem, based on recently developed methodologies. In particular, unbiased estimators of the gradient of the log posterior are constructed and used within a provably convergent Langevin-based Markov chain Monte Carlo method. The methods are illustrated on classically simulated output from quantum simulators.
△ Less
Submitted 7 March, 2019;
originally announced March 2019.
-
Bayesian Point Set Registration
Authors:
Adam Spannaus,
Vasileios Maroulas,
David J. Keffer,
Kody J. H. Law
Abstract:
Point set registration involves identifying a smooth invertible transformation between corresponding points in two point sets, one of which may be smaller than the other and possibly corrupted by observation noise. This problem is traditionally decomposed into two separate optimization problems: (i) assignment or correspondence, and (ii) identification of the optimal transformation between the ord…
▽ More
Point set registration involves identifying a smooth invertible transformation between corresponding points in two point sets, one of which may be smaller than the other and possibly corrupted by observation noise. This problem is traditionally decomposed into two separate optimization problems: (i) assignment or correspondence, and (ii) identification of the optimal transformation between the ordered point sets. In this work, we propose an approach solving both problems simultaneously. In particular, a coherent Bayesian formulation of the problem results in a marginal posterior distribution on the transformation, which is explored within a Markov chain Monte Carlo scheme. Motivated by Atomic Probe Tomography (APT), in the context of structure inference for high entropy alloys (HEA), we focus on the registration of noisy sparse observations of rigid transformations of a known reference configuration.Lastly, we test our method on synthetic data sets.
△ Less
Submitted 23 December, 2018;
originally announced December 2018.
-
Hyperparameter Learning via Distributional Transfer
Authors:
Ho Chung Leon Law,
Peilin Zhao,
Lucian Chan,
Junzhou Huang,
Dino Sejdinovic
Abstract:
Bayesian optimisation is a popular technique for hyperparameter learning but typically requires initial exploration even in cases where similar prior tasks have been solved. We propose to transfer information across tasks using learnt representations of training datasets used in those tasks. This results in a joint Gaussian process model on hyperparameters and data representations. Representations…
▽ More
Bayesian optimisation is a popular technique for hyperparameter learning but typically requires initial exploration even in cases where similar prior tasks have been solved. We propose to transfer information across tasks using learnt representations of training datasets used in those tasks. This results in a joint Gaussian process model on hyperparameters and data representations. Representations make use of the framework of distribution embeddings into reproducing kernel Hilbert spaces. The developed method has a faster convergence compared to existing baselines, in some cases requiring only a few evaluations of the target objective.
△ Less
Submitted 26 May, 2019; v1 submitted 15 October, 2018;
originally announced October 2018.
-
Detection and Segmentation of Manufacturing Defects with Convolutional Neural Networks and Transfer Learning
Authors:
Max Ferguson,
Ronay Ak,
Yung-Tsun Tina Lee,
Kincho H. Law
Abstract:
Quality control is a fundamental component of many manufacturing processes, especially those involving casting or welding. However, manual quality control procedures are often time-consuming and error-prone. In order to meet the growing demand for high-quality products, the use of intelligent visual inspection systems is becoming essential in production lines. Recently, Convolutional Neural Networ…
▽ More
Quality control is a fundamental component of many manufacturing processes, especially those involving casting or welding. However, manual quality control procedures are often time-consuming and error-prone. In order to meet the growing demand for high-quality products, the use of intelligent visual inspection systems is becoming essential in production lines. Recently, Convolutional Neural Networks (CNNs) have shown outstanding performance in both image classification and localization tasks. In this article, a system is proposed for the identification of casting defects in X-ray images, based on the Mask Region-based CNN architecture. The proposed defect detection system simultaneously performs defect detection and segmentation on input images, making it suitable for a range of defect detection tasks. It is shown that training the network to simultaneously perform defect detection and defect instance segmentation, results in a higher defect detection accuracy than training on defect detection alone. Transfer learning is leveraged to reduce the training data demands and increase the prediction accuracy of the trained model. More specifically, the model is first trained with two large openly-available image datasets before finetuning on a relatively small metal casting X-ray dataset. The accuracy of the trained model exceeds state-of-the art performance on the GRIMA database of X-ray images (GDXray) Castings dataset and is fast enough to be used in a production setting. The system also performs well on the GDXray Welds dataset. A number of in-depth studies are conducted to explore how transfer learning, multi-task learning, and multi-class learning influence the performance of the trained system.
△ Less
Submitted 2 September, 2018; v1 submitted 7 August, 2018;
originally announced August 2018.
-
CornerNet: Detecting Objects as Paired Keypoints
Authors:
Hei Law,
Jia Deng
Abstract:
We propose CornerNet, a new approach to object detection where we detect an object bounding box as a pair of keypoints, the top-left corner and the bottom-right corner, using a single convolution neural network. By detecting objects as paired keypoints, we eliminate the need for designing a set of anchor boxes commonly used in prior single-stage detectors. In addition to our novel formulation, we…
▽ More
We propose CornerNet, a new approach to object detection where we detect an object bounding box as a pair of keypoints, the top-left corner and the bottom-right corner, using a single convolution neural network. By detecting objects as paired keypoints, we eliminate the need for designing a set of anchor boxes commonly used in prior single-stage detectors. In addition to our novel formulation, we introduce corner pooling, a new type of pooling layer that helps the network better localize corners. Experiments show that CornerNet achieves a 42.2% AP on MS COCO, outperforming all existing one-stage detectors.
△ Less
Submitted 18 March, 2019; v1 submitted 3 August, 2018;
originally announced August 2018.
-
A Differentially Private Kernel Two-Sample Test
Authors:
Anant Raj,
Ho Chung Leon Law,
Dino Sejdinovic,
Mijung Park
Abstract:
Kernel two-sample testing is a useful statistical tool in determining whether data samples arise from different distributions without imposing any parametric assumptions on those distributions. However, raw data samples can expose sensitive information about individuals who participate in scientific studies, which makes the current tests vulnerable to privacy breaches. Hence, we design a new frame…
▽ More
Kernel two-sample testing is a useful statistical tool in determining whether data samples arise from different distributions without imposing any parametric assumptions on those distributions. However, raw data samples can expose sensitive information about individuals who participate in scientific studies, which makes the current tests vulnerable to privacy breaches. Hence, we design a new framework for kernel two-sample testing conforming to differential privacy constraints, in order to guarantee the privacy of subjects in the data. Unlike existing differentially private parametric tests that simply add noise to data, kernel-based testing imposes a challenge due to a complex dependence of test statistics on the raw data, as these statistics correspond to estimators of distances between representations of probability measures in Hilbert spaces. Our approach considers finite dimensional approximations to those representations. As a result, a simple chi-squared test is obtained, where a test statistic depends on a mean and covariance of empirical differences between the samples, which we perturb for a privacy guarantee. We investigate the utility of our framework in two realistic settings and conclude that our method requires only a relatively modest increase in sample size to achieve a similar level of power to the non-private tests in both settings.
△ Less
Submitted 1 August, 2018;
originally announced August 2018.
-
Unbiased inference for discretely observed hidden Markov model diffusions
Authors:
Neil K. Chada,
Jordan Franks,
Ajay Jasra,
Kody J. H. Law,
Matti Vihola
Abstract:
We develop a Bayesian inference method for diffusions observed discretely and with noise, which is free of discretisation bias. Unlike existing unbiased inference methods, our method does not rely on exact simulation techniques. Instead, our method uses standard time-discretised approximations of diffusions, such as the Euler--Maruyama scheme. Our approach is based on particle marginal Metropolis-…
▽ More
We develop a Bayesian inference method for diffusions observed discretely and with noise, which is free of discretisation bias. Unlike existing unbiased inference methods, our method does not rely on exact simulation techniques. Instead, our method uses standard time-discretised approximations of diffusions, such as the Euler--Maruyama scheme. Our approach is based on particle marginal Metropolis--Hastings, a particle filter, randomised multilevel Monte Carlo, and importance sampling type correction of approximate Markov chain Monte Carlo. The resulting estimator leads to inference without a bias from the time-discretisation as the number of Markov chain iterations increases. We give convergence results and recommend allocations for algorithm inputs. Our method admits a straightforward parallelisation, and can be computationally efficient. The user-friendly approach is illustrated on three examples, where the underlying diffusion is an Ornstein--Uhlenbeck process, a geometric Brownian motion, and a 2d non-reversible Langevin equation.
△ Less
Submitted 9 March, 2021; v1 submitted 26 July, 2018;
originally announced July 2018.
-
Variational Learning on Aggregate Outputs with Gaussian Processes
Authors:
Ho Chung Leon Law,
Dino Sejdinovic,
Ewan Cameron,
Tim CD Lucas,
Seth Flaxman,
Katherine Battle,
Kenji Fukumizu
Abstract:
While a typical supervised learning framework assumes that the inputs and the outputs are measured at the same levels of granularity, many applications, including global map** of disease, only have access to outputs at a much coarser level than that of the inputs. Aggregation of outputs makes generalization to new inputs much more difficult. We consider an approach to this problem based on varia…
▽ More
While a typical supervised learning framework assumes that the inputs and the outputs are measured at the same levels of granularity, many applications, including global map** of disease, only have access to outputs at a much coarser level than that of the inputs. Aggregation of outputs makes generalization to new inputs much more difficult. We consider an approach to this problem based on variational learning with a model of output aggregation and Gaussian processes, where aggregation leads to intractability of the standard evidence lower bounds. We propose new bounds and tractable approximations, leading to improved prediction accuracy and scalability to large datasets, while explicitly taking uncertainty into account. We develop a framework which extends to several types of likelihoods, including the Poisson model for aggregated count data. We apply our framework to a challenging and important problem, the fine-scale spatial modelling of malaria incidence, with over 1 million observations.
△ Less
Submitted 22 May, 2018;
originally announced May 2018.
-
Multi-Index Sequential Monte Carlo Methods for partially observed Stochastic Partial Differential Equations
Authors:
Yaxian Xu,
Ajay Jasra,
Kody J. H. Law
Abstract:
In this paper we consider sequential joint state and static parameter estimation given discrete time observations associated to a partially observed stochastic partial differential equation (SPDE). It is assumed that one can only estimate the hidden state using a discretization of the model. In this context, it is known that the multi-index Monte Carlo (MIMC) method of [11] can be used to improve…
▽ More
In this paper we consider sequential joint state and static parameter estimation given discrete time observations associated to a partially observed stochastic partial differential equation (SPDE). It is assumed that one can only estimate the hidden state using a discretization of the model. In this context, it is known that the multi-index Monte Carlo (MIMC) method of [11] can be used to improve over direct Monte Carlo from the most precise discretizaton. However, in the context of interest, it cannot be directly applied, but rather must be used within another advanced method such as sequential Monte Carlo (SMC). We show how one can use the MIMC method by renormalizing the MI identity and approximating the resulting identity using the SMC$^2$ method of [5]. We prove that our approach can reduce the cost to obtain a given mean square error (MSE), relative to just using SMC$^2$ on the most precise discretization. We demonstrate this with some numerical examples.
△ Less
Submitted 10 September, 2020; v1 submitted 1 May, 2018;
originally announced May 2018.
-
Multilevel Particle Filters for Lévy-driven stochastic differential equations
Authors:
Ajay Jasra,
Kody J. H. Law,
Prince Peprah Osei
Abstract:
We develop algorithms for computing expectations of the laws of models associated to stochastic differential equations (SDEs) driven by pure Lévy processes. We consider filtering such processes and well as pricing of path dependent options. We propose a multilevel particle filter (MLPF) to address the computational issues involved in solving these continuum problems. We show via numerical simulati…
▽ More
We develop algorithms for computing expectations of the laws of models associated to stochastic differential equations (SDEs) driven by pure Lévy processes. We consider filtering such processes and well as pricing of path dependent options. We propose a multilevel particle filter (MLPF) to address the computational issues involved in solving these continuum problems. We show via numerical simulations and theoretical results that under suitable assumptions of the discretization of the underlying driving Lévy proccess, our proposed method achieves optimal convergence rates. The cost to obtain MSE $O(ε^2)$ scales like $O(ε^{-2})$ for our method, as compared with the standard particle filter $O(ε^{-3})$.
△ Less
Submitted 11 July, 2018; v1 submitted 12 April, 2018;
originally announced April 2018.
-
Learning Robust and Adaptive Real-World Continuous Control Using Simulation and Transfer Learning
Authors:
M Ferguson,
K. H. Law
Abstract:
We use model-free reinforcement learning, extensive simulation, and transfer learning to develop a continuous control algorithm that has good zero-shot performance in a real physical environment. We train a simulated agent to act optimally across a set of similar environments, each with dynamics drawn from a prior distribution. We propose that the agent is able to adjust its actions almost immedia…
▽ More
We use model-free reinforcement learning, extensive simulation, and transfer learning to develop a continuous control algorithm that has good zero-shot performance in a real physical environment. We train a simulated agent to act optimally across a set of similar environments, each with dynamics drawn from a prior distribution. We propose that the agent is able to adjust its actions almost immediately, based on small set of observations. This robust and adaptive behavior is enabled by using a policy gradient algorithm with an Long Short Term Memory (LSTM) function approximation. Finally, we train an agent to navigate a two-dimensional environment with uncertain dynamics and noisy observations. We demonstrate that this agent has good zero-shot performance in a real physical environment. Our preliminary results indicate that the agent is able to infer the environmental dynamics after only a few timesteps, and adjust its actions accordingly.
△ Less
Submitted 8 March, 2018; v1 submitted 13 February, 2018;
originally announced February 2018.
-
Multilevel ensemble Kalman filtering for spatio-temporal processes
Authors:
Alexey Chernov,
Håkon Hoel,
Kody J. H. Law,
Fabio Nobile,
Raul Tempone
Abstract:
We design and analyse the performance of a multilevel ensemble Kalman filter method (MLEnKF) for filtering settings where the underlying state-space model is an infinite-dimensional spatio-temporal process. We consider underlying models that needs to be simulated by numerical methods, with discretization in both space and time. The multilevel Monte Carlo (MLMC) sampling strategy, achieving varianc…
▽ More
We design and analyse the performance of a multilevel ensemble Kalman filter method (MLEnKF) for filtering settings where the underlying state-space model is an infinite-dimensional spatio-temporal process. We consider underlying models that needs to be simulated by numerical methods, with discretization in both space and time. The multilevel Monte Carlo (MLMC) sampling strategy, achieving variance reduction through pairwise coupling of ensemble particles on neighboring resolutions, is used in the sample-moment step of MLEnKF to produce an efficient hierarchical filtering method for spatio-temporal models. Under sufficient regularity, MLEnKF is proven to be more efficient for weak approximations than EnKF, asymptotically in the large-ensemble and fine-numerical-resolution limit. Numerical examples support our theoretical findings.
△ Less
Submitted 10 March, 2020; v1 submitted 19 October, 2017;
originally announced October 2017.
-
Bayesian Approaches to Distribution Regression
Authors:
Ho Chung Leon Law,
Danica J. Sutherland,
Dino Sejdinovic,
Seth Flaxman
Abstract:
Distribution regression has recently attracted much interest as a generic solution to the problem of supervised learning where labels are available at the group level, rather than at the individual level. Current approaches, however, do not propagate the uncertainty in observations due to sampling variability in the groups. This effectively assumes that small and large groups are estimated equally…
▽ More
Distribution regression has recently attracted much interest as a generic solution to the problem of supervised learning where labels are available at the group level, rather than at the individual level. Current approaches, however, do not propagate the uncertainty in observations due to sampling variability in the groups. This effectively assumes that small and large groups are estimated equally well, and should have equal weight in the final regression. We account for this uncertainty with a Bayesian distribution regression formalism, improving the robustness and performance of the model when group sizes vary. We frame our models in a neural network style, allowing for simple MAP inference using backpropagation to learn the parameters, as well as MCMC-based inference which can fully propagate uncertainty. We demonstrate our approach on illustrative toy datasets, as well as on a challenging problem of predicting age from images.
△ Less
Submitted 14 January, 2021; v1 submitted 11 May, 2017;
originally announced May 2017.
-
Testing and Learning on Distributions with Symmetric Noise Invariance
Authors:
Ho Chung Leon Law,
Christopher Yau,
Dino Sejdinovic
Abstract:
Kernel embeddings of distributions and the Maximum Mean Discrepancy (MMD), the resulting distance between distributions, are useful tools for fully nonparametric two-sample testing and learning on distributions. However, it is rarely that all possible differences between samples are of interest -- discovered differences can be due to different types of measurement noise, data collection artefacts…
▽ More
Kernel embeddings of distributions and the Maximum Mean Discrepancy (MMD), the resulting distance between distributions, are useful tools for fully nonparametric two-sample testing and learning on distributions. However, it is rarely that all possible differences between samples are of interest -- discovered differences can be due to different types of measurement noise, data collection artefacts or other irrelevant sources of variability. We propose distances between distributions which encode invariance to additive symmetric noise, aimed at testing whether the assumed true underlying processes differ. Moreover, we construct invariant features of distributions, leading to learning algorithms robust to the impairment of the input distributions with symmetric additive noise.
△ Less
Submitted 5 November, 2017; v1 submitted 22 March, 2017;
originally announced March 2017.
-
Bayesian Static Parameter Estimation for Partially Observed Diffusions via Multilevel Monte Carlo
Authors:
Ajay Jasra,
Kengo Kamatani,
Kody J. H. Law,
Yan Zhou
Abstract:
In this article we consider static Bayesian parameter estimation for partially observed diffusions that are discretely observed. We work under the assumption that one must resort to discretizing the underlying diffusion process, for instance using the Euler-Maruyama method. Given this assumption, we show how one can use Markov chain Monte Carlo (MCMC) and particularly particle MCMC [Andrieu, C., D…
▽ More
In this article we consider static Bayesian parameter estimation for partially observed diffusions that are discretely observed. We work under the assumption that one must resort to discretizing the underlying diffusion process, for instance using the Euler-Maruyama method. Given this assumption, we show how one can use Markov chain Monte Carlo (MCMC) and particularly particle MCMC [Andrieu, C., Doucet, A. and Holenstein, R. (2010). Particle Markov chain Monte Carlo methods (with discussion). J. R. Statist. Soc. Ser. B, 72, 269--342] to implement a new approximation of the multilevel (ML) Monte Carlo (MC) collapsing sum identity. Our approach comprises constructing an approximate coupling of the posterior density of the joint distribution over parameter and hidden variables at two different discretization levels and then correcting by an importance sampling method. The variance of the weights are independent of the length of the observed data set. The utility of such a method is that, for a prescribed level of mean square error, the cost of this MLMC method is provably less than i.i.d. sampling from the posterior associated to the most precise discretization. However the method here comprises using only known and efficient simulation methodologies. The theoretical results are illustrated by inference of the parameters of two prototypical processes given noisy partial observations of the process: the first is an Ornstein Uhlenbeck process and the second is a more general Langevin equation.
△ Less
Submitted 20 January, 2017;
originally announced January 2017.
-
Multilevel particle filter
Authors:
Ajay Jasra,
Kengo Kamatani,
Kody J. H. Law,
Yan Zhou
Abstract:
In this paper the filtering of partially observed diffusions, with discrete-time observations, is considered. It is assumed that only biased approximations of the diffusion can be obtained, for choice of an accuracy parameter indexed by $l$. A multilevel estimator is proposed, consisting of a telescopic sum of increment estimators associated to the successive levels. The work associated to…
▽ More
In this paper the filtering of partially observed diffusions, with discrete-time observations, is considered. It is assumed that only biased approximations of the diffusion can be obtained, for choice of an accuracy parameter indexed by $l$. A multilevel estimator is proposed, consisting of a telescopic sum of increment estimators associated to the successive levels. The work associated to $\mathcal{O}(\varepsilon^2)$ mean-square error between the multilevel estimator and average with respect to the filtering distribution is shown to scale optimally, for example as $\mathcal{O}(\varepsilon^{-2})$ for optimal rates of convergence of the underlying diffusion approximation. The method is illustrated on some toy examples as well as estimation of interest rate based on real S&P 500 stock price data.
△ Less
Submitted 16 October, 2015;
originally announced October 2015.
-
Data Assimilation: A Mathematical Introduction
Authors:
K. J. H. Law,
A. M. Stuart,
K. C. Zygalakis
Abstract:
These notes provide a systematic mathematical treatment of the subject of data assimilation.
These notes provide a systematic mathematical treatment of the subject of data assimilation.
△ Less
Submitted 25 June, 2015;
originally announced June 2015.
-
Accelerated dimension-independent adaptive Metropolis
Authors:
Yuxin Chen,
David Keyes,
Kody J. H. Law,
Hatem Ltaief
Abstract:
This work considers black-box Bayesian inference over high-dimensional parameter spaces. The well-known adaptive Metropolis (AM) algorithm of (Haario etal. 2001) is extended herein to scale asymptotically uniformly with respect to the underlying parameter dimension for Gaussian targets, by respecting the variance of the target. The resulting algorithm, referred to as the dimension-independent adap…
▽ More
This work considers black-box Bayesian inference over high-dimensional parameter spaces. The well-known adaptive Metropolis (AM) algorithm of (Haario etal. 2001) is extended herein to scale asymptotically uniformly with respect to the underlying parameter dimension for Gaussian targets, by respecting the variance of the target. The resulting algorithm, referred to as the dimension-independent adaptive Metropolis (DIAM) algorithm, also shows improved performance with respect to adaptive Metropolis on non-Gaussian targets. This algorithm is further improved, and the possibility of probing high-dimensional targets is enabled, via GPU-accelerated numerical libraries and periodically synchronized concurrent chains (justified a posteriori). Asymptotically in dimension, this GPU implementation exhibits a factor of four improvement versus a competitive CPU-based Intel MKL parallel version alone. Strong scaling to concurrent chains is exhibited, through a combination of longer time per sample batch (weak scaling) and yet fewer necessary samples to convergence. The algorithm performance is illustrated on several Gaussian and non-Gaussian target examples, in which the dimension may be in excess of one thousand.
△ Less
Submitted 18 June, 2015;
originally announced June 2015.
-
Multilevel ensemble Kalman filtering
Authors:
Håkon Hoel,
Kody J. H. Law,
Raul Tempone
Abstract:
This work embeds a multilevel Monte Carlo sampling strategy into the Monte Carlo step of the ensemble Kalman filter (EnKF) in the setting of finite dimensional signal evolution and noisy discrete-time observations. The signal dynamics is assumed to be governed by a stochastic differential equation (SDE), and a hierarchy of time grids is introduced for multilevel numerical integration of that SDE.…
▽ More
This work embeds a multilevel Monte Carlo sampling strategy into the Monte Carlo step of the ensemble Kalman filter (EnKF) in the setting of finite dimensional signal evolution and noisy discrete-time observations. The signal dynamics is assumed to be governed by a stochastic differential equation (SDE), and a hierarchy of time grids is introduced for multilevel numerical integration of that SDE. The resulting multilevel EnKF is proved to asymptotically outperform EnKF in terms of computational cost versus approximation accuracy. The theoretical results are illustrated numerically.
△ Less
Submitted 28 June, 2016; v1 submitted 21 February, 2015;
originally announced February 2015.
-
Dimension-independent likelihood-informed MCMC
Authors:
Tiangang Cui,
Kody J. H. Law,
Youssef M. Marzouk
Abstract:
Many Bayesian inference problems require exploring the posterior distribution of high-dimensional parameters that represent the discretization of an underlying function. This work introduces a family of Markov chain Monte Carlo (MCMC) samplers that can adapt to the particular structure of a posterior distribution over functions. Two distinct lines of research intersect in the methods developed her…
▽ More
Many Bayesian inference problems require exploring the posterior distribution of high-dimensional parameters that represent the discretization of an underlying function. This work introduces a family of Markov chain Monte Carlo (MCMC) samplers that can adapt to the particular structure of a posterior distribution over functions. Two distinct lines of research intersect in the methods developed here. First, we introduce a general class of operator-weighted proposal distributions that are well defined on function space, such that the performance of the resulting MCMC samplers is independent of the discretization of the function. Second, by exploiting local Hessian information and any associated low-dimensional structure in the change from prior to posterior distributions, we develop an inhomogeneous discretization scheme for the Langevin stochastic differential equation that yields operator-weighted proposals adapted to the non-Gaussian structure of the posterior. The resulting dimension-independent, likelihood-informed (DILI) MCMC samplers may be useful for a large class of high-dimensional problems where the target probability measure has a density with respect to a Gaussian reference measure. Two nonlinear inverse problems are used to demonstrate the efficiency of these DILI samplers: an elliptic PDE coefficient inverse problem and path reconstruction in a conditioned diffusion.
△ Less
Submitted 17 October, 2015; v1 submitted 13 November, 2014;
originally announced November 2014.
-
Controlling Unpredictability with Observations in the Partially Observed Lorenz '96 Model
Authors:
K. J. H. Law,
D. Sanz-Alonso,
A. Shukla,
A. M. Stuart
Abstract:
In the context of filtering chaotic dynamical systems it is well-known that partial observations, if sufficiently informative, can be used to control the inherent uncertainty due to chaos. The purpose of this paper is to investigate, both theoretically and numerically, conditions on the observations of chaotic systems under which they can be accurately filtered. In particular, we highlight the adv…
▽ More
In the context of filtering chaotic dynamical systems it is well-known that partial observations, if sufficiently informative, can be used to control the inherent uncertainty due to chaos. The purpose of this paper is to investigate, both theoretically and numerically, conditions on the observations of chaotic systems under which they can be accurately filtered. In particular, we highlight the advantage of adaptive observation operators over fixed ones. The Lorenz '96 model is used to exemplify our findings.
We consider discrete-time and continuous-time observations in our theoretical developments. We prove that, for fixed observation operator, the 3DVAR filter can recover the system state within a neighbourhood determined by the size of the observational noise. It is required that a sufficiently large proportion of the state vector is observed, and an explicit form for such sufficient fixed observation operator is given. Numerical experiments, where the data is incorporated by use of the 3DVAR and extended Kalman filters, suggest that less informative fixed operators than given by our theory can still lead to accurate signal reconstruction. Adaptive observation operators are then studied numerically; we show that, for carefully chosen adaptive observation operators, the proportion of the state vector that needs to be observed is drastically smaller than with a fixed observation operator. Indeed, we show that the number of state coordinates that need to be observed may even be significantly smaller than the total number of positive Lyapunov exponents of the underlying system.
△ Less
Submitted 21 September, 2015; v1 submitted 12 November, 2014;
originally announced November 2014.
-
Infinite Gammoids: Minors and Duality
Authors:
Seyed Hadi Afzali Borujeni,
Hiu Fai Law,
Malte Müller
Abstract:
This sequel to our paper (Infinite gammoids, 2014) considers minors and duals of infinite gammoids. We prove that a class of gammoids definable by digraphs not containing a certain type of substructure, called an outgoing comb, is minor-closed. Also, we prove that finite-rank minors of gammoids are gammoids. Furthermore, the topological gammoids introduced by Carmesin (Topological infinite gammoid…
▽ More
This sequel to our paper (Infinite gammoids, 2014) considers minors and duals of infinite gammoids. We prove that a class of gammoids definable by digraphs not containing a certain type of substructure, called an outgoing comb, is minor-closed. Also, we prove that finite-rank minors of gammoids are gammoids. Furthermore, the topological gammoids introduced by Carmesin (Topological infinite gammoids, and a new Menger-type theorem for infinite graphs, 2014) are proved to coincide, as matroids, with the finitary gammoids. A corollary is that topological gammoids are minor-closed. It is a well-known fact that the dual of any finite strict gammoid is a transversal matroid. The class of alternating-comb-free strict gammoids, introduced in the prequel, contains examples which are not dual to any transversal matroid. However, we describe the duals of matroids in this class as a natural extension of transversal matroids. While finite gammoids are closed under duality, we construct a strict gammoid that is not dual to any gammoid.
△ Less
Submitted 9 November, 2014;
originally announced November 2014.