-
Task-specific experimental design for treatment effect estimation
Authors:
Bethany Connolly,
Kim Moore,
Tobias Schwedes,
Alexander Adam,
Gary Willis,
Ilya Feige,
Christopher Frye
Abstract:
Understanding causality should be a core requirement of any attempt to build real impact through AI. Due to the inherent unobservability of counterfactuals, large randomised trials (RCTs) are the standard for causal inference. But large experiments are generically expensive, and randomisation carries its own costs, e.g. when suboptimal decisions are trialed. Recent work has proposed more sample-ef…
▽ More
Understanding causality should be a core requirement of any attempt to build real impact through AI. Due to the inherent unobservability of counterfactuals, large randomised trials (RCTs) are the standard for causal inference. But large experiments are generically expensive, and randomisation carries its own costs, e.g. when suboptimal decisions are trialed. Recent work has proposed more sample-efficient alternatives to RCTs, but these are not adaptable to the downstream application for which the causal effect is sought. In this work, we develop a task-specific approach to experimental design and derive sampling strategies customised to particular downstream applications. Across a range of important tasks, real-world datasets, and sample sizes, our method outperforms other benchmarks, e.g. requiring an order-of-magnitude less data to match RCT performance on targeted marketing tasks.
△ Less
Submitted 8 June, 2023;
originally announced June 2023.
-
A Bayesian Framework for learning governing Partial Differential Equation from Data
Authors:
Kalpesh More,
Tapas Tripura,
Rajdip Nayek,
Souvik Chakraborty
Abstract:
The discovery of partial differential equations (PDEs) is a challenging task that involves both theoretical and empirical methods. Machine learning approaches have been developed and used to solve this problem; however, it is important to note that existing methods often struggle to identify the underlying equation accurately in the presence of noise. In this study, we present a new approach to di…
▽ More
The discovery of partial differential equations (PDEs) is a challenging task that involves both theoretical and empirical methods. Machine learning approaches have been developed and used to solve this problem; however, it is important to note that existing methods often struggle to identify the underlying equation accurately in the presence of noise. In this study, we present a new approach to discovering PDEs by combining variational Bayes and sparse linear regression. The problem of PDE discovery has been posed as a problem to learn relevant basis from a predefined dictionary of basis functions. To accelerate the overall process, a variational Bayes-based approach for discovering partial differential equations is proposed. To ensure sparsity, we employ a spike and slab prior. We illustrate the efficacy of our strategy in several examples, including Burgers, Korteweg-de Vries, Kuramoto Sivashinsky, wave equation, and heat equation (1D as well as 2D). Our method offers a promising avenue for discovering PDEs from data and has potential applications in fields such as physics, engineering, and biology.
△ Less
Submitted 7 June, 2023;
originally announced June 2023.
-
MAntRA: A framework for model agnostic reliability analysis
Authors:
Yogesh Chandrakant Mathpati,
Kalpesh Sanjay More,
Tapas Tripura,
Rajdip Nayek,
Souvik Chakraborty
Abstract:
We propose a novel model agnostic data-driven reliability analysis framework for time-dependent reliability analysis. The proposed approach -- referred to as MAntRA -- combines interpretable machine learning, Bayesian statistics, and identifying stochastic dynamic equation to evaluate reliability of stochastically-excited dynamical systems for which the governing physics is \textit{apriori} unknow…
▽ More
We propose a novel model agnostic data-driven reliability analysis framework for time-dependent reliability analysis. The proposed approach -- referred to as MAntRA -- combines interpretable machine learning, Bayesian statistics, and identifying stochastic dynamic equation to evaluate reliability of stochastically-excited dynamical systems for which the governing physics is \textit{apriori} unknown. A two-stage approach is adopted: in the first stage, an efficient variational Bayesian equation discovery algorithm is developed to determine the governing physics of an underlying stochastic differential equation (SDE) from measured output data. The developed algorithm is efficient and accounts for epistemic uncertainty due to limited and noisy data, and aleatoric uncertainty because of environmental effect and external excitation. In the second stage, the discovered SDE is solved using a stochastic integration scheme and the probability failure is computed. The efficacy of the proposed approach is illustrated on three numerical examples. The results obtained indicate the possible application of the proposed approach for reliability analysis of in-situ and heritage structures from on-site measurements.
△ Less
Submitted 12 December, 2022;
originally announced December 2022.
-
The KiTS19 Challenge Data: 300 Kidney Tumor Cases with Clinical Context, CT Semantic Segmentations, and Surgical Outcomes
Authors:
Nicholas Heller,
Niranjan Sathianathen,
Arveen Kalapara,
Edward Walczak,
Keenan Moore,
Heather Kaluzniak,
Joel Rosenberg,
Paul Blake,
Zachary Rengel,
Makinna Oestreich,
Joshua Dean,
Michael Tradewell,
Aneri Shah,
Resha Tejpaul,
Zachary Edgerton,
Matthew Peterson,
Shaneabbas Raza,
Subodh Regmi,
Nikolaos Papanikolopoulos,
Christopher Weight
Abstract:
The morphometry of a kidney tumor revealed by contrast-enhanced Computed Tomography (CT) imaging is an important factor in clinical decision making surrounding the lesion's diagnosis and treatment. Quantitative study of the relationship between kidney tumor morphology and clinical outcomes is difficult due to data scarcity and the laborious nature of manually quantifying imaging predictors. Automa…
▽ More
The morphometry of a kidney tumor revealed by contrast-enhanced Computed Tomography (CT) imaging is an important factor in clinical decision making surrounding the lesion's diagnosis and treatment. Quantitative study of the relationship between kidney tumor morphology and clinical outcomes is difficult due to data scarcity and the laborious nature of manually quantifying imaging predictors. Automatic semantic segmentation of kidneys and kidney tumors is a promising tool towards automatically quantifying a wide array of morphometric features, but no sizeable annotated dataset is currently available to train models for this task. We present the KiTS19 challenge dataset: A collection of multi-phase CT imaging, segmentation masks, and comprehensive clinical outcomes for 300 patients who underwent nephrectomy for kidney tumors at our center between 2010 and 2018. 210 (70%) of these patients were selected at random as the training set for the 2019 MICCAI KiTS Kidney Tumor Segmentation Challenge and have been released publicly. With the presence of clinical context and surgical outcomes, this data can serve not only for benchmarking semantic segmentation models, but also for develo** and studying biomarkers which make use of the imaging and semantic segmentation masks.
△ Less
Submitted 15 March, 2020; v1 submitted 31 March, 2019;
originally announced April 2019.
-
Random Walk Null Models for Time Series Data
Authors:
Daryl DeFord,
Katherine Moore
Abstract:
Permutation entropy has become a standard tool for time series analysis that exploits the temporal properties of these data sets. Many current applications use an approach based on Shannon entropy, which implicitly assumes an underlying uniform distribution of patterns. In this paper, we analyze random walk null models for time series and determine the corresponding permutation distributions. Thes…
▽ More
Permutation entropy has become a standard tool for time series analysis that exploits the temporal properties of these data sets. Many current applications use an approach based on Shannon entropy, which implicitly assumes an underlying uniform distribution of patterns. In this paper, we analyze random walk null models for time series and determine the corresponding permutation distributions. These new techniques allow us to explicitly describe the behavior of real world data in terms of more complex generative processes. Additionally, building on recent results of Martinez, we define a validation measure that allows us to determine when a random walk is an appropriate model for a time series. We demonstrate the usefulness of our methods using empirical data drawn from a variety of fields.
△ Less
Submitted 5 October, 2017;
originally announced October 2017.