-
Structured Learning of Compositional Sequential Interventions
Authors:
Jialin Yu,
Andreas Koukorinis,
Nicolò Colombo,
Yuchen Zhu,
Ricardo Silva
Abstract:
We consider sequential treatment regimes where each unit is exposed to combinations of interventions over time. When interventions are described by qualitative labels, such as ``close schools for a month due to a pandemic'' or ``promote this podcast to this user during this week'', it is unclear which appropriate structural assumptions allow us to generalize behavioral predictions to previously un…
▽ More
We consider sequential treatment regimes where each unit is exposed to combinations of interventions over time. When interventions are described by qualitative labels, such as ``close schools for a month due to a pandemic'' or ``promote this podcast to this user during this week'', it is unclear which appropriate structural assumptions allow us to generalize behavioral predictions to previously unseen combinatorial sequences. Standard black-box approaches map** sequences of categorical variables to outputs are applicable, but they rely on poorly understood assumptions on how reliable generalization can be obtained, and may underperform under sparse sequences, temporal variability, and large action spaces. To approach that, we pose an explicit model for \emph{composition}, that is, how the effect of sequential interventions can be isolated into modules, clarifying which data conditions allow for the identification of their combined effect at different units and time steps. We show the identification properties of our compositional model, inspired by advances in causal matrix factorization methods but focusing on predictive models for novel compositions of interventions instead of matrix completion tasks and causal effect estimation. We compare our approach to flexible but generic black-box models to illustrate how structure aids prediction in sparse data conditions.
△ Less
Submitted 9 June, 2024;
originally announced June 2024.
-
Bounding Causal Effects with Leaky Instruments
Authors:
David S. Watson,
Jordan Penn,
Lee M. Gunderson,
Gecia Bravo-Hermsdorff,
Afsaneh Mastouri,
Ricardo Silva
Abstract:
Instrumental variables (IVs) are a popular and powerful tool for estimating causal effects in the presence of unobserved confounding. However, classical approaches rely on strong assumptions such as the $\textit{exclusion criterion}$, which states that instrumental effects must be entirely mediated by treatments. This assumption often fails in practice. When IV methods are improperly applied to da…
▽ More
Instrumental variables (IVs) are a popular and powerful tool for estimating causal effects in the presence of unobserved confounding. However, classical approaches rely on strong assumptions such as the $\textit{exclusion criterion}$, which states that instrumental effects must be entirely mediated by treatments. This assumption often fails in practice. When IV methods are improperly applied to data that do not meet the exclusion criterion, estimated causal effects may be badly biased. In this work, we propose a novel solution that provides $\textit{partial}$ identification in linear systems given a set of $\textit{leaky instruments}$, which are allowed to violate the exclusion criterion to some limited degree. We derive a convex optimization objective that provides provably sharp bounds on the average treatment effect under some common forms of information leakage, and implement inference procedures to quantify the uncertainty of resulting estimates. We demonstrate our method in a set of experiments with simulated data, where it performs favorably against the state of the art. An accompanying $\texttt{R}$ package, $\texttt{leakyIV}$, is available from $\texttt{CRAN}$.
△ Less
Submitted 8 May, 2024; v1 submitted 5 April, 2024;
originally announced April 2024.
-
Counterfactual Fairness Is Not Demographic Parity, and Other Observations
Authors:
Ricardo Silva
Abstract:
Blanket statements of equivalence between causal concepts and purely probabilistic concepts should be approached with care. In this short note, I examine a recent claim that counterfactual fairness is equivalent to demographic parity. The claim fails to hold up upon closer examination. I will take the opportunity to address some broader misunderstandings about counterfactual fairness.
Blanket statements of equivalence between causal concepts and purely probabilistic concepts should be approached with care. In this short note, I examine a recent claim that counterfactual fairness is equivalent to demographic parity. The claim fails to hold up upon closer examination. I will take the opportunity to address some broader misunderstandings about counterfactual fairness.
△ Less
Submitted 4 February, 2024;
originally announced February 2024.
-
Intervention Generalization: A View from Factor Graph Models
Authors:
Gecia Bravo-Hermsdorff,
David S. Watson,
Jialin Yu,
Jakob Zeitler,
Ricardo Silva
Abstract:
One of the goals of causal inference is to generalize from past experiments and observational data to novel conditions. While it is in principle possible to eventually learn a map** from a novel experimental condition to an outcome of interest, provided a sufficient variety of experiments is available in the training data, co** with a large combinatorial space of possible interventions is hard…
▽ More
One of the goals of causal inference is to generalize from past experiments and observational data to novel conditions. While it is in principle possible to eventually learn a map** from a novel experimental condition to an outcome of interest, provided a sufficient variety of experiments is available in the training data, co** with a large combinatorial space of possible interventions is hard. Under a typical sparse experimental design, this map** is ill-posed without relying on heavy regularization or prior distributions. Such assumptions may or may not be reliable, and can be hard to defend or test. In this paper, we take a close look at how to warrant a leap from past experiments to novel conditions based on minimal assumptions about the factorization of the distribution of the manipulated system, communicated in the well-understood language of factor graph models. A postulated $\textit{interventional factor model}$ (IFM) may not always be informative, but it conveniently abstracts away a need for explicitly modeling unmeasured confounding and feedback mechanisms, leading to directly testable claims. Given an IFM and datasets from a collection of experimental regimes, we derive conditions for identifiability of the expected outcomes of new regimes never observed in these training data. We implement our framework using several efficient algorithms, and apply them on a range of semi-synthetic experiments.
△ Less
Submitted 8 November, 2023; v1 submitted 6 June, 2023;
originally announced June 2023.
-
Precision Dose-finding Cancer Clinical Trials in the Setting of Broadened Eligibility
Authors:
Rebecca B. Silva,
Bin Cheng,
Richard D. Carvajal,
Shing M. Lee
Abstract:
Broadening eligibility criteria in cancer trials has been advocated to represent the true patient population more accurately. While the advantages are clear in terms of generalizability and recruitment, novel dose-finding designs are needed to ensure patient safety. These designs should be able to recommend precise doses for subpopulations if such subpopulations with different toxicity profiles ex…
▽ More
Broadening eligibility criteria in cancer trials has been advocated to represent the true patient population more accurately. While the advantages are clear in terms of generalizability and recruitment, novel dose-finding designs are needed to ensure patient safety. These designs should be able to recommend precise doses for subpopulations if such subpopulations with different toxicity profiles exist. While dose-finding designs accounting for patient heterogeneity have been proposed, all existing methods assume the source of heterogeneity is known and thus pre-specify the subpopulations or only allow inclusion of a few patient characteristics. We propose a precision dose-finding design to address the setting of unknown patient heterogeneity in phase I cancer clinical trials where eligibility is expanded, and multiple eligibility criteria could potentially lead to different optimal doses for patient subgroups. The design offers a two-in-one approach to dose-finding by simultaneously selecting patient criteria that differentiate the maximum tolerated dose (MTD) and recommending the subpopulation-specific MTD if needed, using marginal models to sequentially incorporate patient covariates. Our simulation study compares the proposed design to the naive approach of assuming patient homogeneity and our design recommends multiple doses when heterogeneity exists and a single dose when no heterogeneity exists. The proposed dose-finding design addresses the challenges of broadening eligibility criteria in cancer trials and the desire for a more precise dose in the context of early phase clinical trials.
△ Less
Submitted 11 January, 2023;
originally announced January 2023.
-
Inferring urban polycentricity from the variability in human mobility patterns
Authors:
Carmen Cabrera-Arnau,
Chen Zhong,
Michael Batty,
Ricardo Silva,
Soong Moon Kang
Abstract:
The polycentric city model has gained popularity in spatial planning policy, since it is believed to overcome some of the problems often present in monocentric metropolises, ranging from congestion to difficult accessibility to jobs and services. However, the concept 'polycentric city' has a fuzzy definition and as a result, the extent to which a city is polycentric cannot be easily determined. He…
▽ More
The polycentric city model has gained popularity in spatial planning policy, since it is believed to overcome some of the problems often present in monocentric metropolises, ranging from congestion to difficult accessibility to jobs and services. However, the concept 'polycentric city' has a fuzzy definition and as a result, the extent to which a city is polycentric cannot be easily determined. Here, we leverage the fine spatio-temporal resolution of smart travel card data to infer urban polycentricity by examining how a city departs from a well-defined monocentric model. In particular, we analyse the human movements that arise as a result of sophisticated forms of urban structure by introducing a novel probabilistic approach which captures the complexity of these human movements. We focus on London (UK) and Seoul (South Korea) as our two case studies, and we specifically find evidence that London displays a higher degree of monocentricity than Seoul, suggesting that Seoul is likely to be more polycentric than London.
△ Less
Submitted 7 December, 2022;
originally announced December 2022.
-
A family of mixture models for beta valued DNA methylation data
Authors:
Koyel Majumdar,
Romina Silva,
Antoinette Sabrina Perry,
Ronald William Watson,
Andrea Rau,
Florence Jaffrezic,
Thomas Brendan Murphy,
Isobel Claire Gormley
Abstract:
As hypermethylation of promoter cytosine-guanine dinucleotide (CpG) islands has been shown to silence tumour suppressor genes, identifying differentially methylated CpG sites between different samples can assist in understanding disease. Differentially methylated CpG sites (DMCs) can be identified using moderated t-tests or nonparametric tests, but this typically requires the use of data transform…
▽ More
As hypermethylation of promoter cytosine-guanine dinucleotide (CpG) islands has been shown to silence tumour suppressor genes, identifying differentially methylated CpG sites between different samples can assist in understanding disease. Differentially methylated CpG sites (DMCs) can be identified using moderated t-tests or nonparametric tests, but this typically requires the use of data transformations due to a lack of appropriate statistical methods able to adequately account for the bounded nature of DNA methylation data.
We propose a family of beta mixture models (BMMs) which use a model-based approach to cluster CpG sites given their original beta-valued methylation data, with no need for transformations. The BMMs allow (i) objective inference of methylation state thresholds and (ii) identification of DMCs between different sample types. The BMMs employ different parameter constraints facilitating application to different study settings. Parameter estimation proceeds via an expectation-maximisation algorithm, with a novel approximation in the maximization step providing tractability and computational feasibility.
Performance of BMMs is assessed through thorough simulation studies, and the BMMs are used to analyse a prostate cancer dataset. The BMMs objectively infer intuitive and biologically interpretable methylation state thresholds, and identify DMCs that are related to genes implicated in carcinogenesis and involved in cancer related pathways. An R package betaclust facilitates widespread use of BMMs.
△ Less
Submitted 18 March, 2024; v1 submitted 3 November, 2022;
originally announced November 2022.
-
No Pattern, No Recognition: a Survey about Reproducibility and Distortion Issues of Text Clustering and Topic Modeling
Authors:
Marília Costa Rosendo Silva,
Felipe Alves Siqueira,
João Pedro Mantovani Tarrega,
João Vitor Pataca Beinotti,
Augusto Sousa Nunes,
Miguel de Mattos Gardini,
Vinícius Adolfo Pereira da Silva,
Nádia Félix Felipe da Silva,
André Carlos Ponce de Leon Ferreira de Carvalho
Abstract:
Extracting knowledge from unlabeled texts using machine learning algorithms can be complex. Document categorization and information retrieval are two applications that may benefit from unsupervised learning (e.g., text clustering and topic modeling), including exploratory data analysis. However, the unsupervised learning paradigm poses reproducibility issues. The initialization can lead to variabi…
▽ More
Extracting knowledge from unlabeled texts using machine learning algorithms can be complex. Document categorization and information retrieval are two applications that may benefit from unsupervised learning (e.g., text clustering and topic modeling), including exploratory data analysis. However, the unsupervised learning paradigm poses reproducibility issues. The initialization can lead to variability depending on the machine learning algorithm. Furthermore, the distortions can be misleading when regarding cluster geometry. Amongst the causes, the presence of outliers and anomalies can be a determining factor. Despite the relevance of initialization and outlier issues for text clustering and topic modeling, the authors did not find an in-depth analysis of them. This survey provides a systematic literature review (2011-2022) of these subareas and proposes a common terminology since similar procedures have different terms. The authors describe research opportunities, trends, and open issues. The appendices summarize the theoretical background of the text vectorization, the factorization, and the clustering algorithms that are directly or indirectly related to the reviewed works.
△ Less
Submitted 2 August, 2022;
originally announced August 2022.
-
Causal Machine Learning: A Survey and Open Problems
Authors:
Jean Kaddour,
Aengus Lynch,
Qi Liu,
Matt J. Kusner,
Ricardo Silva
Abstract:
Causal Machine Learning (CausalML) is an umbrella term for machine learning methods that formalize the data-generation process as a structural causal model (SCM). This perspective enables us to reason about the effects of changes to this process (interventions) and what would have happened in hindsight (counterfactuals). We categorize work in CausalML into five groups according to the problems the…
▽ More
Causal Machine Learning (CausalML) is an umbrella term for machine learning methods that formalize the data-generation process as a structural causal model (SCM). This perspective enables us to reason about the effects of changes to this process (interventions) and what would have happened in hindsight (counterfactuals). We categorize work in CausalML into five groups according to the problems they address: (1) causal supervised learning, (2) causal generative modeling, (3) causal explanations, (4) causal fairness, and (5) causal reinforcement learning. We systematically compare the methods in each category and point out open problems. Further, we review data-modality-specific applications in computer vision, natural language processing, and graph representation learning. Finally, we provide an overview of causal benchmarks and a critical discussion of the state of this nascent field, including recommendations for future work.
△ Less
Submitted 21 July, 2022; v1 submitted 30 June, 2022;
originally announced June 2022.
-
Causal Inference with Treatment Measurement Error: A Nonparametric Instrumental Variable Approach
Authors:
Yuchen Zhu,
Limor Gultchin,
Arthur Gretton,
Matt Kusner,
Ricardo Silva
Abstract:
We propose a kernel-based nonparametric estimator for the causal effect when the cause is corrupted by error. We do so by generalizing estimation in the instrumental variable setting. Despite significant work on regression with measurement error, additionally handling unobserved confounding in the continuous setting is non-trivial: we have seen little prior work. As a by-product of our investigati…
▽ More
We propose a kernel-based nonparametric estimator for the causal effect when the cause is corrupted by error. We do so by generalizing estimation in the instrumental variable setting. Despite significant work on regression with measurement error, additionally handling unobserved confounding in the continuous setting is non-trivial: we have seen little prior work. As a by-product of our investigation, we clarify a connection between mean embeddings and characteristic functions, and how learning one simultaneously allows one to learn the other. This opens the way for kernel method research to leverage existing results in characteristic function estimation. Finally, we empirically show that our proposed method, MEKIV, improves over baselines and is robust under changes in the strength of measurement error and to the type of error distributions.
△ Less
Submitted 18 June, 2022;
originally announced June 2022.
-
Modified Galton-Watson processes with immigration under an alternative offspring mechanism
Authors:
Wagner Barreto-Souza,
Sokol Ndreca,
Rodrigo B. Silva,
Roger W. C. Silva
Abstract:
We propose a novel class of count time series models alternative to the classic Galton-Watson process with immigration (GWI) and Bernoulli offspring. A new offspring mechanism is developed and its properties are explored. This novel mechanism, called geometric thinning operator, is used to define a class of modified GWI (MGWI) processes, which induces a certain non-linearity to the models. We show…
▽ More
We propose a novel class of count time series models alternative to the classic Galton-Watson process with immigration (GWI) and Bernoulli offspring. A new offspring mechanism is developed and its properties are explored. This novel mechanism, called geometric thinning operator, is used to define a class of modified GWI (MGWI) processes, which induces a certain non-linearity to the models. We show that this non-linearity can produce better results in terms of prediction when compared to the linear case commonly considered in the literature. We explore both stationary and non-stationary versions of our MGWI processes. Inference on the model parameters is addressed and the finite-sample behavior of the estimators investigated through Monte Carlo simulations. Two real data sets are analyzed to illustrate the stationary and non-stationary cases and the gain of the non-linearity induced for our method over the existing linear methods. A generalization of the geometric thinning operator and an associated MGWI process are also proposed and motivated for dealing with zero-inflated or zero-deflated count time series data.
△ Less
Submitted 1 June, 2022;
originally announced June 2022.
-
Reductive MDPs: A Perspective Beyond Temporal Horizons
Authors:
Thomas Spooner,
Rui Silva,
Joshua Lockhart,
Jason Long,
Vacslav Glukhov
Abstract:
Solving general Markov decision processes (MDPs) is a computationally hard problem. Solving finite-horizon MDPs, on the other hand, is highly tractable with well known polynomial-time algorithms. What drives this extreme disparity, and do problems exist that lie between these diametrically opposed complexities? In this paper we identify and analyse a sub-class of stochastic shortest path problems…
▽ More
Solving general Markov decision processes (MDPs) is a computationally hard problem. Solving finite-horizon MDPs, on the other hand, is highly tractable with well known polynomial-time algorithms. What drives this extreme disparity, and do problems exist that lie between these diametrically opposed complexities? In this paper we identify and analyse a sub-class of stochastic shortest path problems (SSPs) for general state-action spaces whose dynamics satisfy a particular drift condition. This construction generalises the traditional, temporal notion of a horizon via decreasing reachability: a property called reductivity. It is shown that optimal policies can be recovered in polynomial-time for reductive SSPs -- via an extension of backwards induction -- with an efficient analogue in reductive MDPs. The practical considerations of the proposed approach are discussed, and numerical verification provided on a canonical optimal liquidation problem.
△ Less
Submitted 15 May, 2022;
originally announced May 2022.
-
Causal discovery under a confounder blanket
Authors:
David S. Watson,
Ricardo Silva
Abstract:
Inferring causal relationships from observational data is rarely straightforward, but the problem is especially difficult in high dimensions. For these applications, causal discovery algorithms typically require parametric restrictions or extreme sparsity constraints. We relax these assumptions and focus on an important but more specialized problem, namely recovering the causal order among a subgr…
▽ More
Inferring causal relationships from observational data is rarely straightforward, but the problem is especially difficult in high dimensions. For these applications, causal discovery algorithms typically require parametric restrictions or extreme sparsity constraints. We relax these assumptions and focus on an important but more specialized problem, namely recovering the causal order among a subgraph of variables known to descend from some (possibly large) set of confounding covariates, i.e. a $\textit{confounder blanket}$. This is useful in many settings, for example when studying a dynamic biomolecular subsystem with genetic data providing background information. Under a structural assumption called the $\textit{confounder blanket principle}$, which we argue is essential for tractable causal discovery in high dimensions, our method accommodates graphs of low or high sparsity while maintaining polynomial time complexity. We present a structure learning algorithm that is provably sound and complete with respect to a so-called $\textit{lazy oracle}$. We design inference procedures with finite sample error control for linear and nonlinear systems, and demonstrate our approach on a range of simulated and real-world datasets. An accompanying $\texttt{R}$ package, $\texttt{cbl}$, is available from $\texttt{CRAN}$.
△ Less
Submitted 28 June, 2022; v1 submitted 11 May, 2022;
originally announced May 2022.
-
Sequential time-window learning with approximate Bayesian computation: an application to epidemic forecasting
Authors:
João Pedro Valeriano,
Pedro Henrique Cintra,
Gustavo Libotte,
Igor Reis,
Felipe Fontinele,
Renato Silva,
Sandra Malta
Abstract:
The long duration of the COVID-19 pandemic allowed for multiple bursts in the infection and death rates, the so-called epidemic waves. This complex behavior is no longer tractable by simple compartmental model and requires more sophisticated mathematical techniques for analyzing epidemic data and generating reliable forecasts. In this work, we propose a framework for analyzing complex dynamical sy…
▽ More
The long duration of the COVID-19 pandemic allowed for multiple bursts in the infection and death rates, the so-called epidemic waves. This complex behavior is no longer tractable by simple compartmental model and requires more sophisticated mathematical techniques for analyzing epidemic data and generating reliable forecasts. In this work, we propose a framework for analyzing complex dynamical systems by dividing the data in consecutive time-windows to be separately analyzed. We fit parameters for each time-window through an Approximate Bayesian Computation (ABC) algorithm, and the posterior distribution of parameters obtained for one window is used as the prior distribution for the next window. This Bayesian learning approach is tested with data on COVID-19 cases in multiple countries and is shown to improve ABC performance and to produce good short-term forecasting.
△ Less
Submitted 21 March, 2022;
originally announced March 2022.
-
The Causal Marginal Polytope for Bounding Treatment Effects
Authors:
Jakob Zeitler,
Ricardo Silva
Abstract:
Due to unmeasured confounding, it is often not possible to identify causal effects from a postulated model. Nevertheless, we can ask for partial identification, which usually boils down to finding upper and lower bounds of a causal quantity of interest derived from all solutions compatible with the encoded structural assumptions. One appealing way to derive such bounds is by casting it in terms of…
▽ More
Due to unmeasured confounding, it is often not possible to identify causal effects from a postulated model. Nevertheless, we can ask for partial identification, which usually boils down to finding upper and lower bounds of a causal quantity of interest derived from all solutions compatible with the encoded structural assumptions. One appealing way to derive such bounds is by casting it in terms of a constrained optimization method that searches over all causal models compatible with evidence, as introduced in the classic work of Balke and Pearl (1994) for discrete data. Although by construction this guarantees tight bounds, it poses a formidable computational challenge. To cope with this issue, alternatives include algorithms that are not guaranteed to be tight, or by introducing restrictions on the class of models. In this paper, we introduce a novel alternative: inspired by ideas coming from belief propagation, we enforce compatibility between marginals of a causal model and data, without constructing a global causal model. We call this collection of locally consistent marginals the causal marginal polytope. As global independence constraints disappear when considering small dimensional tractable marginals, this also leads to a rethinking of how to elicit and express causal knowledge. We provide an explicit algorithm and implementation of this idea, and assess its practicality with numerical experiments.
△ Less
Submitted 28 February, 2022;
originally announced February 2022.
-
Stochastic Causal Programming for Bounding Treatment Effects
Authors:
Kirtan Padh,
Jakob Zeitler,
David Watson,
Matt Kusner,
Ricardo Silva,
Niki Kilbertus
Abstract:
Causal effect estimation is important for many tasks in the natural and social sciences. We design algorithms for the continuous partial identification problem: bounding the effects of multivariate, continuous treatments when unmeasured confounding makes identification impossible. Specifically, we cast causal effects as objective functions within a constrained optimization problem, and minimize/ma…
▽ More
Causal effect estimation is important for many tasks in the natural and social sciences. We design algorithms for the continuous partial identification problem: bounding the effects of multivariate, continuous treatments when unmeasured confounding makes identification impossible. Specifically, we cast causal effects as objective functions within a constrained optimization problem, and minimize/maximize these functions to obtain bounds. We combine flexible learning algorithms with Monte Carlo methods to implement a family of solutions under the name of stochastic causal programming. In particular, we show how the generic framework can be efficiently formulated in settings where auxiliary variables are clustered into pre-treatment and post-treatment sets, where no fine-grained causal graph can be easily specified. In these settings, we can avoid the need for fully specifying the distribution family of hidden common causes. Monte Carlo computation is also much simplified, leading to algorithms which are more computationally stable against alternatives.
△ Less
Submitted 17 May, 2023; v1 submitted 22 February, 2022;
originally announced February 2022.
-
When Do Flat Minima Optimizers Work?
Authors:
Jean Kaddour,
Linqing Liu,
Ricardo Silva,
Matt J. Kusner
Abstract:
Recently, flat-minima optimizers, which seek to find parameters in low-loss neighborhoods, have been shown to improve a neural network's generalization performance over stochastic and adaptive gradient-based optimizers. Two methods have received significant attention due to their scalability: 1. Stochastic Weight Averaging (SWA), and 2. Sharpness-Aware Minimization (SAM). However, there has been l…
▽ More
Recently, flat-minima optimizers, which seek to find parameters in low-loss neighborhoods, have been shown to improve a neural network's generalization performance over stochastic and adaptive gradient-based optimizers. Two methods have received significant attention due to their scalability: 1. Stochastic Weight Averaging (SWA), and 2. Sharpness-Aware Minimization (SAM). However, there has been limited investigation into their properties and no systematic benchmarking of them across different domains. We fill this gap here by comparing the loss surfaces of the models trained with each method and through broad benchmarking across computer vision, natural language processing, and graph representation learning tasks. We discover several surprising findings from these results, which we hope will help researchers further improve deep learning optimizers, and practitioners identify the right optimizer for their problem.
△ Less
Submitted 27 January, 2023; v1 submitted 1 February, 2022;
originally announced February 2022.
-
Operationalizing Complex Causes: A Pragmatic View of Mediation
Authors:
Limor Gultchin,
David S. Watson,
Matt J. Kusner,
Ricardo Silva
Abstract:
We examine the problem of causal response estimation for complex objects (e.g., text, images, genomics). In this setting, classical \emph{atomic} interventions are often not available (e.g., changes to characters, pixels, DNA base-pairs). Instead, we only have access to indirect or \emph{crude} interventions (e.g., enrolling in a writing program, modifying a scene, applying a gene therapy). In thi…
▽ More
We examine the problem of causal response estimation for complex objects (e.g., text, images, genomics). In this setting, classical \emph{atomic} interventions are often not available (e.g., changes to characters, pixels, DNA base-pairs). Instead, we only have access to indirect or \emph{crude} interventions (e.g., enrolling in a writing program, modifying a scene, applying a gene therapy). In this work, we formalize this problem and provide an initial solution. Given a collection of candidate mediators, we propose (a) a two-step method for predicting the causal responses of crude interventions; and (b) a testing procedure to identify mediators of crude interventions. We demonstrate, on a range of simulated and real-world-inspired examples, that our approach allows us to efficiently estimate the effect of crude interventions with limited data from new treatment regimes.
△ Less
Submitted 10 June, 2021; v1 submitted 9 June, 2021;
originally announced June 2021.
-
Parameter Estimation for Grouped Data Using EM and MCEM Algorithms
Authors:
Zahra A. Shirazi,
João Pedro A. R. da Silva,
Camila P. E. de Souza
Abstract:
Nowadays, the confidentiality of data and information is of great importance for many companies and organizations. For this reason, they may prefer not to release exact data, but instead to grant researchers access to approximate data. For example, rather than providing the exact measurements of their clients, they may only provide researchers with grouped data, that is, the number of clients fall…
▽ More
Nowadays, the confidentiality of data and information is of great importance for many companies and organizations. For this reason, they may prefer not to release exact data, but instead to grant researchers access to approximate data. For example, rather than providing the exact measurements of their clients, they may only provide researchers with grouped data, that is, the number of clients falling in each of a set of non-overlap** measurement intervals. The challenge is to estimate the mean and variance structure of the hidden ungrouped data based on the observed grouped data. To tackle this problem, this work considers the exact observed data likelihood and applies the Expectation-Maximization (EM) and Monte-Carlo EM (MCEM) algorithms for cases where the hidden data follow a univariate, bivariate, or multivariate normal distribution. Simulation studies are conducted to evaluate the performance of the proposed EM and MCEM algorithms. The well-known Galton data set is considered as an application example.
△ Less
Submitted 22 December, 2021; v1 submitted 5 June, 2021;
originally announced June 2021.
-
Causal Effect Inference for Structured Treatments
Authors:
Jean Kaddour,
Yuchen Zhu,
Qi Liu,
Matt J. Kusner,
Ricardo Silva
Abstract:
We address the estimation of conditional average treatment effects (CATEs) for structured treatments (e.g., graphs, images, texts). Given a weak condition on the effect, we propose the generalized Robinson decomposition, which (i) isolates the causal estimand (reducing regularization bias), (ii) allows one to plug in arbitrary models for learning, and (iii) possesses a quasi-oracle convergence gua…
▽ More
We address the estimation of conditional average treatment effects (CATEs) for structured treatments (e.g., graphs, images, texts). Given a weak condition on the effect, we propose the generalized Robinson decomposition, which (i) isolates the causal estimand (reducing regularization bias), (ii) allows one to plug in arbitrary models for learning, and (iii) possesses a quasi-oracle convergence guarantee under mild assumptions. In experiments with small-world and molecular graphs we demonstrate that our approach outperforms prior work in CATE estimation.
△ Less
Submitted 27 October, 2021; v1 submitted 3 June, 2021;
originally announced June 2021.
-
Fractional Poisson random sum and its associated normal variance mixture
Authors:
Gabriela Oliveira,
Wagner Barreto-Souza,
Roger W. C. Silva
Abstract:
In this work, we study the partial sums of independent and identically distributed random variables with the number of terms following a fractional Poisson (FP) distribution. The FP sum contains the Poisson and geometric summations as particular cases. We show that the weak limit of the FP summation, when properly normalized, is a mixture between the normal and Mittag-Leffler distributions, which…
▽ More
In this work, we study the partial sums of independent and identically distributed random variables with the number of terms following a fractional Poisson (FP) distribution. The FP sum contains the Poisson and geometric summations as particular cases. We show that the weak limit of the FP summation, when properly normalized, is a mixture between the normal and Mittag-Leffler distributions, which we call by Normal-Mittag-Leffler (NML) law. A parameter estimation procedure for the NML distribution is developed and the associated asymptotic distribution is derived. Simulations are performed to check the performance of the proposed estimators under finite samples. An empirical illustration on the daily log-returns of the Brazilian stock exchange index (IBOVESPA) shows that the NML distribution captures better the tails than some of its competitors. Related problems such as a mixed Poisson representation for the FP law and the weak convergence for the Conway-Maxwell-Poisson random sum are also addressed.
△ Less
Submitted 31 March, 2021; v1 submitted 9 March, 2021;
originally announced March 2021.
-
Surrogate-based Bayesian Comparison of Computationally Expensive Models: Application to Microbially Induced Calcite Precipitation
Authors:
Stefania Scheurer,
Aline Schäfer Rodrigues Silva,
Farid Mohammadi,
Johannes Hommel,
Sergey Oladyshkin,
Bernd Flemisch,
Wolfgang Nowak
Abstract:
Geochemical processes in subsurface reservoirs affected by microbial activity change the material properties of porous media. This is a complex biogeochemical process in subsurface reservoirs that currently contains strong conceptual uncertainty. This means, several modeling approaches describing the biogeochemical process are plausible and modelers face the uncertainty of choosing the most approp…
▽ More
Geochemical processes in subsurface reservoirs affected by microbial activity change the material properties of porous media. This is a complex biogeochemical process in subsurface reservoirs that currently contains strong conceptual uncertainty. This means, several modeling approaches describing the biogeochemical process are plausible and modelers face the uncertainty of choosing the most appropriate one. Once observation data becomes available, a rigorous Bayesian model selection accompanied by a Bayesian model justifiability analysis could be employed to choose the most appropriate model, i.e. the one that describes the underlying physical processes best in the light of the available data. However, biogeochemical modeling is computationally very demanding because it conceptualizes different phases, biomass dynamics, geochemistry, precipitation and dissolution in porous media. Therefore, the Bayesian framework cannot be based directly on the full computational models as this would require too many expensive model evaluations. To circumvent this problem, we suggest performing both Bayesian model selection and justifiability analysis after constructing surrogates for the competing biogeochemical models. Here, we use the arbitrary polynomial chaos expansion. We account for the approximation error in the Bayesian analysis by introducing novel correction factors for the resulting model weights. Thereby, we extend the Bayesian justifiability analysis and assess model similarities for computationally expensive models. We demonstrate the method on a representative scenario for microbially induced calcite precipitation in a porous medium. Our extension of the justifiability analysis provides a suitable approach for the comparison of computationally demanding models and gives an insight on the necessary amount of data for a reliable model performance.
△ Less
Submitted 25 November, 2020;
originally announced November 2020.
-
Short-term forecasting of Amazon rainforest fires based on ensemble decomposition model
Authors:
Ramon Gomes da Silva,
Matheus Henrique Dal Molin Ribeiro,
Viviana Cocco Mariani,
Leandro dos Santos Coelho
Abstract:
Accurate forecasting is important for decision-makers. Recently, the Amazon rainforest is reaching record levels of the number of fires, a situation that concerns both climate and public health problems. Obtaining the desired forecasting accuracy becomes difficult and challenging. In this paper were developed a novel heterogeneous decomposition-ensemble model by using Seasonal and Trend decomposit…
▽ More
Accurate forecasting is important for decision-makers. Recently, the Amazon rainforest is reaching record levels of the number of fires, a situation that concerns both climate and public health problems. Obtaining the desired forecasting accuracy becomes difficult and challenging. In this paper were developed a novel heterogeneous decomposition-ensemble model by using Seasonal and Trend decomposition based on Loess in combination with algorithms for short-term load forecasting multi-month-ahead, to explore temporal patterns of Amazon rainforest fires in Brazil. The results demonstrate the proposed decomposition-ensemble models can provide more accurate forecasting evaluated by performance measures. Diebold-Mariano statistical test showed the proposed models are better than other compared models, but it is statistically equal to one of them.
△ Less
Submitted 23 July, 2020; v1 submitted 15 July, 2020;
originally announced July 2020.
-
Exact and computationally efficient Bayesian inference for generalized Markov modulated Poisson processes
Authors:
Flavio B. Gonçalves,
Livia M. Dutra,
Roger W. C. Silva
Abstract:
Statistical modeling of point patterns is an important and common problem in several areas. The Poisson process is the most common process used for this purpose, in particular, its generalization that considers the intensity function to be stochastic. This is called a Cox process and different choices to model the dynamics of the intensity gives rise to a wide range of models. We present a new cla…
▽ More
Statistical modeling of point patterns is an important and common problem in several areas. The Poisson process is the most common process used for this purpose, in particular, its generalization that considers the intensity function to be stochastic. This is called a Cox process and different choices to model the dynamics of the intensity gives rise to a wide range of models. We present a new class of unidimensional Cox process models in which the intensity function assumes parametric functional forms that switch among them according to a continuous-time Markov chain. A novel methodology is proposed to perform exact Bayesian inference based on MCMC algorithms. The term exact refers to the fact that no discrete time approximation is used and Monte Carlo error is the only source of inaccuracy. The reliability of the algorithms depends on a variety of specifications which are carefully addressed, resulting in a computationally efficient (in terms of computing time) algorithm and enabling its use with large data sets. Simulated and real examples are presented to illustrate the efficiency and applicability of the proposed methodology. A specific model to fit epidemic curves is proposed and used to analyze data from Dengue Fever in Brazil and COVID-19 in some countries.
△ Less
Submitted 25 February, 2021; v1 submitted 17 June, 2020;
originally announced June 2020.
-
A Class of Algorithms for General Instrumental Variable Models
Authors:
Niki Kilbertus,
Matt J. Kusner,
Ricardo Silva
Abstract:
Causal treatment effect estimation is a key problem that arises in a variety of real-world settings, from personalized medicine to governmental policy making. There has been a flurry of recent work in machine learning on estimating causal effects when one has access to an instrument. However, to achieve identifiability, they in general require one-size-fits-all assumptions such as an additive erro…
▽ More
Causal treatment effect estimation is a key problem that arises in a variety of real-world settings, from personalized medicine to governmental policy making. There has been a flurry of recent work in machine learning on estimating causal effects when one has access to an instrument. However, to achieve identifiability, they in general require one-size-fits-all assumptions such as an additive error model for the outcome. An alternative is partial identification, which provides bounds on the causal effect. Little exists in terms of bounding methods that can deal with the most general case, where the treatment itself can be continuous. Moreover, bounding methods generally do not allow for a continuum of assumptions on the shape of the causal effect that can smoothly trade off stronger background knowledge for more informative bounds. In this work, we provide a method for causal effect bounding in continuous distributions, leveraging recent advances in gradient-based methods for the optimization of computationally intractable objective functions. We demonstrate on a set of synthetic and real-world data that our bounds capture the causal effect when additive methods fail, providing a useful range of answers compatible with observation as opposed to relying on unwarranted structural assumptions.
△ Less
Submitted 21 October, 2020; v1 submitted 11 June, 2020;
originally announced June 2020.
-
Learning Joint Nonlinear Effects from Single-variable Interventions in the Presence of Hidden Confounders
Authors:
Sorawit Saengkyongam,
Ricardo Silva
Abstract:
We propose an approach to estimate the effect of multiple simultaneous interventions in the presence of hidden confounders. To overcome the problem of hidden confounding, we consider the setting where we have access to not only the observational data but also sets of single-variable interventions in which each of the treatment variables is intervened on separately. We prove identifiability under t…
▽ More
We propose an approach to estimate the effect of multiple simultaneous interventions in the presence of hidden confounders. To overcome the problem of hidden confounding, we consider the setting where we have access to not only the observational data but also sets of single-variable interventions in which each of the treatment variables is intervened on separately. We prove identifiability under the assumption that the data is generated from a nonlinear continuous structural causal model with additive Gaussian noise. In addition, we propose a simple parameter estimation method by pooling all the data from different regimes and jointly maximizing the combined likelihood. We also conduct comprehensive experiments to verify the identifiability result as well as to compare the performance of our approach against a baseline on both synthetic and real-world data.
△ Less
Submitted 16 June, 2020; v1 submitted 23 May, 2020;
originally announced May 2020.
-
Multivariate Log-Skewed Distributions with normal kernel and their Applications
Authors:
Marina M. de Queiroz,
Rosangela H. Loschi,
Roger W. C. Silva
Abstract:
We introduce two classes of multivariate log skewed distributions with normal kernel: the log canonical fundamental skew-normal (log-CFUSN) and the log unified skew-normal (log-SUN). We also discuss some properties of the log-CFUSN family of distributions. These new classes of log-skewed distributions include the log-normal and multivariate log-skew normal families as particular cases. We discuss…
▽ More
We introduce two classes of multivariate log skewed distributions with normal kernel: the log canonical fundamental skew-normal (log-CFUSN) and the log unified skew-normal (log-SUN). We also discuss some properties of the log-CFUSN family of distributions. These new classes of log-skewed distributions include the log-normal and multivariate log-skew normal families as particular cases. We discuss some issues related to Bayesian inference in the log-CFUSN family of distributions, mainly we focus on how to model the prior uncertainty about the skewing parameter. Based on the stochastic representation of the log-CFUSN family, we propose a data augmentation strategy for sampling from the posterior distributions. This proposed family is used to analyze the US national monthly precipitation data. We conclude that a high dimensional skewing function lead to a better model fit.
△ Less
Submitted 1 May, 2020;
originally announced May 2020.
-
Forecasting in Non-stationary Environments with Fuzzy Time Series
Authors:
Petrônio Cândido de Lima e Silva,
Carlos Alberto Severiano Junior,
Marcos Antonio Alves,
Rodrigo Silva,
Miri Weiss Cohen,
Frederico Gadelha Guimarães
Abstract:
In this paper we introduce a Non-Stationary Fuzzy Time Series (NSFTS) method with time varying parameters adapted from the distribution of the data. In this approach, we employ Non-Stationary Fuzzy Sets, in which perturbation functions are used to adapt the membership function parameters in the knowledge base in response to statistical changes in the time series. The proposed method is capable of…
▽ More
In this paper we introduce a Non-Stationary Fuzzy Time Series (NSFTS) method with time varying parameters adapted from the distribution of the data. In this approach, we employ Non-Stationary Fuzzy Sets, in which perturbation functions are used to adapt the membership function parameters in the knowledge base in response to statistical changes in the time series. The proposed method is capable of dynamically adapting its fuzzy sets to reflect the changes in the stochastic process based on the residual errors, without the need to retraining the model. This method can handle non-stationary and heteroskedastic data as well as scenarios with concept-drift. The proposed approach allows the model to be trained only once and remain useful long after while kee** reasonable accuracy. The flexibility of the method by means of computational experiments was tested with eight synthetic non-stationary time series data with several kinds of concept drifts, four real market indices (Dow Jones, NASDAQ, SP500 and TAIEX), three real FOREX pairs (EUR-USD, EUR-GBP, GBP-USD), and two real cryptocoins exchange rates (Bitcoin-USD and Ethereum-USD). As competitor models the Time Variant fuzzy time series and the Incremental Ensemble were used, these are two of the major approaches for handling non-stationary data sets. Non-parametric tests are employed to check the significance of the results. The proposed method shows resilience to concept drift, by adapting parameters of the model, while preserving the symbolic structure of the knowledge base.
△ Less
Submitted 26 April, 2020;
originally announced April 2020.
-
Visualizing and Understanding Large-Scale Assessments in Mathematics through Dimensionality Reduction
Authors:
Esdras Medeiros,
Jorge Lira,
Romildo Silva,
Caio Azevedo
Abstract:
In this paper, we apply the Logistic PCA (LPCA) as a dimensionality reduction tool for visualizing patterns and characterizing the relevance of mathematics abilities from a given population measured by a large-scale assessment. We establish an equivalence of parameters between LPCA, Inner Product Representation (IPR) and the two paramenter logistic model (2PL) from the Item Response Theory (IRT).…
▽ More
In this paper, we apply the Logistic PCA (LPCA) as a dimensionality reduction tool for visualizing patterns and characterizing the relevance of mathematics abilities from a given population measured by a large-scale assessment. We establish an equivalence of parameters between LPCA, Inner Product Representation (IPR) and the two paramenter logistic model (2PL) from the Item Response Theory (IRT). This equivalence provides three complemetary ways of looking at data that assists professionals in education to perform in-context interpretations. Particularly, we analyse the data collected from SPAECE, a large-scale assessment in Mathematics that has been applied yearly in the public educational system of the state of Ceará, Brazil. As the main result, we show that the the poor performance of examinees in the end of middle school is primarily caused by their disabilities in number sense.
△ Less
Submitted 31 May, 2020; v1 submitted 3 March, 2020;
originally announced March 2020.
-
Differentiable Causal Backdoor Discovery
Authors:
Limor Gultchin,
Matt J. Kusner,
Varun Kanade,
Ricardo Silva
Abstract:
Discovering the causal effect of a decision is critical to nearly all forms of decision-making. In particular, it is a key quantity in drug development, in crafting government policy, and when implementing a real-world machine learning system. Given only observational data, confounders often obscure the true causal effect. Luckily, in some cases, it is possible to recover the causal effect by usin…
▽ More
Discovering the causal effect of a decision is critical to nearly all forms of decision-making. In particular, it is a key quantity in drug development, in crafting government policy, and when implementing a real-world machine learning system. Given only observational data, confounders often obscure the true causal effect. Luckily, in some cases, it is possible to recover the causal effect by using certain observed variables to adjust for the effects of confounders. However, without access to the true causal model, finding this adjustment requires brute-force search. In this work, we present an algorithm that exploits auxiliary variables, similar to instruments, in order to find an appropriate adjustment by a gradient-based optimization method. We demonstrate that it outperforms practical alternatives in estimating the true causal effect, without knowledge of the full causal graph.
△ Less
Submitted 3 March, 2020;
originally announced March 2020.
-
Generalized Autoregressive Neural Network Models
Authors:
Renato Rodrigues Silva
Abstract:
A time series is a sequence of observations taken sequentially in time. The autoregressive integrated moving average is a class of the model more used for times series data. However, this class of model has two critical limitations. It fits well onlyGaussian data with the linear structure of correlation. Here, I present a new model named as generalized autoregressive neural networks, GARNN. The GA…
▽ More
A time series is a sequence of observations taken sequentially in time. The autoregressive integrated moving average is a class of the model more used for times series data. However, this class of model has two critical limitations. It fits well onlyGaussian data with the linear structure of correlation. Here, I present a new model named as generalized autoregressive neural networks, GARNN. The GARNN is an extension of the generalized linear model where the mean marginal depends on the lagged values via the inclusion of the neural network in the link function. A practical application of the model is shown using a well-known poliomyelitis case number, originated analyzed by Zeger and Qaqish (1988),
△ Less
Submitted 13 February, 2020;
originally announced February 2020.
-
Neural Network Approximation of Graph Fourier Transforms for Sparse Sampling of Networked Flow Dynamics
Authors:
Alessio Pagani,
Zhuangkun Wei,
Ricardo Silva,
Weisi Guo
Abstract:
Infrastructure monitoring is critical for safe operations and sustainability. Water distribution networks (WDNs) are large-scale networked critical systems with complex cascade dynamics which are difficult to predict. Ubiquitous monitoring is expensive and a key challenge is to infer the contaminant dynamics from partial sparse monitoring data. Existing approaches use multi-objective optimisation…
▽ More
Infrastructure monitoring is critical for safe operations and sustainability. Water distribution networks (WDNs) are large-scale networked critical systems with complex cascade dynamics which are difficult to predict. Ubiquitous monitoring is expensive and a key challenge is to infer the contaminant dynamics from partial sparse monitoring data. Existing approaches use multi-objective optimisation to find the minimum set of essential monitoring points, but lack performance guarantees and a theoretical framework.
Here, we first develop Graph Fourier Transform (GFT) operators to compress networked contamination spreading dynamics to identify the essential principle data collection points with inference performance guarantees. We then build autoencoder (AE) inspired neural networks (NN) to generalize the GFT sampling process and under-sample further from the initial sampling set, allowing a very small set of data points to largely reconstruct the contamination dynamics over real and artificial WDNs. Various sources of the contamination are tested and we obtain high accuracy reconstruction using around 5-10% of the sample set. This general approach of compression and under-sampled recovery via neural networks can be applied to a wide range of networked infrastructures to enable digital twins.
△ Less
Submitted 11 February, 2020;
originally announced February 2020.
-
Adversarial recovery of agent rewards from latent spaces of the limit order book
Authors:
Jacobo Roa-Vicens,
Yuanbo Wang,
Virgile Mison,
Yarin Gal,
Ricardo Silva
Abstract:
Inverse reinforcement learning has proved its ability to explain state-action trajectories of expert agents by recovering their underlying reward functions in increasingly challenging environments. Recent advances in adversarial learning have allowed extending inverse RL to applications with non-stationary environment dynamics unknown to the agents, arbitrary structures of reward functions and imp…
▽ More
Inverse reinforcement learning has proved its ability to explain state-action trajectories of expert agents by recovering their underlying reward functions in increasingly challenging environments. Recent advances in adversarial learning have allowed extending inverse RL to applications with non-stationary environment dynamics unknown to the agents, arbitrary structures of reward functions and improved handling of the ambiguities inherent to the ill-posed nature of inverse RL. This is particularly relevant in real time applications on stochastic environments involving risk, like volatile financial markets. Moreover, recent work on simulation of complex environments enable learning algorithms to engage with real market data through simulations of its latent space representations, avoiding a costly exploration of the original environment. In this paper, we explore whether adversarial inverse RL algorithms can be adapted and trained within such latent space simulations from real market data, while maintaining their ability to recover agent rewards robust to variations in the underlying dynamics, and transfer them to new regimes of the original environment.
△ Less
Submitted 9 December, 2019;
originally announced December 2019.
-
Multidataset Independent Subspace Analysis with Application to Multimodal Fusion
Authors:
Rogers F. Silva,
Sergey M. Plis,
Tulay Adali,
Marios S. Pattichis,
Vince D. Calhoun
Abstract:
In the last two decades, unsupervised latent variable models---blind source separation (BSS) especially---have enjoyed a strong reputation for the interpretable features they produce. Seldom do these models combine the rich diversity of information available in multiple datasets. Multidatasets, on the other hand, yield joint solutions otherwise unavailable in isolation, with a potential for pivota…
▽ More
In the last two decades, unsupervised latent variable models---blind source separation (BSS) especially---have enjoyed a strong reputation for the interpretable features they produce. Seldom do these models combine the rich diversity of information available in multiple datasets. Multidatasets, on the other hand, yield joint solutions otherwise unavailable in isolation, with a potential for pivotal insights into complex systems.
To take advantage of the complex multidimensional subspace structures that capture underlying modes of shared and unique variability across and within datasets, we present a direct, principled approach to multidataset combination. We design a new method called multidataset independent subspace analysis (MISA) that leverages joint information from multiple heterogeneous datasets in a flexible and synergistic fashion.
Methodological innovations exploiting the Kotz distribution for subspace modeling in conjunction with a novel combinatorial optimization for evasion of local minima enable MISA to produce a robust generalization of independent component analysis (ICA), independent vector analysis (IVA), and independent subspace analysis (ISA) in a single unified model.
We highlight the utility of MISA for multimodal information fusion, including sample-poor regimes and low signal-to-noise ratio scenarios, promoting novel applications in both unimodal and multimodal brain imaging data.
△ Less
Submitted 10 November, 2019;
originally announced November 2019.
-
A simple study of the correlation effects in the superposition of waves of electric fields: the emergence of extreme events
Authors:
Roberto da Silva,
Sandra D. Prado
Abstract:
In this paper, we study the effects of correlated random phases in the intensity of a superposition of $N$ wave-fields. Our results suggest that regardless of whether the phase distribution is continuous or discrete if the phases are random correlated variables, we must observe a heavier tail distribution and the emergence of extreme events as the correlation between phases increases. We believe t…
▽ More
In this paper, we study the effects of correlated random phases in the intensity of a superposition of $N$ wave-fields. Our results suggest that regardless of whether the phase distribution is continuous or discrete if the phases are random correlated variables, we must observe a heavier tail distribution and the emergence of extreme events as the correlation between phases increases. We believe that such a simple method can be easily applied in other situations to show the existence of extreme statistical events in the context of nonlinear complex systems.
△ Less
Submitted 3 November, 2019;
originally announced November 2019.
-
Improved Differentially Private Decentralized Source Separation for fMRI Data
Authors:
Hafiz Imtiaz,
Jafar Mohammadi,
Rogers Silva,
Bradley Baker,
Sergey M. Plis,
Anand D. Sarwate,
Vince Calhoun
Abstract:
Blind source separation algorithms such as independent component analysis (ICA) are widely used in the analysis of neuroimaging data. In order to leverage larger sample sizes, different data holders/sites may wish to collaboratively learn feature representations. However, such datasets are often privacy-sensitive, precluding centralized analyses that pool the data at a single site. In this work, w…
▽ More
Blind source separation algorithms such as independent component analysis (ICA) are widely used in the analysis of neuroimaging data. In order to leverage larger sample sizes, different data holders/sites may wish to collaboratively learn feature representations. However, such datasets are often privacy-sensitive, precluding centralized analyses that pool the data at a single site. In this work, we propose a differentially private algorithm for performing ICA in a decentralized data setting. Conventional approaches to decentralized differentially private algorithms may introduce too much noise due to the typically small sample sizes at each site. We propose a novel protocol that uses correlated noise to remedy this problem. We show that our algorithm outperforms existing approaches on synthetic and real neuroimaging datasets and demonstrate that it can sometimes reach the same level of utility as the corresponding non-private algorithm. This indicates that it is possible to have meaningful utility while preserving privacy.
△ Less
Submitted 22 February, 2021; v1 submitted 28 October, 2019;
originally announced October 2019.
-
Counterfactual Distribution Regression for Structured Inference
Authors:
Nicolo Colombo,
Ricardo Silva,
Soong M Kang,
Arthur Gretton
Abstract:
We consider problems in which a system receives external \emph{perturbations} from time to time. For instance, the system can be a train network in which particular lines are repeatedly disrupted without warning, having an effect on passenger behavior. The goal is to predict changes in the behavior of the system at particular points of interest, such as passenger traffic around stations at the aff…
▽ More
We consider problems in which a system receives external \emph{perturbations} from time to time. For instance, the system can be a train network in which particular lines are repeatedly disrupted without warning, having an effect on passenger behavior. The goal is to predict changes in the behavior of the system at particular points of interest, such as passenger traffic around stations at the affected rails. We assume that the data available provides records of the system functioning at its "natural regime" (e.g., the train network without disruptions) and data on cases where perturbations took place. The inference problem is how information concerning perturbations, with particular covariates such as location and time, can be generalized to predict the effect of novel perturbations. We approach this problem from the point of view of a map** from the counterfactual distribution of the system behavior without disruptions to the distribution of the disrupted system. A variant on \emph{distribution regression} is developed for this setup.
△ Less
Submitted 20 August, 2019;
originally announced August 2019.
-
The Sensitivity of Counterfactual Fairness to Unmeasured Confounding
Authors:
Niki Kilbertus,
Philip J. Ball,
Matt J. Kusner,
Adrian Weller,
Ricardo Silva
Abstract:
Causal approaches to fairness have seen substantial recent interest, both from the machine learning community and from wider parties interested in ethical prediction algorithms. In no small part, this has been due to the fact that causal models allow one to simultaneously leverage data and expert knowledge to remove discriminatory effects from predictions. However, one of the primary assumptions i…
▽ More
Causal approaches to fairness have seen substantial recent interest, both from the machine learning community and from wider parties interested in ethical prediction algorithms. In no small part, this has been due to the fact that causal models allow one to simultaneously leverage data and expert knowledge to remove discriminatory effects from predictions. However, one of the primary assumptions in causal modeling is that you know the causal graph. This introduces a new opportunity for bias, caused by misspecifying the causal model. One common way for misspecification to occur is via unmeasured confounding: the true causal effect between variables is partially described by unobserved quantities. In this work we design tools to assess the sensitivity of fairness measures to this confounding for the popular class of non-linear additive noise models (ANMs). Specifically, we give a procedure for computing the maximum difference between two counterfactually fair predictors, where one has become biased due to confounding. For the case of bivariate confounding our technique can be swiftly computed via a sequence of closed-form updates. For multivariate confounding we give an algorithm that can be efficiently solved via automatic differentiation. We demonstrate our new sensitivity analysis tools in real-world fairness scenarios to assess the bias arising from confounding.
△ Less
Submitted 1 July, 2019;
originally announced July 2019.
-
Towards Inverse Reinforcement Learning for Limit Order Book Dynamics
Authors:
Jacobo Roa-Vicens,
Cyrine Chtourou,
Angelos Filos,
Francisco Rullan,
Yarin Gal,
Ricardo Silva
Abstract:
Multi-agent learning is a promising method to simulate aggregate competitive behaviour in finance. Learning expert agents' reward functions through their external demonstrations is hence particularly relevant for subsequent design of realistic agent-based simulations. Inverse Reinforcement Learning (IRL) aims at acquiring such reward functions through inference, allowing to generalize the resultin…
▽ More
Multi-agent learning is a promising method to simulate aggregate competitive behaviour in finance. Learning expert agents' reward functions through their external demonstrations is hence particularly relevant for subsequent design of realistic agent-based simulations. Inverse Reinforcement Learning (IRL) aims at acquiring such reward functions through inference, allowing to generalize the resulting policy to states not observed in the past. This paper investigates whether IRL can infer such rewards from agents within real financial stochastic environments: limit order books (LOB). We introduce a simple one-level LOB, where the interactions of a number of stochastic agents and an expert trading agent are modelled as a Markov decision process. We consider two cases for the expert's reward: either a simple linear function of state features; or a complex, more realistic non-linear function. Given the expert agent's demonstrations, we attempt to discover their strategy by modelling their latent reward function using linear and Gaussian process (GP) regressors from previous literature, and our own approach through Bayesian neural networks (BNN). While the three methods can learn the linear case, only the GP-based and our proposed BNN methods are able to discover the non-linear reward case. Our BNN IRL algorithm outperforms the other two approaches as the number of samples increases. These results illustrate that complex behaviours, induced by non-linear reward functions amid agent-based stochastic scenarios, can be deduced through inference, encouraging the use of inverse reinforcement learning for opponent-modelling in multi-agent systems.
△ Less
Submitted 11 June, 2019;
originally announced June 2019.
-
Netherlands Dataset: A New Public Dataset for Machine Learning in Seismic Interpretation
Authors:
Reinaldo Mozart Silva,
Lais Baroni,
Rodrigo S. Ferreira,
Daniel Civitarese,
Daniela Szwarcman,
Emilio Vital Brazil
Abstract:
Machine learning and, more specifically, deep learning algorithms have seen remarkable growth in their popularity and usefulness in the last years. This is arguably due to three main factors: powerful computers, new techniques to train deeper networks and larger datasets. Although the first two are readily available in modern computers and ML libraries, the last one remains a challenge for many do…
▽ More
Machine learning and, more specifically, deep learning algorithms have seen remarkable growth in their popularity and usefulness in the last years. This is arguably due to three main factors: powerful computers, new techniques to train deeper networks and larger datasets. Although the first two are readily available in modern computers and ML libraries, the last one remains a challenge for many domains. It is a fact that big data is a reality in almost all fields nowadays, and geosciences are not an exception. However, to achieve the success of general-purpose applications such as ImageNet - for which there are +14 million labeled images for 1000 target classes - we not only need more data, we need more high-quality labeled data. When it comes to the Oil&Gas industry, confidentiality issues hamper even more the sharing of datasets. In this work, we present the Netherlands interpretation dataset, a contribution to the development of machine learning in seismic interpretation. The Netherlands F3 dataset acquisition was carried out in the North Sea, Netherlands offshore. The data is publicly available and contains pos-stack data, 8 horizons and well logs of 4 wells. For the purposes of our machine learning tasks, the original dataset was reinterpreted, generating 9 horizons separating different seismic facies intervals. The interpreted horizons were used to generate approximatelly 190,000 labeled images for inlines and crosslines. Finally, we present two deep learning applications in which the proposed dataset was employed and produced compelling results.
△ Less
Submitted 26 March, 2019;
originally announced April 2019.
-
Neural Likelihoods via Cumulative Distribution Functions
Authors:
Pawel Chilinski,
Ricardo Silva
Abstract:
We leverage neural networks as universal approximators of monotonic functions to build a parameterization of conditional cumulative distribution functions (CDFs). By the application of automatic differentiation with respect to response variables and then to parameters of this CDF representation, we are able to build black box CDF and density estimators. A suite of families is introduced as alterna…
▽ More
We leverage neural networks as universal approximators of monotonic functions to build a parameterization of conditional cumulative distribution functions (CDFs). By the application of automatic differentiation with respect to response variables and then to parameters of this CDF representation, we are able to build black box CDF and density estimators. A suite of families is introduced as alternative constructions for the multivariate case. At one extreme, the simplest construction is a competitive density estimator against state-of-the-art deep learning methods, although it does not provide an easily computable representation of multivariate CDFs. At the other extreme, we have a flexible construction from which multivariate CDF evaluations and marginalizations can be obtained by a simple forward pass in a deep neural net, but where the computation of the likelihood scales exponentially with dimensionality. Alternatives in between the extremes are discussed. We evaluate the different representations empirically on a variety of tasks involving tail area probabilities, tail dependence and (partial) density estimation.
△ Less
Submitted 6 June, 2020; v1 submitted 2 November, 2018;
originally announced November 2018.
-
Bayesian Semi-supervised Learning with Graph Gaussian Processes
Authors:
Yin Cheng Ng,
Nicolo Colombo,
Ricardo Silva
Abstract:
We propose a data-efficient Gaussian process-based Bayesian approach to the semi-supervised learning problem on graphs. The proposed model shows extremely competitive performance when compared to the state-of-the-art graph neural networks on semi-supervised learning benchmark experiments, and outperforms the neural networks in active learning experiments where labels are scarce. Furthermore, the m…
▽ More
We propose a data-efficient Gaussian process-based Bayesian approach to the semi-supervised learning problem on graphs. The proposed model shows extremely competitive performance when compared to the state-of-the-art graph neural networks on semi-supervised learning benchmark experiments, and outperforms the neural networks in active learning experiments where labels are scarce. Furthermore, the model does not require a validation data set for early stop** to control over-fitting. Our model can be viewed as an instance of empirical distribution regression weighted locally by network connectivity. We further motivate the intuitive construction of the model with a Bayesian linear model interpretation where the node features are filtered by an operator related to the graph Laplacian. The method can be easily implemented by adapting off-the-shelf scalable variational inference algorithms for Gaussian processes.
△ Less
Submitted 12 October, 2018; v1 submitted 12 September, 2018;
originally announced September 2018.
-
Causal Interventions for Fairness
Authors:
Matt J. Kusner,
Chris Russell,
Joshua R. Loftus,
Ricardo Silva
Abstract:
Most approaches in algorithmic fairness constrain machine learning methods so the resulting predictions satisfy one of several intuitive notions of fairness. While this may help private companies comply with non-discrimination laws or avoid negative publicity, we believe it is often too little, too late. By the time the training data is collected, individuals in disadvantaged groups have already s…
▽ More
Most approaches in algorithmic fairness constrain machine learning methods so the resulting predictions satisfy one of several intuitive notions of fairness. While this may help private companies comply with non-discrimination laws or avoid negative publicity, we believe it is often too little, too late. By the time the training data is collected, individuals in disadvantaged groups have already suffered from discrimination and lost opportunities due to factors out of their control. In the present work we focus instead on interventions such as a new public policy, and in particular, how to maximize their positive effects while improving the fairness of the overall system. We use causal methods to model the effects of interventions, allowing for potential interference--each individual's outcome may depend on who else receives the intervention. We demonstrate this with an example of allocating a budget of teaching resources using a dataset of schools in New York City.
△ Less
Submitted 6 June, 2018;
originally announced June 2018.
-
Alpha-Beta Divergence For Variational Inference
Authors:
Jean-Baptiste Regli,
Ricardo Silva
Abstract:
This paper introduces a variational approximation framework using direct optimization of what is known as the {\it scale invariant Alpha-Beta divergence} (sAB divergence). This new objective encompasses most variational objectives that use the Kullback-Leibler, the R{é}nyi or the gamma divergences. It also gives access to objective functions never exploited before in the context of variational inf…
▽ More
This paper introduces a variational approximation framework using direct optimization of what is known as the {\it scale invariant Alpha-Beta divergence} (sAB divergence). This new objective encompasses most variational objectives that use the Kullback-Leibler, the R{é}nyi or the gamma divergences. It also gives access to objective functions never exploited before in the context of variational inference. This is achieved via two easy to interpret control parameters, which allow for a smooth interpolation over the divergence space while trading-off properties such as mass-covering of a target distribution and robustness to outliers in the data. Furthermore, the sAB variational objective can be optimized directly by repurposing existing methods for Monte Carlo computation of complex variational objectives, leading to estimates of the divergence instead of variational lower bounds. We show the advantages of this objective on Bayesian models for regression problems.
△ Less
Submitted 20 May, 2018; v1 submitted 2 May, 2018;
originally announced May 2018.
-
Modeling goal chances in soccer: a Bayesian inference approach
Authors:
Gavin A. Whitaker,
Ricardo Silva,
Daniel Edwards
Abstract:
We consider the task of determining the number of chances a soccer team creates, along with the composite nature of each chance-the players involved and the locations on the pitch of the assist and the chance. We propose an interpretable Bayesian inference approach and implement a Poisson model to capture chance occurrences, from which we infer team abilities. We then use a Gaussian mixture model…
▽ More
We consider the task of determining the number of chances a soccer team creates, along with the composite nature of each chance-the players involved and the locations on the pitch of the assist and the chance. We propose an interpretable Bayesian inference approach and implement a Poisson model to capture chance occurrences, from which we infer team abilities. We then use a Gaussian mixture model to capture the areas on the pitch a player makes an assist/takes a chance. This approach allows the visualization of differences between players in the way they approach attacking play (making assists/taking chances). We apply the resulting scheme to the 2016/2017 English Premier League, capturing team abilities to create chances, before highlighting key areas where players have most impact.
△ Less
Submitted 23 February, 2018;
originally announced February 2018.
-
Two-way sparsity for time-varying networks, with applications in genomics
Authors:
Thomas E. Bartlett,
Ioannis Kosmidis,
Ricardo Silva
Abstract:
We propose a novel way of modelling time-varying networks, by inducing two-way sparsity on local models of node connectivity. This two-way sparsity separately promotes sparsity across time and sparsity across variables (within time). Separation of these two types of sparsity is achieved through a novel prior structure, which draws on ideas from the Bayesian lasso and from copula modelling. We prov…
▽ More
We propose a novel way of modelling time-varying networks, by inducing two-way sparsity on local models of node connectivity. This two-way sparsity separately promotes sparsity across time and sparsity across variables (within time). Separation of these two types of sparsity is achieved through a novel prior structure, which draws on ideas from the Bayesian lasso and from copula modelling. We provide an efficient implementation of the proposed model via a Gibbs sampler, and we apply the model to data from neural development. In doing so, we demonstrate that the proposed model is able to identify changes in genomic network structure that match current biological knowledge. Such changes in genomic network structure can then be used by neuro-biologists to identify potential targets for further experimental investigation.
△ Less
Submitted 18 November, 2020; v1 submitted 22 February, 2018;
originally announced February 2018.
-
A Dynamic Edge Exchangeable Model for Sparse Temporal Networks
Authors:
Yin Cheng Ng,
Ricardo Silva
Abstract:
We propose a dynamic edge exchangeable network model that can capture sparse connections observed in real temporal networks, in contrast to existing models which are dense. The model achieved superior link prediction accuracy on multiple data sets when compared to a dynamic variant of the blockmodel, and is able to extract interpretable time-varying community structures from the data. In addition…
▽ More
We propose a dynamic edge exchangeable network model that can capture sparse connections observed in real temporal networks, in contrast to existing models which are dense. The model achieved superior link prediction accuracy on multiple data sets when compared to a dynamic variant of the blockmodel, and is able to extract interpretable time-varying community structures from the data. In addition to sparsity, the model accounts for the effect of social influence on vertices' future behaviours. Compared to the dynamic blockmodels, our model has a smaller latent space. The compact latent space requires a smaller number of parameters to be estimated in variational inference and results in a computationally friendly inference algorithm.
△ Less
Submitted 11 October, 2017;
originally announced October 2017.
-
Comparing reverse complementary genomic words based on their distance distributions and frequencies
Authors:
Ana Helena Tavares,
Jakob Raymaekers,
Peter Rousseeuw,
Raquel M. Silva,
Carlos A. C. Bastos,
Armando Pinho,
Paula Brito,
Vera Afreixo
Abstract:
In this work we study reverse complementary genomic word pairs in the human DNA, by comparing both the distance distribution and the frequency of a word to those of its reverse complement. Several measures of dissimilarity between distance distributions are considered, and it is found that the peak dissimilarity works best in this setting. We report the existence of reverse complementary word pair…
▽ More
In this work we study reverse complementary genomic word pairs in the human DNA, by comparing both the distance distribution and the frequency of a word to those of its reverse complement. Several measures of dissimilarity between distance distributions are considered, and it is found that the peak dissimilarity works best in this setting. We report the existence of reverse complementary word pairs with very dissimilar distance distributions, as well as word pairs with very similar distance distributions even when both distributions are irregular and contain strong peaks. The association between distribution dissimilarity and frequency discrepancy is explored also, and it is speculated that symmetric pairs combining low and high values of each measure may uncover features of interest. Taken together, our results suggest that some asymmetries in the human genome go far beyond Chargaff's rules. This study uses both the complete human genome and its repeat-masked version.
△ Less
Submitted 6 October, 2017;
originally announced October 2017.
-
A Bayesian inference approach for determining player abilities in football
Authors:
Gavin A. Whitaker,
Ricardo Silva,
Daniel Edwards,
Ioannis Kosmidis
Abstract:
We consider the task of determining a football player's ability for a given event type, for example, scoring a goal. We propose an interpretable Bayesian model which is fit using variational inference methods. We implement a Poisson model to capture occurrences of event types, from which we infer player abilities. Our approach also allows the visualisation of differences between players, for a spe…
▽ More
We consider the task of determining a football player's ability for a given event type, for example, scoring a goal. We propose an interpretable Bayesian model which is fit using variational inference methods. We implement a Poisson model to capture occurrences of event types, from which we infer player abilities. Our approach also allows the visualisation of differences between players, for a specific ability, through the marginal posterior variational densities. We then use these inferred player abilities to extend the Bayesian hierarchical model of Baio and Blangiardo (2010) which captures a team's scoring rate (the rate at which they score goals). We apply the resulting scheme to the English Premier League, capturing player abilities over the 2013/2014 season, before using output from the hierarchical model to predict whether over or under 2.5 goals will be scored in a given game in the 2014/2015 season. This validates our model as a way of providing insights into team formation and the individual success of sports teams.
△ Less
Submitted 23 September, 2020; v1 submitted 25 September, 2017;
originally announced October 2017.
-
Counterfactual Fairness
Authors:
Matt J. Kusner,
Joshua R. Loftus,
Chris Russell,
Ricardo Silva
Abstract:
Machine learning can impact people with legal or ethical consequences when it is used to automate decisions in areas such as insurance, lending, hiring, and predictive policing. In many of these scenarios, previous decisions have been made that are unfairly biased against certain subpopulations, for example those of a particular race, gender, or sexual orientation. Since this past data may be bias…
▽ More
Machine learning can impact people with legal or ethical consequences when it is used to automate decisions in areas such as insurance, lending, hiring, and predictive policing. In many of these scenarios, previous decisions have been made that are unfairly biased against certain subpopulations, for example those of a particular race, gender, or sexual orientation. Since this past data may be biased, machine learning predictors must account for this to avoid perpetuating or creating discriminatory practices. In this paper, we develop a framework for modeling fairness using tools from causal inference. Our definition of counterfactual fairness captures the intuition that a decision is fair towards an individual if it is the same in (a) the actual world and (b) a counterfactual world where the individual belonged to a different demographic group. We demonstrate our framework on a real-world problem of fair prediction of success in law school.
△ Less
Submitted 8 March, 2018; v1 submitted 20 March, 2017;
originally announced March 2017.