Search | arXiv e-print repository

Structured Learning of Compositional Sequential Interventions

Authors: Jialin Yu, Andreas Koukorinis, Nicolò Colombo, Yuchen Zhu, Ricardo Silva

Abstract: We consider sequential treatment regimes where each unit is exposed to combinations of interventions over time. When interventions are described by qualitative labels, such as ``close schools for a month due to a pandemic'' or ``promote this podcast to this user during this week'', it is unclear which appropriate structural assumptions allow us to generalize behavioral predictions to previously un… ▽ More We consider sequential treatment regimes where each unit is exposed to combinations of interventions over time. When interventions are described by qualitative labels, such as ``close schools for a month due to a pandemic'' or ``promote this podcast to this user during this week'', it is unclear which appropriate structural assumptions allow us to generalize behavioral predictions to previously unseen combinatorial sequences. Standard black-box approaches map** sequences of categorical variables to outputs are applicable, but they rely on poorly understood assumptions on how reliable generalization can be obtained, and may underperform under sparse sequences, temporal variability, and large action spaces. To approach that, we pose an explicit model for \emph{composition}, that is, how the effect of sequential interventions can be isolated into modules, clarifying which data conditions allow for the identification of their combined effect at different units and time steps. We show the identification properties of our compositional model, inspired by advances in causal matrix factorization methods but focusing on predictive models for novel compositions of interventions instead of matrix completion tasks and causal effect estimation. We compare our approach to flexible but generic black-box models to illustrate how structure aids prediction in sparse data conditions. △ Less

Submitted 9 June, 2024; originally announced June 2024.

arXiv:2404.04446 [pdf, other]

Bounding Causal Effects with Leaky Instruments

Authors: David S. Watson, Jordan Penn, Lee M. Gunderson, Gecia Bravo-Hermsdorff, Afsaneh Mastouri, Ricardo Silva

Abstract: Instrumental variables (IVs) are a popular and powerful tool for estimating causal effects in the presence of unobserved confounding. However, classical approaches rely on strong assumptions such as the $\textit{exclusion criterion}$, which states that instrumental effects must be entirely mediated by treatments. This assumption often fails in practice. When IV methods are improperly applied to da… ▽ More Instrumental variables (IVs) are a popular and powerful tool for estimating causal effects in the presence of unobserved confounding. However, classical approaches rely on strong assumptions such as the $\textit{exclusion criterion}$, which states that instrumental effects must be entirely mediated by treatments. This assumption often fails in practice. When IV methods are improperly applied to data that do not meet the exclusion criterion, estimated causal effects may be badly biased. In this work, we propose a novel solution that provides $\textit{partial}$ identification in linear systems given a set of $\textit{leaky instruments}$, which are allowed to violate the exclusion criterion to some limited degree. We derive a convex optimization objective that provides provably sharp bounds on the average treatment effect under some common forms of information leakage, and implement inference procedures to quantify the uncertainty of resulting estimates. We demonstrate our method in a set of experiments with simulated data, where it performs favorably against the state of the art. An accompanying $\texttt{R}$ package, $\texttt{leakyIV}$, is available from $\texttt{CRAN}$. △ Less

Submitted 8 May, 2024; v1 submitted 5 April, 2024; originally announced April 2024.

Comments: Camera ready version (UAI 2024)

Journal ref: 40th Conference on Uncertainty in Artificial Intelligence (UAI 2024)

arXiv:2402.02663 [pdf, other]

Counterfactual Fairness Is Not Demographic Parity, and Other Observations

Authors: Ricardo Silva

Abstract: Blanket statements of equivalence between causal concepts and purely probabilistic concepts should be approached with care. In this short note, I examine a recent claim that counterfactual fairness is equivalent to demographic parity. The claim fails to hold up upon closer examination. I will take the opportunity to address some broader misunderstandings about counterfactual fairness. Blanket statements of equivalence between causal concepts and purely probabilistic concepts should be approached with care. In this short note, I examine a recent claim that counterfactual fairness is equivalent to demographic parity. The claim fails to hold up upon closer examination. I will take the opportunity to address some broader misunderstandings about counterfactual fairness. △ Less

Submitted 4 February, 2024; originally announced February 2024.

Comments: 17 pages, 2 figures

arXiv:2306.04027 [pdf, other]

Intervention Generalization: A View from Factor Graph Models

Authors: Gecia Bravo-Hermsdorff, David S. Watson, Jialin Yu, Jakob Zeitler, Ricardo Silva

Abstract: One of the goals of causal inference is to generalize from past experiments and observational data to novel conditions. While it is in principle possible to eventually learn a map** from a novel experimental condition to an outcome of interest, provided a sufficient variety of experiments is available in the training data, co** with a large combinatorial space of possible interventions is hard… ▽ More One of the goals of causal inference is to generalize from past experiments and observational data to novel conditions. While it is in principle possible to eventually learn a map** from a novel experimental condition to an outcome of interest, provided a sufficient variety of experiments is available in the training data, co** with a large combinatorial space of possible interventions is hard. Under a typical sparse experimental design, this map** is ill-posed without relying on heavy regularization or prior distributions. Such assumptions may or may not be reliable, and can be hard to defend or test. In this paper, we take a close look at how to warrant a leap from past experiments to novel conditions based on minimal assumptions about the factorization of the distribution of the manipulated system, communicated in the well-understood language of factor graph models. A postulated $\textit{interventional factor model}$ (IFM) may not always be informative, but it conveniently abstracts away a need for explicitly modeling unmeasured confounding and feedback mechanisms, leading to directly testable claims. Given an IFM and datasets from a collection of experimental regimes, we derive conditions for identifiability of the expected outcomes of new regimes never observed in these training data. We implement our framework using several efficient algorithms, and apply them on a range of semi-synthetic experiments. △ Less

Submitted 8 November, 2023; v1 submitted 6 June, 2023; originally announced June 2023.

Comments: Camera ready version (NeurIPS 2023)

arXiv:2301.04578 [pdf, other]

Precision Dose-finding Cancer Clinical Trials in the Setting of Broadened Eligibility

Authors: Rebecca B. Silva, Bin Cheng, Richard D. Carvajal, Shing M. Lee

Abstract: Broadening eligibility criteria in cancer trials has been advocated to represent the true patient population more accurately. While the advantages are clear in terms of generalizability and recruitment, novel dose-finding designs are needed to ensure patient safety. These designs should be able to recommend precise doses for subpopulations if such subpopulations with different toxicity profiles ex… ▽ More Broadening eligibility criteria in cancer trials has been advocated to represent the true patient population more accurately. While the advantages are clear in terms of generalizability and recruitment, novel dose-finding designs are needed to ensure patient safety. These designs should be able to recommend precise doses for subpopulations if such subpopulations with different toxicity profiles exist. While dose-finding designs accounting for patient heterogeneity have been proposed, all existing methods assume the source of heterogeneity is known and thus pre-specify the subpopulations or only allow inclusion of a few patient characteristics. We propose a precision dose-finding design to address the setting of unknown patient heterogeneity in phase I cancer clinical trials where eligibility is expanded, and multiple eligibility criteria could potentially lead to different optimal doses for patient subgroups. The design offers a two-in-one approach to dose-finding by simultaneously selecting patient criteria that differentiate the maximum tolerated dose (MTD) and recommending the subpopulation-specific MTD if needed, using marginal models to sequentially incorporate patient covariates. Our simulation study compares the proposed design to the naive approach of assuming patient homogeneity and our design recommends multiple doses when heterogeneity exists and a single dose when no heterogeneity exists. The proposed dose-finding design addresses the challenges of broadening eligibility criteria in cancer trials and the desire for a more precise dose in the context of early phase clinical trials. △ Less

Submitted 11 January, 2023; originally announced January 2023.

arXiv:2212.03973 [pdf, other]

doi 10.1038/s41598-023-33003-7

Inferring urban polycentricity from the variability in human mobility patterns

Authors: Carmen Cabrera-Arnau, Chen Zhong, Michael Batty, Ricardo Silva, Soong Moon Kang

Abstract: The polycentric city model has gained popularity in spatial planning policy, since it is believed to overcome some of the problems often present in monocentric metropolises, ranging from congestion to difficult accessibility to jobs and services. However, the concept 'polycentric city' has a fuzzy definition and as a result, the extent to which a city is polycentric cannot be easily determined. He… ▽ More The polycentric city model has gained popularity in spatial planning policy, since it is believed to overcome some of the problems often present in monocentric metropolises, ranging from congestion to difficult accessibility to jobs and services. However, the concept 'polycentric city' has a fuzzy definition and as a result, the extent to which a city is polycentric cannot be easily determined. Here, we leverage the fine spatio-temporal resolution of smart travel card data to infer urban polycentricity by examining how a city departs from a well-defined monocentric model. In particular, we analyse the human movements that arise as a result of sophisticated forms of urban structure by introducing a novel probabilistic approach which captures the complexity of these human movements. We focus on London (UK) and Seoul (South Korea) as our two case studies, and we specifically find evidence that London displays a higher degree of monocentricity than Seoul, suggesting that Seoul is likely to be more polycentric than London. △ Less

Submitted 7 December, 2022; originally announced December 2022.

Comments: 15 pages, 5 figures

Journal ref: Sci. Rep. 13 (2023) 5751

arXiv:2211.01938 [pdf, other]

A family of mixture models for beta valued DNA methylation data

Authors: Koyel Majumdar, Romina Silva, Antoinette Sabrina Perry, Ronald William Watson, Andrea Rau, Florence Jaffrezic, Thomas Brendan Murphy, Isobel Claire Gormley

Abstract: As hypermethylation of promoter cytosine-guanine dinucleotide (CpG) islands has been shown to silence tumour suppressor genes, identifying differentially methylated CpG sites between different samples can assist in understanding disease. Differentially methylated CpG sites (DMCs) can be identified using moderated t-tests or nonparametric tests, but this typically requires the use of data transform… ▽ More As hypermethylation of promoter cytosine-guanine dinucleotide (CpG) islands has been shown to silence tumour suppressor genes, identifying differentially methylated CpG sites between different samples can assist in understanding disease. Differentially methylated CpG sites (DMCs) can be identified using moderated t-tests or nonparametric tests, but this typically requires the use of data transformations due to a lack of appropriate statistical methods able to adequately account for the bounded nature of DNA methylation data. We propose a family of beta mixture models (BMMs) which use a model-based approach to cluster CpG sites given their original beta-valued methylation data, with no need for transformations. The BMMs allow (i) objective inference of methylation state thresholds and (ii) identification of DMCs between different sample types. The BMMs employ different parameter constraints facilitating application to different study settings. Parameter estimation proceeds via an expectation-maximisation algorithm, with a novel approximation in the maximization step providing tractability and computational feasibility. Performance of BMMs is assessed through thorough simulation studies, and the BMMs are used to analyse a prostate cancer dataset. The BMMs objectively infer intuitive and biologically interpretable methylation state thresholds, and identify DMCs that are related to genes implicated in carcinogenesis and involved in cancer related pathways. An R package betaclust facilitates widespread use of BMMs. △ Less

Submitted 18 March, 2024; v1 submitted 3 November, 2022; originally announced November 2022.

Comments: 27 pages, 4 figures

arXiv:2208.01712 [pdf, other]

No Pattern, No Recognition: a Survey about Reproducibility and Distortion Issues of Text Clustering and Topic Modeling

Authors: Marília Costa Rosendo Silva, Felipe Alves Siqueira, João Pedro Mantovani Tarrega, João Vitor Pataca Beinotti, Augusto Sousa Nunes, Miguel de Mattos Gardini, Vinícius Adolfo Pereira da Silva, Nádia Félix Felipe da Silva, André Carlos Ponce de Leon Ferreira de Carvalho

Abstract: Extracting knowledge from unlabeled texts using machine learning algorithms can be complex. Document categorization and information retrieval are two applications that may benefit from unsupervised learning (e.g., text clustering and topic modeling), including exploratory data analysis. However, the unsupervised learning paradigm poses reproducibility issues. The initialization can lead to variabi… ▽ More Extracting knowledge from unlabeled texts using machine learning algorithms can be complex. Document categorization and information retrieval are two applications that may benefit from unsupervised learning (e.g., text clustering and topic modeling), including exploratory data analysis. However, the unsupervised learning paradigm poses reproducibility issues. The initialization can lead to variability depending on the machine learning algorithm. Furthermore, the distortions can be misleading when regarding cluster geometry. Amongst the causes, the presence of outliers and anomalies can be a determining factor. Despite the relevance of initialization and outlier issues for text clustering and topic modeling, the authors did not find an in-depth analysis of them. This survey provides a systematic literature review (2011-2022) of these subareas and proposes a common terminology since similar procedures have different terms. The authors describe research opportunities, trends, and open issues. The appendices summarize the theoretical background of the text vectorization, the factorization, and the clustering algorithms that are directly or indirectly related to the reviewed works. △ Less

Submitted 2 August, 2022; originally announced August 2022.

ACM Class: I.2; I.2.7; I.5.3

arXiv:2206.15475 [pdf, other]

Causal Machine Learning: A Survey and Open Problems

Authors: Jean Kaddour, Aengus Lynch, Qi Liu, Matt J. Kusner, Ricardo Silva

Abstract: Causal Machine Learning (CausalML) is an umbrella term for machine learning methods that formalize the data-generation process as a structural causal model (SCM). This perspective enables us to reason about the effects of changes to this process (interventions) and what would have happened in hindsight (counterfactuals). We categorize work in CausalML into five groups according to the problems the… ▽ More Causal Machine Learning (CausalML) is an umbrella term for machine learning methods that formalize the data-generation process as a structural causal model (SCM). This perspective enables us to reason about the effects of changes to this process (interventions) and what would have happened in hindsight (counterfactuals). We categorize work in CausalML into five groups according to the problems they address: (1) causal supervised learning, (2) causal generative modeling, (3) causal explanations, (4) causal fairness, and (5) causal reinforcement learning. We systematically compare the methods in each category and point out open problems. Further, we review data-modality-specific applications in computer vision, natural language processing, and graph representation learning. Finally, we provide an overview of causal benchmarks and a critical discussion of the state of this nascent field, including recommendations for future work. △ Less

Submitted 21 July, 2022; v1 submitted 30 June, 2022; originally announced June 2022.

Comments: 191 pages. v02. Work in progress. Feedback and comments are highly appreciated!

arXiv:2206.09186 [pdf, other]

Causal Inference with Treatment Measurement Error: A Nonparametric Instrumental Variable Approach

Authors: Yuchen Zhu, Limor Gultchin, Arthur Gretton, Matt Kusner, Ricardo Silva

Abstract: We propose a kernel-based nonparametric estimator for the causal effect when the cause is corrupted by error. We do so by generalizing estimation in the instrumental variable setting. Despite significant work on regression with measurement error, additionally handling unobserved confounding in the continuous setting is non-trivial: we have seen little prior work. As a by-product of our investigati… ▽ More We propose a kernel-based nonparametric estimator for the causal effect when the cause is corrupted by error. We do so by generalizing estimation in the instrumental variable setting. Despite significant work on regression with measurement error, additionally handling unobserved confounding in the continuous setting is non-trivial: we have seen little prior work. As a by-product of our investigation, we clarify a connection between mean embeddings and characteristic functions, and how learning one simultaneously allows one to learn the other. This opens the way for kernel method research to leverage existing results in characteristic function estimation. Finally, we empirically show that our proposed method, MEKIV, improves over baselines and is robust under changes in the strength of measurement error and to the type of error distributions. △ Less

Submitted 18 June, 2022; originally announced June 2022.

Comments: UAI 2022 (Oral)

arXiv:2206.00736 [pdf, other]

Modified Galton-Watson processes with immigration under an alternative offspring mechanism

Authors: Wagner Barreto-Souza, Sokol Ndreca, Rodrigo B. Silva, Roger W. C. Silva

Abstract: We propose a novel class of count time series models alternative to the classic Galton-Watson process with immigration (GWI) and Bernoulli offspring. A new offspring mechanism is developed and its properties are explored. This novel mechanism, called geometric thinning operator, is used to define a class of modified GWI (MGWI) processes, which induces a certain non-linearity to the models. We show… ▽ More We propose a novel class of count time series models alternative to the classic Galton-Watson process with immigration (GWI) and Bernoulli offspring. A new offspring mechanism is developed and its properties are explored. This novel mechanism, called geometric thinning operator, is used to define a class of modified GWI (MGWI) processes, which induces a certain non-linearity to the models. We show that this non-linearity can produce better results in terms of prediction when compared to the linear case commonly considered in the literature. We explore both stationary and non-stationary versions of our MGWI processes. Inference on the model parameters is addressed and the finite-sample behavior of the estimators investigated through Monte Carlo simulations. Two real data sets are analyzed to illustrate the stationary and non-stationary cases and the gain of the non-linearity induced for our method over the existing linear methods. A generalization of the geometric thinning operator and an associated MGWI process are also proposed and motivated for dealing with zero-inflated or zero-deflated count time series data. △ Less

Submitted 1 June, 2022; originally announced June 2022.

Comments: Paper submitted for publication

arXiv:2205.07338 [pdf, other]

Reductive MDPs: A Perspective Beyond Temporal Horizons

Authors: Thomas Spooner, Rui Silva, Joshua Lockhart, Jason Long, Vacslav Glukhov

Abstract: Solving general Markov decision processes (MDPs) is a computationally hard problem. Solving finite-horizon MDPs, on the other hand, is highly tractable with well known polynomial-time algorithms. What drives this extreme disparity, and do problems exist that lie between these diametrically opposed complexities? In this paper we identify and analyse a sub-class of stochastic shortest path problems… ▽ More Solving general Markov decision processes (MDPs) is a computationally hard problem. Solving finite-horizon MDPs, on the other hand, is highly tractable with well known polynomial-time algorithms. What drives this extreme disparity, and do problems exist that lie between these diametrically opposed complexities? In this paper we identify and analyse a sub-class of stochastic shortest path problems (SSPs) for general state-action spaces whose dynamics satisfy a particular drift condition. This construction generalises the traditional, temporal notion of a horizon via decreasing reachability: a property called reductivity. It is shown that optimal policies can be recovered in polynomial-time for reductive SSPs -- via an extension of backwards induction -- with an efficient analogue in reductive MDPs. The practical considerations of the proposed approach are discussed, and numerical verification provided on a canonical optimal liquidation problem. △ Less

Submitted 15 May, 2022; originally announced May 2022.

Comments: 15 pages, 10 figures, 1 algorithm

arXiv:2205.05715 [pdf, other]

Causal discovery under a confounder blanket

Authors: David S. Watson, Ricardo Silva

Abstract: Inferring causal relationships from observational data is rarely straightforward, but the problem is especially difficult in high dimensions. For these applications, causal discovery algorithms typically require parametric restrictions or extreme sparsity constraints. We relax these assumptions and focus on an important but more specialized problem, namely recovering the causal order among a subgr… ▽ More Inferring causal relationships from observational data is rarely straightforward, but the problem is especially difficult in high dimensions. For these applications, causal discovery algorithms typically require parametric restrictions or extreme sparsity constraints. We relax these assumptions and focus on an important but more specialized problem, namely recovering the causal order among a subgraph of variables known to descend from some (possibly large) set of confounding covariates, i.e. a $\textit{confounder blanket}$. This is useful in many settings, for example when studying a dynamic biomolecular subsystem with genetic data providing background information. Under a structural assumption called the $\textit{confounder blanket principle}$, which we argue is essential for tractable causal discovery in high dimensions, our method accommodates graphs of low or high sparsity while maintaining polynomial time complexity. We present a structure learning algorithm that is provably sound and complete with respect to a so-called $\textit{lazy oracle}$. We design inference procedures with finite sample error control for linear and nonlinear systems, and demonstrate our approach on a range of simulated and real-world datasets. An accompanying $\texttt{R}$ package, $\texttt{cbl}$, is available from $\texttt{CRAN}$. △ Less

Submitted 28 June, 2022; v1 submitted 11 May, 2022; originally announced May 2022.

Comments: Camera ready version (UAI 2022)

Journal ref: 38th Conference on Uncertainty in Artificial Intelligence (UAI 2022)

arXiv:2203.10982 [pdf, other]

doi 10.1007/s11071-022-07865-x

Sequential time-window learning with approximate Bayesian computation: an application to epidemic forecasting

Authors: João Pedro Valeriano, Pedro Henrique Cintra, Gustavo Libotte, Igor Reis, Felipe Fontinele, Renato Silva, Sandra Malta

Abstract: The long duration of the COVID-19 pandemic allowed for multiple bursts in the infection and death rates, the so-called epidemic waves. This complex behavior is no longer tractable by simple compartmental model and requires more sophisticated mathematical techniques for analyzing epidemic data and generating reliable forecasts. In this work, we propose a framework for analyzing complex dynamical sy… ▽ More The long duration of the COVID-19 pandemic allowed for multiple bursts in the infection and death rates, the so-called epidemic waves. This complex behavior is no longer tractable by simple compartmental model and requires more sophisticated mathematical techniques for analyzing epidemic data and generating reliable forecasts. In this work, we propose a framework for analyzing complex dynamical systems by dividing the data in consecutive time-windows to be separately analyzed. We fit parameters for each time-window through an Approximate Bayesian Computation (ABC) algorithm, and the posterior distribution of parameters obtained for one window is used as the prior distribution for the next window. This Bayesian learning approach is tested with data on COVID-19 cases in multiple countries and is shown to improve ABC performance and to produce good short-term forecasting. △ Less

Submitted 21 March, 2022; originally announced March 2022.

Comments: 12 pages, 7 figures; + supplementary material -- 31 pages

Journal ref: Nonlinear Dyn (2022)

arXiv:2202.13851 [pdf, other]

The Causal Marginal Polytope for Bounding Treatment Effects

Authors: Jakob Zeitler, Ricardo Silva

Abstract: Due to unmeasured confounding, it is often not possible to identify causal effects from a postulated model. Nevertheless, we can ask for partial identification, which usually boils down to finding upper and lower bounds of a causal quantity of interest derived from all solutions compatible with the encoded structural assumptions. One appealing way to derive such bounds is by casting it in terms of… ▽ More Due to unmeasured confounding, it is often not possible to identify causal effects from a postulated model. Nevertheless, we can ask for partial identification, which usually boils down to finding upper and lower bounds of a causal quantity of interest derived from all solutions compatible with the encoded structural assumptions. One appealing way to derive such bounds is by casting it in terms of a constrained optimization method that searches over all causal models compatible with evidence, as introduced in the classic work of Balke and Pearl (1994) for discrete data. Although by construction this guarantees tight bounds, it poses a formidable computational challenge. To cope with this issue, alternatives include algorithms that are not guaranteed to be tight, or by introducing restrictions on the class of models. In this paper, we introduce a novel alternative: inspired by ideas coming from belief propagation, we enforce compatibility between marginals of a causal model and data, without constructing a global causal model. We call this collection of locally consistent marginals the causal marginal polytope. As global independence constraints disappear when considering small dimensional tractable marginals, this also leads to a rethinking of how to elicit and express causal knowledge. We provide an explicit algorithm and implementation of this idea, and assess its practicality with numerical experiments. △ Less

Submitted 28 February, 2022; originally announced February 2022.

arXiv:2202.10806 [pdf, other]

Stochastic Causal Programming for Bounding Treatment Effects

Authors: Kirtan Padh, Jakob Zeitler, David Watson, Matt Kusner, Ricardo Silva, Niki Kilbertus

Abstract: Causal effect estimation is important for many tasks in the natural and social sciences. We design algorithms for the continuous partial identification problem: bounding the effects of multivariate, continuous treatments when unmeasured confounding makes identification impossible. Specifically, we cast causal effects as objective functions within a constrained optimization problem, and minimize/ma… ▽ More Causal effect estimation is important for many tasks in the natural and social sciences. We design algorithms for the continuous partial identification problem: bounding the effects of multivariate, continuous treatments when unmeasured confounding makes identification impossible. Specifically, we cast causal effects as objective functions within a constrained optimization problem, and minimize/maximize these functions to obtain bounds. We combine flexible learning algorithms with Monte Carlo methods to implement a family of solutions under the name of stochastic causal programming. In particular, we show how the generic framework can be efficiently formulated in settings where auxiliary variables are clustered into pre-treatment and post-treatment sets, where no fine-grained causal graph can be easily specified. In these settings, we can avoid the need for fully specifying the distribution family of hidden common causes. Monte Carlo computation is also much simplified, leading to algorithms which are more computationally stable against alternatives. △ Less

Submitted 17 May, 2023; v1 submitted 22 February, 2022; originally announced February 2022.

Journal ref: Proceedings of Machine Learning Research vol 213:1-35, 2023

arXiv:2202.00661 [pdf, other]

When Do Flat Minima Optimizers Work?

Authors: Jean Kaddour, Linqing Liu, Ricardo Silva, Matt J. Kusner

Abstract: Recently, flat-minima optimizers, which seek to find parameters in low-loss neighborhoods, have been shown to improve a neural network's generalization performance over stochastic and adaptive gradient-based optimizers. Two methods have received significant attention due to their scalability: 1. Stochastic Weight Averaging (SWA), and 2. Sharpness-Aware Minimization (SAM). However, there has been l… ▽ More Recently, flat-minima optimizers, which seek to find parameters in low-loss neighborhoods, have been shown to improve a neural network's generalization performance over stochastic and adaptive gradient-based optimizers. Two methods have received significant attention due to their scalability: 1. Stochastic Weight Averaging (SWA), and 2. Sharpness-Aware Minimization (SAM). However, there has been limited investigation into their properties and no systematic benchmarking of them across different domains. We fill this gap here by comparing the loss surfaces of the models trained with each method and through broad benchmarking across computer vision, natural language processing, and graph representation learning tasks. We discover several surprising findings from these results, which we hope will help researchers further improve deep learning optimizers, and practitioners identify the right optimizer for their problem. △ Less

Submitted 27 January, 2023; v1 submitted 1 February, 2022; originally announced February 2022.

arXiv:2106.05074 [pdf, other]

Operationalizing Complex Causes: A Pragmatic View of Mediation

Authors: Limor Gultchin, David S. Watson, Matt J. Kusner, Ricardo Silva

Abstract: We examine the problem of causal response estimation for complex objects (e.g., text, images, genomics). In this setting, classical \emph{atomic} interventions are often not available (e.g., changes to characters, pixels, DNA base-pairs). Instead, we only have access to indirect or \emph{crude} interventions (e.g., enrolling in a writing program, modifying a scene, applying a gene therapy). In thi… ▽ More We examine the problem of causal response estimation for complex objects (e.g., text, images, genomics). In this setting, classical \emph{atomic} interventions are often not available (e.g., changes to characters, pixels, DNA base-pairs). Instead, we only have access to indirect or \emph{crude} interventions (e.g., enrolling in a writing program, modifying a scene, applying a gene therapy). In this work, we formalize this problem and provide an initial solution. Given a collection of candidate mediators, we propose (a) a two-step method for predicting the causal responses of crude interventions; and (b) a testing procedure to identify mediators of crude interventions. We demonstrate, on a range of simulated and real-world-inspired examples, that our approach allows us to efficiently estimate the effect of crude interventions with limited data from new treatment regimes. △ Less

Submitted 10 June, 2021; v1 submitted 9 June, 2021; originally announced June 2021.

Journal ref: International Conference on Machine Learning 2021

arXiv:2106.02909 [pdf, other]

Parameter Estimation for Grouped Data Using EM and MCEM Algorithms

Authors: Zahra A. Shirazi, João Pedro A. R. da Silva, Camila P. E. de Souza

Abstract: Nowadays, the confidentiality of data and information is of great importance for many companies and organizations. For this reason, they may prefer not to release exact data, but instead to grant researchers access to approximate data. For example, rather than providing the exact measurements of their clients, they may only provide researchers with grouped data, that is, the number of clients fall… ▽ More Nowadays, the confidentiality of data and information is of great importance for many companies and organizations. For this reason, they may prefer not to release exact data, but instead to grant researchers access to approximate data. For example, rather than providing the exact measurements of their clients, they may only provide researchers with grouped data, that is, the number of clients falling in each of a set of non-overlap** measurement intervals. The challenge is to estimate the mean and variance structure of the hidden ungrouped data based on the observed grouped data. To tackle this problem, this work considers the exact observed data likelihood and applies the Expectation-Maximization (EM) and Monte-Carlo EM (MCEM) algorithms for cases where the hidden data follow a univariate, bivariate, or multivariate normal distribution. Simulation studies are conducted to evaluate the performance of the proposed EM and MCEM algorithms. The well-known Galton data set is considered as an application example. △ Less

Submitted 22 December, 2021; v1 submitted 5 June, 2021; originally announced June 2021.

Comments: 32 pages, 9 tables and 7 figures

arXiv:2106.01939 [pdf, other]

Causal Effect Inference for Structured Treatments

Authors: Jean Kaddour, Yuchen Zhu, Qi Liu, Matt J. Kusner, Ricardo Silva

Abstract: We address the estimation of conditional average treatment effects (CATEs) for structured treatments (e.g., graphs, images, texts). Given a weak condition on the effect, we propose the generalized Robinson decomposition, which (i) isolates the causal estimand (reducing regularization bias), (ii) allows one to plug in arbitrary models for learning, and (iii) possesses a quasi-oracle convergence gua… ▽ More We address the estimation of conditional average treatment effects (CATEs) for structured treatments (e.g., graphs, images, texts). Given a weak condition on the effect, we propose the generalized Robinson decomposition, which (i) isolates the causal estimand (reducing regularization bias), (ii) allows one to plug in arbitrary models for learning, and (iii) possesses a quasi-oracle convergence guarantee under mild assumptions. In experiments with small-world and molecular graphs we demonstrate that our approach outperforms prior work in CATE estimation. △ Less

Submitted 27 October, 2021; v1 submitted 3 June, 2021; originally announced June 2021.

Comments: NeurIPS 2021 Camera-Ready submission

arXiv:2103.08691 [pdf, other]

Fractional Poisson random sum and its associated normal variance mixture

Authors: Gabriela Oliveira, Wagner Barreto-Souza, Roger W. C. Silva

Abstract: In this work, we study the partial sums of independent and identically distributed random variables with the number of terms following a fractional Poisson (FP) distribution. The FP sum contains the Poisson and geometric summations as particular cases. We show that the weak limit of the FP summation, when properly normalized, is a mixture between the normal and Mittag-Leffler distributions, which… ▽ More In this work, we study the partial sums of independent and identically distributed random variables with the number of terms following a fractional Poisson (FP) distribution. The FP sum contains the Poisson and geometric summations as particular cases. We show that the weak limit of the FP summation, when properly normalized, is a mixture between the normal and Mittag-Leffler distributions, which we call by Normal-Mittag-Leffler (NML) law. A parameter estimation procedure for the NML distribution is developed and the associated asymptotic distribution is derived. Simulations are performed to check the performance of the proposed estimators under finite samples. An empirical illustration on the daily log-returns of the Brazilian stock exchange index (IBOVESPA) shows that the NML distribution captures better the tails than some of its competitors. Related problems such as a mixed Poisson representation for the FP law and the weak convergence for the Conway-Maxwell-Poisson random sum are also addressed. △ Less

Submitted 31 March, 2021; v1 submitted 9 March, 2021; originally announced March 2021.

Comments: Paper submitted for publication

arXiv:2011.12756 [pdf, other]

Surrogate-based Bayesian Comparison of Computationally Expensive Models: Application to Microbially Induced Calcite Precipitation

Authors: Stefania Scheurer, Aline Schäfer Rodrigues Silva, Farid Mohammadi, Johannes Hommel, Sergey Oladyshkin, Bernd Flemisch, Wolfgang Nowak

Abstract: Geochemical processes in subsurface reservoirs affected by microbial activity change the material properties of porous media. This is a complex biogeochemical process in subsurface reservoirs that currently contains strong conceptual uncertainty. This means, several modeling approaches describing the biogeochemical process are plausible and modelers face the uncertainty of choosing the most approp… ▽ More Geochemical processes in subsurface reservoirs affected by microbial activity change the material properties of porous media. This is a complex biogeochemical process in subsurface reservoirs that currently contains strong conceptual uncertainty. This means, several modeling approaches describing the biogeochemical process are plausible and modelers face the uncertainty of choosing the most appropriate one. Once observation data becomes available, a rigorous Bayesian model selection accompanied by a Bayesian model justifiability analysis could be employed to choose the most appropriate model, i.e. the one that describes the underlying physical processes best in the light of the available data. However, biogeochemical modeling is computationally very demanding because it conceptualizes different phases, biomass dynamics, geochemistry, precipitation and dissolution in porous media. Therefore, the Bayesian framework cannot be based directly on the full computational models as this would require too many expensive model evaluations. To circumvent this problem, we suggest performing both Bayesian model selection and justifiability analysis after constructing surrogates for the competing biogeochemical models. Here, we use the arbitrary polynomial chaos expansion. We account for the approximation error in the Bayesian analysis by introducing novel correction factors for the resulting model weights. Thereby, we extend the Bayesian justifiability analysis and assess model similarities for computationally expensive models. We demonstrate the method on a representative scenario for microbially induced calcite precipitation in a porous medium. Our extension of the justifiability analysis provides a suitable approach for the comparison of computationally demanding models and gives an insight on the necessary amount of data for a reliable model performance. △ Less

Submitted 25 November, 2020; originally announced November 2020.

arXiv:2007.07979 [pdf, other]

Short-term forecasting of Amazon rainforest fires based on ensemble decomposition model

Authors: Ramon Gomes da Silva, Matheus Henrique Dal Molin Ribeiro, Viviana Cocco Mariani, Leandro dos Santos Coelho

Abstract: Accurate forecasting is important for decision-makers. Recently, the Amazon rainforest is reaching record levels of the number of fires, a situation that concerns both climate and public health problems. Obtaining the desired forecasting accuracy becomes difficult and challenging. In this paper were developed a novel heterogeneous decomposition-ensemble model by using Seasonal and Trend decomposit… ▽ More Accurate forecasting is important for decision-makers. Recently, the Amazon rainforest is reaching record levels of the number of fires, a situation that concerns both climate and public health problems. Obtaining the desired forecasting accuracy becomes difficult and challenging. In this paper were developed a novel heterogeneous decomposition-ensemble model by using Seasonal and Trend decomposition based on Loess in combination with algorithms for short-term load forecasting multi-month-ahead, to explore temporal patterns of Amazon rainforest fires in Brazil. The results demonstrate the proposed decomposition-ensemble models can provide more accurate forecasting evaluated by performance measures. Diebold-Mariano statistical test showed the proposed models are better than other compared models, but it is statistically equal to one of them. △ Less

Submitted 23 July, 2020; v1 submitted 15 July, 2020; originally announced July 2020.

Comments: 6 pages with 3 figures; Comments edited

arXiv:2006.09949 [pdf, other]

Exact and computationally efficient Bayesian inference for generalized Markov modulated Poisson processes

Authors: Flavio B. Gonçalves, Livia M. Dutra, Roger W. C. Silva

Abstract: Statistical modeling of point patterns is an important and common problem in several areas. The Poisson process is the most common process used for this purpose, in particular, its generalization that considers the intensity function to be stochastic. This is called a Cox process and different choices to model the dynamics of the intensity gives rise to a wide range of models. We present a new cla… ▽ More Statistical modeling of point patterns is an important and common problem in several areas. The Poisson process is the most common process used for this purpose, in particular, its generalization that considers the intensity function to be stochastic. This is called a Cox process and different choices to model the dynamics of the intensity gives rise to a wide range of models. We present a new class of unidimensional Cox process models in which the intensity function assumes parametric functional forms that switch among them according to a continuous-time Markov chain. A novel methodology is proposed to perform exact Bayesian inference based on MCMC algorithms. The term exact refers to the fact that no discrete time approximation is used and Monte Carlo error is the only source of inaccuracy. The reliability of the algorithms depends on a variety of specifications which are carefully addressed, resulting in a computationally efficient (in terms of computing time) algorithm and enabling its use with large data sets. Simulated and real examples are presented to illustrate the efficiency and applicability of the proposed methodology. A specific model to fit epidemic curves is proposed and used to analyze data from Dengue Fever in Brazil and COVID-19 in some countries. △ Less

Submitted 25 February, 2021; v1 submitted 17 June, 2020; originally announced June 2020.

arXiv:2006.06366 [pdf, other]

A Class of Algorithms for General Instrumental Variable Models

Authors: Niki Kilbertus, Matt J. Kusner, Ricardo Silva

Abstract: Causal treatment effect estimation is a key problem that arises in a variety of real-world settings, from personalized medicine to governmental policy making. There has been a flurry of recent work in machine learning on estimating causal effects when one has access to an instrument. However, to achieve identifiability, they in general require one-size-fits-all assumptions such as an additive erro… ▽ More Causal treatment effect estimation is a key problem that arises in a variety of real-world settings, from personalized medicine to governmental policy making. There has been a flurry of recent work in machine learning on estimating causal effects when one has access to an instrument. However, to achieve identifiability, they in general require one-size-fits-all assumptions such as an additive error model for the outcome. An alternative is partial identification, which provides bounds on the causal effect. Little exists in terms of bounding methods that can deal with the most general case, where the treatment itself can be continuous. Moreover, bounding methods generally do not allow for a continuum of assumptions on the shape of the causal effect that can smoothly trade off stronger background knowledge for more informative bounds. In this work, we provide a method for causal effect bounding in continuous distributions, leveraging recent advances in gradient-based methods for the optimization of computationally intractable objective functions. We demonstrate on a set of synthetic and real-world data that our bounds capture the causal effect when additive methods fail, providing a useful range of answers compatible with observation as opposed to relying on unwarranted structural assumptions. △ Less

Submitted 21 October, 2020; v1 submitted 11 June, 2020; originally announced June 2020.

Comments: Appeared at Neural Information Processing Systems (NeurIPS) 2020; Code at https://github.com/nikikilbertus/general-iv-models

arXiv:2005.11528 [pdf, other]

Learning Joint Nonlinear Effects from Single-variable Interventions in the Presence of Hidden Confounders

Authors: Sorawit Saengkyongam, Ricardo Silva

Abstract: We propose an approach to estimate the effect of multiple simultaneous interventions in the presence of hidden confounders. To overcome the problem of hidden confounding, we consider the setting where we have access to not only the observational data but also sets of single-variable interventions in which each of the treatment variables is intervened on separately. We prove identifiability under t… ▽ More We propose an approach to estimate the effect of multiple simultaneous interventions in the presence of hidden confounders. To overcome the problem of hidden confounding, we consider the setting where we have access to not only the observational data but also sets of single-variable interventions in which each of the treatment variables is intervened on separately. We prove identifiability under the assumption that the data is generated from a nonlinear continuous structural causal model with additive Gaussian noise. In addition, we propose a simple parameter estimation method by pooling all the data from different regimes and jointly maximizing the combined likelihood. We also conduct comprehensive experiments to verify the identifiability result as well as to compare the performance of our approach against a baseline on both synthetic and real-world data. △ Less

Submitted 16 June, 2020; v1 submitted 23 May, 2020; originally announced May 2020.

Comments: Accepted to The Conference on Uncertainty in Artificial Intelligence (UAI) 2020

arXiv:2005.00501 [pdf, ps, other]

Multivariate Log-Skewed Distributions with normal kernel and their Applications

Authors: Marina M. de Queiroz, Rosangela H. Loschi, Roger W. C. Silva

Abstract: We introduce two classes of multivariate log skewed distributions with normal kernel: the log canonical fundamental skew-normal (log-CFUSN) and the log unified skew-normal (log-SUN). We also discuss some properties of the log-CFUSN family of distributions. These new classes of log-skewed distributions include the log-normal and multivariate log-skew normal families as particular cases. We discuss… ▽ More We introduce two classes of multivariate log skewed distributions with normal kernel: the log canonical fundamental skew-normal (log-CFUSN) and the log unified skew-normal (log-SUN). We also discuss some properties of the log-CFUSN family of distributions. These new classes of log-skewed distributions include the log-normal and multivariate log-skew normal families as particular cases. We discuss some issues related to Bayesian inference in the log-CFUSN family of distributions, mainly we focus on how to model the prior uncertainty about the skewing parameter. Based on the stochastic representation of the log-CFUSN family, we propose a data augmentation strategy for sampling from the posterior distributions. This proposed family is used to analyze the US national monthly precipitation data. We conclude that a high dimensional skewing function lead to a better model fit. △ Less

Submitted 1 May, 2020; originally announced May 2020.

Comments: 20 pages

Journal ref: Statistics (Berlin), 2016

arXiv:2004.12554 [pdf, other]

Forecasting in Non-stationary Environments with Fuzzy Time Series

Authors: Petrônio Cândido de Lima e Silva, Carlos Alberto Severiano Junior, Marcos Antonio Alves, Rodrigo Silva, Miri Weiss Cohen, Frederico Gadelha Guimarães

Abstract: In this paper we introduce a Non-Stationary Fuzzy Time Series (NSFTS) method with time varying parameters adapted from the distribution of the data. In this approach, we employ Non-Stationary Fuzzy Sets, in which perturbation functions are used to adapt the membership function parameters in the knowledge base in response to statistical changes in the time series. The proposed method is capable of… ▽ More In this paper we introduce a Non-Stationary Fuzzy Time Series (NSFTS) method with time varying parameters adapted from the distribution of the data. In this approach, we employ Non-Stationary Fuzzy Sets, in which perturbation functions are used to adapt the membership function parameters in the knowledge base in response to statistical changes in the time series. The proposed method is capable of dynamically adapting its fuzzy sets to reflect the changes in the stochastic process based on the residual errors, without the need to retraining the model. This method can handle non-stationary and heteroskedastic data as well as scenarios with concept-drift. The proposed approach allows the model to be trained only once and remain useful long after while kee** reasonable accuracy. The flexibility of the method by means of computational experiments was tested with eight synthetic non-stationary time series data with several kinds of concept drifts, four real market indices (Dow Jones, NASDAQ, SP500 and TAIEX), three real FOREX pairs (EUR-USD, EUR-GBP, GBP-USD), and two real cryptocoins exchange rates (Bitcoin-USD and Ethereum-USD). As competitor models the Time Variant fuzzy time series and the Incremental Ensemble were used, these are two of the major approaches for handling non-stationary data sets. Non-parametric tests are employed to check the significance of the results. The proposed method shows resilience to concept drift, by adapting parameters of the model, while preserving the symbolic structure of the knowledge base. △ Less

Submitted 26 April, 2020; originally announced April 2020.

Comments: 21 pages, 7 figures, submitted to Applied Soft Computing

arXiv:2003.01864 [pdf, other]

Visualizing and Understanding Large-Scale Assessments in Mathematics through Dimensionality Reduction

Authors: Esdras Medeiros, Jorge Lira, Romildo Silva, Caio Azevedo

Abstract: In this paper, we apply the Logistic PCA (LPCA) as a dimensionality reduction tool for visualizing patterns and characterizing the relevance of mathematics abilities from a given population measured by a large-scale assessment. We establish an equivalence of parameters between LPCA, Inner Product Representation (IPR) and the two paramenter logistic model (2PL) from the Item Response Theory (IRT).… ▽ More In this paper, we apply the Logistic PCA (LPCA) as a dimensionality reduction tool for visualizing patterns and characterizing the relevance of mathematics abilities from a given population measured by a large-scale assessment. We establish an equivalence of parameters between LPCA, Inner Product Representation (IPR) and the two paramenter logistic model (2PL) from the Item Response Theory (IRT). This equivalence provides three complemetary ways of looking at data that assists professionals in education to perform in-context interpretations. Particularly, we analyse the data collected from SPAECE, a large-scale assessment in Mathematics that has been applied yearly in the public educational system of the state of Ceará, Brazil. As the main result, we show that the the poor performance of examinees in the end of middle school is primarily caused by their disabilities in number sense. △ Less

Submitted 31 May, 2020; v1 submitted 3 March, 2020; originally announced March 2020.

Comments: To be submitted for a journal

arXiv:2003.01461 [pdf, other]

Differentiable Causal Backdoor Discovery

Authors: Limor Gultchin, Matt J. Kusner, Varun Kanade, Ricardo Silva

Abstract: Discovering the causal effect of a decision is critical to nearly all forms of decision-making. In particular, it is a key quantity in drug development, in crafting government policy, and when implementing a real-world machine learning system. Given only observational data, confounders often obscure the true causal effect. Luckily, in some cases, it is possible to recover the causal effect by usin… ▽ More Discovering the causal effect of a decision is critical to nearly all forms of decision-making. In particular, it is a key quantity in drug development, in crafting government policy, and when implementing a real-world machine learning system. Given only observational data, confounders often obscure the true causal effect. Luckily, in some cases, it is possible to recover the causal effect by using certain observed variables to adjust for the effects of confounders. However, without access to the true causal model, finding this adjustment requires brute-force search. In this work, we present an algorithm that exploits auxiliary variables, similar to instruments, in order to find an appropriate adjustment by a gradient-based optimization method. We demonstrate that it outperforms practical alternatives in estimating the true causal effect, without knowledge of the full causal graph. △ Less

Submitted 3 March, 2020; originally announced March 2020.

Comments: Published in the Proceedings of the 23rd International Conference on Artificial Intelligence and Statistics (AISTATS) 2020, Palermo, Italy

arXiv:2002.05676 [pdf, other]

Generalized Autoregressive Neural Network Models

Authors: Renato Rodrigues Silva

Abstract: A time series is a sequence of observations taken sequentially in time. The autoregressive integrated moving average is a class of the model more used for times series data. However, this class of model has two critical limitations. It fits well onlyGaussian data with the linear structure of correlation. Here, I present a new model named as generalized autoregressive neural networks, GARNN. The GA… ▽ More A time series is a sequence of observations taken sequentially in time. The autoregressive integrated moving average is a class of the model more used for times series data. However, this class of model has two critical limitations. It fits well onlyGaussian data with the linear structure of correlation. Here, I present a new model named as generalized autoregressive neural networks, GARNN. The GARNN is an extension of the generalized linear model where the mean marginal depends on the lagged values via the inclusion of the neural network in the link function. A practical application of the model is shown using a well-known poliomyelitis case number, originated analyzed by Zeger and Qaqish (1988), △ Less

Submitted 13 February, 2020; originally announced February 2020.

arXiv:2002.05508 [pdf, other]

Neural Network Approximation of Graph Fourier Transforms for Sparse Sampling of Networked Flow Dynamics

Authors: Alessio Pagani, Zhuangkun Wei, Ricardo Silva, Weisi Guo

Abstract: Infrastructure monitoring is critical for safe operations and sustainability. Water distribution networks (WDNs) are large-scale networked critical systems with complex cascade dynamics which are difficult to predict. Ubiquitous monitoring is expensive and a key challenge is to infer the contaminant dynamics from partial sparse monitoring data. Existing approaches use multi-objective optimisation… ▽ More Infrastructure monitoring is critical for safe operations and sustainability. Water distribution networks (WDNs) are large-scale networked critical systems with complex cascade dynamics which are difficult to predict. Ubiquitous monitoring is expensive and a key challenge is to infer the contaminant dynamics from partial sparse monitoring data. Existing approaches use multi-objective optimisation to find the minimum set of essential monitoring points, but lack performance guarantees and a theoretical framework. Here, we first develop Graph Fourier Transform (GFT) operators to compress networked contamination spreading dynamics to identify the essential principle data collection points with inference performance guarantees. We then build autoencoder (AE) inspired neural networks (NN) to generalize the GFT sampling process and under-sample further from the initial sampling set, allowing a very small set of data points to largely reconstruct the contamination dynamics over real and artificial WDNs. Various sources of the contamination are tested and we obtain high accuracy reconstruction using around 5-10% of the sample set. This general approach of compression and under-sampled recovery via neural networks can be applied to a wide range of networked infrastructures to enable digital twins. △ Less

Submitted 11 February, 2020; originally announced February 2020.

arXiv:1912.04242 [pdf, other]

Adversarial recovery of agent rewards from latent spaces of the limit order book

Authors: Jacobo Roa-Vicens, Yuanbo Wang, Virgile Mison, Yarin Gal, Ricardo Silva

Abstract: Inverse reinforcement learning has proved its ability to explain state-action trajectories of expert agents by recovering their underlying reward functions in increasingly challenging environments. Recent advances in adversarial learning have allowed extending inverse RL to applications with non-stationary environment dynamics unknown to the agents, arbitrary structures of reward functions and imp… ▽ More Inverse reinforcement learning has proved its ability to explain state-action trajectories of expert agents by recovering their underlying reward functions in increasingly challenging environments. Recent advances in adversarial learning have allowed extending inverse RL to applications with non-stationary environment dynamics unknown to the agents, arbitrary structures of reward functions and improved handling of the ambiguities inherent to the ill-posed nature of inverse RL. This is particularly relevant in real time applications on stochastic environments involving risk, like volatile financial markets. Moreover, recent work on simulation of complex environments enable learning algorithms to engage with real market data through simulations of its latent space representations, avoiding a costly exploration of the original environment. In this paper, we explore whether adversarial inverse RL algorithms can be adapted and trained within such latent space simulations from real market data, while maintaining their ability to recover agent rewards robust to variations in the underlying dynamics, and transfer them to new regimes of the original environment. △ Less

Submitted 9 December, 2019; originally announced December 2019.

Comments: Published as a workshop paper on NeurIPS 2019 Workshop on Robust AI in Financial Services. 33rd Conference on Neural Information Processing Systems (NeurIPS 2019), Vancouver, Canada

arXiv:1911.04048 [pdf, other]

Multidataset Independent Subspace Analysis with Application to Multimodal Fusion

Authors: Rogers F. Silva, Sergey M. Plis, Tulay Adali, Marios S. Pattichis, Vince D. Calhoun

Abstract: In the last two decades, unsupervised latent variable models---blind source separation (BSS) especially---have enjoyed a strong reputation for the interpretable features they produce. Seldom do these models combine the rich diversity of information available in multiple datasets. Multidatasets, on the other hand, yield joint solutions otherwise unavailable in isolation, with a potential for pivota… ▽ More In the last two decades, unsupervised latent variable models---blind source separation (BSS) especially---have enjoyed a strong reputation for the interpretable features they produce. Seldom do these models combine the rich diversity of information available in multiple datasets. Multidatasets, on the other hand, yield joint solutions otherwise unavailable in isolation, with a potential for pivotal insights into complex systems. To take advantage of the complex multidimensional subspace structures that capture underlying modes of shared and unique variability across and within datasets, we present a direct, principled approach to multidataset combination. We design a new method called multidataset independent subspace analysis (MISA) that leverages joint information from multiple heterogeneous datasets in a flexible and synergistic fashion. Methodological innovations exploiting the Kotz distribution for subspace modeling in conjunction with a novel combinatorial optimization for evasion of local minima enable MISA to produce a robust generalization of independent component analysis (ICA), independent vector analysis (IVA), and independent subspace analysis (ISA) in a single unified model. We highlight the utility of MISA for multimodal information fusion, including sample-poor regimes and low signal-to-noise ratio scenarios, promoting novel applications in both unimodal and multimodal brain imaging data. △ Less

Submitted 10 November, 2019; originally announced November 2019.

Comments: For associated code, see https://github.com/rsilva8/MISA For associated data, see https://github.com/rsilva8/MISA-data Submitted to IEEE Transactions on Image Processing on Nov/7/2019: 13 pages, 8 figures Supplement: 16 pages, 5 figures

ACM Class: G.1.6; G.2.1; G.3; H.1.1; J.3; I.5.1; I.2.6

arXiv:1911.00827 [pdf, other]

doi 10.1016/j.physleta.2019.126231

A simple study of the correlation effects in the superposition of waves of electric fields: the emergence of extreme events

Authors: Roberto da Silva, Sandra D. Prado

Abstract: In this paper, we study the effects of correlated random phases in the intensity of a superposition of $N$ wave-fields. Our results suggest that regardless of whether the phase distribution is continuous or discrete if the phases are random correlated variables, we must observe a heavier tail distribution and the emergence of extreme events as the correlation between phases increases. We believe t… ▽ More In this paper, we study the effects of correlated random phases in the intensity of a superposition of $N$ wave-fields. Our results suggest that regardless of whether the phase distribution is continuous or discrete if the phases are random correlated variables, we must observe a heavier tail distribution and the emergence of extreme events as the correlation between phases increases. We believe that such a simple method can be easily applied in other situations to show the existence of extreme statistical events in the context of nonlinear complex systems. △ Less

Submitted 3 November, 2019; originally announced November 2019.

Comments: 11 pages, 3 figures

arXiv:1910.12913 [pdf, other]

Improved Differentially Private Decentralized Source Separation for fMRI Data

Authors: Hafiz Imtiaz, Jafar Mohammadi, Rogers Silva, Bradley Baker, Sergey M. Plis, Anand D. Sarwate, Vince Calhoun

Abstract: Blind source separation algorithms such as independent component analysis (ICA) are widely used in the analysis of neuroimaging data. In order to leverage larger sample sizes, different data holders/sites may wish to collaboratively learn feature representations. However, such datasets are often privacy-sensitive, precluding centralized analyses that pool the data at a single site. In this work, w… ▽ More Blind source separation algorithms such as independent component analysis (ICA) are widely used in the analysis of neuroimaging data. In order to leverage larger sample sizes, different data holders/sites may wish to collaboratively learn feature representations. However, such datasets are often privacy-sensitive, precluding centralized analyses that pool the data at a single site. In this work, we propose a differentially private algorithm for performing ICA in a decentralized data setting. Conventional approaches to decentralized differentially private algorithms may introduce too much noise due to the typically small sample sizes at each site. We propose a novel protocol that uses correlated noise to remedy this problem. We show that our algorithm outperforms existing approaches on synthetic and real neuroimaging datasets and demonstrate that it can sometimes reach the same level of utility as the corresponding non-private algorithm. This indicates that it is possible to have meaningful utility while preserving privacy. △ Less

Submitted 22 February, 2021; v1 submitted 28 October, 2019; originally announced October 2019.

Comments: \c{opyright} 2020 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising or promotional purposes, creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works. arXiv admin note: text overlap with arXiv:1904.10059

arXiv:1908.07193 [pdf, other]

Counterfactual Distribution Regression for Structured Inference

Authors: Nicolo Colombo, Ricardo Silva, Soong M Kang, Arthur Gretton

Abstract: We consider problems in which a system receives external \emph{perturbations} from time to time. For instance, the system can be a train network in which particular lines are repeatedly disrupted without warning, having an effect on passenger behavior. The goal is to predict changes in the behavior of the system at particular points of interest, such as passenger traffic around stations at the aff… ▽ More We consider problems in which a system receives external \emph{perturbations} from time to time. For instance, the system can be a train network in which particular lines are repeatedly disrupted without warning, having an effect on passenger behavior. The goal is to predict changes in the behavior of the system at particular points of interest, such as passenger traffic around stations at the affected rails. We assume that the data available provides records of the system functioning at its "natural regime" (e.g., the train network without disruptions) and data on cases where perturbations took place. The inference problem is how information concerning perturbations, with particular covariates such as location and time, can be generalized to predict the effect of novel perturbations. We approach this problem from the point of view of a map** from the counterfactual distribution of the system behavior without disruptions to the distribution of the disrupted system. A variant on \emph{distribution regression} is developed for this setup. △ Less

Submitted 20 August, 2019; originally announced August 2019.

Comments: 24 pages, 5 figures

arXiv:1907.01040 [pdf, other]

The Sensitivity of Counterfactual Fairness to Unmeasured Confounding

Authors: Niki Kilbertus, Philip J. Ball, Matt J. Kusner, Adrian Weller, Ricardo Silva

Abstract: Causal approaches to fairness have seen substantial recent interest, both from the machine learning community and from wider parties interested in ethical prediction algorithms. In no small part, this has been due to the fact that causal models allow one to simultaneously leverage data and expert knowledge to remove discriminatory effects from predictions. However, one of the primary assumptions i… ▽ More Causal approaches to fairness have seen substantial recent interest, both from the machine learning community and from wider parties interested in ethical prediction algorithms. In no small part, this has been due to the fact that causal models allow one to simultaneously leverage data and expert knowledge to remove discriminatory effects from predictions. However, one of the primary assumptions in causal modeling is that you know the causal graph. This introduces a new opportunity for bias, caused by misspecifying the causal model. One common way for misspecification to occur is via unmeasured confounding: the true causal effect between variables is partially described by unobserved quantities. In this work we design tools to assess the sensitivity of fairness measures to this confounding for the popular class of non-linear additive noise models (ANMs). Specifically, we give a procedure for computing the maximum difference between two counterfactually fair predictors, where one has become biased due to confounding. For the case of bivariate confounding our technique can be swiftly computed via a sequence of closed-form updates. For multivariate confounding we give an algorithm that can be efficiently solved via automatic differentiation. We demonstrate our new sensitivity analysis tools in real-world fairness scenarios to assess the bias arising from confounding. △ Less

Submitted 1 July, 2019; originally announced July 2019.

Comments: published at UAI 2019

arXiv:1906.04813 [pdf, other]

Towards Inverse Reinforcement Learning for Limit Order Book Dynamics

Authors: Jacobo Roa-Vicens, Cyrine Chtourou, Angelos Filos, Francisco Rullan, Yarin Gal, Ricardo Silva

Abstract: Multi-agent learning is a promising method to simulate aggregate competitive behaviour in finance. Learning expert agents' reward functions through their external demonstrations is hence particularly relevant for subsequent design of realistic agent-based simulations. Inverse Reinforcement Learning (IRL) aims at acquiring such reward functions through inference, allowing to generalize the resultin… ▽ More Multi-agent learning is a promising method to simulate aggregate competitive behaviour in finance. Learning expert agents' reward functions through their external demonstrations is hence particularly relevant for subsequent design of realistic agent-based simulations. Inverse Reinforcement Learning (IRL) aims at acquiring such reward functions through inference, allowing to generalize the resulting policy to states not observed in the past. This paper investigates whether IRL can infer such rewards from agents within real financial stochastic environments: limit order books (LOB). We introduce a simple one-level LOB, where the interactions of a number of stochastic agents and an expert trading agent are modelled as a Markov decision process. We consider two cases for the expert's reward: either a simple linear function of state features; or a complex, more realistic non-linear function. Given the expert agent's demonstrations, we attempt to discover their strategy by modelling their latent reward function using linear and Gaussian process (GP) regressors from previous literature, and our own approach through Bayesian neural networks (BNN). While the three methods can learn the linear case, only the GP-based and our proposed BNN methods are able to discover the non-linear reward case. Our BNN IRL algorithm outperforms the other two approaches as the number of samples increases. These results illustrate that complex behaviours, induced by non-linear reward functions amid agent-based stochastic scenarios, can be deduced through inference, encouraging the use of inverse reinforcement learning for opponent-modelling in multi-agent systems. △ Less

Submitted 11 June, 2019; originally announced June 2019.

Comments: Published as a workshop paper on AI in Finance: Applications and Infrastructure for Multi-Agent Learning at the 36th International Conference on Machine Learning (ICML), Long Beach, California, PMLR97, 2019. Copyright 2019 by the author(s)

arXiv:1904.00770 [pdf, other]

Netherlands Dataset: A New Public Dataset for Machine Learning in Seismic Interpretation

Authors: Reinaldo Mozart Silva, Lais Baroni, Rodrigo S. Ferreira, Daniel Civitarese, Daniela Szwarcman, Emilio Vital Brazil

Abstract: Machine learning and, more specifically, deep learning algorithms have seen remarkable growth in their popularity and usefulness in the last years. This is arguably due to three main factors: powerful computers, new techniques to train deeper networks and larger datasets. Although the first two are readily available in modern computers and ML libraries, the last one remains a challenge for many do… ▽ More Machine learning and, more specifically, deep learning algorithms have seen remarkable growth in their popularity and usefulness in the last years. This is arguably due to three main factors: powerful computers, new techniques to train deeper networks and larger datasets. Although the first two are readily available in modern computers and ML libraries, the last one remains a challenge for many domains. It is a fact that big data is a reality in almost all fields nowadays, and geosciences are not an exception. However, to achieve the success of general-purpose applications such as ImageNet - for which there are +14 million labeled images for 1000 target classes - we not only need more data, we need more high-quality labeled data. When it comes to the Oil&Gas industry, confidentiality issues hamper even more the sharing of datasets. In this work, we present the Netherlands interpretation dataset, a contribution to the development of machine learning in seismic interpretation. The Netherlands F3 dataset acquisition was carried out in the North Sea, Netherlands offshore. The data is publicly available and contains pos-stack data, 8 horizons and well logs of 4 wells. For the purposes of our machine learning tasks, the original dataset was reinterpreted, generating 9 horizons separating different seismic facies intervals. The interpreted horizons were used to generate approximatelly 190,000 labeled images for inlines and crosslines. Finally, we present two deep learning applications in which the proposed dataset was employed and produced compelling results. △ Less

Submitted 26 March, 2019; originally announced April 2019.

Comments: 5 pages, 5 figures

arXiv:1811.00974 [pdf, other]

Neural Likelihoods via Cumulative Distribution Functions

Authors: Pawel Chilinski, Ricardo Silva

Abstract: We leverage neural networks as universal approximators of monotonic functions to build a parameterization of conditional cumulative distribution functions (CDFs). By the application of automatic differentiation with respect to response variables and then to parameters of this CDF representation, we are able to build black box CDF and density estimators. A suite of families is introduced as alterna… ▽ More We leverage neural networks as universal approximators of monotonic functions to build a parameterization of conditional cumulative distribution functions (CDFs). By the application of automatic differentiation with respect to response variables and then to parameters of this CDF representation, we are able to build black box CDF and density estimators. A suite of families is introduced as alternative constructions for the multivariate case. At one extreme, the simplest construction is a competitive density estimator against state-of-the-art deep learning methods, although it does not provide an easily computable representation of multivariate CDFs. At the other extreme, we have a flexible construction from which multivariate CDF evaluations and marginalizations can be obtained by a simple forward pass in a deep neural net, but where the computation of the likelihood scales exponentially with dimensionality. Alternatives in between the extremes are discussed. We evaluate the different representations empirically on a variety of tasks involving tail area probabilities, tail dependence and (partial) density estimation. △ Less

Submitted 6 June, 2020; v1 submitted 2 November, 2018; originally announced November 2018.

Comments: 10 pages

arXiv:1809.04379 [pdf, other]

Bayesian Semi-supervised Learning with Graph Gaussian Processes

Authors: Yin Cheng Ng, Nicolo Colombo, Ricardo Silva

Abstract: We propose a data-efficient Gaussian process-based Bayesian approach to the semi-supervised learning problem on graphs. The proposed model shows extremely competitive performance when compared to the state-of-the-art graph neural networks on semi-supervised learning benchmark experiments, and outperforms the neural networks in active learning experiments where labels are scarce. Furthermore, the m… ▽ More We propose a data-efficient Gaussian process-based Bayesian approach to the semi-supervised learning problem on graphs. The proposed model shows extremely competitive performance when compared to the state-of-the-art graph neural networks on semi-supervised learning benchmark experiments, and outperforms the neural networks in active learning experiments where labels are scarce. Furthermore, the model does not require a validation data set for early stop** to control over-fitting. Our model can be viewed as an instance of empirical distribution regression weighted locally by network connectivity. We further motivate the intuitive construction of the model with a Bayesian linear model interpretation where the node features are filtered by an operator related to the graph Laplacian. The method can be easily implemented by adapting off-the-shelf scalable variational inference algorithms for Gaussian processes. △ Less

Submitted 12 October, 2018; v1 submitted 12 September, 2018; originally announced September 2018.

Comments: To appear in NIPS 2018 Fixed an error in Figure 2. The previous arxiv version contains two identical sub-figures

arXiv:1806.02380 [pdf, other]

Causal Interventions for Fairness

Authors: Matt J. Kusner, Chris Russell, Joshua R. Loftus, Ricardo Silva

Abstract: Most approaches in algorithmic fairness constrain machine learning methods so the resulting predictions satisfy one of several intuitive notions of fairness. While this may help private companies comply with non-discrimination laws or avoid negative publicity, we believe it is often too little, too late. By the time the training data is collected, individuals in disadvantaged groups have already s… ▽ More Most approaches in algorithmic fairness constrain machine learning methods so the resulting predictions satisfy one of several intuitive notions of fairness. While this may help private companies comply with non-discrimination laws or avoid negative publicity, we believe it is often too little, too late. By the time the training data is collected, individuals in disadvantaged groups have already suffered from discrimination and lost opportunities due to factors out of their control. In the present work we focus instead on interventions such as a new public policy, and in particular, how to maximize their positive effects while improving the fairness of the overall system. We use causal methods to model the effects of interventions, allowing for potential interference--each individual's outcome may depend on who else receives the intervention. We demonstrate this with an example of allocating a budget of teaching resources using a dataset of schools in New York City. △ Less

Submitted 6 June, 2018; originally announced June 2018.

arXiv:1805.01045 [pdf, other]

Alpha-Beta Divergence For Variational Inference

Authors: Jean-Baptiste Regli, Ricardo Silva

Abstract: This paper introduces a variational approximation framework using direct optimization of what is known as the {\it scale invariant Alpha-Beta divergence} (sAB divergence). This new objective encompasses most variational objectives that use the Kullback-Leibler, the R{é}nyi or the gamma divergences. It also gives access to objective functions never exploited before in the context of variational inf… ▽ More This paper introduces a variational approximation framework using direct optimization of what is known as the {\it scale invariant Alpha-Beta divergence} (sAB divergence). This new objective encompasses most variational objectives that use the Kullback-Leibler, the R{é}nyi or the gamma divergences. It also gives access to objective functions never exploited before in the context of variational inference. This is achieved via two easy to interpret control parameters, which allow for a smooth interpolation over the divergence space while trading-off properties such as mass-covering of a target distribution and robustness to outliers in the data. Furthermore, the sAB variational objective can be optimized directly by repurposing existing methods for Monte Carlo computation of complex variational objectives, leading to estimates of the divergence instead of variational lower bounds. We show the advantages of this objective on Bayesian models for regression problems. △ Less

Submitted 20 May, 2018; v1 submitted 2 May, 2018; originally announced May 2018.

arXiv:1802.08664 [pdf, other]

Modeling goal chances in soccer: a Bayesian inference approach

Authors: Gavin A. Whitaker, Ricardo Silva, Daniel Edwards

Abstract: We consider the task of determining the number of chances a soccer team creates, along with the composite nature of each chance-the players involved and the locations on the pitch of the assist and the chance. We propose an interpretable Bayesian inference approach and implement a Poisson model to capture chance occurrences, from which we infer team abilities. We then use a Gaussian mixture model… ▽ More We consider the task of determining the number of chances a soccer team creates, along with the composite nature of each chance-the players involved and the locations on the pitch of the assist and the chance. We propose an interpretable Bayesian inference approach and implement a Poisson model to capture chance occurrences, from which we infer team abilities. We then use a Gaussian mixture model to capture the areas on the pitch a player makes an assist/takes a chance. This approach allows the visualization of differences between players in the way they approach attacking play (making assists/taking chances). We apply the resulting scheme to the 2016/2017 English Premier League, capturing team abilities to create chances, before highlighting key areas where players have most impact. △ Less

Submitted 23 February, 2018; originally announced February 2018.

Comments: 19 pages, 12 figures

arXiv:1802.08114 [pdf, other]

Two-way sparsity for time-varying networks, with applications in genomics

Authors: Thomas E. Bartlett, Ioannis Kosmidis, Ricardo Silva

Abstract: We propose a novel way of modelling time-varying networks, by inducing two-way sparsity on local models of node connectivity. This two-way sparsity separately promotes sparsity across time and sparsity across variables (within time). Separation of these two types of sparsity is achieved through a novel prior structure, which draws on ideas from the Bayesian lasso and from copula modelling. We prov… ▽ More We propose a novel way of modelling time-varying networks, by inducing two-way sparsity on local models of node connectivity. This two-way sparsity separately promotes sparsity across time and sparsity across variables (within time). Separation of these two types of sparsity is achieved through a novel prior structure, which draws on ideas from the Bayesian lasso and from copula modelling. We provide an efficient implementation of the proposed model via a Gibbs sampler, and we apply the model to data from neural development. In doing so, we demonstrate that the proposed model is able to identify changes in genomic network structure that match current biological knowledge. Such changes in genomic network structure can then be used by neuro-biologists to identify potential targets for further experimental investigation. △ Less

Submitted 18 November, 2020; v1 submitted 22 February, 2018; originally announced February 2018.

arXiv:1710.04008 [pdf, other]

A Dynamic Edge Exchangeable Model for Sparse Temporal Networks

Authors: Yin Cheng Ng, Ricardo Silva

Abstract: We propose a dynamic edge exchangeable network model that can capture sparse connections observed in real temporal networks, in contrast to existing models which are dense. The model achieved superior link prediction accuracy on multiple data sets when compared to a dynamic variant of the blockmodel, and is able to extract interpretable time-varying community structures from the data. In addition… ▽ More We propose a dynamic edge exchangeable network model that can capture sparse connections observed in real temporal networks, in contrast to existing models which are dense. The model achieved superior link prediction accuracy on multiple data sets when compared to a dynamic variant of the blockmodel, and is able to extract interpretable time-varying community structures from the data. In addition to sparsity, the model accounts for the effect of social influence on vertices' future behaviours. Compared to the dynamic blockmodels, our model has a smaller latent space. The compact latent space requires a smaller number of parameters to be estimated in variational inference and results in a computationally friendly inference algorithm. △ Less

Submitted 11 October, 2017; originally announced October 2017.

arXiv:1710.02520 [pdf, other]

doi 10.1007/s12539-017-0273-0

Comparing reverse complementary genomic words based on their distance distributions and frequencies

Authors: Ana Helena Tavares, Jakob Raymaekers, Peter Rousseeuw, Raquel M. Silva, Carlos A. C. Bastos, Armando Pinho, Paula Brito, Vera Afreixo

Abstract: In this work we study reverse complementary genomic word pairs in the human DNA, by comparing both the distance distribution and the frequency of a word to those of its reverse complement. Several measures of dissimilarity between distance distributions are considered, and it is found that the peak dissimilarity works best in this setting. We report the existence of reverse complementary word pair… ▽ More In this work we study reverse complementary genomic word pairs in the human DNA, by comparing both the distance distribution and the frequency of a word to those of its reverse complement. Several measures of dissimilarity between distance distributions are considered, and it is found that the peak dissimilarity works best in this setting. We report the existence of reverse complementary word pairs with very dissimilar distance distributions, as well as word pairs with very similar distance distributions even when both distributions are irregular and contain strong peaks. The association between distribution dissimilarity and frequency discrepancy is explored also, and it is speculated that symmetric pairs combining low and high values of each measure may uncover features of interest. Taken together, our results suggest that some asymmetries in the human genome go far beyond Chargaff's rules. This study uses both the complete human genome and its repeat-masked version. △ Less

Submitted 6 October, 2017; originally announced October 2017.

Comments: Post-print of a paper accepted to publication in "Interdisciplinary Sciences: Computational Life Sciences" (ISSN: 1913-2751, ESSN: 1867-1462)

MSC Class: 62P10

Journal ref: Interdisciplinary Sciences: Computational Life Sciences, 2018, Vol. 10, 1-11

arXiv:1710.00001 [pdf, other]

A Bayesian inference approach for determining player abilities in football

Authors: Gavin A. Whitaker, Ricardo Silva, Daniel Edwards, Ioannis Kosmidis

Abstract: We consider the task of determining a football player's ability for a given event type, for example, scoring a goal. We propose an interpretable Bayesian model which is fit using variational inference methods. We implement a Poisson model to capture occurrences of event types, from which we infer player abilities. Our approach also allows the visualisation of differences between players, for a spe… ▽ More We consider the task of determining a football player's ability for a given event type, for example, scoring a goal. We propose an interpretable Bayesian model which is fit using variational inference methods. We implement a Poisson model to capture occurrences of event types, from which we infer player abilities. Our approach also allows the visualisation of differences between players, for a specific ability, through the marginal posterior variational densities. We then use these inferred player abilities to extend the Bayesian hierarchical model of Baio and Blangiardo (2010) which captures a team's scoring rate (the rate at which they score goals). We apply the resulting scheme to the English Premier League, capturing player abilities over the 2013/2014 season, before using output from the hierarchical model to predict whether over or under 2.5 goals will be scored in a given game in the 2014/2015 season. This validates our model as a way of providing insights into team formation and the individual success of sports teams. △ Less

Submitted 23 September, 2020; v1 submitted 25 September, 2017; originally announced October 2017.

Comments: 31 pages, 14 figures

arXiv:1703.06856 [pdf, other]

Counterfactual Fairness

Authors: Matt J. Kusner, Joshua R. Loftus, Chris Russell, Ricardo Silva

Abstract: Machine learning can impact people with legal or ethical consequences when it is used to automate decisions in areas such as insurance, lending, hiring, and predictive policing. In many of these scenarios, previous decisions have been made that are unfairly biased against certain subpopulations, for example those of a particular race, gender, or sexual orientation. Since this past data may be bias… ▽ More Machine learning can impact people with legal or ethical consequences when it is used to automate decisions in areas such as insurance, lending, hiring, and predictive policing. In many of these scenarios, previous decisions have been made that are unfairly biased against certain subpopulations, for example those of a particular race, gender, or sexual orientation. Since this past data may be biased, machine learning predictors must account for this to avoid perpetuating or creating discriminatory practices. In this paper, we develop a framework for modeling fairness using tools from causal inference. Our definition of counterfactual fairness captures the intuition that a decision is fair towards an individual if it is the same in (a) the actual world and (b) a counterfactual world where the individual belonged to a different demographic group. We demonstrate our framework on a real-world problem of fair prediction of success in law school. △ Less

Submitted 8 March, 2018; v1 submitted 20 March, 2017; originally announced March 2017.

Showing 1–50 of 75 results for author: Silva, R