-
Fair Generalized Linear Models with a Convex Penalty
Authors:
Hyungrok Do,
Preston Putzel,
Axel Martin,
Padhraic Smyth,
Judy Zhong
Abstract:
Despite recent advances in algorithmic fairness, methodologies for achieving fairness with generalized linear models (GLMs) have yet to be explored in general, despite GLMs being widely used in practice. In this paper we introduce two fairness criteria for GLMs based on equalizing expected outcomes or log-likelihoods. We prove that for GLMs both criteria can be achieved via a convex penalty term b…
▽ More
Despite recent advances in algorithmic fairness, methodologies for achieving fairness with generalized linear models (GLMs) have yet to be explored in general, despite GLMs being widely used in practice. In this paper we introduce two fairness criteria for GLMs based on equalizing expected outcomes or log-likelihoods. We prove that for GLMs both criteria can be achieved via a convex penalty term based solely on the linear components of the GLM, thus permitting efficient optimization. We also derive theoretical properties for the resulting fair GLM estimator. To empirically demonstrate the efficacy of the proposed fair GLM, we compare it with other well-known fair prediction methods on an extensive set of benchmark datasets for binary classification and regression. In addition, we demonstrate that the fair GLM can generate fair predictions for a range of response variables, other than binary and continuous outcomes.
△ Less
Submitted 17 June, 2022;
originally announced June 2022.
-
Bayesian additive regression trees for probabilistic programming
Authors:
Miriana Quiroga,
Pablo G Garay,
Juan M. Alonso,
Juan Martin Loyola,
Osvaldo A Martin
Abstract:
Bayesian additive regression trees (BART) is a non-parametric method to approximate functions. It is a black-box method based on the sum of many trees where priors are used to regularize inference, mainly by restricting trees' learning capacity so that no individual tree is able to explain the data, but rather the sum of trees. We discuss BART in the context of probabilistic programming languages…
▽ More
Bayesian additive regression trees (BART) is a non-parametric method to approximate functions. It is a black-box method based on the sum of many trees where priors are used to regularize inference, mainly by restricting trees' learning capacity so that no individual tree is able to explain the data, but rather the sum of trees. We discuss BART in the context of probabilistic programming languages (PPL), i.e., we present BART as a primitive that can be used as a component of a probabilistic model rather than as a standalone model. Specifically, we introduce the Python library PyMC-BART, which works by extending PyMC, a library for probabilistic programming. We showcase a few examples of models that can be built using PyMC-BART, discuss recommendations for the selection of hyperparameters, and finally, we close with limitations of our implementation and future directions for improvement.
△ Less
Submitted 15 August, 2023; v1 submitted 7 June, 2022;
originally announced June 2022.
-
Prior knowledge elicitation: The past, present, and future
Authors:
Petrus Mikkola,
Osvaldo A. Martin,
Suyog Chandramouli,
Marcelo Hartmann,
Oriol Abril Pla,
Owen Thomas,
Henri Pesonen,
Jukka Corander,
Aki Vehtari,
Samuel Kaski,
Paul-Christian Bürkner,
Arto Klami
Abstract:
Specification of the prior distribution for a Bayesian model is a central part of the Bayesian workflow for data analysis, but it is often difficult even for statistical experts. In principle, prior elicitation transforms domain knowledge of various kinds into well-defined prior distributions, and offers a solution to the prior specification problem. In practice, however, we are still fairly far f…
▽ More
Specification of the prior distribution for a Bayesian model is a central part of the Bayesian workflow for data analysis, but it is often difficult even for statistical experts. In principle, prior elicitation transforms domain knowledge of various kinds into well-defined prior distributions, and offers a solution to the prior specification problem. In practice, however, we are still fairly far from having usable prior elicitation tools that could significantly influence the way we build probabilistic models in academia and industry. We lack elicitation methods that integrate well into the Bayesian workflow and perform elicitation efficiently in terms of costs of time and effort. We even lack a comprehensive theoretical framework for understanding different facets of the prior elicitation problem.
Why are we not widely using prior elicitation? We analyse the state of the art by identifying a range of key aspects of prior knowledge elicitation, from properties of the modelling task and the nature of the priors to the form of interaction with the expert. The existing prior elicitation literature is reviewed and categorized in these terms. This allows recognizing under-studied directions in prior elicitation research, finally leading to a proposal of several new avenues to improve prior elicitation methodology.
△ Less
Submitted 9 May, 2023; v1 submitted 1 December, 2021;
originally announced December 2021.
-
Many Objective Bayesian Optimization
Authors:
Lucia Asencio Martín,
Eduardo C. Garrido-Merchán
Abstract:
Some real problems require the evaluation of expensive and noisy objective functions. Moreover, the analytical expression of these objective functions may be unknown. These functions are known as black-boxes, for example, estimating the generalization error of a machine learning algorithm and computing its prediction time in terms of its hyper-parameters. Multi-objective Bayesian optimization (MOB…
▽ More
Some real problems require the evaluation of expensive and noisy objective functions. Moreover, the analytical expression of these objective functions may be unknown. These functions are known as black-boxes, for example, estimating the generalization error of a machine learning algorithm and computing its prediction time in terms of its hyper-parameters. Multi-objective Bayesian optimization (MOBO) is a set of methods that has been successfully applied for the simultaneous optimization of black-boxes. Concretely, BO methods rely on a probabilistic model of the objective functions, typically a Gaussian process. This model generates a predictive distribution of the objectives. However, MOBO methods have problems when the number of objectives in a multi-objective optimization problem are 3 or more, which is the many objective setting. In particular, the BO process is more costly as more objectives are considered, computing the quality of the solution via the hyper-volume is also more costly and, most importantly, we have to evaluate every objective function, wasting expensive computational, economic or other resources. However, as more objectives are involved in the optimization problem, it is highly probable that some of them are redundant and not add information about the problem solution. A measure that represents how similar are GP predictive distributions is proposed. We also propose a many objective Bayesian optimization algorithm that uses this metric to determine whether two objectives are redundant. The algorithm stops evaluating one of them if the similarity is found, saving resources and not hurting the performance of the multi-objective BO algorithm. We show empirical evidence in a set of toy, synthetic, benchmark and real experiments that GPs predictive distributions of the effectiveness of the metric and the algorithm.
△ Less
Submitted 8 July, 2021;
originally announced July 2021.
-
Bambi: A simple interface for fitting Bayesian linear models in Python
Authors:
Tomás Capretto,
Camen Piho,
Ravin Kumar,
Jacob Westfall,
Tal Yarkoni,
Osvaldo A. Martin
Abstract:
The popularity of Bayesian statistical methods has increased dramatically in recent years across many research areas and industrial applications. This is the result of a variety of methodological advances with faster and cheaper hardware as well as the development of new software tools. Here we introduce an open source Python package named Bambi (BAyesian Model Building Interface) that is built on…
▽ More
The popularity of Bayesian statistical methods has increased dramatically in recent years across many research areas and industrial applications. This is the result of a variety of methodological advances with faster and cheaper hardware as well as the development of new software tools. Here we introduce an open source Python package named Bambi (BAyesian Model Building Interface) that is built on top of the PyMC probabilistic programming framework and the ArviZ package for exploratory analysis of Bayesian models. Bambi makes it easy to specify complex generalized linear hierarchical models using a formula notation similar to those found in R. We demonstrate Bambi's versatility and ease of use with a few examples spanning a range of common statistical models including multiple regression, logistic regression, and mixed-effects modeling with crossed group specific effects. Additionally we discuss how automatic priors are constructed. Finally, we conclude with a discussion of our plans for the future development of Bambi.
△ Less
Submitted 11 January, 2022; v1 submitted 19 December, 2020;
originally announced December 2020.
-
Out-Of-Bag Anomaly Detection
Authors:
Egor Klevak,
Sangdi Lin,
Andy Martin,
Ondrej Linda,
Eric Ringger
Abstract:
Data anomalies are ubiquitous in real world datasets, and can have an adverse impact on machine learning (ML) systems, such as automated home valuation. Detecting anomalies could make ML applications more responsible and trustworthy. However, the lack of labels for anomalies and the complex nature of real-world datasets make anomaly detection a challenging unsupervised learning problem. In this pa…
▽ More
Data anomalies are ubiquitous in real world datasets, and can have an adverse impact on machine learning (ML) systems, such as automated home valuation. Detecting anomalies could make ML applications more responsible and trustworthy. However, the lack of labels for anomalies and the complex nature of real-world datasets make anomaly detection a challenging unsupervised learning problem. In this paper, we propose a novel model-based anomaly detection method, that we call Out-of- Bag anomaly detection, which handles multi-dimensional datasets consisting of numerical and categorical features. The proposed method decomposes the unsupervised problem into the training of a set of ensemble models. Out-of-Bag estimates are leveraged to derive an effective measure for anomaly detection. We not only demonstrate the state-of-the-art performance of our method through comprehensive experiments on benchmark datasets, but also show our model can improve the accuracy and reliability of an ML system as data pre-processing step via a case study on home valuation.
△ Less
Submitted 20 September, 2020;
originally announced September 2020.
-
The Monte Carlo Transformer: a stochastic self-attention model for sequence prediction
Authors:
Alice Martin,
Charles Ollion,
Florian Strub,
Sylvain Le Corff,
Olivier Pietquin
Abstract:
This paper introduces the Sequential Monte Carlo Transformer, an original approach that naturally captures the observations distribution in a transformer architecture. The keys, queries, values and attention vectors of the network are considered as the unobserved stochastic states of its hidden structure. This generative model is such that at each time step the received observation is a random fun…
▽ More
This paper introduces the Sequential Monte Carlo Transformer, an original approach that naturally captures the observations distribution in a transformer architecture. The keys, queries, values and attention vectors of the network are considered as the unobserved stochastic states of its hidden structure. This generative model is such that at each time step the received observation is a random function of its past states in a given attention window. In this general state-space setting, we use Sequential Monte Carlo methods to approximate the posterior distributions of the states given the observations, and to estimate the gradient of the log-likelihood. We hence propose a generative model giving a predictive distribution, instead of a single-point estimate.
△ Less
Submitted 15 December, 2020; v1 submitted 15 July, 2020;
originally announced July 2020.
-
Backward importance sampling for online estimation of state space models
Authors:
Alice Martin,
Marie-Pierre Etienne,
Pierre Gloaguen,
Sylvain Le Corff,
Jimmy Olsson
Abstract:
This paper proposes a new Sequential Monte Carlo algorithm to perform online estimation in the context of state space models when either the transition density of the latent state or the conditional likelihood of an observation given a state is intractable. In this setting, obtaining low variance estimators of expectations under the posterior distributions of the unobserved states given the observ…
▽ More
This paper proposes a new Sequential Monte Carlo algorithm to perform online estimation in the context of state space models when either the transition density of the latent state or the conditional likelihood of an observation given a state is intractable. In this setting, obtaining low variance estimators of expectations under the posterior distributions of the unobserved states given the observations is a challenging task. Following recent theoretical results for pseudo-marginal sequential Monte Carlo smoothers, a pseudo-marginal backward importance sampling step is introduced to estimate such expectations. This new step allows to reduce very significantly the computational time of the existing numerical solutions based on an acceptance-rejection procedure for similar performance, and to broaden the class of eligible models for such methods. For instance, in the context of multivariate stochastic differential equations, the proposed algorithm makes use of unbiased estimates of the unknown transition densities under much weaker assumptions than standard alternatives. The performance of this estimator is assessed for high-dimensional discrete-time latent data models, for recursive maximum likelihood estimation in the context of partially observed diffusion process, and in the case of a bidimensional partially observed stochastic Lotka-Volterra model.
△ Less
Submitted 7 May, 2021; v1 submitted 13 February, 2020;
originally announced February 2020.
-
On Last-Layer Algorithms for Classification: Decoupling Representation from Uncertainty Estimation
Authors:
Nicolas Brosse,
Carlos Riquelme,
Alice Martin,
Sylvain Gelly,
Éric Moulines
Abstract:
Uncertainty quantification for deep learning is a challenging open problem. Bayesian statistics offer a mathematically grounded framework to reason about uncertainties; however, approximate posteriors for modern neural networks still require prohibitive computational costs. We propose a family of algorithms which split the classification task into two stages: representation learning and uncertaint…
▽ More
Uncertainty quantification for deep learning is a challenging open problem. Bayesian statistics offer a mathematically grounded framework to reason about uncertainties; however, approximate posteriors for modern neural networks still require prohibitive computational costs. We propose a family of algorithms which split the classification task into two stages: representation learning and uncertainty estimation. We compare four specific instances, where uncertainty estimation is performed via either an ensemble of Stochastic Gradient Descent or Stochastic Gradient Langevin Dynamics snapshots, an ensemble of bootstrapped logistic regressions, or via a number of Monte Carlo Dropout passes. We evaluate their performance in terms of \emph{selective} classification (risk-coverage), and their ability to detect out-of-distribution samples. Our experiments suggest there is limited value in adding multiple uncertainty layers to deep classifiers, and we observe that these simple methods strongly outperform a vanilla point-estimate SGD in some complex benchmarks like ImageNet.
△ Less
Submitted 22 January, 2020;
originally announced January 2020.
-
Approximate Bayesian inference of directed acyclic graphs in biology with flexible priors on edge states
Authors:
Evan A Martin,
Audrey Qiuyan Fu
Abstract:
Graphical models or networks describe the statistical dependence among multiple variables and are widely used in biology (e.g., gene regulatory networks). Under appropriate assumptions, directed edges may represent causal relationships. A key feature of a biological network is sparsity, defined by how likely an edge is present, of which we often have some knowledge. However, most existing Bayesian…
▽ More
Graphical models or networks describe the statistical dependence among multiple variables and are widely used in biology (e.g., gene regulatory networks). Under appropriate assumptions, directed edges may represent causal relationships. A key feature of a biological network is sparsity, defined by how likely an edge is present, of which we often have some knowledge. However, most existing Bayesian methods use priors for the entire graph, making it difficult to specify the level of sparsity. The few methods that use priors on edges estimate the two directions independently; the sum of the two probabilities can exceed 1. Here, we present baycn (BAYesian Causal Network), a novel approximate Bayesian method that represents a graph in terms of three states of edges: the two directions and edge absence, and specifies priors on these edge states. We design a pseudo Bayesian sampling algorithm for efficient inference. We apply baycn to two genomic problems: i) distinguishing direct and indirect target genes of genetic variants, using these variants as instrumental variables, and ii) inferring combinatorial binding of highly-correlated transcription factors in Drosophila. In both cases and in extensive simulations, our method demonstrates much improved accuracy over existing methods for the whole graph and for individual edges.
△ Less
Submitted 27 November, 2023; v1 submitted 23 September, 2019;
originally announced September 2019.
-
An Online-Learning Approach to Inverse Optimization
Authors:
Andreas Bärmann,
Alexander Martin,
Sebastian Pokutta,
Oskar Schneider
Abstract:
In this paper, we demonstrate how to learn the objective function of a decision-maker while only observing the problem input data and the decision-maker's corresponding decisions over multiple rounds. We present exact algorithms for this online version of inverse optimization which converge at a rate of $ \mathcal{O}(1/\sqrt{T}) $ in the number of observations~$T$ and compare their further propert…
▽ More
In this paper, we demonstrate how to learn the objective function of a decision-maker while only observing the problem input data and the decision-maker's corresponding decisions over multiple rounds. We present exact algorithms for this online version of inverse optimization which converge at a rate of $ \mathcal{O}(1/\sqrt{T}) $ in the number of observations~$T$ and compare their further properties. Especially, they all allow taking decisions which are essentially as good as those of the observed decision-maker already after relatively few iterations, but are suited best for different settings each. Our approach is based on online learning and works for linear objectives over arbitrary feasible sets for which we have a linear optimization oracle. As such, it generalizes previous approaches based on KKT-system decomposition and dualization. We also introduce several generalizations, such as the approximate learning of non-linear objective functions, dynamically changing as well as parameterized objectives and the case of suboptimal observed decisions. When applied to the stochastic offline case, our algorithms are able to give guarantees on the quality of the learned objectives in expectation. Finally, we show the effectiveness and possible applications of our methods in indicative computational experiments.
△ Less
Submitted 28 March, 2020; v1 submitted 30 October, 2018;
originally announced October 2018.
-
Grand Challenge: Real-time Destination and ETA Prediction for Maritime Traffic
Authors:
Oleh Bodunov,
Florian Schmidt,
André Martin,
Andrey Brito,
Christof Fetzer
Abstract:
In this paper, we present our approach for solving the DEBS Grand Challenge 2018. The challenge asks to provide a prediction for (i) a destination and the (ii) arrival time of ships in a streaming-fashion using Geo-spatial data in the maritime context. Novel aspects of our approach include the use of ensemble learning based on Random Forest, Gradient Boosting Decision Trees (GBDT), XGBoost Trees a…
▽ More
In this paper, we present our approach for solving the DEBS Grand Challenge 2018. The challenge asks to provide a prediction for (i) a destination and the (ii) arrival time of ships in a streaming-fashion using Geo-spatial data in the maritime context. Novel aspects of our approach include the use of ensemble learning based on Random Forest, Gradient Boosting Decision Trees (GBDT), XGBoost Trees and Extremely Randomized Trees (ERT) in order to provide a prediction for a destination while for the arrival time, we propose the use of Feed-forward Neural Networks. In our evaluation, we were able to achieve an accuracy of 97% for the port destination classification problem and 90% (in mins) for the ETA prediction.
△ Less
Submitted 12 October, 2018;
originally announced October 2018.
-
MRPC: An R package for accurate inference of causal graphs
Authors:
Md. Bahadur Badsha,
Evan A Martin,
Audrey Qiuyan Fu
Abstract:
We present MRPC, an R package that learns causal graphs with improved accuracy over existing packages, such as pcalg and bnlearn. Our algorithm builds on the powerful PC algorithm, the canonical algorithm in computer science for learning directed acyclic graphs. The improvement in accuracy results from online control of the false discovery rate (FDR) that reduces false positive edges, a more accur…
▽ More
We present MRPC, an R package that learns causal graphs with improved accuracy over existing packages, such as pcalg and bnlearn. Our algorithm builds on the powerful PC algorithm, the canonical algorithm in computer science for learning directed acyclic graphs. The improvement in accuracy results from online control of the false discovery rate (FDR) that reduces false positive edges, a more accurate approach to identifying v-structures (i.e., $T_1 \rightarrow T_2 \leftarrow T_3$), and robust estimation of the correlation matrix among nodes. For genomic data that contain genotypes and gene expression for each sample, MRPC incorporates the principle of Mendelian randomization to orient the edges. Our package can be applied to continuous and discrete data.
△ Less
Submitted 5 June, 2018;
originally announced June 2018.
-
Robust And Scalable Learning Of Complex Dataset Topologies Via Elpigraph
Authors:
Luca Albergante,
Evgeny M. Mirkes,
Huidong Chen,
Alexis Martin,
Louis Faure,
Emmanuel Barillot,
Luca Pinello,
Alexander N. Gorban,
Andrei Zinovyev
Abstract:
Large datasets represented by multidimensional data point clouds often possess non-trivial distributions with branching trajectories and excluded regions, with the recent single-cell transcriptomic studies of develo** embryo being notable examples. Reducing the complexity and producing compact and interpretable representations of such data remains a challenging task. Most of the existing computa…
▽ More
Large datasets represented by multidimensional data point clouds often possess non-trivial distributions with branching trajectories and excluded regions, with the recent single-cell transcriptomic studies of develo** embryo being notable examples. Reducing the complexity and producing compact and interpretable representations of such data remains a challenging task. Most of the existing computational methods are based on exploring the local data point neighbourhood relations, a step that can perform poorly in the case of multidimensional and noisy data. Here we present ElPiGraph, a scalable and robust method for approximation of datasets with complex structures which does not require computing the complete data distance matrix or the data point neighbourhood graph. This method is able to withstand high levels of noise and is capable of approximating complex topologies via principal graph ensembles that can be combined into a consensus principal graph. ElPiGraph deals efficiently with large and complex datasets in various fields from biology, where it can be used to infer gene dynamics from single-cell RNA-Seq, to astronomy, where it can be used to explore complex structures in the distribution of galaxies.
△ Less
Submitted 20 June, 2018; v1 submitted 20 April, 2018;
originally announced April 2018.
-
Dynamic time war** distance for message propagation classification in Twitter
Authors:
Siwar Jendoubi,
Arnaud Martin,
Ludovic Liétard,
Boutheina Ben Yaghlane,
Hend Ben Hadji
Abstract:
Social messages classification is a research domain that has attracted the attention of many researchers in these last years. Indeed, the social message is different from ordinary text because it has some special characteristics like its shortness. Then the development of new approaches for the processing of the social message is now essential to make its classification more efficient. In this pap…
▽ More
Social messages classification is a research domain that has attracted the attention of many researchers in these last years. Indeed, the social message is different from ordinary text because it has some special characteristics like its shortness. Then the development of new approaches for the processing of the social message is now essential to make its classification more efficient. In this paper, we are mainly interested in the classification of social messages based on their spreading on online social networks (OSN). We proposed a new distance metric based on the Dynamic Time War** distance and we use it with the probabilistic and the evidential k Nearest Neighbors (k-NN) classifiers to classify propagation networks (PrNets) of messages. The propagation network is a directed acyclic graph (DAG) that is used to record propagation traces of the message, the traversed links and their types. We tested the proposed metric with the chosen k-NN classifiers on real world propagation traces that were collected from Twitter social network and we got good classification accuracies.
△ Less
Submitted 26 January, 2017;
originally announced January 2017.
-
Evidential-EM Algorithm Applied to Progressively Censored Observations
Authors:
Kuang Zhou,
Arnaud Martin,
Quan Pan
Abstract:
Evidential-EM (E2M) algorithm is an effective approach for computing maximum likelihood estimations under finite mixture models, especially when there is uncertain information about data. In this paper we present an extension of the E2M method in a particular case of incom-plete data, where the loss of information is due to both mixture models and censored observations. The prior uncertain informa…
▽ More
Evidential-EM (E2M) algorithm is an effective approach for computing maximum likelihood estimations under finite mixture models, especially when there is uncertain information about data. In this paper we present an extension of the E2M method in a particular case of incom-plete data, where the loss of information is due to both mixture models and censored observations. The prior uncertain information is expressed by belief functions, while the pseudo-likelihood function is derived based on imprecise observations and prior knowledge. Then E2M method is evoked to maximize the generalized likelihood function to obtain the optimal estimation of parameters. Numerical examples show that the proposed method could effectively integrate the uncertain prior infor-mation with the current imprecise knowledge conveyed by the observed data.
△ Less
Submitted 7 January, 2015;
originally announced January 2015.
-
Kinetic modeling of opinion formation of peoples via multiple political parties
Authors:
Ryosuke Yano,
Arnaud Martin
Abstract:
We investigate the opinion formation among the peoples and multiple political parties using the one dimensional relativistic Boltzmann-Vlasov equation for multi-components. A political party is constituted of politicians. The opinion formation depends on self-thinkings of peoples and politicians, and the constraint of the political party over opinions of politicians, when we restrict ourselves to…
▽ More
We investigate the opinion formation among the peoples and multiple political parties using the one dimensional relativistic Boltzmann-Vlasov equation for multi-components. A political party is constituted of politicians. The opinion formation depends on self-thinkings of peoples and politicians, and the constraint of the political party over opinions of politicians, when we restrict ourselves to the conciliatory exchange of opinions between two individuals. In particular, shock like profiles are obtained in the distribution of opinions of peoples, when the self-thinking of politicians are absent at the binary exchange of opinions between two politicians in the same political party.
△ Less
Submitted 16 May, 2014; v1 submitted 19 March, 2014;
originally announced March 2014.
-
Opinion formation with upper and lower bounds
Authors:
Ryosuke Yano,
Arnaud Martin
Abstract:
We investigate the opinion formation with upper and lower bounds. We formulate the binary exchange of opinions between two individuals, and effects of the self-thinking and political party using the relativistic Boltzmann-Vlasov type equation with the randomly perturbed motion. The convergent form of the distribution function is determined by the balance between the cooling rate via the binary exc…
▽ More
We investigate the opinion formation with upper and lower bounds. We formulate the binary exchange of opinions between two individuals, and effects of the self-thinking and political party using the relativistic Boltzmann-Vlasov type equation with the randomly perturbed motion. The convergent form of the distribution function is determined by the balance between the cooling rate via the binary exchange of opinions between two individuals and the concentration of opinions by the political party, and heating rate via the self-thinking.
△ Less
Submitted 23 November, 2015; v1 submitted 28 February, 2014;
originally announced February 2014.