-
Sample Average Approximation for Black-Box VI
Authors:
Javier Burroni,
Justin Domke,
Daniel Sheldon
Abstract:
We present a novel approach for black-box VI that bypasses the difficulties of stochastic gradient ascent, including the task of selecting step-sizes. Our approach involves using a sequence of sample average approximation (SAA) problems. SAA approximates the solution of stochastic optimization problems by transforming them into deterministic ones. We use quasi-Newton methods and line search to sol…
▽ More
We present a novel approach for black-box VI that bypasses the difficulties of stochastic gradient ascent, including the task of selecting step-sizes. Our approach involves using a sequence of sample average approximation (SAA) problems. SAA approximates the solution of stochastic optimization problems by transforming them into deterministic ones. We use quasi-Newton methods and line search to solve each deterministic optimization problem and present a heuristic policy to automate hyperparameter selection. Our experiments show that our method simplifies the VI problem and achieves faster performance than existing methods.
△ Less
Submitted 17 May, 2023; v1 submitted 13 April, 2023;
originally announced April 2023.
-
U-Statistics for Importance-Weighted Variational Inference
Authors:
Javier Burroni,
Kenta Takatsu,
Justin Domke,
Daniel Sheldon
Abstract:
We propose the use of U-statistics to reduce variance for gradient estimation in importance-weighted variational inference. The key observation is that, given a base gradient estimator that requires $m > 1$ samples and a total of $n > m$ samples to be used for estimation, lower variance is achieved by averaging the base estimator on overlap** batches of size $m$ than disjoint batches, as current…
▽ More
We propose the use of U-statistics to reduce variance for gradient estimation in importance-weighted variational inference. The key observation is that, given a base gradient estimator that requires $m > 1$ samples and a total of $n > m$ samples to be used for estimation, lower variance is achieved by averaging the base estimator on overlap** batches of size $m$ than disjoint batches, as currently done. We use classical U-statistic theory to analyze the variance reduction, and propose novel approximations with theoretical guarantees to ensure computational efficiency. We find empirically that U-statistic variance reduction can lead to modest to significant improvements in inference performance on a range of models, with little computational cost.
△ Less
Submitted 27 February, 2023;
originally announced February 2023.
-
Automatically Marginalized MCMC in Probabilistic Programming
Authors:
**lin Lai,
Javier Burroni,
Hui Guan,
Daniel Sheldon
Abstract:
Hamiltonian Monte Carlo (HMC) is a powerful algorithm to sample latent variables from Bayesian models. The advent of probabilistic programming languages (PPLs) frees users from writing inference algorithms and lets users focus on modeling. However, many models are difficult for HMC to solve directly, and often require tricks like model reparameterization. We are motivated by the fact that many of…
▽ More
Hamiltonian Monte Carlo (HMC) is a powerful algorithm to sample latent variables from Bayesian models. The advent of probabilistic programming languages (PPLs) frees users from writing inference algorithms and lets users focus on modeling. However, many models are difficult for HMC to solve directly, and often require tricks like model reparameterization. We are motivated by the fact that many of those models could be simplified by marginalization. We propose to use automatic marginalization as part of the sampling process using HMC in a graphical model extracted from a PPL, which substantially improves sampling from real-world hierarchical models.
△ Less
Submitted 1 June, 2023; v1 submitted 1 February, 2023;
originally announced February 2023.
-
The Random Conditional Distribution for Higher-Order Probabilistic Inference
Authors:
Zenna Tavares,
Xin Zhang,
Edgar Minaysan,
Javier Burroni,
Rajesh Ranganath,
Armando Solar Lezama
Abstract:
The need to condition distributional properties such as expectation, variance, and entropy arises in algorithmic fairness, model simplification, robustness and many other areas. At face value however, distributional properties are not random variables, and hence conditioning them is a semantic error and type error in probabilistic programming languages. On the other hand, distributional properties…
▽ More
The need to condition distributional properties such as expectation, variance, and entropy arises in algorithmic fairness, model simplification, robustness and many other areas. At face value however, distributional properties are not random variables, and hence conditioning them is a semantic error and type error in probabilistic programming languages. On the other hand, distributional properties are contingent on other variables in the model, change in value when we observe more information, and hence in a precise sense are random variables too. In order to capture the uncertain over distributional properties, we introduce a probability construct -- the random conditional distribution -- and incorporate it into a probabilistic programming language Omega. A random conditional distribution is a higher-order random variable whose realizations are themselves conditional random variables. In Omega we extend distributional properties of random variables to random conditional distributions, such that for example while the expectation a real valued random variable is a real value, the expectation of a random conditional distribution is a distribution over expectations. As a consequence, it requires minimal syntax to encode inference problems over distributional properties, which so far have evaded treatment within probabilistic programming systems and probabilistic modeling in general. We demonstrate our approach case studies in algorithmic fairness and robustness.
△ Less
Submitted 25 March, 2019;
originally announced March 2019.
-
Soft Constraints for Inference with Declarative Knowledge
Authors:
Zenna Tavares,
Javier Burroni,
Edgar Minaysan,
Armando Solar Lezama,
Rajesh Ranganath
Abstract:
We develop a likelihood free inference procedure for conditioning a probabilistic model on a predicate. A predicate is a Boolean valued function which expresses a yes/no question about a domain. Our contribution, which we call predicate exchange, constructs a softened predicate which takes value in the unit interval [0, 1] as opposed to a simply true or false. Intuitively, 1 corresponds to true, a…
▽ More
We develop a likelihood free inference procedure for conditioning a probabilistic model on a predicate. A predicate is a Boolean valued function which expresses a yes/no question about a domain. Our contribution, which we call predicate exchange, constructs a softened predicate which takes value in the unit interval [0, 1] as opposed to a simply true or false. Intuitively, 1 corresponds to true, and a high value (such as 0.999) corresponds to "nearly true" as determined by a distance metric. We define Boolean algebra for soft predicates, such that they can be negated, conjoined and disjoined arbitrarily. A softened predicate can serve as a tractable proxy to a likelihood function for approximate posterior inference. However, to target exact inference, we temper the relaxation by a temperature parameter, and add a accept/reject phase use to replica exchange Markov Chain Mont Carlo, which exchanges states between a sequence of models conditioned on predicates at varying temperatures. We describe a lightweight implementation of predicate exchange that it provides a language independent layer that can be implemented on top of existingn modeling formalisms.
△ Less
Submitted 16 January, 2019;
originally announced January 2019.
-
Inference of Demographic Attributes based on Mobile Phone Usage Patterns and Social Network Topology
Authors:
Carlos Sarraute,
Jorge Brea,
Javier Burroni,
Pablo Blanc
Abstract:
Mobile phone usage provides a wealth of information, which can be used to better understand the demographic structure of a population. In this paper, we focus on the population of Mexican mobile phone users. We first present an observational study of mobile phone usage according to gender and age groups. We are able to detect significant differences in phone usage among different subgroups of the…
▽ More
Mobile phone usage provides a wealth of information, which can be used to better understand the demographic structure of a population. In this paper, we focus on the population of Mexican mobile phone users. We first present an observational study of mobile phone usage according to gender and age groups. We are able to detect significant differences in phone usage among different subgroups of the population. We then study the performance of different machine learning (ML) methods to predict demographic features (namely, age and gender) of unlabeled users by leveraging individual calling patterns, as well as the structure of the communication graph. We show how a specific implementation of a diffusion model, harnessing the graph structure, has significantly better performance over other node-based standard ML methods. We provide details of the methodology together with an analysis of the robustness of our results to changes in the model parameters. Furthermore, by carefully examining the topological relations of the training nodes (seed nodes) to the rest of the nodes in the network, we find topological metrics which have a direct influence on the performance of the algorithm.
△ Less
Submitted 9 January, 2019;
originally announced January 2019.
-
Compiling Stan to Generative Probabilistic Languages and Extension to Deep Probabilistic Programming
Authors:
Guillaume Baudart,
Javier Burroni,
Martin Hirzel,
Louis Mandel,
Avraham Shinnar
Abstract:
Stan is a probabilistic programming language that is popular in the statistics community, with a high-level syntax for expressing probabilistic models. Stan differs by nature from generative probabilistic programming languages like Church, Anglican, or Pyro. This paper presents a comprehensive compilation scheme to compile any Stan model to a generative language and proves its correctness. We use…
▽ More
Stan is a probabilistic programming language that is popular in the statistics community, with a high-level syntax for expressing probabilistic models. Stan differs by nature from generative probabilistic programming languages like Church, Anglican, or Pyro. This paper presents a comprehensive compilation scheme to compile any Stan model to a generative language and proves its correctness. We use our compilation scheme to build two new backends for the Stanc3 compiler targeting Pyro and NumPyro. Experimental results show that the NumPyro backend yields a 2.3x speedup compared to Stan in geometric mean over 26 benchmarks. Building on Pyro we extend Stan with support for explicit variational inference guides and deep probabilistic models. That way, users familiar with Stan get access to new features without having to learn a fundamentally new language.
△ Less
Submitted 11 April, 2021; v1 submitted 30 September, 2018;
originally announced October 2018.
-
Inference of Users Demographic Attributes based on Homophily in Communication Networks
Authors:
Jorge Brea,
Javier Burroni,
Carlos Sarraute
Abstract:
Over the past decade, mobile phones have become prevalent in all parts of the world, across all demographic backgrounds. Mobile phones are used by men and women across a wide age range in both developed and develo** countries. Consequently, they have become one of the most important mechanisms for social interaction within a population, making them an increasingly important source of information…
▽ More
Over the past decade, mobile phones have become prevalent in all parts of the world, across all demographic backgrounds. Mobile phones are used by men and women across a wide age range in both developed and develo** countries. Consequently, they have become one of the most important mechanisms for social interaction within a population, making them an increasingly important source of information to understand human demographics and human behaviour.
In this work we combine two sources of information: communication logs from a major mobile operator in a Latin American country, and information on the demographics of a subset of the users population. This allows us to perform an observational study of mobile phone usage, differentiated by age groups categories. This study is interesting in its own right, since it provides knowledge on the structure and demographics of the mobile phone market in the studied country.
We then tackle the problem of inferring the age group for all users in the network. We present here an exclusively graph-based inference method relying solely on the topological structure of the mobile network, together with a topological analysis of the performance of the algorithm. The equations for our algorithm can be described as a diffusion process with two added properties: (i) memory of its initial state, and (ii) the information is propagated as a probability vector for each node attribute (instead of the value of the attribute itself). Our algorithm can successfully infer different age groups within the network population given known values for a subset of nodes (seed nodes). Most interestingly, we show that by carefully analysing the topological relationships between correctly predicted nodes and the seed nodes, we can characterize particular subsets of nodes for which our inference method has significantly higher accuracy.
△ Less
Submitted 1 August, 2018;
originally announced August 2018.
-
Social Events in a Time-Varying Mobile Phone Graph
Authors:
Carlos Sarraute,
Jorge Brea,
Javier Burroni,
Klaus Wehmuth,
Artur Ziviani,
J. I. Alvarez-Hamelin
Abstract:
The large-scale study of human mobility has been significantly enhanced over the last decade by the massive use of mobile phones in urban populations. Studying the activity of mobile phones allows us, not only to infer social networks between individuals, but also to observe the movements of these individuals in space and time. In this work, we investigate how these two related sources of informat…
▽ More
The large-scale study of human mobility has been significantly enhanced over the last decade by the massive use of mobile phones in urban populations. Studying the activity of mobile phones allows us, not only to infer social networks between individuals, but also to observe the movements of these individuals in space and time. In this work, we investigate how these two related sources of information can be integrated within the context of detecting and analyzing large social events. We show that large social events can be characterized not only by an anomalous increase in activity of the antennas in the neighborhood of the event, but also by an increase in social relationships of the attendants present in the event. Moreover, having detected a large social event via increased antenna activity, we can use the network connections to infer whether an unobserved user was present at the event. More precisely, we address the following three challenges: (i) automatically detecting large social events via increased antenna activity; (ii) characterizing the social cohesion of the detected event; and (iii) analyzing the feasibility of inferring whether unobserved users were in the event.
△ Less
Submitted 19 June, 2017;
originally announced June 2017.
-
Harnessing Mobile Phone Social Network Topology to Infer Users Demographic Attributes
Authors:
Jorge Brea,
Javier Burroni,
Martin Minnoni,
Carlos Sarraute
Abstract:
We study the structure of the social graph of mobile phone users in the country of Mexico, with a focus on demographic attributes of the users (more specifically the users' age). We examine assortativity patterns in the graph, and observe a strong age homophily in the communications preferences. We propose a graph based algorithm for the prediction of the age of mobile phone users. The algorithm e…
▽ More
We study the structure of the social graph of mobile phone users in the country of Mexico, with a focus on demographic attributes of the users (more specifically the users' age). We examine assortativity patterns in the graph, and observe a strong age homophily in the communications preferences. We propose a graph based algorithm for the prediction of the age of mobile phone users. The algorithm exploits the topology of the mobile phone network, together with a subset of known users ages (seeds), to infer the age of remaining users. We provide the details of the methodology, and show experimental results on a network GT with more than 70 million users. By carefully examining the topological relations of the seeds to the rest of the nodes in GT, we find topological metrics which have a direct influence on the performance of the algorithm. In particular we characterize subsets of users for which the accuracy of the algorithm is 62% when predicting between 4 age categories (whereas a pure random guess would yield an accuracy of 25%). We also show that we can use the probabilistic information computed by the algorithm to further increase its inference power to 72% on a significant subset of users.
△ Less
Submitted 23 November, 2015;
originally announced November 2015.
-
A Study of Age and Gender seen through Mobile Phone Usage Patterns in Mexico
Authors:
Carlos Sarraute,
Pablo Blanc,
Javier Burroni
Abstract:
Mobile phone usage provides a wealth of information, which can be used to better understand the demographic structure of a population. In this paper we focus on the population of Mexican mobile phone users. Our first contribution is an observational study of mobile phone usage according to gender and age groups. We were able to detect significant differences in phone usage among different subgroup…
▽ More
Mobile phone usage provides a wealth of information, which can be used to better understand the demographic structure of a population. In this paper we focus on the population of Mexican mobile phone users. Our first contribution is an observational study of mobile phone usage according to gender and age groups. We were able to detect significant differences in phone usage among different subgroups of the population. Our second contribution is to provide a novel methodology to predict demographic features (namely age and gender) of unlabeled users by leveraging individual calling patterns, as well as the structure of the communication graph. We provide details of the methodology and show experimental results on a real world dataset that involves millions of users.
△ Less
Submitted 20 November, 2015;
originally announced November 2015.
-
Outrepasser les limites des techniques classiques de Prise d'Empreintes grace aux Reseaux de Neurones
Authors:
Javier Burroni,
Carlos Sarraute
Abstract:
We present an application of Artificial Intelligence techniques to the field of Information Security. The problem of remote Operating System (OS) Detection, also called OS Fingerprinting, is a crucial step of the penetration testing process, since the attacker (hacker or security professional) needs to know the OS of the target host in order to choose the exploits that he will use. OS Detection is…
▽ More
We present an application of Artificial Intelligence techniques to the field of Information Security. The problem of remote Operating System (OS) Detection, also called OS Fingerprinting, is a crucial step of the penetration testing process, since the attacker (hacker or security professional) needs to know the OS of the target host in order to choose the exploits that he will use. OS Detection is accomplished by passively sniffing network packets and actively sending test packets to the target host, to study specific variations in the host responses revealing information about its operating system.
The first fingerprinting implementations were based on the analysis of differences between TCP/IP stack implementations. The next generation focused the analysis on application layer data such as the DCE RPC endpoint information. Even though more information was analyzed, some variation of the "best fit" algorithm was still used to interpret this new information. Our new approach involves an analysis of the composition of the information collected during the OS identification process to identify key elements and their relations. To implement this approach, we have developed tools using Neural Networks and techniques from the field of Statistics. These tools have been successfully integrated in a commercial software (Core Impact).
△ Less
Submitted 14 June, 2010;
originally announced June 2010.
-
Using Neural Networks to improve classical Operating System Fingerprinting techniques
Authors:
Carlos Sarraute,
Javier Burroni
Abstract:
We present remote Operating System detection as an inference problem: given a set of observations (the target host responses to a set of tests), we want to infer the OS type which most probably generated these observations. Classical techniques used to perform this analysis present several limitations. To improve the analysis, we have developed tools using neural networks and Statistics tools. We…
▽ More
We present remote Operating System detection as an inference problem: given a set of observations (the target host responses to a set of tests), we want to infer the OS type which most probably generated these observations. Classical techniques used to perform this analysis present several limitations. To improve the analysis, we have developed tools using neural networks and Statistics tools. We present two working modules: one which uses DCE-RPC endpoints to distinguish Windows versions, and another which uses Nmap signatures to distinguish different version of Windows, Linux, Solaris, OpenBSD, FreeBSD and NetBSD systems. We explain the details of the topology and inner workings of the neural networks used, and the fine tuning of their parameters. Finally we show positive experimental results.
△ Less
Submitted 9 June, 2010;
originally announced June 2010.