Skip to main content

Showing 1–20 of 20 results for author: Airoldi, M

Searching in archive cs. Search in all archives.
.
  1. arXiv:2406.16527  [pdf

    cs.CL cs.CY cs.DL cs.LG

    SyROCCo: Enhancing Systematic Reviews using Machine Learning

    Authors: Zheng Fang, Miguel Arana-Catania, Felix-Anselm van Lier, Juliana Outes Velarde, Harry Bregazzi, Mara Airoldi, Eleanor Carter, Rob Procter

    Abstract: The sheer number of research outputs published every year makes systematic reviewing increasingly time- and resource-intensive. This paper explores the use of machine learning techniques to help navigate the systematic review process. ML has previously been used to reliably 'screen' articles for review - that is, identify relevant articles based on reviewers' inclusion criteria. The application of… ▽ More

    Submitted 24 June, 2024; originally announced June 2024.

    Comments: 28 pages, 5 figures. To appear in Data & Policy journal

  2. Estimating Total Treatment Effect in Randomized Experiments with Unknown Network Structure

    Authors: Christina Lee Yu, Edoardo M Airoldi, Christian Borgs, Jennifer T Chayes

    Abstract: Randomized experiments are widely used to estimate the causal effects of a proposed treatment in many areas of science, from medicine and healthcare to the physical and biological sciences, from the social sciences to engineering, to public policy and to the technology industry at large. Here, we consider situations where classical methods for estimating the total treatment effect on a target popu… ▽ More

    Submitted 24 September, 2022; v1 submitted 25 May, 2022; originally announced May 2022.

  3. arXiv:1909.07578  [pdf, other

    stat.ML cs.LG cs.SI physics.data-an q-bio.MN

    Stacking Models for Nearly Optimal Link Prediction in Complex Networks

    Authors: Amir Ghasemian, Homa Hosseinmardi, Aram Galstyan, Edoardo M. Airoldi, Aaron Clauset

    Abstract: Most real-world networks are incompletely observed. Algorithms that can accurately predict which links are missing can dramatically speedup the collection of network data and improve the validity of network models. Many algorithms now exist for predicting missing links, given a partially observed network, but it has remained unknown whether a single best predictor exists, how link predictability v… ▽ More

    Submitted 17 September, 2019; originally announced September 2019.

    Comments: 30 pages, 9 figures, 22 tables

    Journal ref: Proc. Natl. Acad. Sci. USA 117(38), 23393-23400 (2020)

  4. arXiv:1507.00803  [pdf, other

    stat.ME cs.SI physics.soc-ph stat.ML

    Model-assisted design of experiments in the presence of network correlated outcomes

    Authors: Guillaume W. Basse, Edoardo M. Airoldi

    Abstract: We consider the problem of how to assign treatment in a randomized experiment, in which the correlation among the outcomes is informed by a network available pre-intervention. Working within the potential outcome causal framework, we develop a class of models that posit such a correlation structure among the outcomes. Then we leverage these models to develop restricted randomization strategies for… ▽ More

    Submitted 18 May, 2017; v1 submitted 2 July, 2015; originally announced July 2015.

    Comments: 56 pages, 6 figures

  5. arXiv:1506.03159  [pdf, other

    stat.ML cs.LG stat.CO stat.ME

    Copula variational inference

    Authors: Dustin Tran, David M. Blei, Edoardo M. Airoldi

    Abstract: We develop a general variational inference method that preserves dependency among the latent variables. Our method uses copulas to augment the families of distributions used in mean-field and structured approximations. Copulas model the dependency that is not captured by the original variational distribution, and thus the augmented variational family guarantees better approximations to the posteri… ▽ More

    Submitted 31 October, 2015; v1 submitted 10 June, 2015; originally announced June 2015.

    Comments: Appears in Neural Information Processing Systems, 2015

  6. arXiv:1505.02417  [pdf, other

    stat.ME cs.LG stat.CO stat.ML

    Towards stability and optimality in stochastic gradient descent

    Authors: Panos Toulis, Dustin Tran, Edoardo M. Airoldi

    Abstract: Iterative procedures for parameter estimation based on stochastic gradient descent allow the estimation to scale to massive data sets. However, in both theory and practice, they suffer from numerical instability. Moreover, they are statistically inefficient as estimators of the true parameter value. To address these two issues, we propose a new iterative procedure termed averaged implicit SGD (AI-… ▽ More

    Submitted 7 June, 2016; v1 submitted 10 May, 2015; originally announced May 2015.

    Comments: Appears in Artificial Intelligence and Statistics, 2016

  7. arXiv:1412.6734  [pdf, other

    stat.ML cs.LG

    Implicit Temporal Differences

    Authors: Aviv Tamar, Panos Toulis, Shie Mannor, Edoardo M. Airoldi

    Abstract: In reinforcement learning, the TD($λ$) algorithm is a fundamental policy evaluation method with an efficient online implementation that is suitable for large-scale problems. One practical drawback of TD($λ$) is its sensitivity to the choice of the step-size. It is an empirically well-known fact that a large step-size leads to fast convergence, at the cost of higher variance and risk of instability… ▽ More

    Submitted 21 December, 2014; originally announced December 2014.

  8. arXiv:1410.8597  [pdf, other

    stat.ME cs.SI math.ST physics.soc-ph

    Consistent estimation of dynamic and multi-layer block models

    Authors: Qiuyi Han, Kevin S. Xu, Edoardo M. Airoldi

    Abstract: Significant progress has been made recently on theoretical analysis of estimators for the stochastic block model (SBM). In this paper, we consider the multi-graph SBM, which serves as a foundation for many application settings including dynamic and multi-layer networks. We explore the asymptotic properties of two estimators for the multi-graph SBM, namely spectral clustering and the maximum-likeli… ▽ More

    Submitted 19 May, 2015; v1 submitted 30 October, 2014; originally announced October 2014.

    Comments: To appear at ICML 2015

    Journal ref: Proceedings of the 32nd International Conference on Machine Learning (2015) 1511-1520

  9. arXiv:1405.2566  [pdf, other

    stat.ML cs.SI physics.soc-ph q-bio.QM stat.AP

    Learning modular structures from network data and node variables

    Authors: Elham Azizi, James E. Galagan, Edoardo M. Airoldi

    Abstract: A standard technique for understanding underlying dependency structures among a set of variables posits a shared conditional probability distribution for the variables measured on individuals within a group. This approach is often referred to as module networks, where individuals are represented by nodes in a network, groups are termed modules, and the focus is on estimating the network structure… ▽ More

    Submitted 11 May, 2014; originally announced May 2014.

    Comments: 22 pages, 6 figures, 3 tables, 3 algorithms

  10. arXiv:1311.1731  [pdf, ps, other

    stat.ME cs.LG cs.SI physics.data-an stat.ML

    Stochastic blockmodel approximation of a graphon: Theory and consistent estimation

    Authors: Edoardo M Airoldi, Thiago B Costa, Stanley H Chan

    Abstract: Non-parametric approaches for analyzing network data based on exchangeable graph models (ExGM) have recently gained interest. The key object that defines an ExGM is often referred to as a graphon. This non-parametric perspective on network modeling poses challenging questions on how to make inference on the graphon underlying observed network data. In this paper, we propose a computationally effic… ▽ More

    Submitted 7 November, 2013; v1 submitted 7 November, 2013; originally announced November 2013.

    Comments: 20 pages, 4 figures, 2 algorithms. Neural Information Processing Systems (NIPS), 2013

  11. arXiv:1206.4631  [pdf, other

    cs.LG cs.CL cs.IR stat.ME stat.ML

    A Poisson convolution model for characterizing topical content with word frequency and exclusivity

    Authors: Edoardo M Airoldi, Jonathan M Bischof

    Abstract: An ongoing challenge in the analysis of document collections is how to summarize content in terms of a set of inferred themes that can be interpreted substantively in terms of topics. The current practice of parametrizing the themes in terms of most frequent words limits interpretability by ignoring the differential use of words across topics. We argue that words that are both common and exclusive… ▽ More

    Submitted 27 July, 2014; v1 submitted 18 June, 2012; originally announced June 2012.

    Comments: Originally appeared in ICML2012

  12. arXiv:1203.2821  [pdf, other

    stat.ME cs.LG cs.SI physics.soc-ph

    Graphlet decomposition of a weighted network

    Authors: Hossein Azari Soufiani, Edoardo M Airoldi

    Abstract: We introduce the graphlet decomposition of a weighted network, which encodes a notion of social information based on social structure. We develop a scalable inference algorithm, which combines EM with Bron-Kerbosch in a novel fashion, for estimating the parameters of the model underlying graphlets using one network sample. We explore some theoretical properties of the graphlet decomposition, inclu… ▽ More

    Submitted 13 March, 2012; originally announced March 2012.

    Comments: 25 pages, 4 figures, 3 tables

    Journal ref: Journal of Machine Learning Research, Workshop & Conference Proceedings, vol. 22 (AISTATS), 2012

  13. arXiv:1105.6245  [pdf, other

    stat.ME cs.SI physics.soc-ph

    Confidence sets for network structure

    Authors: Edoardo M. Airoldi, David S. Choi, Patrick J. Wolfe

    Abstract: Latent variable models are frequently used to identify structure in dichotomous network data, in part because they give rise to a Bernoulli product likelihood that is both well understood and consistent with the notion of exchangeable random graphs. In this article we propose conservative confidence sets that hold with respect to these underlying Bernoulli parameters as a function of any given par… ▽ More

    Submitted 31 May, 2011; originally announced May 2011.

    Comments: 17 pages, 3 figures, 3 tables

    Journal ref: Statistical Analysis and Data Mining, vol. 4, pp. 461-469, 2011

  14. arXiv:1105.2526  [pdf

    stat.ME cs.SI

    Deconvolution of mixing time series on a graph

    Authors: Alexander W. Blocker, Edoardo M. Airoldi

    Abstract: In many applications we are interested in making inference on latent time series from indirect measurements, which are often low-dimensional projections resulting from mixing or aggregation. Positron emission tomography, super-resolution, and network traffic monitoring are some examples. Inference in such settings requires solving a sequence of ill-posed inverse problems, y_t= A x_t, where the pro… ▽ More

    Submitted 10 June, 2011; v1 submitted 12 May, 2011; originally announced May 2011.

    Comments: 10 pages, 11 page supplement; updated with minor edits; accepted into UAI 2011

  15. arXiv:1012.0866  [pdf, other

    math.ST cs.LG stat.ME

    Generalized Species Sampling Priors with Latent Beta reinforcements

    Authors: Edoardo M. Airoldi, Thiago Costa, Federico Bassetti, Fabrizio Leisen, Michele Guindani

    Abstract: Many popular Bayesian nonparametric priors can be characterized in terms of exchangeable species sampling sequences. However, in some applications, exchangeability may not be appropriate. We introduce a {novel and probabilistically coherent family of non-exchangeable species sampling sequences characterized by a tractable predictive probability function with weights driven by a sequence of indepen… ▽ More

    Submitted 1 August, 2014; v1 submitted 3 December, 2010; originally announced December 2010.

    Comments: For correspondence purposes, Edoardo M. Airoldi's email is [email protected]; Federico Bassetti's email is [email protected]; Michele Guindani's email is [email protected] ; Fabrizo Leisen's email is [email protected]. To appear in the Journal of the American Statistical Association

  16. arXiv:1011.4644  [pdf, ps, other

    math.ST cs.SI stat.ME stat.ML

    Stochastic blockmodels with growing number of classes

    Authors: David S. Choi, Patrick J. Wolfe, Edoardo M. Airoldi

    Abstract: We present asymptotic and finite-sample results on the use of stochastic blockmodels for the analysis of network data. We show that the fraction of misclassified network nodes converges in probability to zero under maximum likelihood fitting when the number of classes is allowed to grow as the root of the network size and the average network degree grows at least poly-logarithmically in this size.… ▽ More

    Submitted 30 April, 2011; v1 submitted 21 November, 2010; originally announced November 2010.

    Comments: 12 pages, 3 figures; revised version

    Journal ref: Biometrika, 99:273--284, 2012

  17. arXiv:0912.5410  [pdf, other

    stat.ME cs.LG physics.soc-ph q-bio.MN stat.ML

    A survey of statistical network models

    Authors: Anna Goldenberg, Alice X Zheng, Stephen E Fienberg, Edoardo M Airoldi

    Abstract: Networks are ubiquitous in science and have become a focal point for discussion in everyday life. Formal statistical models for the analysis of network data have emerged as a major topic of interest in diverse areas of study, and most of these involve a form of graphical representation. Probability models on graphs date back to 1959. Along with empirical studies in social psychology and sociolog… ▽ More

    Submitted 29 December, 2009; originally announced December 2009.

    Comments: 96 pages, 14 figures, 333 references

    Journal ref: Foundations and Trends in Machine Learning, 2(2):1-117, 2009

  18. arXiv:0912.5193  [pdf, ps, other

    stat.ME cs.LG physics.soc-ph q-bio.QM stat.AP

    Ranking relations using analogies in biological and information networks

    Authors: Ricardo Silva, Katherine Heller, Zoubin Ghahramani, Edoardo M. Airoldi

    Abstract: Analogical reasoning depends fundamentally on the ability to learn and generalize about relations between objects. We develop an approach to relational learning which, given a set of pairs of objects $\mathbf{S}=\{A^{(1)}:B^{(1)},A^{(2)}:B^{(2)},\ldots,A^{(N)}:B ^{(N)}\}$, measures how well other pairs A:B fit in with the set $\mathbf{S}$. Our work addresses the following question: is the relation… ▽ More

    Submitted 29 August, 2013; v1 submitted 28 December, 2009; originally announced December 2009.

    Comments: Published in at http://dx.doi.org/10.1214/09-AOAS321 the Annals of Applied Statistics (http://www.imstat.org/aoas/) by the Institute of Mathematical Statistics (http://www.imstat.org)

    Report number: IMS-AOAS-AOAS321

    Journal ref: Annals of Applied Statistics 2010, Vol. 4, No. 2, 615-644

  19. arXiv:0706.2040  [pdf, other

    q-bio.QM cs.LG physics.soc-ph stat.ME stat.ML

    Getting started in probabilistic graphical models

    Authors: Edoardo M Airoldi

    Abstract: Probabilistic graphical models (PGMs) have become a popular tool for computational analysis of biological data in a variety of domains. But, what exactly are they and how do they work? How can we use PGMs to discover patterns that are biologically relevant? And to what extent can PGMs help us formulate new hypotheses that are testable at the bench? This note sketches out some answers and illustr… ▽ More

    Submitted 10 November, 2007; v1 submitted 14 June, 2007; originally announced June 2007.

    Comments: 12 pages, 1 figure

    Journal ref: Airoldi EM (2007) Getting started in probabilistic graphical models. PLoS Comput Biol 3(12): e252

  20. arXiv:0705.4485  [pdf, other

    stat.ME cs.LG math.ST physics.soc-ph stat.ML

    Mixed membership stochastic blockmodels

    Authors: Edoardo M Airoldi, David M Blei, Stephen E Fienberg, Eric P Xing

    Abstract: Observations consisting of measurements on relationships for pairs of objects arise in many settings, such as protein interaction and gene regulatory networks, collections of author-recipient email, and social networks. Analyzing such data with probabilisic models can be delicate because the simple exchangeability assumptions underlying many boilerplate models no longer hold. In this paper, we d… ▽ More

    Submitted 30 May, 2007; originally announced May 2007.

    Comments: 46 pages, 14 figures, 3 tables

    Journal ref: Journal of Machine Learning Research, 9, 1981-2014.