Skip to main content

Showing 1–45 of 45 results for author: Upfal, E

.
  1. arXiv:2306.01658  [pdf, other

    cs.LG

    An Adaptive Method for Weak Supervision with Drifting Data

    Authors: Alessio Mazzetto, Reza Esfandiarpoor, Eli Upfal, Stephen H. Bach

    Abstract: We introduce an adaptive method with formal quality guarantees for weak supervision in a non-stationary setting. Our goal is to infer the unknown labels of a sequence of data by using weak supervision sources that provide independent noisy signals of the correct classification for each data point. This setting includes crowdsourcing and programmatic weak supervision. We focus on the non-stationary… ▽ More

    Submitted 2 June, 2023; originally announced June 2023.

  2. arXiv:2305.02252  [pdf, ps, other

    cs.LG

    An Adaptive Algorithm for Learning with Unknown Distribution Drift

    Authors: Alessio Mazzetto, Eli Upfal

    Abstract: We develop and analyze a general technique for learning with an unknown distribution drift. Given a sequence of independent observations from the last $T$ steps of a drifting distribution, our algorithm agnostically learns a family of functions with respect to the current distribution at time $T$. Unlike previous work, our technique does not require prior knowledge about the magnitude of the drift… ▽ More

    Submitted 27 October, 2023; v1 submitted 3 May, 2023; originally announced May 2023.

    Comments: Updated version for Camera-ready with minor changes in text for readability, and including a new small section on linear regression

  3. arXiv:2302.02460  [pdf, other

    cs.LG stat.ML

    Nonparametric Density Estimation under Distribution Drift

    Authors: Alessio Mazzetto, Eli Upfal

    Abstract: We study nonparametric density estimation in non-stationary drift settings. Given a sequence of independent samples taken from a distribution that gradually changes in time, the goal is to compute the best estimate for the current distribution. We prove tight minimax risk bounds for both discrete and continuous smooth densities, where the minimum is over all possible estimates and the maximum is o… ▽ More

    Submitted 27 October, 2023; v1 submitted 5 February, 2023; originally announced February 2023.

    Comments: Camera Ready version

  4. arXiv:2205.13068  [pdf, other

    cs.LG

    Tight Lower Bounds on Worst-Case Guarantees for Zero-Shot Learning with Attributes

    Authors: Alessio Mazzetto, Cristina Menghini, Andrew Yuan, Eli Upfal, Stephen H. Bach

    Abstract: We develop a rigorous mathematical analysis of zero-shot learning with attributes. In this setting, the goal is to label novel classes with no training data, only detectors for attributes and a description of how those attributes are correlated with the target classes, called the class-attribute matrix. We develop the first non-trivial lower bound on the worst-case error of the best map from attri… ▽ More

    Submitted 28 November, 2022; v1 submitted 25 May, 2022; originally announced May 2022.

  5. arXiv:2111.08118  [pdf, other

    stat.AP cs.SI q-bio.NC

    Brain Functional Connectivity Estimation Utilizing Diffusion Kernels on a Structural Connectivity Graph

    Authors: Nathan Tung, Jerome Sanes, Eli Upfal, Ani Eloyan

    Abstract: Functional connectivity (FC) refers to the investigation of interactions between brain regions to understand integration of neural activity in several regions. FC is often estimated using functional magnetic resonance images (fMRI). There has been increasing interest in the potential of multi-modal imaging to obtain robust estimates of FC in high-dimensional settings. We develop novel algorithms a… ▽ More

    Submitted 21 January, 2023; v1 submitted 15 November, 2021; originally announced November 2021.

    Comments: 29 pages, 6 figures, 1 table, 2 algorithms

  6. arXiv:2111.07372  [pdf, other

    stat.ML cs.DS

    Fast Doubly-Adaptive MCMC to Estimate the Gibbs Partition Function with Weak Mixing Time Bounds

    Authors: Shahrzad Haddadan, Yue Zhuang, Cyrus Cousins, Eli Upfal

    Abstract: We present a novel method for reducing the computational complexity of rigorously estimating the partition functions (normalizing constants) of Gibbs (Boltzmann) distributions, which arise ubiquitously in probabilistic graphical models. A major obstacle to practical applications of Gibbs distributions is the need to estimate their partition functions. The state of the art in addressing this proble… ▽ More

    Submitted 14 November, 2021; originally announced November 2021.

    Comments: A short version of this paper will appear inthe 35th Conference on NeuralInformation Processing Systems, NeurIPS 2021

  7. RePBubLik: Reducing the Polarized Bubble Radius with Link Insertions

    Authors: Shahrzad Haddadan, Cristina Menghini, Matteo Riondato, Eli Upfal

    Abstract: The topology of the hyperlink graph among pages expressing different opinions may influence the exposure of readers to diverse content. Structural bias may trap a reader in a polarized bubble with no access to other opinions. We model readers' behavior as random walks. A node is in a polarized bubble if the expected length of a random walk from it to a page of different opinion is large. The struc… ▽ More

    Submitted 12 January, 2021; originally announced January 2021.

  8. arXiv:2011.11129  [pdf, ps, other

    cs.DS

    Making mean-estimation more efficient using an MCMC trace variance approach: DynaMITE

    Authors: Cyrus Cousins, Shahrzad Haddadan, Eli Upfal

    Abstract: We introduce a novel statistical measure for MCMC-mean estimation, the inter-trace variance ${\rm trv}^{(τ_{rel})}({\cal M},f)$, which depends on a Markov chain ${\cal M}$ and a function $f:S\to [a,b]$. The inter-trace variance can be efficiently estimated from observed data and leads to a more efficient MCMC-mean estimator. Prior MCMC mean-estimators receive, as input, upper-bounds on $τ_{mix}$ o… ▽ More

    Submitted 4 August, 2021; v1 submitted 22 November, 2020; originally announced November 2020.

  9. How Inclusive Are Wikipedia's Hyperlinks in Articles Covering Polarizing Topics?

    Authors: Cristina Menghini, Aris Anagnostopoulos, Eli Upfal

    Abstract: Wikipedia relies on an extensive review process to verify that the content of each individual page is unbiased and presents a neutral point of view. Less attention has been paid to possible biases in the hyperlink structure of Wikipedia, which has a significant influence on the user's exploration process when visiting more than one page. The evaluation of hyperlink bias is challenging because it d… ▽ More

    Submitted 31 March, 2022; v1 submitted 16 July, 2020; originally announced July 2020.

  10. arXiv:1910.03493  [pdf, other

    eess.SP cs.LG eess.SY

    A Rademacher Complexity Based Method fo rControlling Power and Confidence Level in Adaptive Statistical Analysis

    Authors: Lorenzo De Stefani, Eli Upfal

    Abstract: While standard statistical inference techniques and machine learning generalization bounds assume that tests are run on data selected independently of the hypotheses, practical data analysis and machine learning are usually iterative and adaptive processes where the same holdout data is often used for testing a sequence of hypotheses (or models), which may each depend on the outcome of the previou… ▽ More

    Submitted 4 October, 2019; originally announced October 2019.

  11. arXiv:1905.13379  [pdf, other

    cs.GT

    Learning Equilibria of Simulation-Based Games

    Authors: Enrique Areyan Viqueira, Cyrus Cousins, Eli Upfal, Amy Greenwald

    Abstract: We tackle a fundamental problem in empirical game-theoretic analysis (EGTA), that of learning equilibria of simulation-based games. Such games cannot be described in analytical form; instead, a black-box simulator can be queried to obtain noisy samples of utilities. Our approach to EGTA is in the spirit of probably approximately correct learning. We design algorithms that learn so-called empirical… ▽ More

    Submitted 30 May, 2019; originally announced May 2019.

  12. arXiv:1902.06306  [pdf, ps, other

    cs.CR

    On the Complexity of Anonymous Communication Through Public Networks

    Authors: Megumi Ando, Anna Lysyanskaya, Eli Upfal

    Abstract: Onion routing is the most widely used approach to anonymous communication online. The idea is that Alice wraps her message to Bob in layers of encryption to form an "onion," and routes it through a series of intermediaries. Each intermediary's job is to decrypt ("peel") the onion it receives to obtain instructions for where to send it next, and what to send. The intuition is that, by the time it g… ▽ More

    Submitted 28 July, 2021; v1 submitted 17 February, 2019; originally announced February 2019.

  13. arXiv:1812.07568  [pdf, other

    cs.SD cs.LG eess.AS stat.ML

    Uniform Convergence Bounds for Codec Selection

    Authors: Clayton Sanford, Cyrus Cousins, Eli Upfal

    Abstract: We frame the problem of selecting an optimal audio encoding scheme as a supervised learning task. Through uniform convergence theory, we guarantee approximately optimal codec selection while controlling for selection bias. We present rigorous statistical guarantees for the codec selection problem that hold for arbitrary distributions over audio sequences and for arbitrary quality metrics. Our tech… ▽ More

    Submitted 17 December, 2018; originally announced December 2018.

  14. arXiv:1811.00602  [pdf, other

    cs.DB

    VizRec: A framework for secure data exploration via visual representation

    Authors: Lorenzo De Stefani, Leonhard F. Spiegelberg, Tim Kraska, Eli Upfal

    Abstract: Visual representations of data (visualizations) are tools of great importance and widespread use in data analytics as they provide users visual insight to patterns in the observed data in a simple and effective way. However, since visualizations tools are applied to sample data, there is a a risk of visualizing random fluctuations in the sample rather than a true pattern in the data. This problem… ▽ More

    Submitted 1 November, 2018; originally announced November 2018.

  15. arXiv:1808.08294  [pdf, other

    cs.LG stat.ML

    Unknown Examples & Machine Learning Model Generalization

    Authors: Yeounoh Chung, Peter J. Haas, Eli Upfal, Tim Kraska

    Abstract: Over the past decades, researchers and ML practitioners have come up with better and better ways to build, understand and improve the quality of ML models, but mostly under the key assumption that the training data is distributed identically to the testing data. In many real-world applications, however, some potential training examples are unknown to the modeler, due to sample selection bias or, m… ▽ More

    Submitted 11 October, 2019; v1 submitted 24 August, 2018; originally announced August 2018.

  16. arXiv:1807.02876  [pdf, other

    physics.comp-ph cs.LG hep-ex stat.ML

    Machine Learning in High Energy Physics Community White Paper

    Authors: Kim Albertsson, Piero Altoe, Dustin Anderson, John Anderson, Michael Andrews, Juan Pedro Araque Espinosa, Adam Aurisano, Laurent Basara, Adrian Bevan, Wahid Bhimji, Daniele Bonacorsi, Bjorn Burkle, Paolo Calafiura, Mario Campanelli, Louis Capps, Federico Carminati, Stefano Carrazza, Yi-fan Chen, Taylor Childers, Yann Coadou, Elias Coniavitis, Kyle Cranmer, Claire David, Douglas Davis, Andrea De Simone , et al. (103 additional authors not shown)

    Abstract: Machine learning has been applied to several problems in particle physics research, beginning with applications to high-level physics analysis in the 1990s and 2000s, followed by an explosion of applications in particle and event identification and reconstruction in the 2010s. In this document we discuss promising future research and development areas for machine learning in particle physics. We d… ▽ More

    Submitted 16 May, 2019; v1 submitted 8 July, 2018; originally announced July 2018.

    Comments: Editors: Sergei Gleyzer, Paul Seyfert and Steven Schramm

  17. arXiv:1710.02108  [pdf, ps, other

    cs.DS

    Tiered Sampling: An Efficient Method for Approximate Counting Sparse Motifs in Massive Graph Streams

    Authors: Lorenzo De Stefani, Erisa Terolli, Eli Upfal

    Abstract: We introduce Tiered Sampling, a novel technique for approximate counting sparse motifs in massive graphs whose edges are observed in a stream. Our technique requires only a single pass on the data and uses a memory of fixed size $M$, which can be magnitudes smaller than the number of edges. Our methods addresses the challenging task of counting sparse motifs - sub-graph patterns that have low pr… ▽ More

    Submitted 5 October, 2017; originally announced October 2017.

    Comments: 28 pages

    MSC Class: 68W20 ACM Class: G.1.2; G.2.2

  18. arXiv:1706.05367  [pdf, ps, other

    cs.CR

    Practical and Provably Secure Onion Routing

    Authors: Megumi Ando, Anna Lysyanskaya, Eli Upfal

    Abstract: In an onion routing protocol, messages travel through several intermediaries before arriving at their destinations, they are wrapped in layers of encryption (hence they are called "onions"). The goal is to make it hard to establish who sent the message. It is a practical and widespread tool for creating anonymous channels. For the standard adversary models -- network, passive, and active -- we p… ▽ More

    Submitted 29 July, 2021; v1 submitted 16 June, 2017; originally announced June 2017.

  19. arXiv:1612.01040  [pdf, other

    cs.DB stat.ME

    Controlling False Discoveries During Interactive Data Exploration

    Authors: Zheguang Zhao, Lorenzo De Stefani, Emanuel Zgraggen, Carsten Binnig, Eli Upfal, Tim Kraska

    Abstract: Recent tools for interactive data exploration significantly increase the chance that users make false discoveries. The crux is that these tools implicitly allow the user to test a large body of different hypotheses with just a few clicks thus incurring in the issue commonly known in statistics as the multiple hypothesis testing error. In this paper, we propose solutions to integrate multiple hypot… ▽ More

    Submitted 3 December, 2016; originally announced December 2016.

  20. arXiv:1609.00790  [pdf, other

    cs.SI cs.DS

    Scalable Betweenness Centrality Maximization via Sampling

    Authors: Ahmad Mahmoody, Charalampos E. Tsourakakis, Eli Upfal

    Abstract: Betweenness centrality is a fundamental centrality measure in social network analysis. Given a large-scale network, how can we find the most central nodes? This question is of key importance to numerous important applications that rely on betweenness centrality, including community detection and understanding graph vulnerability. Despite the large amount of work on designing scalable approximation… ▽ More

    Submitted 3 September, 2016; originally announced September 2016.

    Comments: Accepted in KDD 2016

  21. arXiv:1605.05590  [pdf, other

    cs.DC

    MapReduce and Streaming Algorithms for Diversity Maximization in Metric Spaces of Bounded Doubling Dimension

    Authors: Matteo Ceccarello, Andrea Pietracaprina, Geppino Pucci, Eli Upfal

    Abstract: Given a dataset of points in a metric space and an integer $k$, a diversity maximization problem requires determining a subset of $k$ points maximizing some diversity objective measure, e.g., the minimum or the average distance between two points in the subset. Diversity maximization is computationally hard, hence only approximate solutions can be hoped for. Although its applications are mainly in… ▽ More

    Submitted 23 January, 2017; v1 submitted 18 May, 2016; originally announced May 2016.

    Comments: Extended version of http://www.vldb.org/pvldb/vol10/p469-ceccarello.pdf, PVLDB Volume 10, No. 5, January 2017

  22. Balanced Allocation: Patience is not a Virtue

    Authors: John Augustine, William K. Moses Jr., Amanda Redlich, Eli Upfal

    Abstract: Load balancing is a well-studied problem, with balls-in-bins being the primary framework. The greedy algorithm $\mathsf{Greedy}[d]$ of Azar et al. places each ball by probing $d > 1$ random bins and placing the ball in the least loaded of them. With high probability, the maximum load under $\mathsf{Greedy}[d]$ is exponentially lower than the result when balls are placed uniformly randomly. Vöcking… ▽ More

    Submitted 22 January, 2018; v1 submitted 26 February, 2016; originally announced February 2016.

    Comments: 26 pages, preliminary version accepted at SODA 2016

    ACM Class: F.2.2; G.2.0; G.3

    Journal ref: In Proceedings of the Twenty-Seventh Annual ACM-SIAM Symposium on Discrete Algorithms (SODA 2016), 655-671

  23. arXiv:1602.07424  [pdf, other

    cs.DS cs.DB

    TRIÈST: Counting Local and Global Triangles in Fully-dynamic Streams with Fixed Memory Size

    Authors: Lorenzo De Stefani, Alessandro Epasto, Matteo Riondato, Eli Upfal

    Abstract: We present TRIÈST, a suite of one-pass streaming algorithms to compute unbiased, low-variance, high-quality approximations of the global and local (i.e., incident to each vertex) number of triangles in a fully-dynamic graph represented as an adversarial stream of edge insertions and deletions. Our algorithms use reservoir sampling and its variants to exploit the user-specified memory space at all… ▽ More

    Submitted 28 June, 2016; v1 submitted 24 February, 2016; originally announced February 2016.

    Comments: 49 pages, 7 figures, extended version of the paper appeared at ACM KDD'16

    ACM Class: G.2.2; H.2.8

  24. arXiv:1602.05866  [pdf, other

    cs.DS

    ABRA: Approximating Betweenness Centrality in Static and Dynamic Graphs with Rademacher Averages

    Authors: Matteo Riondato, Eli Upfal

    Abstract: We present ABRA, a suite of algorithms that compute and maintain probabilistically-guaranteed, high-quality, approximations of the betweenness centrality of all nodes (or edges) on both static and fully dynamic graphs. Our algorithms rely on random sampling and their analysis leverages on Rademacher averages and pseudodimension, fundamental concepts from statistical learning theory. To our knowled… ▽ More

    Submitted 18 February, 2016; originally announced February 2016.

    ACM Class: G.2.2; H.2.8

  25. arXiv:1509.02487  [pdf, ps, other

    cs.DS cs.LG

    Optimizing Static and Adaptive Probing Schedules for Rapid Event Detection

    Authors: Ahmad Mahmoody, Evgenios M. Kornaropoulos, Eli Upfal

    Abstract: We formulate and study a fundamental search and detection problem, Schedule Optimization, motivated by a variety of real-world applications, ranging from monitoring content changes on the web, social networks, and user activities to detecting failure on large systems with many individual machines. We consider a large system consists of many nodes, where each node has its own rate of generating n… ▽ More

    Submitted 9 September, 2015; v1 submitted 8 September, 2015; originally announced September 2015.

  26. arXiv:1506.03265  [pdf, other

    cs.DC

    A Practical Parallel Algorithm for Diameter Approximation of Massive Weighted Graphs

    Authors: Matteo Ceccarello, Andrea Pietracaprina, Geppino Pucci, Eli Upfal

    Abstract: We present a space and time efficient practical parallel algorithm for approximating the diameter of massive weighted undirected graphs on distributed platforms supporting a MapReduce-like abstraction. The core of the algorithm is a weighted graph decomposition strategy generating disjoint clusters of bounded weighted radius. Theoretically, our algorithm uses linear space and yields a polylogarith… ▽ More

    Submitted 9 November, 2015; v1 submitted 10 June, 2015; originally announced June 2015.

  27. arXiv:1504.03275  [pdf, other

    cs.SI

    Wiggins: Detecting Valuable Information in Dynamic Networks Using Limited Resources

    Authors: Ahmad Mahmoody, Matteo Riondato, Eli Upfal

    Abstract: Detecting new information and events in a dynamic network by probing individual nodes has many practical applications: discovering new webpages, analyzing influence properties in network, and detecting failure propagation in electronic circuits or infections in public drinkable water systems. In practice, it is infeasible for anyone but the owner of the network (if existent) to monitor all nodes a… ▽ More

    Submitted 29 July, 2015; v1 submitted 13 April, 2015; originally announced April 2015.

  28. arXiv:1407.3144  [pdf, other

    cs.DC cs.DS

    Space and Time Efficient Parallel Graph Decomposition, Clustering, and Diameter Approximation

    Authors: Matteo Ceccarello, Andrea Pietracaprina, Geppino Pucci, Eli Upfal

    Abstract: We develop a novel parallel decomposition strategy for unweighted, undirected graphs, based on growing disjoint connected clusters from batches of centers progressively selected from yet uncovered nodes. With respect to similar previous decompositions, our strategy exercises a tighter control on both the number of clusters and their maximum radius. We present two important applications of our pa… ▽ More

    Submitted 6 February, 2015; v1 submitted 11 July, 2014; originally announced July 2014.

    Comments: 14 pages

  29. arXiv:1402.5524  [pdf, other

    cs.CR cs.DC cs.DS

    The Melbourne Shuffle: Improving Oblivious Storage in the Cloud

    Authors: Olga Ohrimenko, Michael T. Goodrich, Roberto Tamassia, Eli Upfal

    Abstract: We present a simple, efficient, and secure data-oblivious randomized shuffle algorithm. This is the first secure data-oblivious shuffle that is not based on sorting. Our method can be used to improve previous oblivious storage solutions for network-based outsourcing of data.

    Submitted 22 February, 2014; originally announced February 2014.

  30. arXiv:1312.1277  [pdf, ps, other

    cs.DS cs.LG

    Bandits and Experts in Metric Spaces

    Authors: Robert Kleinberg, Aleksandrs Slivkins, Eli Upfal

    Abstract: In a multi-armed bandit problem, an online algorithm chooses from a set of strategies in a sequence of trials so as to maximize the total payoff of the chosen strategies. While the performance of bandit algorithms with a small finite strategy set is quite well understood, bandit problems with large strategy sets are still a topic of very active investigation, motivated by practical applications su… ▽ More

    Submitted 15 April, 2019; v1 submitted 4 December, 2013; originally announced December 2013.

    Comments: This manuscript is a merged and definitive version of (R. Kleinberg, Slivkins, Upfal: STOC 2008) and (R. Kleinberg, Slivkins: SODA 2010), with a significantly revised presentation

  31. arXiv:1309.4286  [pdf, other

    q-bio.QM stat.AP

    Accurate Computation of Survival Statistics in Genome-wide Studies

    Authors: Fabio Vandin, Alexandra Papoutsaki, Benjamin J. Raphael, Eli Upfal

    Abstract: A key challenge in genomics is to identify genetic variants that distinguish patients with different survival time following diagnosis or treatment. While the log-rank test is widely used for this purpose, nearly all implementations of the log-rank test rely on an asymptotic approximation that is not appropriate in many genomics applications. This is because: the two populations determined by a ge… ▽ More

    Submitted 17 September, 2013; originally announced September 2013.

    Comments: Full version of RECOMB 2013 paper

  32. arXiv:1305.1121  [pdf, ps, other

    cs.DC

    Storage and Search in Dynamic Peer-to-Peer Networks

    Authors: John Augustine, Anisur Rahaman Molla, Ehab Morsy, Gopal Pandurangan, Peter Robinson, Eli Upfal

    Abstract: We study robust and efficient distributed algorithms for searching, storing, and maintaining data in dynamic Peer-to-Peer (P2P) networks. P2P networks are highly dynamic networks that experience heavy node churn (i.e., nodes join and leave the network continuously over time). Our goal is to guarantee, despite high node churn rate, that a large number of nodes in the network can store, retrieve, an… ▽ More

    Submitted 6 May, 2013; originally announced May 2013.

    Comments: to appear at SPAA 2013

    ACM Class: F.2.2

  33. arXiv:1301.2277  [pdf

    cs.AI cs.DS

    A Clustering Approach to Solving Large Stochastic Matching Problems

    Authors: Milos Hauskrecht, Eli Upfal

    Abstract: In this work we focus on efficient heuristics for solving a class of stochastic planning problems that arise in a variety of business, investment, and industrial applications. The problem is best described in terms of future buy and sell contracts. By buying less reliable, but less expensive, buy (supply) contracts, a company or a trader can cover a position of more reliable and more expensive sel… ▽ More

    Submitted 10 January, 2013; originally announced January 2013.

    Comments: Appears in Proceedings of the Seventeenth Conference on Uncertainty in Artificial Intelligence (UAI2001)

    Report number: UAI-P-2001-PG-219-226

  34. Fast Distributed PageRank Computation

    Authors: Atish Das Sarma, Anisur Rahaman Molla, Gopal Pandurangan, Eli Upfal

    Abstract: Over the last decade, PageRank has gained importance in a wide range of applications and domains, ever since it first proved to be effective in determining node importance in large graphs (and was a pioneering idea behind Google's search engine). In distributed computing alone, PageRank vector, or more generally random walk based quantities have been used for several different applications ranging… ▽ More

    Submitted 25 November, 2015; v1 submitted 15 August, 2012; originally announced August 2012.

    Comments: 14 pages

    Journal ref: Theoretical Computer Science, Volume 561, Pages 113-121, 2015

  35. arXiv:1111.6937  [pdf, other

    cs.DS cs.DB cs.LG

    Efficient Discovery of Association Rules and Frequent Itemsets through Sampling with Tight Performance Guarantees

    Authors: Matteo Riondato, Eli Upfal

    Abstract: The tasks of extracting (top-$K$) Frequent Itemsets (FI's) and Association Rules (AR's) are fundamental primitives in data mining and database applications. Exact algorithms for these problems exist and are widely used, but their running time is hindered by the need of scanning the entire dataset, possibly multiple times. High quality approximations of FI's and AR's are sufficient for most practic… ▽ More

    Submitted 22 February, 2013; v1 submitted 29 November, 2011; originally announced November 2011.

    Comments: 19 pages, 7 figures. A shorter version of this paper appeared in the proceedings of ECML PKDD 2012

    ACM Class: H.2.8

  36. Space-Round Tradeoffs for MapReduce Computations

    Authors: Andrea Pietracaprina, Geppino Pucci, Matteo Riondato, Francesco Silvestri, Eli Upfal

    Abstract: This work explores fundamental modeling and algorithmic issues arising in the well-established MapReduce framework. First, we formally specify a computational model for MapReduce which captures the functional flavor of the paradigm by allowing for a flexible use of parallelism. Indeed, the model diverges from a traditional processor-centric view by featuring parameters which embody only global and… ▽ More

    Submitted 9 November, 2011; originally announced November 2011.

    Journal ref: Final version in Proc. of the 26th ACM international conference on Supercomputing, pages 235-244, 2012

  37. arXiv:1108.0809  [pdf, ps, other

    cs.DS cs.DC

    Distributed Agreement in Dynamic Peer-to-Peer Networks

    Authors: John Augustine, Gopal Pandurangan, Peter Robinson, Eli Upfal

    Abstract: Motivated by the need for robust and fast distributed computation in highly dynamic Peer-to-Peer (P2P) networks, we study algorithms for the fundamental distributed agreement problem. P2P networks are highly dynamic networks that experience heavy node {\em churn}. Our goal is to design fast algorithms (running in a small number of rounds) that guarantee, despite high node churn rate, that almost a… ▽ More

    Submitted 10 September, 2014; v1 submitted 3 August, 2011; originally announced August 2011.

    Comments: to appear at the Journal of Computer and System Sciences; preliminary version appeared at SODA 2012

  38. arXiv:1101.5805  [pdf, ps, other

    cs.DB cs.DS cs.LG

    The VC-Dimension of Queries and Selectivity Estimation Through Sampling

    Authors: Matteo Riondato, Mert Akdere, Ugur Cetintemel, Stanley B. Zdonik, Eli Upfal

    Abstract: We develop a novel method, based on the statistical concept of the Vapnik-Chervonenkis dimension, to evaluate the selectivity (output cardinality) of SQL queries - a crucial step in optimizing the execution of large scale database and data-mining operations. The major theoretical contribution of this work, which is of independent interest, is an explicit bound to the VC-dimension of a range space… ▽ More

    Submitted 11 August, 2011; v1 submitted 30 January, 2011; originally announced January 2011.

    Comments: 20 pages, 3 figures

    ACM Class: H.2.4; G.3

  39. arXiv:1101.4609  [pdf, ps, other

    cs.DM cs.DS

    Tight Bounds on Information Dissemination in Sparse Mobile Networks

    Authors: Alberto Pettarin, Andrea Pietracaprina, Geppino Pucci, Eli Upfal

    Abstract: Motivated by the growing interest in mobile systems, we study the dynamics of information dissemination between agents moving independently on a plane. Formally, we consider $k$ mobile agents performing independent random walks on an $n$-node grid. At time $0$, each agent is located at a random node of the grid and one agent has a rumor. The spread of the rumor is governed by a dynamic communicati… ▽ More

    Submitted 1 February, 2011; v1 submitted 24 January, 2011; originally announced January 2011.

    Comments: 19 pages; we rewrote Lemma 4, fixing a claim which was not fully justified in the first version of the draft

  40. arXiv:1007.1604  [pdf, other

    cs.DM cs.DS

    Infectious Random Walks

    Authors: Alberto Pettarin, Andrea Pietracaprina, Geppino Pucci, Eli Upfal

    Abstract: We study the dynamics of information (or virus) dissemination by $m$ mobile agents performing independent random walks on an $n$-node grid. We formulate our results in terms of two scenarios: broadcasting and gossi**. In the broadcasting scenario, the mobile agents are initially placed uniformly at random among the grid nodes. At time 0, one agent is informed of a rumor and starts a random walk.… ▽ More

    Submitted 25 January, 2011; v1 submitted 9 July, 2010; originally announced July 2010.

    Comments: 21 pages, 3 figures --- The results presented in this paper have been extended in: Pettarin et al., Tight Bounds on Information Dissemination in Sparse Mobile Networks, http://arxiv.longhoe.net/abs/1101.4609

  41. Mining Top-K Frequent Itemsets Through Progressive Sampling

    Authors: Andrea Pietracaprina, Matteo Riondato, Eli Upfal, Fabio Vandin

    Abstract: We study the use of sampling for efficiently mining the top-K frequent itemsets of cardinality at most w. To this purpose, we define an approximation to the top-K frequent itemsets to be a family of itemsets which includes (resp., excludes) all very frequent (resp., very infrequent) itemsets, together with an estimate of these itemsets' frequencies with a bounded error. Our first result is an uppe… ▽ More

    Submitted 27 June, 2010; originally announced June 2010.

    Comments: 16 pages, 2 figures, accepted for presentation at ECML PKDD 2010 and publication in the ECML PKDD 2010 special issue of the Data Mining and Knowledge Discovery journal

  42. arXiv:1002.1104  [pdf, ps, other

    cs.DB cs.DS

    An Efficient Rigorous Approach for Identifying Statistically Significant Frequent Itemsets

    Authors: Adam Kirsch, Michael Mitzenmacher, Andrea Pietracaprina, Geppino Pucci, Eli Upfal, Fabio Vandin

    Abstract: As advances in technology allow for the collection, storage, and analysis of vast amounts of data, the task of screening and assessing the significance of discovered patterns is becoming a major challenge in data mining applications. In this work, we address significance in the context of frequent itemset mining. Specifically, we develop a novel methodology to identify a meaningful support thres… ▽ More

    Submitted 4 February, 2010; originally announced February 2010.

    Comments: A preliminary version of this work was presented in ACM PODS 2009. 20 pages, 0 figures

    ACM Class: H.2.8

  43. arXiv:1002.0874  [pdf, ps, other

    cs.DS

    MADMX: A Novel Strategy for Maximal Dense Motif Extraction

    Authors: Roberto Grossi, Andrea Pietracaprina, Nadia Pisanti, Geppino Pucci, Eli Upfal, Fabio Vandin

    Abstract: We develop, analyze and experiment with a new tool, called MADMX, which extracts frequent motifs, possibly including don't care characters, from biological sequences. We introduce density, a simple and flexible measure for bounding the number of don't cares in a motif, defined as the ratio of solid (i.e., different from don't care) characters to the total length of the motif. By extracting only… ▽ More

    Submitted 3 February, 2010; originally announced February 2010.

    Comments: A preliminary version of this work was presented in WABI 2009. 10 pages, 0 figures

  44. arXiv:0809.4882  [pdf, ps, other

    cs.DS cs.LG

    Multi-Armed Bandits in Metric Spaces

    Authors: Robert Kleinberg, Aleksandrs Slivkins, Eli Upfal

    Abstract: In a multi-armed bandit problem, an online algorithm chooses from a set of strategies in a sequence of trials so as to maximize the total payoff of the chosen strategies. While the performance of bandit algorithms with a small finite strategy set is quite well understood, bandit problems with large strategy sets are still a topic of very active investigation, motivated by practical applications… ▽ More

    Submitted 28 September, 2008; originally announced September 2008.

    Comments: 16 pages, 0 figures

  45. arXiv:math/0209357  [pdf, ps, other

    math.PR

    Steady state analysis of balanced-allocation routing

    Authors: Aris Anagnostopoulos, Ioannis Kontoyiannis, Eli Upfal

    Abstract: We compare the long-term, steady-state performance of a variant of the standard Dynamic Alternative Routing (DAR) technique commonly used in telephone and ATM networks, to the performance of a path-selection algorithm based on the "balanced-allocation" principle; we refer to this new algorithm as the Balanced Dynamic Alternative Routing (BDAR) algorithm. While DAR checks alternative routes seque… ▽ More

    Submitted 25 September, 2002; originally announced September 2002.

    Comments: 22 pages, 1 figure