-
Low-Distortion Clustering in Bounded Growth Graphs
Authors:
Yi-Jun Chang,
Varsha Dani,
Thomas P. Hayes
Abstract:
The well-known clustering algorithm of Miller, Peng, and Xu (SPAA 2013) is useful for many applications, including low-diameter decomposition and low-energy distributed algorithms. One nice property of their clustering, shown in previous work by Chang, Dani, Hayes, and Pettie (PODC 2020), is that distances in the cluster graph are rescaled versions of distances in the original graph, up to an…
▽ More
The well-known clustering algorithm of Miller, Peng, and Xu (SPAA 2013) is useful for many applications, including low-diameter decomposition and low-energy distributed algorithms. One nice property of their clustering, shown in previous work by Chang, Dani, Hayes, and Pettie (PODC 2020), is that distances in the cluster graph are rescaled versions of distances in the original graph, up to an $O(\log n)$ distortion factor and rounding issues. Minimizing this distortion factor is important for efficiency in computing the clustering, as well as in other applications.
We prove that there exist graphs for which an $Ω((\log n)^{1/3})$ distortion factor is necessary for any clustering. We also consider a class of nice graphs which we call uniformly bounded independence graphs. These include, for example, paths, lattice graphs, and "dense" unit disk graphs. For these graphs, we prove that clusterings of distortion $O(1)$ always exist, and moreover, we give new efficient distributed algorithms to construct them. This clustering is based on Voronoi cells centered at the vertices of a maximal independent set in a suitable power graph.
Applications include low-energy simulation of distributed algorithms in the LOCAL, CONGEST, and RADIO-CONGEST models and efficient approximate solutions to distributed combinatorial optimization problems. We also investigate related lower bounds.
△ Less
Submitted 8 May, 2024;
originally announced May 2024.
-
Mafin: Enhancing Black-Box Embeddings with Model Augmented Fine-Tuning
Authors:
Mingtian Zhang,
Shawn Lan,
Peter Hayes,
David Barber
Abstract:
Retrieval Augmented Generation (RAG) has emerged as an effective solution for mitigating hallucinations in Large Language Models (LLMs). The retrieval stage in RAG typically involves a pre-trained embedding model, which converts queries and passages into vectors to capture their semantics. However, a standard pre-trained embedding model may exhibit sub-optimal performance when applied to specific…
▽ More
Retrieval Augmented Generation (RAG) has emerged as an effective solution for mitigating hallucinations in Large Language Models (LLMs). The retrieval stage in RAG typically involves a pre-trained embedding model, which converts queries and passages into vectors to capture their semantics. However, a standard pre-trained embedding model may exhibit sub-optimal performance when applied to specific domain knowledge, necessitating fine-tuning. This paper addresses scenarios where the embeddings are only available from a black-box model. We introduce Model augmented fine-tuning (Mafin) -- a novel approach for fine-tuning a black-box embedding model by augmenting it with a trainable embedding model. Our results demonstrate that Mafin significantly enhances the performance of the black-box embeddings by only requiring the training of a small augmented model. We validate the effectiveness of our method on both labeled and unlabeled datasets, illustrating its broad applicability and efficiency.
△ Less
Submitted 12 March, 2024; v1 submitted 19 February, 2024;
originally announced February 2024.
-
Active Preference Learning for Large Language Models
Authors:
William Muldrew,
Peter Hayes,
Mingtian Zhang,
David Barber
Abstract:
As large language models (LLMs) become more capable, fine-tuning techniques for aligning with human intent are increasingly important. A key consideration for aligning these models is how to most effectively use human resources, or model resources in the case where LLMs themselves are used as oracles. Reinforcement learning from Human or AI preferences (RLHF/RLAIF) is the most prominent example of…
▽ More
As large language models (LLMs) become more capable, fine-tuning techniques for aligning with human intent are increasingly important. A key consideration for aligning these models is how to most effectively use human resources, or model resources in the case where LLMs themselves are used as oracles. Reinforcement learning from Human or AI preferences (RLHF/RLAIF) is the most prominent example of such a technique, but is complex and often unstable. Direct Preference Optimization (DPO) has recently been proposed as a simpler and more stable alternative. In this work, we develop an active learning strategy for DPO to make better use of preference labels. We propose a practical acquisition function for prompt/completion pairs based on the predictive entropy of the language model and a measure of certainty of the implicit preference model optimized by DPO. We demonstrate how our approach improves both the rate of learning and final performance of fine-tuning on pairwise preference data.
△ Less
Submitted 28 June, 2024; v1 submitted 12 February, 2024;
originally announced February 2024.
-
Low Noise Inverse Magnetoelectric Magnetic Field Sensor
Authors:
L. Thormählen,
P. Hayes,
E. Elzenheimer,
E. Spetzler,
J. McCord,
G. Schmidt,
M. Höft,
D. Meyners,
E. Quandt
Abstract:
In the development of any type of magnetic field sensor based on magnetic films, special consideration must be given to the magnetic layer component. The presented work investigates the use of flux closing magnetostrictive multilayers for inverse magnetoelectric sensors. In such a type of magnetic field sensor, highly sensitive AC and DC field detection relies on strong excitation of the incorpora…
▽ More
In the development of any type of magnetic field sensor based on magnetic films, special consideration must be given to the magnetic layer component. The presented work investigates the use of flux closing magnetostrictive multilayers for inverse magnetoelectric sensors. In such a type of magnetic field sensor, highly sensitive AC and DC field detection relies on strong excitation of the incorporated magnetic layers by piezoelectrically driven cantilever oscillation at mechanical resonances. The provoked periodic flux change is influenced by the magnetic field to be measured and is picked up by a coil, which generates the measured output. The effect of the magnetic multilayer on linearity, noise behavior, and detection limit of DC and AC signals is investigated. This study demonstrates the next step for inverse magnetoelectric thin film sensors, which achieve one order of magnitude improved detection limits with less than $8 pT/Hz^{1/2}$ at $10 Hz$ and $18 pT/Hz^{1/2}$ at $DC$ using exchange bias stabilized magnetic multilayers for obtaining flux closure.
△ Less
Submitted 8 December, 2023;
originally announced December 2023.
-
Optimal Mixing via Tensorization for Random Independent Sets on Arbitrary Trees
Authors:
Charilaos Efthymiou,
Thomas P. Hayes,
Daniel Stefankovic,
Eric Vigoda
Abstract:
We study the mixing time of the single-site update Markov chain, known as the Glauber dynamics, for generating a random independent set of a tree. Our focus is obtaining optimal convergence results for arbitrary trees. We consider the more general problem of sampling from the Gibbs distribution in the hard-core model where independent sets are weighted by a parameter $λ>0$; the special case $λ=1$…
▽ More
We study the mixing time of the single-site update Markov chain, known as the Glauber dynamics, for generating a random independent set of a tree. Our focus is obtaining optimal convergence results for arbitrary trees. We consider the more general problem of sampling from the Gibbs distribution in the hard-core model where independent sets are weighted by a parameter $λ>0$; the special case $λ=1$ corresponds to the uniform distribution over all independent sets. Previous work of Martinelli, Sinclair and Weitz (2004) obtained optimal mixing time bounds for the complete $Δ$-regular tree for all $λ$. However, Restrepo et al. (2014) showed that for sufficiently large $λ$ there are bounded-degree trees where optimal mixing does not hold. Recent work of Eppstein and Frishberg (2022) proved a polynomial mixing time bound for the Glauber dynamics for arbitrary trees, and more generally for graphs of bounded tree-width.
We establish an optimal bound on the relaxation time (i.e., inverse spectral gap) of $O(n)$ for the Glauber dynamics for unweighted independent sets on arbitrary trees. We stress that our results hold for arbitrary trees and there is no dependence on the maximum degree $Δ$. Interestingly, our results extend (far) beyond the uniqueness threshold which is on the order $λ=O(1/Δ)$. Our proof approach is inspired by recent work on spectral independence. In fact, we prove that spectral independence holds with a constant independent of the maximum degree for any tree, but this does not imply mixing for general trees as the optimal mixing results of Chen, Liu, and Vigoda (2021) only apply for bounded degree graphs. We instead utilize the combinatorial nature of independent sets to directly prove approximate tensorization of variance via a non-trivial inductive proof.
△ Less
Submitted 18 February, 2024; v1 submitted 15 July, 2023;
originally announced July 2023.
-
Towards Healing the Blindness of Score Matching
Authors:
Mingtian Zhang,
Oscar Key,
Peter Hayes,
David Barber,
Brooks Paige,
François-Xavier Briol
Abstract:
Score-based divergences have been widely used in machine learning and statistics applications. Despite their empirical success, a blindness problem has been observed when using these for multi-modal distributions. In this work, we discuss the blindness problem and propose a new family of divergences that can mitigate the blindness problem. We illustrate our proposed divergence in the context of de…
▽ More
Score-based divergences have been widely used in machine learning and statistics applications. Despite their empirical success, a blindness problem has been observed when using these for multi-modal distributions. In this work, we discuss the blindness problem and propose a new family of divergences that can mitigate the blindness problem. We illustrate our proposed divergence in the context of density estimation and report improved performance compared to traditional approaches.
△ Less
Submitted 15 October, 2022; v1 submitted 15 September, 2022;
originally announced September 2022.
-
Integrated Weak Learning
Authors:
Peter Hayes,
Mingtian Zhang,
Raza Habib,
Jordan Burgess,
Emine Yilmaz,
David Barber
Abstract:
We introduce Integrated Weak Learning, a principled framework that integrates weak supervision into the training process of machine learning models. Our approach jointly trains the end-model and a label model that aggregates multiple sources of weak supervision. We introduce a label model that can learn to aggregate weak supervision sources differently for different datapoints and takes into consi…
▽ More
We introduce Integrated Weak Learning, a principled framework that integrates weak supervision into the training process of machine learning models. Our approach jointly trains the end-model and a label model that aggregates multiple sources of weak supervision. We introduce a label model that can learn to aggregate weak supervision sources differently for different datapoints and takes into consideration the performance of the end-model during training. We show that our approach outperforms existing weak learning techniques across a set of 6 benchmark classification datasets. When both a small amount of labeled data and weak supervision are present the increase in performance is both consistent and large, reliably getting a 2-5 point test F1 score gain over non-integrated methods.
△ Less
Submitted 19 June, 2022;
originally announced June 2022.
-
How to Wake Up Your Neighbors: Safe and Nearly Optimal Generic Energy Conservation in Radio Networks
Authors:
Varsha Dani,
Thomas P. Hayes
Abstract:
Recent work has shown that it is sometimes feasible to significantly reduce the energy usage of some radio-network algorithms by adaptively powering down the radio receiver when it is not needed. Although past work has focused on modifying specific network algorithms in this way, we now ask the question of whether this problem can be solved in a generic way, treating the algorithm as a kind of bla…
▽ More
Recent work has shown that it is sometimes feasible to significantly reduce the energy usage of some radio-network algorithms by adaptively powering down the radio receiver when it is not needed. Although past work has focused on modifying specific network algorithms in this way, we now ask the question of whether this problem can be solved in a generic way, treating the algorithm as a kind of black box.
We are able to answer this question in the affirmative, presenting a new general way to modify arbitrary radio-network algorithms in an attempt to save energy. At the expense of a small increase in the time complexity, we can provably reduce the energy usage to an extent that is provably nearly optimal within a certain class of general-purpose algorithms.
As an application, we show that our algorithm reduces the energy cost of breadth-first search in radio networks from the previous best bound of $2^{O(\sqrt{\log n})}$ to $\mathrm{polylog}(n)$, where $n$ is the number of nodes in the network
A key ingredient in our algorithm is hierarchical clustering based on additive Voronoi decomposition done at multiple scales. Similar clustering algorithms have been used in other recent work on energy-aware computation in radio networks, but we believe the specific approach presented here may be of independent interest.
△ Less
Submitted 25 May, 2022;
originally announced May 2022.
-
Generalization Gap in Amortized Inference
Authors:
Mingtian Zhang,
Peter Hayes,
David Barber
Abstract:
The ability of likelihood-based probabilistic models to generalize to unseen data is central to many machine learning applications such as lossless compression. In this work, we study the generalization of a popular class of probabilistic model - the Variational Auto-Encoder (VAE). We discuss the two generalization gaps that affect VAEs and show that overfitting is usually dominated by amortized i…
▽ More
The ability of likelihood-based probabilistic models to generalize to unseen data is central to many machine learning applications such as lossless compression. In this work, we study the generalization of a popular class of probabilistic model - the Variational Auto-Encoder (VAE). We discuss the two generalization gaps that affect VAEs and show that overfitting is usually dominated by amortized inference. Based on this observation, we propose a new training objective that improves the generalization of amortized inference. We demonstrate how our method can improve performance in the context of image modeling and lossless compression.
△ Less
Submitted 15 October, 2022; v1 submitted 23 May, 2022;
originally announced May 2022.
-
Sample Efficient Model Evaluation
Authors:
Emine Yilmaz,
Peter Hayes,
Raza Habib,
Jordan Burgess,
David Barber
Abstract:
Labelling data is a major practical bottleneck in training and testing classifiers. Given a collection of unlabelled data points, we address how to select which subset to label to best estimate test metrics such as accuracy, $F_1$ score or micro/macro $F_1$. We consider two sampling based approaches, namely the well-known Importance Sampling and we introduce a novel application of Poisson Sampling…
▽ More
Labelling data is a major practical bottleneck in training and testing classifiers. Given a collection of unlabelled data points, we address how to select which subset to label to best estimate test metrics such as accuracy, $F_1$ score or micro/macro $F_1$. We consider two sampling based approaches, namely the well-known Importance Sampling and we introduce a novel application of Poisson Sampling. For both approaches we derive the minimal error sampling distributions and how to approximate and use them to form estimators and confidence intervals. We show that Poisson Sampling outperforms Importance Sampling both theoretically and experimentally.
△ Less
Submitted 24 September, 2021;
originally announced September 2021.
-
CeMux: Maximizing the Accuracy of Stochastic Mux Adders and an Application to Filter Design
Authors:
Timothy J. Baker,
John P. Hayes
Abstract:
Stochastic computing (SC) is a low-cost computational paradigm that has promising applications in digital filter design, image processing and neural networks. Fundamental to these applications is the weighted addition operation which is most often implemented by a multiplexer (mux) tree. Mux-based adders have very low area but typically require long bit-streams to reach practical accuracy threshol…
▽ More
Stochastic computing (SC) is a low-cost computational paradigm that has promising applications in digital filter design, image processing and neural networks. Fundamental to these applications is the weighted addition operation which is most often implemented by a multiplexer (mux) tree. Mux-based adders have very low area but typically require long bit-streams to reach practical accuracy thresholds when the number of summands is large. In this work, we first identify the main contributors to mux adder error. We then demonstrate with analysis and experiment that two new techniques, precise sampling and full correlation, can target and mitigate these error sources. Implementing these techniques in hardware leads to the design of CeMux (Correlation-enhanced Multiplexer), a stochastic mux adder that is significantly more accurate and uses much less area than traditional weighted adders. We compare CeMux to other SC and hybrid designs for an electrocardiogram filtering case study that employs a large digital filter. One major result is that CeMux is shown to be accurate even for large input sizes. CeMux's higher accuracy leads to a latency reduction of 4x to 16x over other designs. Further, CeMux uses about 35% less area than existing designs, and we demonstrate that a small amount of accuracy can be traded for a further 50% reduction in area. Finally, we compare CeMux to a conventional binary design and we show that CeMux can achieve a 50 to 73% area reduction for similar power and latency as the conventional design, but at a slightly higher level of error.
△ Less
Submitted 30 August, 2021; v1 submitted 27 August, 2021;
originally announced August 2021.
-
Reconstruction of Random Geometric Graphs: Breaking the Omega(r) distortion barrier
Authors:
Varsha Dani,
Josep Díaz,
Thomas P. Hayes,
Cristopher Moore
Abstract:
Embedding graphs in a geographical or latent space, i.e.\ inferring locations for vertices in Euclidean space or on a smooth manifold or submanifold, is a common task in network analysis, statistical inference, and graph visualization. We consider the classic model of random geometric graphs where $n$ points are scattered uniformly in a square of area $n$, and two points have an edge between them…
▽ More
Embedding graphs in a geographical or latent space, i.e.\ inferring locations for vertices in Euclidean space or on a smooth manifold or submanifold, is a common task in network analysis, statistical inference, and graph visualization. We consider the classic model of random geometric graphs where $n$ points are scattered uniformly in a square of area $n$, and two points have an edge between them if and only if their Euclidean distance is less than $r$. The reconstruction problem then consists of inferring the vertex positions, up to the symmetries of the square, given only the adjacency matrix of the resulting graph. We give an algorithm that, if $r=n^α$ for any $α> 0$, with high probability reconstructs the vertex positions with a maximum error of $O(n^β)$ where $β=1/2-(4/3)α$, until $α\ge 3/8$ where $β=0$ and the error becomes $O(\sqrt{\log n})$. This improves over earlier results, which were unable to reconstruct with error less than $r$. Our method estimates Euclidean distances using a hybrid of graph distances and short-range estimates based on the number of common neighbors. We extend our results to the surface of the sphere in $\R^3$ and to hypercubes in any constant fixed dimension. Additionally we examine the extent to which reconstruction is still possible when the original adjacency lists have had a subset of the edges independently deleted at random.
△ Less
Submitted 17 May, 2022; v1 submitted 29 July, 2021;
originally announced July 2021.
-
Estimating the Uncertainty of Neural Network Forecasts for Influenza Prevalence Using Web Search Activity
Authors:
Michael Morris,
Peter Hayes,
Ingemar J. Cox,
Vasileios Lampos
Abstract:
Influenza is an infectious disease with the potential to become a pandemic, and hence, forecasting its prevalence is an important undertaking for planning an effective response. Research has found that web search activity can be used to improve influenza models. Neural networks (NN) can provide state-of-the-art forecasting accuracy but do not commonly incorporate uncertainty in their estimates, so…
▽ More
Influenza is an infectious disease with the potential to become a pandemic, and hence, forecasting its prevalence is an important undertaking for planning an effective response. Research has found that web search activity can be used to improve influenza models. Neural networks (NN) can provide state-of-the-art forecasting accuracy but do not commonly incorporate uncertainty in their estimates, something essential for using them effectively during decision making. In this paper, we demonstrate how Bayesian Neural Networks (BNNs) can be used to both provide a forecast and a corresponding uncertainty without significant loss in forecasting accuracy compared to traditional NNs. Our method accounts for two sources of uncertainty: data and model uncertainty, arising due to measurement noise and model specification, respectively. Experiments are conducted using 14 years of data for England, assessing the model's accuracy over the last 4 flu seasons in this dataset. We evaluate the performance of different models including competitive baselines with conventional metrics as well as error functions that incorporate uncertainty estimates. Our empirical analysis indicates that considering both sources of uncertainty simultaneously is superior to considering either one separately. We also show that a BNN with recurrent layers that models both sources of uncertainty yields superior accuracy for these metrics for forecasting horizons greater than 7 days.
△ Less
Submitted 26 May, 2021;
originally announced May 2021.
-
Wake Up and Join Me! An Energy-Efficient Algorithm for Maximal Matching in Radio Networks
Authors:
Varsha Dani,
Aayush Gupta,
Thomas P. Hayes,
Seth Pettie
Abstract:
We consider networks of small, autonomous devices that communicate with each other wirelessly. Minimizing energy usage is an important consideration in designing algorithms for such networks, as battery life is a crucial and limited resource. Working in a model where both sending and listening for messages deplete energy, we consider the problem of finding a maximal matching of the nodes in a radi…
▽ More
We consider networks of small, autonomous devices that communicate with each other wirelessly. Minimizing energy usage is an important consideration in designing algorithms for such networks, as battery life is a crucial and limited resource. Working in a model where both sending and listening for messages deplete energy, we consider the problem of finding a maximal matching of the nodes in a radio network of arbitrary and unknown topology.
We present a distributed randomized algorithm that produces, with high probability, a maximal matching. The maximum energy cost per node is $O(\log^2 n)$, where $n$ is the size of the network. The total latency of our algorithm is $O(n \log n)$ time steps. We observe that there exist families of network topologies for which both of these bounds are simultaneously optimal up to polylog factors, so any significant improvement will require additional assumptions about the network topology.
We also consider the related problem of assigning, for each node in the network, a neighbor to back up its data in case of node failure. Here, a key goal is to minimize the maximum load, defined as the number of nodes assigned to a single node. We present a decentralized low-energy algorithm that finds a neighbor assignment whose maximum load is at most a polylog($n$) factor bigger that the optimum.
△ Less
Submitted 16 April, 2022; v1 submitted 19 April, 2021;
originally announced April 2021.
-
Are Multilevel functional models the next step in sports biomechanics and wearable technology? A case study of Knee Biomechanics patterns in typical training sessions of recreational runners
Authors:
Marcos Matabuena,
Sherveen Riazati,
Nick Caplan,
Phil Hayes
Abstract:
This paper illustrates how multilevel functional models can detect and characterize biomechanical changes along different sport training sessions. Our analysis focuses on the relevant cases to identify differences in knee biomechanics in recreational runners during low and high-intensity exercise sessions with the same energy expenditure by recording $20$ steps. To do so, we review the existing li…
▽ More
This paper illustrates how multilevel functional models can detect and characterize biomechanical changes along different sport training sessions. Our analysis focuses on the relevant cases to identify differences in knee biomechanics in recreational runners during low and high-intensity exercise sessions with the same energy expenditure by recording $20$ steps. To do so, we review the existing literature of multilevel models, and then, we propose a new hypothesis test to look at the changes between different levels of the multilevel model as low and high-intensity training sessions. We also evaluate the reliability of measures recorded in three-dimension knee angles from the functional intra-class correlation coefficient (ICC) obtained from the decomposition performed with the multilevel funcional model taking into account $20$ measures recorded in each test. The results show that there are no statistically significant differences between the two modes of exercise. However, we have to be careful with the conclusions since, as we have shown, human gait-patterns are very individual and heterogeneous between groups of athletes, and other alternatives to the p-value may be more appropriate to detect statistical differences in biomechanical changes in this context.
△ Less
Submitted 5 April, 2021; v1 submitted 29 March, 2021;
originally announced March 2021.
-
The Energy Complexity of BFS in Radio Networks
Authors:
Yi-Jun Chang,
Varsha Dani,
Thomas P. Hayes,
Seth Pettie
Abstract:
We consider a model of energy complexity in Radio Networks in which transmitting or listening on the channel costs one unit of energy and computation is free. This simplified model captures key aspects of battery-powered sensors: that battery life is most influenced by transceiver usage, and that at low transmission powers, the actual cost of transmitting and listening are very similar.
The ener…
▽ More
We consider a model of energy complexity in Radio Networks in which transmitting or listening on the channel costs one unit of energy and computation is free. This simplified model captures key aspects of battery-powered sensors: that battery life is most influenced by transceiver usage, and that at low transmission powers, the actual cost of transmitting and listening are very similar.
The energy complexity of tasks in single-hop networks is well understood. Recent work of Chang et al. considered energy complexity in multi-hop networks and showed that $\mathsf{Broadcast}$ admits an energy-efficient protocol, by which we mean each of the $n$ nodes in the network spends $O(\text{polylog}(n))$ energy. This work left open the strange possibility that all natural problems in multi-hop networks might admit such an energy-efficient solution.
In this paper we prove that the landscape of energy complexity is rich enough to support a multitude of problem complexities. Whereas $\mathsf{Broadcast}$ can be solved by an energy-efficient protocol, exact computation of $\mathsf{Diameter}$ cannot, requiring $Ω(n)$ energy. Our main result is that $\mathsf{Breadth First Search}$ has sub-polynomial energy complexity at most $2^{O(\sqrt{\log n\log\log n})}=n^{o(1)}$; whether it admits an efficient $O(\text{polylog}(n))$-energy protocol is an open problem.
Our main algorithm involves recursively solving a generalized BFS problem on a cluster graph introduced by Miller, Peng, and Xu. In this application, we make crucial use of a close relationship between distances in this cluster graph, and distances in the original network. This relationship is new and may be of independent interest.
△ Less
Submitted 19 July, 2020;
originally announced July 2020.
-
Improved Strong Spatial Mixing for Colorings on Trees
Authors:
Charilaos Efthymiou,
Andreas Galanis,
Thomas P. Hayes,
Daniel Stefankovic,
Eric Vigoda
Abstract:
Strong spatial mixing (SSM) is a form of correlation decay that has played an essential role in the design of approximate counting algorithms for spin systems. A notable example is the algorithm of Weitz (2006) for the hard-core model on weighted independent sets. We study SSM for the $q$-colorings problem on the infinite $(d+1)$-regular tree. Weak spatial mixing (WSM) captures whether the influen…
▽ More
Strong spatial mixing (SSM) is a form of correlation decay that has played an essential role in the design of approximate counting algorithms for spin systems. A notable example is the algorithm of Weitz (2006) for the hard-core model on weighted independent sets. We study SSM for the $q$-colorings problem on the infinite $(d+1)$-regular tree. Weak spatial mixing (WSM) captures whether the influence of the leaves on the root vanishes as the height of the tree grows. Jonasson (2002) established WSM when $q>d+1$. In contrast, in SSM, we first fix a coloring on a subset of internal vertices, and we again ask if the influence of the leaves on the root is vanishing. It was known that SSM holds on the $(d+1)$-regular tree when $q>αd$ where $α\approx 1.763...$ is a constant that has arisen in a variety of results concerning random colorings. Here we improve on this bound by showing SSM for $q>1.59d$. Our proof establishes an $L^2$ contraction for the BP operator. For the contraction we bound the norm of the BP Jacobian by exploiting combinatorial properties of the coloring of the tree.
△ Less
Submitted 16 September, 2019;
originally announced September 2019.
-
Distributed Metropolis Sampler with Optimal Parallelism
Authors:
Weiming Feng,
Thomas P. Hayes,
Yitong Yin
Abstract:
The Metropolis-Hastings algorithm is a fundamental Markov chain Monte Carlo (MCMC) method for sampling and inference. With the advent of Big Data, distributed and parallel variants of MCMC methods are attracting increased attention. In this paper, we give a distributed algorithm that can correctly simulate sequential single-site Metropolis chains without any bias in a fully asynchronous message-pa…
▽ More
The Metropolis-Hastings algorithm is a fundamental Markov chain Monte Carlo (MCMC) method for sampling and inference. With the advent of Big Data, distributed and parallel variants of MCMC methods are attracting increased attention. In this paper, we give a distributed algorithm that can correctly simulate sequential single-site Metropolis chains without any bias in a fully asynchronous message-passing model. Furthermore, if a natural Lipschitz condition is satisfied by the Metropolis filters, our algorithm can simulate $N$-step Metropolis chains within $O(N/n+\log n)$ rounds of asynchronous communications, where $n$ is the number of variables. For sequential single-site dynamics, whose mixing requires $Ω(n\log n)$ steps, this achieves an optimal linear speedup. For several well-studied important graphical models, including proper graph coloring, hardcore model, and Ising model, our condition for linear speedup is weaker than the respective uniqueness (mixing) conditions.
The novel idea in our algorithm is to resolve updates in advance: the local Metropolis filters can often be executed correctly before the full information about neighboring spins is available. This achieves optimal parallelism without introducing any bias.
△ Less
Submitted 14 July, 2019; v1 submitted 1 April, 2019;
originally announced April 2019.
-
Spread Divergence
Authors:
Mingtian Zhang,
Peter Hayes,
Tom Bird,
Raza Habib,
David Barber
Abstract:
For distributions $\mathbb{P}$ and $\mathbb{Q}$ with different supports or undefined densities, the divergence $\textrm{D}(\mathbb{P}||\mathbb{Q})$ may not exist. We define a Spread Divergence $\tilde{\textrm{D}}(\mathbb{P}||\mathbb{Q})$ on modified $\mathbb{P}$ and $\mathbb{Q}$ and describe sufficient conditions for the existence of such a divergence. We demonstrate how to maximize the discrimina…
▽ More
For distributions $\mathbb{P}$ and $\mathbb{Q}$ with different supports or undefined densities, the divergence $\textrm{D}(\mathbb{P}||\mathbb{Q})$ may not exist. We define a Spread Divergence $\tilde{\textrm{D}}(\mathbb{P}||\mathbb{Q})$ on modified $\mathbb{P}$ and $\mathbb{Q}$ and describe sufficient conditions for the existence of such a divergence. We demonstrate how to maximize the discriminatory power of a given divergence by parameterizing and learning the spread. We also give examples of using a Spread Divergence to train implicit generative models, including linear models (Independent Components Analysis) and non-linear models (Deep Generative Networks).
△ Less
Submitted 4 December, 2022; v1 submitted 21 November, 2018;
originally announced November 2018.
-
Distributed Symmetry Breaking in Sampling (Optimal Distributed Randomly Coloring with Fewer Colors)
Authors:
Weiming Feng,
Thomas P. Hayes,
Yitong Yin
Abstract:
We examine the problem of almost-uniform sampling proper $q$-colorings of a graph whose maximum degree is $Δ$. A famous result, discovered independently by Jerrum(1995) and Salas and Sokal(1997), is that, assuming $q > (2+δ) Δ$, the Glauber dynamics (a.k.a. single-site dynamics) for this problem has mixing time $O(n \log n)$, where $n$ is the number of vertices, and thus provides a nearly linear t…
▽ More
We examine the problem of almost-uniform sampling proper $q$-colorings of a graph whose maximum degree is $Δ$. A famous result, discovered independently by Jerrum(1995) and Salas and Sokal(1997), is that, assuming $q > (2+δ) Δ$, the Glauber dynamics (a.k.a. single-site dynamics) for this problem has mixing time $O(n \log n)$, where $n$ is the number of vertices, and thus provides a nearly linear time sampling algorithm for this problem. A natural question is the extent to which this algorithm can be parallelized. Previous work Feng, Sun and Yin [PODC'17] has shown that a $O(Δ\log n)$ time parallelized algorithm is possible, and that $Ω(\log n)$ time is necessary.
We give a distributed sampling algorithm, which we call the Lazy Local Metropolis Algorithm, that achieves an optimal parallelization of this classic algorithm. It improves its predecessor, the Local Metropolis algorithm of Feng, Sun and Yin [PODC'17], by introducing a step of distributed symmetry breaking that helps the mixing of the distributed sampling algorithm.
For sampling almost-uniform proper $q$-colorings of graphs $G$ on $n$ vertices, we show that the Lazy Local Metropolis algorithm achieves an optimal $O(\log n)$ mixing time if either of the following conditions is true for an arbitrary constant $δ>0$:
$\bullet$ $q\ge(2+δ)Δ$, on general graphs with maximum degree $Δ$;
$\bullet$ $q \geq (α^* + δ)Δ$, where $α^* \approx 1.763$ satisfies $α^* = \mathrm{e}^{1/α^*}$, on graphs with sufficiently large maximum degree $Δ\ge Δ_0(δ)$ and girth at least $9$.
△ Less
Submitted 21 June, 2018; v1 submitted 19 February, 2018;
originally announced February 2018.
-
The Energy Complexity of Broadcast
Authors:
Yi-Jun Chang,
Varsha Dani,
Thomas P. Hayes,
Qizheng He,
Wenzheng Li,
Seth Pettie
Abstract:
Energy is often the most constrained resource in networks of battery-powered devices, and as devices become smaller, they spend a larger fraction of their energy on communication (transceiver usage) not computation. As an imperfect proxy for true energy usage, we define energy complexity to be the number of time slots a device transmits/listens; idle time and computation are free.
In this paper…
▽ More
Energy is often the most constrained resource in networks of battery-powered devices, and as devices become smaller, they spend a larger fraction of their energy on communication (transceiver usage) not computation. As an imperfect proxy for true energy usage, we define energy complexity to be the number of time slots a device transmits/listens; idle time and computation are free.
In this paper we investigate the energy complexity of fundamental communication primitives such as broadcast in multi-hop radio networks. We consider models with collision detection (CD) and without (No-CD), as well as both randomized and deterministic algorithms. Some take-away messages from this work include:
1. The energy complexity of broadcast in a multi-hop network is intimately connected to the time complexity of leader election in a single-hop (clique) network. Many existing lower bounds on time complexity immediately transfer to energy complexity. For example, in the CD and No-CD models, we need $Ω(\log n)$ and $Ω(\log^2 n)$ energy, respectively.
2. The energy lower bounds above can almost be achieved, given sufficient ($Ω(n)$) time. In the CD and No-CD models we can solve broadcast using $O(\frac{\log n\log\log n}{\log\log\log n})$ energy and $O(\log^3 n)$ energy, respectively.
3. The complexity measures of Energy and Time are in conflict, and it is an open problem whether both can be minimized simultaneously. We give a tradeoff showing it is possible to be nearly optimal in both measures simultaneously. For any constant $ε>0$, broadcast can be solved in $O(D^{1+ε}\log^{O(1/ε)} n)$ time with $O(\log^{O(1/ε)} n)$ energy, where $D$ is the diameter of the network.
△ Less
Submitted 4 October, 2017;
originally announced October 2017.
-
Sampling Random Colorings of Sparse Random Graphs
Authors:
Charilaos Efthymiou,
Thomas P. Hayes,
Daniel Stefankovic,
Eric Vigoda
Abstract:
We study the mixing properties of the single-site Markov chain known as the Glauber dynamics for sampling $k$-colorings of a sparse random graph $G(n,d/n)$ for constant $d$. The best known rapid mixing results for general graphs are in terms of the maximum degree $Δ$ of the input graph $G$ and hold when $k>11Δ/6$ for all $G$. Improved results hold when $k>αΔ$ for graphs with girth $\geq 5$ and…
▽ More
We study the mixing properties of the single-site Markov chain known as the Glauber dynamics for sampling $k$-colorings of a sparse random graph $G(n,d/n)$ for constant $d$. The best known rapid mixing results for general graphs are in terms of the maximum degree $Δ$ of the input graph $G$ and hold when $k>11Δ/6$ for all $G$. Improved results hold when $k>αΔ$ for graphs with girth $\geq 5$ and $Δ$ sufficiently large where $α\approx 1.7632\ldots$ is the root of $α=\exp(1/α)$; further improvements on the constant $α$ hold with stronger girth and maximum degree assumptions. For sparse random graphs the maximum degree is a function of $n$ and the goal is to obtain results in terms of the expected degree $d$. The following rapid mixing results for $G(n,d/n)$ hold with high probability over the choice of the random graph for sufficiently large constant~$d$. Mossel and Sly (2009) proved rapid mixing for constant $k$, and Efthymiou (2014) improved this to $k$ linear in~$d$. The condition was improved to $k>3d$ by Yin and Zhang (2016) using non-MCMC methods. Here we prove rapid mixing when $k>αd$ where $α\approx 1.7632\ldots$ is the same constant as above. Moreover we obtain $O(n^{3})$ mixing time of the Glauber dynamics, while in previous rapid mixing results the exponent was an increasing function in $d$. As in previous results for random graphs our proof analyzes an appropriately defined block dynamics to "hide" high-degree vertices. One new aspect in our improved approach is utilizing so-called local uniformity properties for the analysis of block dynamics. To analyze the "burn-in" phase we prove a concentration inequality for the number of disagreements propagating in large blocks.
△ Less
Submitted 12 July, 2017;
originally announced July 2017.
-
Energy-Efficient Hybrid Stochastic-Binary Neural Networks for Near-Sensor Computing
Authors:
Vincent T. Lee,
Armin Alaghi,
John P. Hayes,
Visvesh Sathe,
Luis Ceze
Abstract:
Recent advances in neural networks (NNs) exhibit unprecedented success at transforming large, unstructured data streams into compact higher-level semantic information for tasks such as handwriting recognition, image classification, and speech recognition. Ideally, systems would employ near-sensor computation to execute these tasks at sensor endpoints to maximize data reduction and minimize data mo…
▽ More
Recent advances in neural networks (NNs) exhibit unprecedented success at transforming large, unstructured data streams into compact higher-level semantic information for tasks such as handwriting recognition, image classification, and speech recognition. Ideally, systems would employ near-sensor computation to execute these tasks at sensor endpoints to maximize data reduction and minimize data movement. However, near- sensor computing presents its own set of challenges such as operating power constraints, energy budgets, and communication bandwidth capacities. In this paper, we propose a stochastic- binary hybrid design which splits the computation between the stochastic and binary domains for near-sensor NN applications. In addition, our design uses a new stochastic adder and multiplier that are significantly more accurate than existing adders and multipliers. We also show that retraining the binary portion of the NN computation can compensate for precision losses introduced by shorter stochastic bit-streams, allowing faster run times at minimal accuracy losses. Our evaluation shows that our hybrid stochastic-binary design can achieve 9.8x energy efficiency savings, and application-level accuracies within 0.05% compared to conventional all-binary designs.
△ Less
Submitted 7 June, 2017;
originally announced June 2017.
-
Distributed Computing with Channel Noise
Authors:
Abhinav Aggarwal,
Varsha Dani,
Thomas P. Hayes,
Jared Saia
Abstract:
A group of $n$ users want to run a distributed protocol $π$ over a network where communication occurs via private point-to-point channels. Unfortunately, an adversary, who knows $π$, is able to maliciously flip bits on the channels. Can we efficiently simulate $π$ in the presence of such an adversary? We show that this is possible, even when $L$, the number of bits sent in $π$, and $T$, the number…
▽ More
A group of $n$ users want to run a distributed protocol $π$ over a network where communication occurs via private point-to-point channels. Unfortunately, an adversary, who knows $π$, is able to maliciously flip bits on the channels. Can we efficiently simulate $π$ in the presence of such an adversary? We show that this is possible, even when $L$, the number of bits sent in $π$, and $T$, the number of bits flipped by the adversary are not known in advance. In particular, we show how to create a robust version of $π$ that 1) fails with probability at most $δ$, for any $δ>0$; and 2) sends $\tilde{O}(L + T)$ bits, where the $\tilde{O}$ notation hides a $\log (nL/ δ)$ term multiplying $L$. Additionally, we show how to improve this result when the average message size $α$ is not constant. In particular, we give an algorithm that sends $O( L (1 + (1/α) \log (n L/δ) + T)$ bits. This algorithm is adaptive in that it does not require a priori knowledge of $α$. We note that if $α$ is $Ω\left( \log (n L/δ) \right)$, then this improved algorithm sends only $O(L+T)$ bits, and is therefore within a constant factor of optimal.
△ Less
Submitted 24 July, 2017; v1 submitted 18 December, 2016;
originally announced December 2016.
-
Codes, Lower Bounds, and Phase Transitions in the Symmetric Rendezvous Problem
Authors:
Varsha Dani,
Thomas P. Hayes,
Cristopher Moore,
Alexander Russell
Abstract:
In the rendezvous problem, two parties with different labelings of the vertices of a complete graph are trying to meet at some vertex at the same time. It is well-known that if the parties have predetermined roles, then the strategy where one of them waits at one vertex, while the other visits all $n$ vertices in random order is optimal, taking at most $n$ steps and averaging about $n/2$. Anderson…
▽ More
In the rendezvous problem, two parties with different labelings of the vertices of a complete graph are trying to meet at some vertex at the same time. It is well-known that if the parties have predetermined roles, then the strategy where one of them waits at one vertex, while the other visits all $n$ vertices in random order is optimal, taking at most $n$ steps and averaging about $n/2$. Anderson and Weber considered the symmetric rendezvous problem, where both parties must use the same randomized strategy. They analyzed strategies where the parties repeatedly play the optimal asymmetric strategy, determining their role independently each time by a biased coin-flip. By tuning the bias, Anderson and Weber achieved an expected meeting time of about $0.829 n$, which they conjectured to be asymptotically optimal.
We change perspective slightly: instead of minimizing the expected meeting time, we seek to maximize the probability of meeting within a specified time $T$. The Anderson-Weber strategy, which fails with constant probability when $T= Θ(n)$, is not asymptotically optimal for large $T$ in this setting. Specifically, we exhibit a symmetric strategy that succeeds with probability $1-o(1)$ in $T=4n$ steps. This is tight: for any $α< 4$, any symmetric strategy with $T = αn$ fails with constant probability. Our strategy uses a new combinatorial object that we dub a "rendezvous code," which may be of independent interest.
When $T \le n$, we show that the probability of meeting within $T$ steps is indeed asymptotically maximized by the Anderson-Weber strategy. Our results imply new lower bounds, showing that the best symmetric strategy takes at least $0.638 n$ steps in expectation. We also present some partial results for the symmetric rendezvous problem on other vertex-transitive graphs.
△ Less
Submitted 6 September, 2016;
originally announced September 2016.
-
Evaluation System for a Bayesian Optimization Service
Authors:
Ian Dewancker,
Michael McCourt,
Scott Clark,
Patrick Hayes,
Alexandra Johnson,
George Ke
Abstract:
Bayesian optimization is an elegant solution to the hyperparameter optimization problem in machine learning. Building a reliable and robust Bayesian optimization service requires careful testing methodology and sound statistical analysis. In this talk we will outline our development of an evaluation framework to rigorously test and measure the impact of changes to the SigOpt optimization service.…
▽ More
Bayesian optimization is an elegant solution to the hyperparameter optimization problem in machine learning. Building a reliable and robust Bayesian optimization service requires careful testing methodology and sound statistical analysis. In this talk we will outline our development of an evaluation framework to rigorously test and measure the impact of changes to the SigOpt optimization service. We present an overview of our evaluation system and discuss how this framework empowers our research engineers to confidently and quickly make changes to our core optimization engine
△ Less
Submitted 19 May, 2016;
originally announced May 2016.
-
Convergence of MCMC and Loopy BP in the Tree Uniqueness Region for the Hard-Core Model
Authors:
Charilaos Efthymiou,
Thomas P. Hayes,
Daniel Stefankovic,
Eric Vigoda,
Yitong Yin
Abstract:
We study the hard-core model defined on independent sets of an input graph where the independent sets are weighted by a parameter $λ>0$. For constant $Δ$, previous work of Weitz (2006) established an FPTAS for the partition function for graphs of maximum degree $Δ$ when $λ< λ_c(Δ)$. The threshold $λ_c(Δ)$ is the critical point for the phase transition for uniqueness/non-uniqueness on the infinite…
▽ More
We study the hard-core model defined on independent sets of an input graph where the independent sets are weighted by a parameter $λ>0$. For constant $Δ$, previous work of Weitz (2006) established an FPTAS for the partition function for graphs of maximum degree $Δ$ when $λ< λ_c(Δ)$. The threshold $λ_c(Δ)$ is the critical point for the phase transition for uniqueness/non-uniqueness on the infinite $Δ$-regular trees. Sly (2010) showed that there is no FPRAS, unless NP=RP, when $λ>λ_c(Δ)$. The running time of Weitz's algorithm is exponential in $\log(Δ)$. Here we present an FPRAS for the partition function whose running time is $O^*(n^2)$. We analyze the simple single-site Glauber dynamics for sampling from the associated Gibbs distribution. We prove there exists a constant $Δ_0$ such that for all graphs with maximum degree $Δ\geqΔ_0$ and girth $\geq 7$, the mixing time of the Glauber dynamics is $O(n\log(n))$ when $λ<λ_c(Δ)$. Our work complements that of Weitz which applies for constant $Δ$ whereas our work applies for all $Δ\geq Δ_0$.
We utilize loopy BP (belief propagation), a widely-used inference algorithm. A novel aspect of our work is using the principal eigenvector for the BP operator to design a distance function which contracts in expectation for pairs of states that behave like the BP fixed point. We also prove that the Glauber dynamics behaves locally like loopy BP. As a byproduct we obtain that the Glauber dynamics converges, after a short burn-in period, close to the BP fixed point, and this implies that the fixed point of loopy BP is a close approximation to the Gibbs distribution. Using these connections we establish that loopy BP quickly converges to the Gibbs distribution when the girth $\geq 6$ and $λ<λ_c(Δ)$.
△ Less
Submitted 29 August, 2016; v1 submitted 5 April, 2016;
originally announced April 2016.
-
A Stratified Analysis of Bayesian Optimization Methods
Authors:
Ian Dewancker,
Michael McCourt,
Scott Clark,
Patrick Hayes,
Alexandra Johnson,
George Ke
Abstract:
Empirical analysis serves as an important complement to theoretical analysis for studying practical Bayesian optimization. Often empirical insights expose strengths and weaknesses inaccessible to theoretical analysis. We define two metrics for comparing the performance of Bayesian optimization methods and propose a ranking mechanism for summarizing performance within various genres or strata of te…
▽ More
Empirical analysis serves as an important complement to theoretical analysis for studying practical Bayesian optimization. Often empirical insights expose strengths and weaknesses inaccessible to theoretical analysis. We define two metrics for comparing the performance of Bayesian optimization methods and propose a ranking mechanism for summarizing performance within various genres or strata of test functions. These test functions serve to mimic the complexity of hyperparameter optimization problems, the most prominent application of Bayesian optimization, but with a closed form which allows for rapid evaluation and more predictable behavior. This offers a flexible and efficient way to investigate functions with specific properties of interest, such as oscillatory behavior or an optimum on the domain boundary.
△ Less
Submitted 30 March, 2016;
originally announced March 2016.
-
Interactive Communication with Unknown Noise Rate
Authors:
Varsha Dani,
Thomas P. Hayes,
Mahnush Movahedi,
Jared Saia,
Maxwell Young
Abstract:
Alice and Bob want to run a protocol over a noisy channel, where a certain number of bits are flipped adversarially. Several results take a protocol requiring $L$ bits of noise-free communication and make it robust over such a channel. In a recent breakthrough result, Haeupler described an algorithm that sends a number of bits that is conjectured to be near optimal in such a model. However, his al…
▽ More
Alice and Bob want to run a protocol over a noisy channel, where a certain number of bits are flipped adversarially. Several results take a protocol requiring $L$ bits of noise-free communication and make it robust over such a channel. In a recent breakthrough result, Haeupler described an algorithm that sends a number of bits that is conjectured to be near optimal in such a model. However, his algorithm critically requires $a \ priori$ knowledge of the number of bits that will be flipped by the adversary.
We describe an algorithm requiring no such knowledge. If an adversary flips $T$ bits, our algorithm sends $L + O\left(\sqrt{L(T+1)\log L} + T\right)$ bits in expectation and succeeds with high probability in $L$. It does so without any $a \ priori$ knowledge of $T$. Assuming a conjectured lower bound by Haeupler, our result is optimal up to logarithmic factors.
Our algorithm critically relies on the assumption of a private channel. We show that privacy is necessary when the amount of noise is unknown.
△ Less
Submitted 13 August, 2015; v1 submitted 23 April, 2015;
originally announced April 2015.
-
Spatial Mixing for Independent Sets in Poisson Random Trees
Authors:
Varsha Dani,
Thomas P. Hayes,
Cristopher Moore
Abstract:
We consider correlation decay in the hard-core model with fugacity $λ$ on a rooted tree $T$ in which the arity of each vertex is independently Poisson distributed with mean $d$. Specifically, we investigate the question of which parameter settings $(d, λ)$ result in strong spatial mixing, weak spatial mixing, or neither. (In our context, weak spatial mixing is equivalent to Gibbs uniqueness.) For…
▽ More
We consider correlation decay in the hard-core model with fugacity $λ$ on a rooted tree $T$ in which the arity of each vertex is independently Poisson distributed with mean $d$. Specifically, we investigate the question of which parameter settings $(d, λ)$ result in strong spatial mixing, weak spatial mixing, or neither. (In our context, weak spatial mixing is equivalent to Gibbs uniqueness.) For finite fugacity, a zero-one law implies that these spatial mixing properties hold either almost surely or almost never, once we have conditioned on whether $T$ is finite or infinite.
We provide a partial answer to this question, which implies in particular that
1. As $d \to \infty$, weak spatial mixing on the Poisson tree occurs whenever $λ< f(d) - o(1)$ but not when $λ$ is slightly above $f(d)$, where $f(d)$ is the threshold for WSM (and SSM) on the $d$-regular tree. This suggests that, in most cases, Poisson trees have similar spatial mixing behavior to regular trees.
2. When $1 < d \le 1.179$, there is weak spatial mixing on the Poisson($d$) tree for all values of $λ$. However, strong spatial mixing does not hold for sufficiently large $λ$. This is in contrast to regular trees, for which strong spatial mixing and weak spatial mixing always coincide.
For infinite fugacity SSM holds only when the tree is finite, and hence almost surely fails on the Poisson($d$) tree when $d>1$. We show that WSM almost surely holds on the Poisson($d$) tree for $d < \mathbf{e}^{1/\sqrt{2}}/\sqrt{2} =1.434...$, but that it fails with positive probability if $d>\mathbf{e}$.
△ Less
Submitted 21 February, 2015;
originally announced February 2015.
-
Lower Bounds on the Critical Density in the Hard Disk Model via Optimized Metrics
Authors:
Thomas P. Hayes,
Cristopher Moore
Abstract:
We prove a new lower bound on the critical density $ρ_c$ of the hard disk model, i.e., the density below which it is possible to efficiently sample random configurations of $n$ non-overlap** disks in a unit torus. We use a classic Markov chain which moves one disk at a time, but with an improved path coupling analysis. Our main tool is an optimized metric on neighboring pairs of configurations,…
▽ More
We prove a new lower bound on the critical density $ρ_c$ of the hard disk model, i.e., the density below which it is possible to efficiently sample random configurations of $n$ non-overlap** disks in a unit torus. We use a classic Markov chain which moves one disk at a time, but with an improved path coupling analysis. Our main tool is an optimized metric on neighboring pairs of configurations, i.e., configurations that differ in the position of a single disk: we define a metric that depends on the difference in these positions, and which approaches zero continuously as they coincide. This improves the previous lower bound $ρ_c \ge 1/8$ to $ρ_c \ge 0.154$.
△ Less
Submitted 7 July, 2014;
originally announced July 2014.
-
Block Coordinate Descent for Sparse NMF
Authors:
Vamsi K. Potluru,
Sergey M. Plis,
Jonathan Le Roux,
Barak A. Pearlmutter,
Vince D. Calhoun,
Thomas P. Hayes
Abstract:
Nonnegative matrix factorization (NMF) has become a ubiquitous tool for data analysis. An important variant is the sparse NMF problem which arises when we explicitly require the learnt features to be sparse. A natural measure of sparsity is the L$_0$ norm, however its optimization is NP-hard. Mixed norms, such as L$_1$/L$_2$ measure, have been shown to model sparsity robustly, based on intuitive a…
▽ More
Nonnegative matrix factorization (NMF) has become a ubiquitous tool for data analysis. An important variant is the sparse NMF problem which arises when we explicitly require the learnt features to be sparse. A natural measure of sparsity is the L$_0$ norm, however its optimization is NP-hard. Mixed norms, such as L$_1$/L$_2$ measure, have been shown to model sparsity robustly, based on intuitive attributes that such measures need to satisfy. This is in contrast to computationally cheaper alternatives such as the plain L$_1$ norm. However, present algorithms designed for optimizing the mixed norm L$_1$/L$_2$ are slow and other formulations for sparse NMF have been proposed such as those based on L$_1$ and L$_0$ norms. Our proposed algorithm allows us to solve the mixed norm sparsity constraints while not sacrificing computation time. We present experimental evidence on real-world datasets that shows our new algorithm performs an order of magnitude faster compared to the current state-of-the-art solvers optimizing the mixed norm and is suitable for large-scale datasets.
△ Less
Submitted 18 March, 2013; v1 submitted 15 January, 2013;
originally announced January 2013.
-
How Not to Win a Million Dollars: A Counterexample to a Conjecture of L. Breiman
Authors:
Thomas P. Hayes
Abstract:
Consider a gambling game in which we are allowed to repeatedly bet a portion of our bankroll at favorable odds. We investigate the question of how to minimize the expected number of rounds needed to increase our bankroll to a given target amount.
Specifically, we disprove a 50-year old conjecture of L. Breiman, that there exists a threshold strategy that optimizes the expected number of rounds;…
▽ More
Consider a gambling game in which we are allowed to repeatedly bet a portion of our bankroll at favorable odds. We investigate the question of how to minimize the expected number of rounds needed to increase our bankroll to a given target amount.
Specifically, we disprove a 50-year old conjecture of L. Breiman, that there exists a threshold strategy that optimizes the expected number of rounds; that is, a strategy that always bets to try to win in one round whenever the bankroll is at least a certain threshold, and that makes Kelly bets (a simple proportional betting scheme) whenever the bankroll is below the threshold.
△ Less
Submitted 4 December, 2011;
originally announced December 2011.
-
The Adams-Bashforth-Moulton Integration Methods Generalized to an Adaptive Grid
Authors:
A. P. Hayes
Abstract:
We present a generalization of the Adams-Bashforth-Moulton predictor-corrector numerical integration methods to an adaptive grid. The step size may be chosen dynamically in order to maintain a desired relative magnitude of error in each step. We demonstrate that the methods remain convergent to the expected degree, and apply various methods to the famous problem of determining the maximum possible…
▽ More
We present a generalization of the Adams-Bashforth-Moulton predictor-corrector numerical integration methods to an adaptive grid. The step size may be chosen dynamically in order to maintain a desired relative magnitude of error in each step. We demonstrate that the methods remain convergent to the expected degree, and apply various methods to the famous problem of determining the maximum possible mass of a neutron star supported by pure fermionic exclusion pressure. We reproduce the Tolman-Oppenheimer-Volkoff result of 0.71 solar masses using only 23 integration steps, and reproducing both mass and radius within 1% requires 27. We also present various optimizations and features of our implementation.
△ Less
Submitted 15 April, 2011;
originally announced April 2011.
-
A Mechanism for the Present-Day Creation of a New Class of Black Holes
Authors:
Andrew P. Hayes,
Neil F. Comins
Abstract:
In this first paper of a series on the formation and abundance of substellar mass dwarf black holes (DBHs), we present a heuristic for deducing the stability of non-rotating matter embedded in a medium against collapse and the formation of a black hole. We demonstrate the heuristic's accuracy for a family of spherical mass distributions whose stability is known through other means. We also present…
▽ More
In this first paper of a series on the formation and abundance of substellar mass dwarf black holes (DBHs), we present a heuristic for deducing the stability of non-rotating matter embedded in a medium against collapse and the formation of a black hole. We demonstrate the heuristic's accuracy for a family of spherical mass distributions whose stability is known through other means. We also present the applications of this heuristic that could be applied to data sets of various types of simulations, including the possible formation of DBHs in the expanding gases of extreme astrophysical phenomena including Type Ia and Type II supernovae, hypernovae, and in the collision of two compact objects. These papers will also explore the observational and cosmological implications of DBHs, including estimates of the total masses of these objects bound in galaxies and ejected into the intergalactic medium. Applying our formalism to a Type II supernova simulation, we have found regions in one data set that are within a factor of three to four of both the density and mass necessary to create a DBH.
△ Less
Submitted 13 April, 2011;
originally announced April 2011.
-
Randomly coloring planar graphs with fewer colors than the maximum degree
Authors:
Thomas P. Hayes,
Juan C. Vera,
Eric Vigoda
Abstract:
We study Markov chains for randomly sampling $k$-colorings of a graph with maximum degree $Δ$. Our main result is a polynomial upper bound on the mixing time of the single-site update chain known as the Glauber dynamics for planar graphs when $k=Ω(Δ/\logΔ)$. Our results can be partially extended to the more general case where the maximum eigenvalue of the adjacency matrix of the graph is at most…
▽ More
We study Markov chains for randomly sampling $k$-colorings of a graph with maximum degree $Δ$. Our main result is a polynomial upper bound on the mixing time of the single-site update chain known as the Glauber dynamics for planar graphs when $k=Ω(Δ/\logΔ)$. Our results can be partially extended to the more general case where the maximum eigenvalue of the adjacency matrix of the graph is at most $Δ^{1-\eps}$, for fixed $\eps > 0$.
The main challenge when $k \le Δ+ 1$ is the possibility of "frozen" vertices, that is, vertices for which only one color is possible, conditioned on the colors of its neighbors. Indeed, when $Δ= O(1)$, even a typical coloring can have a constant fraction of the vertices frozen. Our proofs rely on recent advances in techniques for bounding mixing time using "local uniformity" properties.
△ Less
Submitted 31 August, 2011; v1 submitted 11 June, 2007;
originally announced June 2007.
-
Checking Equivalence of Quantum Circuits and States
Authors:
George F. Viamontes,
Igor L. Markov,
John P. Hayes
Abstract:
Quantum computing promises exponential speed-ups for important simulation and optimization problems. It also poses new CAD problems that are similar to, but more challenging, than the related problems in classical (non-quantum) CAD, such as determining if two states or circuits are functionally equivalent. While differences in classical states are easy to detect, quantum states, which are repres…
▽ More
Quantum computing promises exponential speed-ups for important simulation and optimization problems. It also poses new CAD problems that are similar to, but more challenging, than the related problems in classical (non-quantum) CAD, such as determining if two states or circuits are functionally equivalent. While differences in classical states are easy to detect, quantum states, which are represented by complex-valued vectors, exhibit subtle differences leading to several notions of equivalence. This provides flexibility in optimizing quantum circuits, but leads to difficult new equivalence-checking issues for simulation and synthesis. We identify several different equivalence-checking problems and present algorithms for practical benchmarks, including quantum communication and search circuits, which are shown to be very fast and robust for hundreds of qubits.
△ Less
Submitted 1 May, 2007; v1 submitted 1 May, 2007;
originally announced May 2007.
-
Coupling with the stationary distribution and improved sampling for colorings and independent sets
Authors:
Thomas P. Hayes,
Eric Vigoda
Abstract:
We present an improved coupling technique for analyzing the mixing time of Markov chains. Using our technique, we simplify and extend previous results for sampling colorings and independent sets. Our approach uses properties of the stationary distribution to avoid worst-case configurations which arise in the traditional approach. As an application, we show that for $k/Δ>1.764$, the Glauber dynam…
▽ More
We present an improved coupling technique for analyzing the mixing time of Markov chains. Using our technique, we simplify and extend previous results for sampling colorings and independent sets. Our approach uses properties of the stationary distribution to avoid worst-case configurations which arise in the traditional approach. As an application, we show that for $k/Δ>1.764$, the Glauber dynamics on $k$-colorings of a graph on $n$ vertices with maximum degree $Δ$ converges in $O(n\log n)$ steps, assuming $Δ=Ω(\log n)$ and that the graph is triangle-free. Previously, girth $\ge 5$ was needed. As a second application, we give a polynomial-time algorithm for sampling weighted independent sets from the Gibbs distribution of the hard-core lattice gas model at fugacity $λ<(1-ε)e/Δ$, on a regular graph $G$ on $n$ vertices of degree $Δ=Ω(\log n)$ and girth $\ge 6$. The best known algorithm for general graphs currently assumes $λ<2/(Δ-2)$.
△ Less
Submitted 5 October, 2006;
originally announced October 2006.
-
How to Beat the Adaptive Multi-Armed Bandit
Authors:
Varsha Dani,
Thomas P. Hayes
Abstract:
The multi-armed bandit is a concise model for the problem of iterated decision-making under uncertainty. In each round, a gambler must pull one of $K$ arms of a slot machine, without any foreknowledge of their payouts, except that they are uniformly bounded. A standard objective is to minimize the gambler's regret, defined as the gambler's total payout minus the largest payout which would have b…
▽ More
The multi-armed bandit is a concise model for the problem of iterated decision-making under uncertainty. In each round, a gambler must pull one of $K$ arms of a slot machine, without any foreknowledge of their payouts, except that they are uniformly bounded. A standard objective is to minimize the gambler's regret, defined as the gambler's total payout minus the largest payout which would have been achieved by any fixed arm, in hindsight. Note that the gambler is only told the payout for the arm actually chosen, not for the unchosen arms.
Almost all previous work on this problem assumed the payouts to be non-adaptive, in the sense that the distribution of the payout of arm $j$ in round $i$ is completely independent of the choices made by the gambler on rounds $1, \dots, i-1$. In the more general model of adaptive payouts, the payouts in round $i$ may depend arbitrarily on the history of past choices made by the algorithm.
We present a new algorithm for this problem, and prove nearly optimal guarantees for the regret against both non-adaptive and adaptive adversaries. After $T$ rounds, our algorithm has regret $O(\sqrt{T})$ with high probability (the tail probability decays exponentially). This dependence on $T$ is best possible, and matches that of the full-information version of the problem, in which the gambler is told the payouts for all $K$ arms after each round.
Previously, even for non-adaptive payouts, the best high-probability bounds known were $O(T^{2/3})$, due to Auer, Cesa-Bianchi, Freund and Schapire. The expected regret of their algorithm is $O(T^{1/2}) for non-adaptive payouts, but as we show, $Ω(T^{2/3})$ for adaptive payouts.
△ Less
Submitted 14 February, 2006;
originally announced February 2006.
-
A general lower bound for mixing of single-site dynamics on graphs
Authors:
Thomas P. Hayes,
Alistair Sinclair
Abstract:
We prove that any Markov chain that performs local, reversible updates on randomly chosen vertices of a bounded-degree graph necessarily has mixing time at least $Ω(n\log n)$, where $n$ is the number of vertices. Our bound applies to the so-called ``Glauber dynamics'' that has been used extensively in algorithms for the Ising model, independent sets, graph colorings and other structures in compu…
▽ More
We prove that any Markov chain that performs local, reversible updates on randomly chosen vertices of a bounded-degree graph necessarily has mixing time at least $Ω(n\log n)$, where $n$ is the number of vertices. Our bound applies to the so-called ``Glauber dynamics'' that has been used extensively in algorithms for the Ising model, independent sets, graph colorings and other structures in computer science and statistical physics, and demonstrates that many of these algorithms are optimal up to constant factors within their class. Previously, no superlinear lower bound was known for this class of algorithms. Though widely conjectured, such a bound had been proved previously only in very restricted circumstances, such as for the empty graph and the path. We also show that the assumption of bounded degree is necessary by giving a family of dynamics on graphs of unbounded degree with mixing time O(n).
△ Less
Submitted 1 August, 2007; v1 submitted 25 July, 2005;
originally announced July 2005.
-
Is Quantum Search Practical?
Authors:
George F. Viamontes,
Igor L. Markov,
John P. Hayes
Abstract:
Quantum algorithms and circuits can, in principle, outperform the best non-quantum (classical) techniques for some hard computational problems. However, this does not necessarily lead to useful applications. To gauge the practical significance of a quantum algorithm, one must weigh it against the best conventional techniques applied to useful instances of the same problem. Grover's quantum searc…
▽ More
Quantum algorithms and circuits can, in principle, outperform the best non-quantum (classical) techniques for some hard computational problems. However, this does not necessarily lead to useful applications. To gauge the practical significance of a quantum algorithm, one must weigh it against the best conventional techniques applied to useful instances of the same problem. Grover's quantum search algorithm is one of the most widely studied.
We identify requirements for Grover's algorithm to be useful in practice: (1) a search application S where classical methods do not provide sufficient scalability; (2) an instantiation of Grover's algorithm Q(S) for S that has a smaller asymptotic worst-case runtime than any classical algorithm C(S) for S; (3) Q(S) with smaller actual runtime for practical instances of S than that of any C(S).
We show that several commonly-suggested applications fail to satisfy these requirements, and outline directions for future work on quantum search.
△ Less
Submitted 30 April, 2004;
originally announced May 2004.
-
Fault Testing for Reversible Circuits
Authors:
Ketan N. Patel,
John P. Hayes,
Igor L. Markov
Abstract:
Applications of reversible circuits can be found in the fields of low-power computation, cryptography, communications, digital signal processing, and the emerging field of quantum computation. Furthermore, prototype circuits for low-power applications are already being fabricated in CMOS. Regardless of the eventual technology adopted, testing is sure to be an important component in any robust im…
▽ More
Applications of reversible circuits can be found in the fields of low-power computation, cryptography, communications, digital signal processing, and the emerging field of quantum computation. Furthermore, prototype circuits for low-power applications are already being fabricated in CMOS. Regardless of the eventual technology adopted, testing is sure to be an important component in any robust implementation.
We consider the test set generation problem. Reversibility affects the testing problem in fundamental ways, making it significantly simpler than for the irreversible case. For example, we show that any test set that detects all single stuck-at faults in a reversible circuit also detects all multiple stuck-at faults. We present efficient test set constructions for the standard stuck-at fault model as well as the usually intractable cell-fault model. We also give a practical test set generation algorithm, based on an integer linear programming formulation, that yields test sets approximately half the size of those produced by conventional ATPG.
△ Less
Submitted 31 March, 2004;
originally announced April 2004.
-
Graph-based simulation of quantum computation in the density matrix representation
Authors:
George F. Viamontes,
Igor L. Markov,
John P. Hayes
Abstract:
Quantum-mechanical phenomena are playing an increasing role in information processing, as transistor sizes approach the nanometer level, and quantum circuits and data encoding methods appear in the securest forms of communication. Simulating such phenomena efficiently is exceedingly difficult because of the vast size of the quantum state space involved. A major complication is caused by errors (…
▽ More
Quantum-mechanical phenomena are playing an increasing role in information processing, as transistor sizes approach the nanometer level, and quantum circuits and data encoding methods appear in the securest forms of communication. Simulating such phenomena efficiently is exceedingly difficult because of the vast size of the quantum state space involved. A major complication is caused by errors (noise) due to unwanted interactions between the quantum states and the environment. Consequently, simulating quantum circuits and their associated errors using the density matrix representation is potentially significant in many applications, but is well beyond the computational abilities of most classical simulation techniques in both time and memory resources. The size of a density matrix grows exponentially with the number of qubits simulated, rendering array-based simulation techniques that explicitly store the density matrix intractable. In this work, we propose a new technique aimed at efficiently simulating quantum circuits that are subject to errors. In particular, we describe new graph-based algorithms implemented in the simulator QuIDDPro/D. While previously reported graph-based simulators operate in terms of the state-vector representation, these new algorithms use the density matrix representation. To gauge the improvements offered by QuIDDPro/D, we compare its simulation performance with an optimized array-based simulator called QCSim. Empirical results, generated by both simulators on a set of quantum circuit benchmarks involving error correction, reversible logic, communication, and quantum search, show that the graph-based approach far outperforms the array-based approach.
△ Less
Submitted 16 March, 2004; v1 submitted 16 March, 2004;
originally announced March 2004.
-
Improving Gate-Level Simulation of Quantum Circuits
Authors:
George F. Viamontes,
Igor L. Markov,
John P. Hayes
Abstract:
Simulating quantum computation on a classical computer is a difficult problem. The matrices representing quantum gates, and the vectors modeling qubit states grow exponentially with an increase in the number of qubits. However, by using a novel data structure called the Quantum Information Decision Diagram (QuIDD) that exploits the structure of quantum operators, a useful subset of operator matr…
▽ More
Simulating quantum computation on a classical computer is a difficult problem. The matrices representing quantum gates, and the vectors modeling qubit states grow exponentially with an increase in the number of qubits. However, by using a novel data structure called the Quantum Information Decision Diagram (QuIDD) that exploits the structure of quantum operators, a useful subset of operator matrices and state vectors can be represented in a form that grows polynomially with the number of qubits. This subset contains, but is not limited to, any equal superposition of n qubits, any computational basis state, n-qubit Pauli matrices, and n-qubit Hadamard matrices. It does not, however, contain the discrete Fourier transform (employed in Shor's algorithm) and some oracles used in Grover's algorithm. We first introduce and motivate decision diagrams and QuIDDs. We then analyze the runtime and memory complexity of QuIDD operations. Finally, we empirically validate QuIDD-based simulation by means of a general-purpose quantum computing simulator QuIDDPro implemented in C++. We simulate various instances of Grover's algorithm with QuIDDPro, and the results demonstrate that QuIDDs asymptotically outperform all other known simulation techniques. Our simulations also show that well-known worst-case instances of classical searching can be circumvented in many specific cases by data compression techniques.
△ Less
Submitted 29 November, 2003; v1 submitted 6 September, 2003;
originally announced September 2003.
-
Efficient Synthesis of Linear Reversible Circuits
Authors:
K. N. Patel,
I. L. Markov,
J. P. Hayes
Abstract:
In this paper we consider circuit synthesis for n-wire linear reversible circuits using the C-NOT gate library. These circuits are an important class of reversible circuits with applications to quantum computation. Previous algorithms, based on Gaussian elimination and LU-decomposition, yield circuits with O(n^2) gates in the worst-case. However, an information theoretic bound suggests that it m…
▽ More
In this paper we consider circuit synthesis for n-wire linear reversible circuits using the C-NOT gate library. These circuits are an important class of reversible circuits with applications to quantum computation. Previous algorithms, based on Gaussian elimination and LU-decomposition, yield circuits with O(n^2) gates in the worst-case. However, an information theoretic bound suggests that it may be possible to reduce this to as few as O(n^2/log n) gates.
We present an algorithm that is optimal up to a multiplicative constant, as well as Theta(log n) times faster than previous methods. While our results are primarily asymptotic, simulation results show that even for relatively small n our algorithm is faster and yields more efficient circuits than the standard method. Generically our algorithm can be interpreted as a matrix decomposition algorithm, yielding an asymptotically efficient decomposition of a binary matrix into a product of elementary matrices.
△ Less
Submitted 2 February, 2003;
originally announced February 2003.
-
Gate-Level Simulation of Quantum Circuits
Authors:
George F. Viamontes,
Manoj Rajagopalan,
Igor L. Markov,
John P. Hayes
Abstract:
While thousands of experimental physicists and chemists are currently trying to build scalable quantum computers, it appears that simulation of quantum computation will be at least as critical as circuit simulation in classical VLSI design. However, since the work of Richard Feynman in the early 1980s little progress was made in practical quantum simulation. Most researchers focused on polynomia…
▽ More
While thousands of experimental physicists and chemists are currently trying to build scalable quantum computers, it appears that simulation of quantum computation will be at least as critical as circuit simulation in classical VLSI design. However, since the work of Richard Feynman in the early 1980s little progress was made in practical quantum simulation. Most researchers focused on polynomial-time simulation of restricted types of quantum circuits that fall short of the full power of quantum computation. Simulating quantum computing devices and useful quantum algorithms on classical hardware now requires excessive computational resources, making many important simulation tasks infeasible. In this work we propose a new technique for gate-level simulation of quantum circuits which greatly reduces the difficulty and cost of such simulations. The proposed technique is implemented in a simulation tool called the Quantum Information Decision Diagram (QuIDD) and evaluated by simulating Grover's quantum search algorithm. The back-end of our package, QuIDD Pro, is based on Binary Decision Diagrams, well-known for their ability to efficiently represent many seemingly intractable combinatorial structures. This reliance on a well-established area of research allows us to take advantage of existing software for BDD manipulation and achieve unparalleled empirical results for quantum simulation.
△ Less
Submitted 1 August, 2002;
originally announced August 2002.
-
Reversible Logic Circuit Synthesis
Authors:
Vivek V. Shende,
Aditya K. Prasad,
Igor L. Markov,
John P. Hayes
Abstract:
Reversible or information-lossless circuits have applications in digital signal processing, communication, computer graphics and cryptography. They are also a fundamental requirement in the emerging field of quantum computation. We investigate the synthesis of reversible circuits that employ a minimum number of gates and contain no redundant input-output line-pairs (temporary storage channels).…
▽ More
Reversible or information-lossless circuits have applications in digital signal processing, communication, computer graphics and cryptography. They are also a fundamental requirement in the emerging field of quantum computation. We investigate the synthesis of reversible circuits that employ a minimum number of gates and contain no redundant input-output line-pairs (temporary storage channels). We prove constructively that every even permutation can be implemented without temporary storage using NOT, CNOT and TOFFOLI gates. We describe an algorithm for the synthesis of optimal circuits and study the reversible functions on three wires, reporting distributions of circuit sizes. We study circuit decompositions of reversible circuits where gates of the same type are next to each other. Finally, in an application important to quantum computing, we synthesize oracle circuits for Grover's search algorithm, and show a significant improvement over a previously proposed synthesis algorithm.
△ Less
Submitted 27 February, 2003; v1 submitted 28 June, 2002;
originally announced July 2002.