-
Realistic binary neutron star initial data with Elliptica
Authors:
Alireza Rashti,
Andrew Noe
Abstract:
This work introduces the Elliptica pseudo-spectral code for generating initial data of binary neutron star systems. Building upon the recent Elliptica code update, we can now construct initial data using not only piecewise polytropic equations of state, but also tabulated equations of state for these binary systems. Furthermore, the code allows us to endow neutron stars within the binary system wi…
▽ More
This work introduces the Elliptica pseudo-spectral code for generating initial data of binary neutron star systems. Building upon the recent Elliptica code update, we can now construct initial data using not only piecewise polytropic equations of state, but also tabulated equations of state for these binary systems. Furthermore, the code allows us to endow neutron stars within the binary system with spins. These spins can have a magnitude close to the mass shedding limit and can point in any direction.
△ Less
Submitted 1 July, 2024;
originally announced July 2024.
-
An Information Theoretic Metric for Evaluating Unlearning Models
Authors:
Dongjae Jeon,
Wonje Jeung,
Taeheon Kim,
Albert No,
Jonghyun Choi
Abstract:
Machine unlearning (MU) addresses privacy concerns by removing information of `forgetting data' samples from trained models. Typically, evaluating MU methods involves comparing unlearned models to those retrained from scratch without forgetting data, using metrics such as membership inference attacks (MIA) and accuracy measurements. These evaluations implicitly assume that if the output logits of…
▽ More
Machine unlearning (MU) addresses privacy concerns by removing information of `forgetting data' samples from trained models. Typically, evaluating MU methods involves comparing unlearned models to those retrained from scratch without forgetting data, using metrics such as membership inference attacks (MIA) and accuracy measurements. These evaluations implicitly assume that if the output logits of the unlearned and retrained models are similar, the unlearned model has successfully forgotten the data. Here, we challenge if this assumption is valid. In particular, we conduct a simple experiment of training only the last layer of a given original model using a novel masked-distillation technique while kee** the rest fixed. Surprisingly, simply altering the last layer yields favorable outcomes in the existing evaluation metrics, while the model does not successfully unlearn the samples or classes. For better evaluating the MU methods, we propose a metric that quantifies the residual information about forgetting data samples in intermediate features using mutual information, called information difference index or IDI for short. The IDI provides a comprehensive evaluation of MU methods by efficiently analyzing the internal structure of DNNs. Our metric is scalable to large datasets and adaptable to various model architectures. Additionally, we present COLapse-and-Align (COLA), a simple contrastive-based method that effectively unlearns intermediate features.
△ Less
Submitted 28 May, 2024;
originally announced May 2024.
-
Simple Drop-in LoRA Conditioning on Attention Layers Will Improve Your Diffusion Model
Authors:
Joo Young Choi,
Jaesung R. Park,
Inkyu Park,
Jaewoong Cho,
Albert No,
Ernest K. Ryu
Abstract:
Current state-of-the-art diffusion models employ U-Net architectures containing convolutional and (qkv) self-attention layers. The U-Net processes images while being conditioned on the time embedding input for each sampling step and the class or caption embedding input corresponding to the desired conditional generation. Such conditioning involves scale-and-shift operations to the convolutional la…
▽ More
Current state-of-the-art diffusion models employ U-Net architectures containing convolutional and (qkv) self-attention layers. The U-Net processes images while being conditioned on the time embedding input for each sampling step and the class or caption embedding input corresponding to the desired conditional generation. Such conditioning involves scale-and-shift operations to the convolutional layers but does not directly affect the attention layers. While these standard architectural choices are certainly effective, not conditioning the attention layers feels arbitrary and potentially suboptimal. In this work, we show that simply adding LoRA conditioning to the attention layers without changing or tuning the other parts of the U-Net architecture improves the image generation quality. For example, a drop-in addition of LoRA conditioning to EDM diffusion model yields FID scores of 1.91/1.75 for unconditional and class-conditional CIFAR-10 generation, improving upon the baseline of 1.97/1.79.
△ Less
Submitted 6 May, 2024;
originally announced May 2024.
-
Improved Communication-Privacy Trade-offs in $L_2$ Mean Estimation under Streaming Differential Privacy
Authors:
Wei-Ning Chen,
Berivan Isik,
Peter Kairouz,
Albert No,
Sewoong Oh,
Zheng Xu
Abstract:
We study $L_2$ mean estimation under central differential privacy and communication constraints, and address two key challenges: firstly, existing mean estimation schemes that simultaneously handle both constraints are usually optimized for $L_\infty$ geometry and rely on random rotation or Kashin's representation to adapt to $L_2$ geometry, resulting in suboptimal leading constants in mean square…
▽ More
We study $L_2$ mean estimation under central differential privacy and communication constraints, and address two key challenges: firstly, existing mean estimation schemes that simultaneously handle both constraints are usually optimized for $L_\infty$ geometry and rely on random rotation or Kashin's representation to adapt to $L_2$ geometry, resulting in suboptimal leading constants in mean square errors (MSEs); secondly, schemes achieving order-optimal communication-privacy trade-offs do not extend seamlessly to streaming differential privacy (DP) settings (e.g., tree aggregation or matrix factorization), rendering them incompatible with DP-FTRL type optimizers.
In this work, we tackle these issues by introducing a novel privacy accounting method for the sparsified Gaussian mechanism that incorporates the randomness inherent in sparsification into the DP noise. Unlike previous approaches, our accounting algorithm directly operates in $L_2$ geometry, yielding MSEs that fast converge to those of the uncompressed Gaussian mechanism. Additionally, we extend the sparsification scheme to the matrix factorization framework under streaming DP and provide a precise accountant tailored for DP-FTRL type optimizers. Empirically, our method demonstrates at least a 100x improvement of compression for DP-SGD across various FL tasks.
△ Less
Submitted 1 May, 2024;
originally announced May 2024.
-
Fully Quantized Always-on Face Detector Considering Mobile Image Sensors
Authors:
Haechang Lee,
Wongi Jeong,
Dongil Ryu,
Hyunwoo Je,
Albert No,
Kijeong Kim,
Se Young Chun
Abstract:
Despite significant research on lightweight deep neural networks (DNNs) designed for edge devices, the current face detectors do not fully meet the requirements for "intelligent" CMOS image sensors (iCISs) integrated with embedded DNNs. These sensors are essential in various practical applications, such as energy-efficient mobile phones and surveillance systems with always-on capabilities. One not…
▽ More
Despite significant research on lightweight deep neural networks (DNNs) designed for edge devices, the current face detectors do not fully meet the requirements for "intelligent" CMOS image sensors (iCISs) integrated with embedded DNNs. These sensors are essential in various practical applications, such as energy-efficient mobile phones and surveillance systems with always-on capabilities. One noteworthy limitation is the absence of suitable face detectors for the always-on scenario, a crucial aspect of image sensor-level applications. These detectors must operate directly with sensor RAW data before the image signal processor (ISP) takes over. This gap poses a significant challenge in achieving optimal performance in such scenarios. Further research and development are necessary to bridge this gap and fully leverage the potential of iCIS applications. In this study, we aim to bridge the gap by exploring extremely low-bit lightweight face detectors, focusing on the always-on face detection scenario for mobile image sensor applications. To achieve this, our proposed model utilizes sensor-aware synthetic RAW inputs, simulating always-on face detection processed "before" the ISP chain. Our approach employs ternary (-1, 0, 1) weights for potential implementations in image sensors, resulting in a relatively simple network architecture with shallow layers and extremely low-bitwidth. Our method demonstrates reasonable face detection performance and excellent efficiency in simulation studies, offering promising possibilities for practical always-on face detectors in real-world applications.
△ Less
Submitted 2 November, 2023;
originally announced November 2023.
-
Censored Sampling of Diffusion Models Using 3 Minutes of Human Feedback
Authors:
TaeHo Yoon,
Kibeom Myoung,
Keon Lee,
Jaewoong Cho,
Albert No,
Ernest K. Ryu
Abstract:
Diffusion models have recently shown remarkable success in high-quality image generation. Sometimes, however, a pre-trained diffusion model exhibits partial misalignment in the sense that the model can generate good images, but it sometimes outputs undesirable images. If so, we simply need to prevent the generation of the bad images, and we call this task censoring. In this work, we present censor…
▽ More
Diffusion models have recently shown remarkable success in high-quality image generation. Sometimes, however, a pre-trained diffusion model exhibits partial misalignment in the sense that the model can generate good images, but it sometimes outputs undesirable images. If so, we simply need to prevent the generation of the bad images, and we call this task censoring. In this work, we present censored generation with a pre-trained diffusion model using a reward model trained on minimal human feedback. We show that censoring can be accomplished with extreme human feedback efficiency and that labels generated with a mere few minutes of human feedback are sufficient. Code available at: https://github.com/tetrzim/diffusion-human-feedback.
△ Less
Submitted 30 October, 2023; v1 submitted 6 July, 2023;
originally announced July 2023.
-
Exact Optimality of Communication-Privacy-Utility Tradeoffs in Distributed Mean Estimation
Authors:
Berivan Isik,
Wei-Ning Chen,
Ayfer Ozgur,
Tsachy Weissman,
Albert No
Abstract:
We study the mean estimation problem under communication and local differential privacy constraints. While previous work has proposed \emph{order}-optimal algorithms for the same problem (i.e., asymptotically optimal as we spend more bits), \emph{exact} optimality (in the non-asymptotic setting) still has not been achieved. In this work, we take a step towards characterizing the \emph{exact}-optim…
▽ More
We study the mean estimation problem under communication and local differential privacy constraints. While previous work has proposed \emph{order}-optimal algorithms for the same problem (i.e., asymptotically optimal as we spend more bits), \emph{exact} optimality (in the non-asymptotic setting) still has not been achieved. In this work, we take a step towards characterizing the \emph{exact}-optimal approach in the presence of shared randomness (a random variable shared between the server and the user) and identify several conditions for \emph{exact} optimality. We prove that one of the conditions is to utilize a rotationally symmetric shared random codebook. Based on this, we propose a randomization mechanism where the codebook is a randomly rotated simplex -- satisfying the properties of the \emph{exact}-optimal codebook. The proposed mechanism is based on a $k$-closest encoding which we prove to be \emph{exact}-optimal for the randomly rotated simplex codebook.
△ Less
Submitted 28 October, 2023; v1 submitted 8 June, 2023;
originally announced June 2023.
-
Learned Smartphone ISP on Mobile GPUs with Deep Learning, Mobile AI & AIM 2022 Challenge: Report
Authors:
Andrey Ignatov,
Radu Timofte,
Shuai Liu,
Chaoyu Feng,
Furui Bai,
Xiaotao Wang,
Lei Lei,
Ziyao Yi,
Yan Xiang,
Zibin Liu,
Shaoqing Li,
Keming Shi,
Dehui Kong,
Ke Xu,
Minsu Kwon,
Yaqi Wu,
Jiesi Zheng,
Zhihao Fan,
Xun Wu,
Feng Zhang,
Albert No,
Minhyeok Cho,
Zewen Chen,
Xiaze Zhang,
Ran Li
, et al. (13 additional authors not shown)
Abstract:
The role of mobile cameras increased dramatically over the past few years, leading to more and more research in automatic image quality enhancement and RAW photo processing. In this Mobile AI challenge, the target was to develop an efficient end-to-end AI-based image signal processing (ISP) pipeline replacing the standard mobile ISPs that can run on modern smartphone GPUs using TensorFlow Lite. Th…
▽ More
The role of mobile cameras increased dramatically over the past few years, leading to more and more research in automatic image quality enhancement and RAW photo processing. In this Mobile AI challenge, the target was to develop an efficient end-to-end AI-based image signal processing (ISP) pipeline replacing the standard mobile ISPs that can run on modern smartphone GPUs using TensorFlow Lite. The participants were provided with a large-scale Fujifilm UltraISP dataset consisting of thousands of paired photos captured with a normal mobile camera sensor and a professional 102MP medium-format FujiFilm GFX100 camera. The runtime of the resulting models was evaluated on the Snapdragon's 8 Gen 1 GPU that provides excellent acceleration results for the majority of common deep learning ops. The proposed solutions are compatible with all recent mobile GPUs, being able to process Full HD photos in less than 20-50 milliseconds while achieving high fidelity results. A detailed description of all models developed in this challenge is provided in this paper.
△ Less
Submitted 7 November, 2022;
originally announced November 2022.
-
A Metaheuristic Algorithm for Large Maximum Weight Independent Set Problems
Authors:
Yuanyuan Dong,
Andrew V. Goldberg,
Alexander Noe,
Nikos Parotsidis,
Mauricio G. C. Resende,
Quico Spaen
Abstract:
Motivated by a real-world vehicle routing application, we consider the maximum-weight independent set problem: Given a node-weighted graph, find a set of independent (mutually nonadjacent) nodes whose node-weight sum is maximum. Some of the graphs airsing in this application are large, having hundreds of thousands of nodes and hundreds of millions of edges. To solve instances of this size, we deve…
▽ More
Motivated by a real-world vehicle routing application, we consider the maximum-weight independent set problem: Given a node-weighted graph, find a set of independent (mutually nonadjacent) nodes whose node-weight sum is maximum. Some of the graphs airsing in this application are large, having hundreds of thousands of nodes and hundreds of millions of edges. To solve instances of this size, we develop a new local search algorithm, which is a metaheuristic in the greedy randomized adaptive search (GRASP) framework. This algorithm, which we call METAMIS, uses a wider range of simple local search operations than previously described in the literature. We introduce data structures that make these operations efficient. A new variant of path-relinking is introduced to escape local optima and so is a new alternating augmenting-path local search move that improves algorithm performance. We compare an implementation of our algorithm with a state-of-the-art openly available code on public benchmark sets, including some large instances with hundreds of millions of vertices. Our algorithm is, in general, competitive and outperforms this openly available code on large vehicle routing instances. We hope that our results will lead to even better MWIS algorithms.
△ Less
Submitted 28 March, 2022;
originally announced March 2022.
-
PyNET-QxQ: An Efficient PyNET Variant for QxQ Bayer Pattern Demosaicing in CMOS Image Sensors
Authors:
Minhyeok Cho,
Haechang Lee,
Hyunwoo Je,
Kijeong Kim,
Dongil Ryu,
Albert No
Abstract:
Deep learning-based image signal processor (ISP) models for mobile cameras can generate high-quality images that rival those of professional DSLR cameras. However, their computational demands often make them unsuitable for mobile settings. Additionally, modern mobile cameras employ non-Bayer color filter arrays (CFA) such as Quad Bayer, Nona Bayer, and QxQ Bayer to enhance image quality, yet most…
▽ More
Deep learning-based image signal processor (ISP) models for mobile cameras can generate high-quality images that rival those of professional DSLR cameras. However, their computational demands often make them unsuitable for mobile settings. Additionally, modern mobile cameras employ non-Bayer color filter arrays (CFA) such as Quad Bayer, Nona Bayer, and QxQ Bayer to enhance image quality, yet most existing deep learning-based ISP (or demosaicing) models focus primarily on standard Bayer CFAs. In this study, we present PyNET-QxQ, a lightweight demosaicing model specifically designed for QxQ Bayer CFA patterns, which is derived from the original PyNET. We also propose a knowledge distillation method called progressive distillation to train the reduced network more effectively. Consequently, PyNET-QxQ contains less than 2.5% of the parameters of the original PyNET while preserving its performance. Experiments using QxQ images captured by a proto type QxQ camera sensor show that PyNET-QxQ outperforms existing conventional algorithms in terms of texture and edge reconstruction, despite its significantly reduced parameter count.
△ Less
Submitted 5 May, 2023; v1 submitted 8 March, 2022;
originally announced March 2022.
-
Neural Tangent Kernel Analysis of Deep Narrow Neural Networks
Authors:
Jongmin Lee,
Joo Young Choi,
Ernest K. Ryu,
Albert No
Abstract:
The tremendous recent progress in analyzing the training dynamics of overparameterized neural networks has primarily focused on wide networks and therefore does not sufficiently address the role of depth in deep learning. In this work, we present the first trainability guarantee of infinitely deep but narrow neural networks. We study the infinite-depth limit of a multilayer perceptron (MLP) with a…
▽ More
The tremendous recent progress in analyzing the training dynamics of overparameterized neural networks has primarily focused on wide networks and therefore does not sufficiently address the role of depth in deep learning. In this work, we present the first trainability guarantee of infinitely deep but narrow neural networks. We study the infinite-depth limit of a multilayer perceptron (MLP) with a specific initialization and establish a trainability guarantee using the NTK theory. We then extend the analysis to an infinitely deep convolutional neural network (CNN) and perform brief experiments.
△ Less
Submitted 27 June, 2022; v1 submitted 7 February, 2022;
originally announced February 2022.
-
Prune Your Model Before Distill It
Authors:
**hyuk Park,
Albert No
Abstract:
Knowledge distillation transfers the knowledge from a cumbersome teacher to a small student. Recent results suggest that the student-friendly teacher is more appropriate to distill since it provides more transferable knowledge. In this work, we propose the novel framework, "prune, then distill," that prunes the model first to make it more transferrable and then distill it to the student. We provid…
▽ More
Knowledge distillation transfers the knowledge from a cumbersome teacher to a small student. Recent results suggest that the student-friendly teacher is more appropriate to distill since it provides more transferable knowledge. In this work, we propose the novel framework, "prune, then distill," that prunes the model first to make it more transferrable and then distill it to the student. We provide several exploratory examples where the pruned teacher teaches better than the original unpruned networks. We further show theoretically that the pruned teacher plays the role of regularizer in distillation, which reduces the generalization error. Based on this result, we propose a novel neural network compression scheme where the student network is formed based on the pruned teacher and then apply the "prune, then distill" strategy. The code is available at https://github.com/ososos888/prune-then-distill
△ Less
Submitted 25 July, 2022; v1 submitted 30 September, 2021;
originally announced September 2021.
-
Algorithm Engineering for Cut Problems
Authors:
Alexander Noe
Abstract:
Graphs are a natural representation of data from various contexts, such as social connections, the web, road networks, and many more. In the last decades, many of these networks have become enormous, requiring efficient algorithms to cut networks into smaller, more readily comprehensible blocks. In this work, we aim to partition the vertices of a graph into multiple blocks while minimizing the num…
▽ More
Graphs are a natural representation of data from various contexts, such as social connections, the web, road networks, and many more. In the last decades, many of these networks have become enormous, requiring efficient algorithms to cut networks into smaller, more readily comprehensible blocks. In this work, we aim to partition the vertices of a graph into multiple blocks while minimizing the number of edges that connect different blocks. There is a multitude of cut or partitioning problems that have been the focus of research for multiple decades. This work develops highly-efficient algorithms for the (global) minimum cut problem, the balanced graph partitioning problem and the multiterminal cut problem. All of these algorithms are efficient in practice and freely available for use.
△ Less
Submitted 10 August, 2021;
originally announced August 2021.
-
Random Rank-Based, Hierarchical or Trivial: Which Dynamic Graph Algorithm Performs Best in Practice?
Authors:
Monika Henzinger,
Alexander Noe
Abstract:
Fully dynamic graph algorithms that achieve polylogarithmic or better time per operation use either a hierarchical graph decomposition or random-rank based approach. There are so far two graph properties for which efficient algorithms for both types of data structures exist, namely fully dynamic (Delta + 1) coloring and fully dynamic maximal matching. In this paper we present an extensive experime…
▽ More
Fully dynamic graph algorithms that achieve polylogarithmic or better time per operation use either a hierarchical graph decomposition or random-rank based approach. There are so far two graph properties for which efficient algorithms for both types of data structures exist, namely fully dynamic (Delta + 1) coloring and fully dynamic maximal matching. In this paper we present an extensive experimental study of these two types of algorithms for these two problems together with very simple baseline algorithms to determine which of these algorithms are the fastest. Our results indicate that the data structures used by the different algorithms dominate their performance.
△ Less
Submitted 10 August, 2021;
originally announced August 2021.
-
New instances for maximum weight independent set from a vehicle routing application
Authors:
Yuanyuan Dong,
Andrew V. Goldberg,
Alexander Noe,
Nikos Parotsidis,
Mauricio G. C. Resende,
Quico Spaen
Abstract:
We present a set of new instances of the maximum weight independent set problem. These instances are derived from a real-world vehicle routing problem and are challenging to solve in part because of their large size. We present instances with up to 881 thousand nodes and 383 million edges.
We present a set of new instances of the maximum weight independent set problem. These instances are derived from a real-world vehicle routing problem and are challenging to solve in part because of their large size. We present instances with up to 881 thousand nodes and 383 million edges.
△ Less
Submitted 27 May, 2021; v1 submitted 26 May, 2021;
originally announced May 2021.
-
An Information-Theoretic Justification for Model Pruning
Authors:
Berivan Isik,
Tsachy Weissman,
Albert No
Abstract:
We study the neural network (NN) compression problem, viewing the tension between the compression ratio and NN performance through the lens of rate-distortion theory. We choose a distortion metric that reflects the effect of NN compression on the model output and derive the tradeoff between rate (compression) and distortion. In addition to characterizing theoretical limits of NN compression, this…
▽ More
We study the neural network (NN) compression problem, viewing the tension between the compression ratio and NN performance through the lens of rate-distortion theory. We choose a distortion metric that reflects the effect of NN compression on the model output and derive the tradeoff between rate (compression) and distortion. In addition to characterizing theoretical limits of NN compression, this formulation shows that \emph{pruning}, implicitly or explicitly, must be a part of a good compression algorithm. This observation bridges a gap between parts of the literature pertaining to NN and data compression, respectively, providing insight into the empirical success of model pruning. Finally, we propose a novel pruning strategy derived from our information-theoretic formulation and show that it outperforms the relevant baselines on CIFAR-10 and ImageNet datasets.
△ Less
Submitted 9 February, 2022; v1 submitted 16 February, 2021;
originally announced February 2021.
-
WGAN with an Infinitely Wide Generator Has No Spurious Stationary Points
Authors:
Albert No,
TaeHo Yoon,
Sehyun Kwon,
Ernest K. Ryu
Abstract:
Generative adversarial networks (GAN) are a widely used class of deep generative models, but their minimax training dynamics are not understood very well. In this work, we show that GANs with a 2-layer infinite-width generator and a 2-layer finite-width discriminator trained with stochastic gradient ascent-descent have no spurious stationary points. We then show that when the width of the generato…
▽ More
Generative adversarial networks (GAN) are a widely used class of deep generative models, but their minimax training dynamics are not understood very well. In this work, we show that GANs with a 2-layer infinite-width generator and a 2-layer finite-width discriminator trained with stochastic gradient ascent-descent have no spurious stationary points. We then show that when the width of the generator is finite but wide, there are no spurious stationary points within a ball whose radius becomes arbitrarily large (to cover the entire parameter space) as the width goes to infinity.
△ Less
Submitted 9 June, 2021; v1 submitted 15 February, 2021;
originally announced February 2021.
-
Practical Fully Dynamic Minimum Cut Algorithms
Authors:
Monika Henzinger,
Alexander Noe,
Christian Schulz
Abstract:
We present a practically efficient algorithm for maintaining a global minimum cut in large dynamic graphs under both edge insertions and deletions. While there has been theoretical work on this problem, our algorithm is the first implementation of a fully-dynamic algorithm. The algorithm uses the theoretical foundation and combines it with efficient and finely-tuned implementations to give an algo…
▽ More
We present a practically efficient algorithm for maintaining a global minimum cut in large dynamic graphs under both edge insertions and deletions. While there has been theoretical work on this problem, our algorithm is the first implementation of a fully-dynamic algorithm. The algorithm uses the theoretical foundation and combines it with efficient and finely-tuned implementations to give an algorithm that can maintain the global minimum cut of a graph with rapid update times. We show that our algorithm gives up to multiple orders of magnitude speedup compared to static approaches both on edge insertions and deletions.
△ Less
Submitted 13 January, 2021;
originally announced January 2021.
-
Recent Advances in Practical Data Reduction
Authors:
Faisal Abu-Khzam,
Sebastian Lamm,
Matthias Mnich,
Alexander Noe,
Christian Schulz,
Darren Strash
Abstract:
Over the last two decades, significant advances have been made in the design and analysis of fixed-parameter algorithms for a wide variety of graph-theoretic problems. This has resulted in an algorithmic toolbox that is by now well-established. However, these theoretical algorithmic ideas have received very little attention from the practical perspective. We survey recent trends in data reduction…
▽ More
Over the last two decades, significant advances have been made in the design and analysis of fixed-parameter algorithms for a wide variety of graph-theoretic problems. This has resulted in an algorithmic toolbox that is by now well-established. However, these theoretical algorithmic ideas have received very little attention from the practical perspective. We survey recent trends in data reduction engineering results for selected problems. Moreover, we describe concrete techniques that may be useful for future implementations in the area and give open problems and research questions.
△ Less
Submitted 31 December, 2020; v1 submitted 23 December, 2020;
originally announced December 2020.
-
Faster Parallel Multiterminal Cuts
Authors:
Monika Henzinger,
Alexander Noe,
Christian Schulz
Abstract:
We give an improved branch-and-bound solver for the multiterminal cut problem, based on the recent work of Henzinger et al.. We contribute new, highly effective data reduction rules to transform the graph into a smaller equivalent instance. In addition, we present a local search algorithm that can significantly improve a given solution to the multiterminal cut problem. Our exact algorithm is able…
▽ More
We give an improved branch-and-bound solver for the multiterminal cut problem, based on the recent work of Henzinger et al.. We contribute new, highly effective data reduction rules to transform the graph into a smaller equivalent instance. In addition, we present a local search algorithm that can significantly improve a given solution to the multiterminal cut problem. Our exact algorithm is able to give exact solutions to more and harder problems compared to the state-of-the-art algorithm by Henzinger et al.; and give better solutions for more than two third of the problems that are too large to be solved to optimality. Additionally, we give an inexact heuristic algorithm that computes high-quality solutions for very hard instances in reasonable time.
△ Less
Submitted 24 April, 2020;
originally announced April 2020.
-
Finding All Global Minimum Cuts In Practice
Authors:
Monika Henzinger,
Alexander Noe,
Christian Schulz,
Darren Strash
Abstract:
We present a practically efficient algorithm that finds all global minimum cuts in huge undirected graphs. Our algorithm uses a multitude of kernelization rules to reduce the graph to a small equivalent instance and then finds all minimum cuts using an optimized version of the algorithm of Nagamochi, Nakao and Ibaraki. In shared memory we are able to find all minimum cuts of graphs with up to bill…
▽ More
We present a practically efficient algorithm that finds all global minimum cuts in huge undirected graphs. Our algorithm uses a multitude of kernelization rules to reduce the graph to a small equivalent instance and then finds all minimum cuts using an optimized version of the algorithm of Nagamochi, Nakao and Ibaraki. In shared memory we are able to find all minimum cuts of graphs with up to billions of edges and millions of minimum cuts in a few minutes. We also give a new linear time algorithm to find the most balanced minimum cuts given as input the representation of all minimum cuts.
△ Less
Submitted 17 February, 2020;
originally announced February 2020.
-
Shared-Memory Branch-and-Reduce for Multiterminal Cuts
Authors:
Monika Henzinger,
Alexander Noe,
Christian Schulz
Abstract:
We introduce the fastest known exact algorithm~for~the multiterminal cut problem with k terminals. In particular, we engineer existing as well as new data reduction rules. We use the rules within a branch-and-reduce framework and to boost the performance of an ILP formulation. Our algorithms achieve improvements in running time of up to multiple orders of magnitudes over the ILP formulation withou…
▽ More
We introduce the fastest known exact algorithm~for~the multiterminal cut problem with k terminals. In particular, we engineer existing as well as new data reduction rules. We use the rules within a branch-and-reduce framework and to boost the performance of an ILP formulation. Our algorithms achieve improvements in running time of up to multiple orders of magnitudes over the ILP formulation without data reductions, which has been the de facto standard used by practitioners. This allows us to solve instances to optimality that are significantly larger than was previously possible.
△ Less
Submitted 17 August, 2019; v1 submitted 12 August, 2019;
originally announced August 2019.
-
Markov Decision Policies for Dynamic Video Delivery in Wireless Caching Networks
Authors:
Minseok Choi,
Albert No,
Mingyue Ji,
Joongheon Kim
Abstract:
This paper proposes a video delivery strategy for dynamic streaming services which maximizes time-average streaming quality under a playback delay constraint in wireless caching networks. The network where popular videos encoded by scalable video coding are already stored in randomly distributed caching nodes is considered under adaptive video streaming concepts, and distance-based interference ma…
▽ More
This paper proposes a video delivery strategy for dynamic streaming services which maximizes time-average streaming quality under a playback delay constraint in wireless caching networks. The network where popular videos encoded by scalable video coding are already stored in randomly distributed caching nodes is considered under adaptive video streaming concepts, and distance-based interference management is investigated in this paper. In this network model, a streaming user makes delay-constrained decisions depending on stochastic network states: 1) caching node for video delivery, 2) video quality, and 3) the quantity of video chunks to receive. Since wireless link activation for video delivery may introduce delays, different timescales for updating caching node association, video quality adaptation, and chunk amounts are considered. After associating with a caching node for video delivery, the streaming user chooses combinations of quality and chunk amounts in the small timescale. The dynamic decision making process for video quality and chunk amounts at each slot is modeled using Markov decision process, and the caching node decision is made based on the framework of Lyapunov optimization. Our intensive simulations verify that the proposed video delivery algorithm works reliably and also can control the tradeoff between video quality and playback latency.
△ Less
Submitted 28 February, 2019;
originally announced February 2019.
-
Shared-memory Exact Minimum Cuts
Authors:
Monika Henzinger,
Alexander Noe,
Christian Schulz
Abstract:
The minimum cut problem for an undirected edge-weighted graph asks us to divide its set of nodes into two blocks while minimizing the weight sum of the cut edges. In this paper, we engineer the fastest known exact algorithm for the problem.
State-of-the-art algorithms like the algorithm of Padberg and Rinaldi or the algorithm of Nagamochi, Ono and Ibaraki identify edges that can be contracted to…
▽ More
The minimum cut problem for an undirected edge-weighted graph asks us to divide its set of nodes into two blocks while minimizing the weight sum of the cut edges. In this paper, we engineer the fastest known exact algorithm for the problem.
State-of-the-art algorithms like the algorithm of Padberg and Rinaldi or the algorithm of Nagamochi, Ono and Ibaraki identify edges that can be contracted to reduce the graph size such that at least one minimum cut is maintained in the contracted graph. Our algorithm achieves improvements in running time over these algorithms by a multitude of techniques. First, we use a recently developed fast and parallel \emph{inexact} minimum cut algorithm to obtain a better bound for the problem. Then we use reductions that depend on this bound, to reduce the size of the graph much faster than previously possible. We use improved data structures to further improve the running time of our algorithm. Additionally, we parallelize the contraction routines of Nagamochi, Ono and Ibaraki. Overall, we arrive at a system that outperforms the fastest state-of-the-art solvers for the \emph{exact} minimum cut problem significantly.
△ Less
Submitted 16 August, 2018;
originally announced August 2018.
-
ILP-based Local Search for Graph Partitioning
Authors:
Alexandra Henzinger,
Alexander Noe,
Christian Schulz
Abstract:
Computing high-quality graph partitions is a challenging problem with numerous applications. In this paper, we present a novel meta-heuristic for the balanced graph partitioning problem. Our approach is based on integer linear programs that solve the partitioning problem to optimality. However, since those programs typically do not scale to large inputs, we adapt them to heuristically improve a gi…
▽ More
Computing high-quality graph partitions is a challenging problem with numerous applications. In this paper, we present a novel meta-heuristic for the balanced graph partitioning problem. Our approach is based on integer linear programs that solve the partitioning problem to optimality. However, since those programs typically do not scale to large inputs, we adapt them to heuristically improve a given partition. We do so by defining a much smaller model that allows us to use symmetry breaking and other techniques that make the approach scalable. For example, in Walshaw's well-known benchmark tables we are able to improve roughly half of all entries when the number of blocks is high.
△ Less
Submitted 20 February, 2018;
originally announced February 2018.
-
Universality of Logarithmic Loss in Lossy Compression
Authors:
Albert No,
Tsachy Weissman
Abstract:
We establish two strong senses of universality of logarithmic loss as a distortion criterion in lossy compression: For any fixed length lossy compression problem under an arbitrary distortion criterion, we show that there is an equivalent lossy compression problem under logarithmic loss. In the successive refinement problem, if the first decoder operates under logarithmic loss, we show that any di…
▽ More
We establish two strong senses of universality of logarithmic loss as a distortion criterion in lossy compression: For any fixed length lossy compression problem under an arbitrary distortion criterion, we show that there is an equivalent lossy compression problem under logarithmic loss. In the successive refinement problem, if the first decoder operates under logarithmic loss, we show that any discrete memoryless source is successively refinable under an arbitrary distortion criterion for the second decoder.
△ Less
Submitted 31 August, 2017;
originally announced September 2017.
-
Practical Minimum Cut Algorithms
Authors:
Monika Henzinger,
Alexander Noe,
Christian Schulz,
Darren Strash
Abstract:
The minimum cut problem for an undirected edge-weighted graph asks us to divide its set of nodes into two blocks while minimizing the weight sum of the cut edges. Here, we introduce a linear-time algorithm to compute near-minimum cuts. Our algorithm is based on cluster contraction using label propagation and Padberg and Rinaldi's contraction heuristics [SIAM Review, 1991]. We give both sequential…
▽ More
The minimum cut problem for an undirected edge-weighted graph asks us to divide its set of nodes into two blocks while minimizing the weight sum of the cut edges. Here, we introduce a linear-time algorithm to compute near-minimum cuts. Our algorithm is based on cluster contraction using label propagation and Padberg and Rinaldi's contraction heuristics [SIAM Review, 1991]. We give both sequential and shared-memory parallel implementations of our algorithm. Extensive experiments on both real-world and generated instances show that our algorithm finds the optimal cut on nearly all instances significantly faster than other state-of-the-art algorithms while our error rate is lower than that of other heuristic algorithms. In addition, our parallel algorithm shows good scalability.
△ Less
Submitted 27 August, 2017; v1 submitted 21 August, 2017;
originally announced August 2017.
-
Thrill: High-Performance Algorithmic Distributed Batch Data Processing with C++
Authors:
Timo Bingmann,
Michael Axtmann,
Emanuel Jöbstl,
Sebastian Lamm,
Huyen Chau Nguyen,
Alexander Noe,
Sebastian Schlag,
Matthias Stumpp,
Tobias Sturm,
Peter Sanders
Abstract:
We present the design and a first performance evaluation of Thrill -- a prototype of a general purpose big data processing framework with a convenient data-flow style programming interface. Thrill is somewhat similar to Apache Spark and Apache Flink with at least two main differences. First, Thrill is based on C++ which enables performance advantages due to direct native code compilation, a more c…
▽ More
We present the design and a first performance evaluation of Thrill -- a prototype of a general purpose big data processing framework with a convenient data-flow style programming interface. Thrill is somewhat similar to Apache Spark and Apache Flink with at least two main differences. First, Thrill is based on C++ which enables performance advantages due to direct native code compilation, a more cache-friendly memory layout, and explicit memory management. In particular, Thrill uses template meta-programming to compile chains of subsequent local operations into a single binary routine without intermediate buffering and with minimal indirections. Second, Thrill uses arrays rather than multisets as its primary data structure which enables additional operations like sorting, prefix sums, window scans, or combining corresponding fields of several arrays (zip**). We compare Thrill with Apache Spark and Apache Flink using five kernels from the HiBench suite. Thrill is consistently faster and often several times faster than the other frameworks. At the same time, the source codes have a similar level of simplicity and abstraction
△ Less
Submitted 19 August, 2016;
originally announced August 2016.
-
Strong Successive Refinability and Rate-Distortion-Complexity Tradeoff
Authors:
Albert No,
Amir Ingber,
Tsachy Weissman
Abstract:
We investigate the second order asymptotics (source dispersion) of the successive refinement problem. Similarly to the classical definition of a successively refinable source, we say that a source is strongly successively refinable if successive refinement coding can achieve the second order optimum rate (including the dispersion terms) at both decoders. We establish a sufficient condition for str…
▽ More
We investigate the second order asymptotics (source dispersion) of the successive refinement problem. Similarly to the classical definition of a successively refinable source, we say that a source is strongly successively refinable if successive refinement coding can achieve the second order optimum rate (including the dispersion terms) at both decoders. We establish a sufficient condition for strong successive refinability. We show that any discrete source under Hamming distortion and the Gaussian source under quadratic distortion are strongly successively refinable.
We also demonstrate how successive refinement ideas can be used in point-to-point lossy compression problems in order to reduce complexity. We give two examples, the binary-Hamming and Gaussian-quadratic cases, in which a layered code construction results in a low complexity scheme that attains optimal performance. For example, when the number of layers grows with the block length $n$, we show how to design an $O(n^{\log(n)})$ algorithm that asymptotically achieves the rate-distortion bound.
△ Less
Submitted 15 March, 2016; v1 submitted 10 June, 2015;
originally announced June 2015.
-
Algorithms for Map** Parallel Processes onto Grid and Torus Architectures
Authors:
Roland Glantz,
Henning Meyerhenke,
Alexander Noe
Abstract:
Static map** is the assignment of parallel processes to the processing elements (PEs) of a parallel system, where the assignment does not change during the application's lifetime. In our scenario we model an application's computations and their dependencies by an application graph. This graph is first partitioned into (nearly) equally sized blocks. These blocks need to communicate at block bound…
▽ More
Static map** is the assignment of parallel processes to the processing elements (PEs) of a parallel system, where the assignment does not change during the application's lifetime. In our scenario we model an application's computations and their dependencies by an application graph. This graph is first partitioned into (nearly) equally sized blocks. These blocks need to communicate at block boundaries. To assign the processes to PEs, our goal is to compute a communication-efficient bijective map** between the blocks and the PEs.
This approach of partitioning followed by bijective map** has many degrees of freedom. Thus, users and developers of parallel applications need to know more about which choices work for which application graphs and which parallel architectures. To this end, we not only develop new map** algorithms (derived from known greedy methods). We also perform extensive experiments involving different classes of application graphs (meshes and complex networks), architectures of parallel computers (grids and tori), as well as different partitioners and map** algorithms. Surprisingly, the quality of the partitions, unless very poor, has little influence on the quality of the map**.
More importantly, one of our new map** algorithms always yields the best results in terms of the quality measure maximum congestion when the application graphs are complex networks. In case of meshes as application graphs, this map** algorithm always leads in terms of maximum congestion AND maximum dilation, another common quality measure.
△ Less
Submitted 2 March, 2015; v1 submitted 4 November, 2014;
originally announced November 2014.
-
Rateless Lossy Compression via the Extremes
Authors:
Albert No,
Tsachy Weissman
Abstract:
We begin by presenting a simple lossy compressor operating at near-zero rate: The encoder merely describes the indices of the few maximal source components, while the decoder's reconstruction is a natural estimate of the source components based on this information. This scheme turns out to be near-optimal for the memoryless Gaussian source in the sense of achieving the zero-rate slope of its disto…
▽ More
We begin by presenting a simple lossy compressor operating at near-zero rate: The encoder merely describes the indices of the few maximal source components, while the decoder's reconstruction is a natural estimate of the source components based on this information. This scheme turns out to be near-optimal for the memoryless Gaussian source in the sense of achieving the zero-rate slope of its distortion-rate function. Motivated by this finding, we then propose a scheme comprised of iterating the above lossy compressor on an appropriately transformed version of the difference between the source and its reconstruction from the previous iteration. The proposed scheme achieves the rate distortion function of the Gaussian memoryless source (under squared error distortion) when employed on any finite-variance ergodic source. It further possesses desirable properties we respectively refer to as infinitesimal successive refinability, ratelessness, and complete separability. Its storage and computation requirements are of order no more than $\frac{n^2}{\log^β n}$ per source symbol for $β>0$ at both the encoder and decoder. Though the details of its derivation, construction, and analysis differ considerably, we discuss similarities between the proposed scheme and the recently introduced Sparse Regression Codes (SPARC) of Venkataramanan et al.
△ Less
Submitted 8 March, 2016; v1 submitted 25 June, 2014;
originally announced June 2014.
-
Information Measures: the Curious Case of the Binary Alphabet
Authors:
Jiantao Jiao,
Thomas Courtade,
Albert No,
Kartik Venkat,
Tsachy Weissman
Abstract:
Four problems related to information divergence measures defined on finite alphabets are considered. In three of the cases we consider, we illustrate a contrast which arises between the binary-alphabet and larger-alphabet settings. This is surprising in some instances, since characterizations for the larger-alphabet settings do not generalize their binary-alphabet counterparts. Specifically, we sh…
▽ More
Four problems related to information divergence measures defined on finite alphabets are considered. In three of the cases we consider, we illustrate a contrast which arises between the binary-alphabet and larger-alphabet settings. This is surprising in some instances, since characterizations for the larger-alphabet settings do not generalize their binary-alphabet counterparts. Specifically, we show that $f$-divergences are not the unique decomposable divergences on binary alphabets that satisfy the data processing inequality, thereby clarifying claims that have previously appeared in the literature. We also show that KL divergence is the unique Bregman divergence which is also an $f$-divergence for any alphabet size. We show that KL divergence is the unique Bregman divergence which is invariant to statistically sufficient transformations of the data, even when non-decomposable divergences are considered. Like some of the problems we consider, this result holds only when the alphabet size is at least three.
△ Less
Submitted 28 November, 2014; v1 submitted 27 April, 2014;
originally announced April 2014.
-
Minimax Filtering via Relations between Information and Estimation
Authors:
Albert No,
Tsachy Weissman
Abstract:
We investigate the problem of continuous-time causal estimation under a minimax criterion. Let $X^T = \{X_t,0\leq t\leq T\}$ be governed by the probability law $P_θ$ from a class of possible laws indexed by $θ\in Λ$, and $Y^T$ be the noise corrupted observations of $X^T$ available to the estimator. We characterize the estimator minimizing the worst case regret, where regret is the difference betwe…
▽ More
We investigate the problem of continuous-time causal estimation under a minimax criterion. Let $X^T = \{X_t,0\leq t\leq T\}$ be governed by the probability law $P_θ$ from a class of possible laws indexed by $θ\in Λ$, and $Y^T$ be the noise corrupted observations of $X^T$ available to the estimator. We characterize the estimator minimizing the worst case regret, where regret is the difference between the causal estimation loss of the estimator and that of the optimum estimator.
One of the main contributions of this paper is characterizing the minimax estimator, showing that it is in fact a Bayesian estimator. We then relate minimax regret to the channel capacity when the channel is either Gaussian or Poisson. In this case, we characterize the minimax regret and the minimax estimator more explicitly. If we further assume that the uncertainty set consists of deterministic signals, the worst case regret is exactly equal to the corresponding channel capacity, namely the maximal mutual information attainable across the channel among all possible distributions on the uncertainty set of signals. The corresponding minimax estimator is the Bayesian estimator assuming the capacity-achieving prior. Using this relation, we also show that the capacity achieving prior coincides with the least favorable input. Moreover, we show that this minimax estimator is not only minimizing the worst case regret but also essentially minimizing regret for "most" of the other sources in the uncertainty set.
We present a couple of examples for the construction of an minimax filter via an approximation of the associated capacity achieving distribution.
△ Less
Submitted 7 July, 2014; v1 submitted 22 January, 2013;
originally announced January 2013.
-
Reference Based Genome Compression
Authors:
Bobbie Chern,
Idoia Ochoa,
Alexandros Manolakos,
Albert No,
Kartik Venkat,
Tsachy Weissman
Abstract:
DNA sequencing technology has advanced to a point where storage is becoming the central bottleneck in the acquisition and mining of more data. Large amounts of data are vital for genomics research, and generic compression tools, while viable, cannot offer the same savings as approaches tuned to inherent biological properties. We propose an algorithm to compress a target genome given a known refere…
▽ More
DNA sequencing technology has advanced to a point where storage is becoming the central bottleneck in the acquisition and mining of more data. Large amounts of data are vital for genomics research, and generic compression tools, while viable, cannot offer the same savings as approaches tuned to inherent biological properties. We propose an algorithm to compress a target genome given a known reference genome. The proposed algorithm first generates a map** from the reference to the target genome, and then compresses this map** with an entropy coder. As an illustration of the performance: applying our algorithm to James Watson's genome with hg18 as a reference, we are able to reduce the 2991 megabyte (MB) genome down to 6.99 MB, while Gzip compresses it to 834.8 MB.
△ Less
Submitted 9 April, 2012;
originally announced April 2012.