-
Tuning neural posterior estimation for gravitational wave inference
Authors:
Alex Kolmus,
Justin Janquart,
Tomasz Baka,
Twan van Laarhoven,
Chris Van Den Broeck,
Tom Heskes
Abstract:
Modern simulation-based inference techniques use neural networks to solve inverse problems efficiently. One notable strategy is neural posterior estimation (NPE), wherein a neural network parameterizes a distribution to approximate the posterior. This approach is particularly advantageous for tackling low-latency or high-volume inverse problems. However, the accuracy of NPE varies significantly wi…
▽ More
Modern simulation-based inference techniques use neural networks to solve inverse problems efficiently. One notable strategy is neural posterior estimation (NPE), wherein a neural network parameterizes a distribution to approximate the posterior. This approach is particularly advantageous for tackling low-latency or high-volume inverse problems. However, the accuracy of NPE varies significantly within the learned parameter space. This variability is observed even in seemingly straightforward systems like coupled-harmonic oscillators. This paper emphasizes the critical role of prior selection in ensuring the consistency of NPE outcomes. Our findings indicate a clear relationship between NPE performance across the parameter space and the number of similar samples trained on by the model. Thus, the prior should match the sample diversity across the parameter space to promote strong, uniform performance. Furthermore, we introduce a novel procedure, in which amortized and sequential NPE are combined to swiftly refine NPE predictions for individual events. This method substantially improves sample efficiency, on average from nearly 0% to 10-80% within ten minutes. Notably, our research demonstrates its real-world applicability by achieving a significant milestone: accurate and swift inference of posterior distributions for low-mass binary black hole (BBH) events with NPE.
△ Less
Submitted 4 March, 2024;
originally announced March 2024.
-
Going Grayscale: The Road to Understanding and Improving Unlearnable Examples
Authors:
Zhuoran Liu,
Zhengyu Zhao,
Alex Kolmus,
Tijn Berns,
Twan van Laarhoven,
Tom Heskes,
Martha Larson
Abstract:
Recent work has shown that imperceptible perturbations can be applied to craft unlearnable examples (ULEs), i.e. images whose content cannot be used to improve a classifier during training. In this paper, we reveal the road that researchers should follow for understanding ULEs and improving ULEs as they were originally formulated (ULEOs). The paper makes four contributions. First, we show that ULE…
▽ More
Recent work has shown that imperceptible perturbations can be applied to craft unlearnable examples (ULEs), i.e. images whose content cannot be used to improve a classifier during training. In this paper, we reveal the road that researchers should follow for understanding ULEs and improving ULEs as they were originally formulated (ULEOs). The paper makes four contributions. First, we show that ULEOs exploit color and, consequently, their effects can be mitigated by simple grayscale pre-filtering, without resorting to adversarial training. Second, we propose an extension to ULEOs, which is called ULEO-GrayAugs, that forces the generated ULEs away from channel-wise color perturbations by making use of grayscale knowledge and data augmentations during optimization. Third, we show that ULEOs generated using Multi-Layer Perceptrons (MLPs) are effective in the case of complex Convolutional Neural Network (CNN) classifiers, suggesting that CNNs suffer specific vulnerability to ULEs. Fourth, we demonstrate that when a classifier is trained on ULEOs, adversarial training will prevent a drop in accuracy measured both on clean images and on adversarial images. Taken together, our contributions represent a substantial advance in the state of art of unlearnable examples, but also reveal important characteristics of their behavior that must be better understood in order to achieve further improvements.
△ Less
Submitted 25 November, 2021;
originally announced November 2021.
-
Swift sky localization of gravitational waves using deep learning seeded importance sampling
Authors:
Alex Kolmus,
Grégory Baltus,
Justin Janquart,
Twan van Laarhoven,
Sarah Caudill,
Tom Heskes
Abstract:
Fast, highly accurate, and reliable inference of the sky origin of gravitational waves would enable real-time multi-messenger astronomy. Current Bayesian inference methodologies, although highly accurate and reliable, are slow. Deep learning models have shown themselves to be accurate and extremely fast for inference tasks on gravitational waves, but their output is inherently questionable due to…
▽ More
Fast, highly accurate, and reliable inference of the sky origin of gravitational waves would enable real-time multi-messenger astronomy. Current Bayesian inference methodologies, although highly accurate and reliable, are slow. Deep learning models have shown themselves to be accurate and extremely fast for inference tasks on gravitational waves, but their output is inherently questionable due to the blackbox nature of neural networks. In this work, we join Bayesian inference and deep learning by applying importance sampling on an approximate posterior generated by a multi-headed convolutional neural network. The neural network parametrizes Von Mises-Fisher and Gaussian distributions for the sky coordinates and two masses for given simulated gravitational wave injections in the LIGO and Virgo detectors. We generate skymaps for unseen gravitational-wave events that highly resemble predictions generated using Bayesian inference in a few minutes. Furthermore, we can detect poor predictions from the neural network, and quickly flag them.
△ Less
Submitted 1 November, 2021;
originally announced November 2021.
-
Universal approximation and model compression for radial neural networks
Authors:
Iordan Ganev,
Twan van Laarhoven,
Robin Walters
Abstract:
We introduce a class of fully-connected neural networks whose activation functions, rather than being pointwise, rescale feature vectors by a function depending only on their norm. We call such networks radial neural networks, extending previous work on rotation equivariant networks that considers rescaling activations in less generality. We prove universal approximation theorems for radial neural…
▽ More
We introduce a class of fully-connected neural networks whose activation functions, rather than being pointwise, rescale feature vectors by a function depending only on their norm. We call such networks radial neural networks, extending previous work on rotation equivariant networks that considers rescaling activations in less generality. We prove universal approximation theorems for radial neural networks, including in the more difficult cases of bounded widths and unbounded domains. Our proof techniques are novel, distinct from those in the pointwise case. Additionally, radial neural networks exhibit a rich group of orthogonal change-of-basis symmetries on the vector space of trainable parameters. Factoring out these symmetries leads to a practical lossless model compression algorithm. Optimization of the compressed model by gradient descent is equivalent to projected gradient descent for the full model.
△ Less
Submitted 16 February, 2023; v1 submitted 6 July, 2021;
originally announced July 2021.
-
Gaussian Processes with Skewed Laplace Spectral Mixture Kernels for Long-term Forecasting
Authors:
Kai Chen,
Twan van Laarhoven,
Elena Marchiori
Abstract:
Long-term forecasting involves predicting a horizon that is far ahead of the last observation. It is a problem of high practical relevance, for instance for companies in order to decide upon expensive long-term investments. Despite the recent progress and success of Gaussian processes (GPs) based on spectral mixture kernels, long-term forecasting remains a challenging problem for these kernels bec…
▽ More
Long-term forecasting involves predicting a horizon that is far ahead of the last observation. It is a problem of high practical relevance, for instance for companies in order to decide upon expensive long-term investments. Despite the recent progress and success of Gaussian processes (GPs) based on spectral mixture kernels, long-term forecasting remains a challenging problem for these kernels because they decay exponentially at large horizons. This is mainly due to their use of a mixture of Gaussians to model spectral densities. Characteristics of the signal important for long-term forecasting can be unravelled by investigating the distribution of the Fourier coefficients of (the training part of) the signal, which is non-smooth, heavy-tailed, sparse, and skewed. The heavy tail and skewness characteristics of such distributions in the spectral domain allow to capture long-range covariance of the signal in the time domain. Motivated by these observations, we propose to model spectral densities using a skewed Laplace spectral mixture (SLSM) due to the skewness of its peaks, sparsity, non-smoothness, and heavy tail characteristics. By applying the inverse Fourier Transform to this spectral density we obtain a new GP kernel for long-term forecasting. In addition, we adapt the lottery ticket method, originally developed to prune weights of a neural network, to GPs in order to automatically select the number of kernel components. Results of extensive experiments, including a multivariate time series, show the beneficial effect of the proposed SLSM kernel for long-term extrapolation and robustness to the choice of the number of mixture components.
△ Less
Submitted 2 October, 2021; v1 submitted 8 November, 2020;
originally announced November 2020.
-
Approximate Voronoi cells for lattices, revisited
Authors:
Thijs Laarhoven
Abstract:
We revisit the approximate Voronoi cells approach for solving the closest vector problem with preprocessing (CVPP) on high-dimensional lattices, and settle the open problem of Doulgerakis-Laarhoven-De Weger [PQCrypto, 2019] of determining exact asymptotics on the volume of these Voronoi cells under the Gaussian heuristic. As a result, we obtain improved upper bounds on the time complexity of the r…
▽ More
We revisit the approximate Voronoi cells approach for solving the closest vector problem with preprocessing (CVPP) on high-dimensional lattices, and settle the open problem of Doulgerakis-Laarhoven-De Weger [PQCrypto, 2019] of determining exact asymptotics on the volume of these Voronoi cells under the Gaussian heuristic. As a result, we obtain improved upper bounds on the time complexity of the randomized iterative slicer when using less than $2^{0.076d + o(d)}$ memory, and we show how to obtain time-memory trade-offs even when using less than $2^{0.048d + o(d)}$ memory. We also settle the open problem of obtaining a continuous trade-off between the size of the advice and the query time complexity, as the time complexity with subexponential advice in our approach scales as $d^{d/2 + o(d)}$, matching worst-case enumeration bounds, and achieving the same asymptotic scaling as average-case enumeration algorithms for the closest vector problem.
△ Less
Submitted 10 July, 2019;
originally announced July 2019.
-
Evolutionary techniques in lattice sieving algorithms
Authors:
Thijs Laarhoven
Abstract:
Lattice-based cryptography has recently emerged as a prominent candidate for secure communication in the quantum age. Its security relies on the hardness of certain lattice problems, and the inability of known lattice algorithms, such as lattice sieving, to solve these problems efficiently. In this paper we investigate the similarities between lattice sieving and evolutionary algorithms, how vario…
▽ More
Lattice-based cryptography has recently emerged as a prominent candidate for secure communication in the quantum age. Its security relies on the hardness of certain lattice problems, and the inability of known lattice algorithms, such as lattice sieving, to solve these problems efficiently. In this paper we investigate the similarities between lattice sieving and evolutionary algorithms, how various improvements to lattice sieving can be viewed as applications of known techniques from evolutionary computation, and how other evolutionary techniques can benefit lattice sieving in practice.
△ Less
Submitted 10 July, 2019;
originally announced July 2019.
-
Polytopes, lattices, and spherical codes for the nearest neighbor problem
Authors:
Thijs Laarhoven
Abstract:
We study locality-sensitive hash methods for the nearest neighbor problem for the angular distance, focusing on the approach of first projecting down onto a low-dimensional subspace, and then partitioning the projected vectors according to Voronoi cells induced by a suitable spherical code. This approach generalizes and interpolates between the fast but suboptimal hyperplane hashing of Charikar [S…
▽ More
We study locality-sensitive hash methods for the nearest neighbor problem for the angular distance, focusing on the approach of first projecting down onto a low-dimensional subspace, and then partitioning the projected vectors according to Voronoi cells induced by a suitable spherical code. This approach generalizes and interpolates between the fast but suboptimal hyperplane hashing of Charikar [STOC'02] and the asymptotically optimal but practically often slower hash families of Andoni-Indyk [FOCS'06], Andoni-Indyk-Nguyen-Razenshteyn [SODA'14] and Andoni-Indyk-Laarhoven-Razenshteyn-Schmidt [NIPS'15]. We set up a framework for analyzing the performance of any spherical code in this context, and we provide results for various codes from the literature, such as those related to regular polytopes and root lattices. Similar to hyperplane hashing, and unlike cross-polytope hashing, our analysis of collision probabilities and query exponents is exact and does not hide order terms which vanish only for large $d$, facilitating an easy parameter selection.
For the two-dimensional case, we derive closed-form expressions for arbitrary spherical codes, and we show that the equilateral triangle is optimal, achieving a better performance than the two-dimensional analogues of hyperplane and cross-polytope hashing. In three and four dimensions, we numerically find that the tetrahedron, $5$-cell, and $16$-cell achieve the best query exponents, while in five or more dimensions orthoplices appear to outperform regular simplices, as well as the root lattice families $A_k$ and $D_k$. We argue that in higher dimensions, larger spherical codes will likely exist which will outperform orthoplices in theory, and we argue why using the $D_k$ root lattices will likely lead to better results in practice, due to a better trade-off between the asymptotic query exponent and the concrete costs of hashing.
△ Less
Submitted 22 April, 2020; v1 submitted 10 July, 2019;
originally announced July 2019.
-
Unsupervised Domain Adaptation using Graph Transduction Games
Authors:
Sebastiano Vascon,
Sinem Aslan,
Alessandro Torcinovich,
Twan van Laarhoven,
Elena Marchiori,
Marcello Pelillo
Abstract:
Unsupervised domain adaptation (UDA) amounts to assigning class labels to the unlabeled instances of a dataset from a target domain, using labeled instances of a dataset from a related source domain. In this paper, we propose to cast this problem in a game-theoretic setting as a non-cooperative game and introduce a fully automatized iterative algorithm for UDA based on graph transduction games (GT…
▽ More
Unsupervised domain adaptation (UDA) amounts to assigning class labels to the unlabeled instances of a dataset from a target domain, using labeled instances of a dataset from a related source domain. In this paper, we propose to cast this problem in a game-theoretic setting as a non-cooperative game and introduce a fully automatized iterative algorithm for UDA based on graph transduction games (GTG). The main advantages of this approach are its principled foundation, guaranteed termination of the iterative algorithms to a Nash equilibrium (which corresponds to a consistent labeling condition) and soft labels quantifying the uncertainty of the label assignment process. We also investigate the beneficial effect of using pseudo-labels from linear classifiers to initialize the iterative process. The performance of the resulting methods is assessed on publicly available object recognition benchmark datasets involving both shallow and deep features. Results of experiments demonstrate the suitability of the proposed game-theoretic approach for solving UDA tasks.
△ Less
Submitted 6 May, 2019;
originally announced May 2019.
-
Nearest neighbor decoding for Tardos fingerprinting codes
Authors:
Thijs Laarhoven
Abstract:
Over the past decade, various improvements have been made to Tardos' collusion-resistant fingerprinting scheme [Tardos, STOC 2003], ultimately resulting in a good understanding of what is the minimum code length required to achieve collusion-resistance. In contrast, decreasing the cost of the actual decoding algorithm for identifying the potential colluders has received less attention, even though…
▽ More
Over the past decade, various improvements have been made to Tardos' collusion-resistant fingerprinting scheme [Tardos, STOC 2003], ultimately resulting in a good understanding of what is the minimum code length required to achieve collusion-resistance. In contrast, decreasing the cost of the actual decoding algorithm for identifying the potential colluders has received less attention, even though previous results have shown that using joint decoding strategies, deemed too expensive for decoding, may lead to better code lengths. Moreover, in dynamic settings a fast decoder may be required to provide answers in real-time, further raising the question whether the decoding costs of score-based fingerprinting schemes can be decreased with a smarter decoding algorithm. In this paper we show how to model the decoding step of score-based fingerprinting as a nearest neighbor search problem, and how this relation allows us to apply techniques from the field of (approximate) nearest neighbor searching to obtain decoding times which are sublinear in the total number of users. As this does not affect the encoding and embedding steps, this decoding mechanism can easily be deployed within existing fingerprinting schemes, and this may bring a truly efficient joint decoder closer to reality. Besides the application to fingerprinting, similar techniques can be used to decrease the decoding costs of group testing methods, which may be of independent interest.
△ Less
Submitted 16 February, 2019;
originally announced February 2019.
-
Multi-Output Convolution Spectral Mixture for Gaussian Processes
Authors:
Kai Chen,
Twan van Laarhoven,
Perry Groot,
**song Chen,
Elena Marchiori
Abstract:
Multi-output Gaussian processes (MOGPs) are an extension of Gaussian Processes (GPs) for predicting multiple output variables (also called channels, tasks) simultaneously. In this paper we use the convolution theorem to design a new kernel for MOGPs, by modeling cross channel dependencies through cross convolution of time and phase delayed components in the spectral domain. The resulting kernel is…
▽ More
Multi-output Gaussian processes (MOGPs) are an extension of Gaussian Processes (GPs) for predicting multiple output variables (also called channels, tasks) simultaneously. In this paper we use the convolution theorem to design a new kernel for MOGPs, by modeling cross channel dependencies through cross convolution of time and phase delayed components in the spectral domain. The resulting kernel is called Multi-Output Convolution Spectral Mixture (MOCSM) kernel. Results of extensive experiments on synthetic and real-life datasets demonstrate the advantages of the proposed kernel and its state of the art performance. MOCSM enjoys the desirable property to reduce to the well known Spectral Mixture (SM) kernel when a single-channel is considered. A comparison with the recently introduced Multi-Output Spectral Mixture kernel reveals that this is not the case for the latter kernel, which contains quadratic terms that generate undesirable scale effects when the spectral densities of different channels are either very close or very far from each other in the frequency domain.
△ Less
Submitted 7 October, 2021; v1 submitted 7 August, 2018;
originally announced August 2018.
-
Multitask Gaussian Process with Hierarchical Latent Interactions
Authors:
Kai Chen,
Twan van Laarhoven,
Elena Marchiori,
Feng Yin,
Shuguang Cui
Abstract:
Multitask Gaussian process (MTGP) is powerful for joint learning of multiple tasks with complicated correlation patterns. However, due to the assembling of additive independent latent functions, all current MTGPs including the salient linear model of coregionalization (LMC) and convolution frameworks cannot effectively represent and learn the hierarchical latent interactions between its latent fun…
▽ More
Multitask Gaussian process (MTGP) is powerful for joint learning of multiple tasks with complicated correlation patterns. However, due to the assembling of additive independent latent functions, all current MTGPs including the salient linear model of coregionalization (LMC) and convolution frameworks cannot effectively represent and learn the hierarchical latent interactions between its latent functions. In this paper, we further investigate the interactions in LMC of MTGP and then propose a novel kernel representation of the hierarchical interactions, which ameliorates both the expressiveness and the interpretability of MTGP. Specifically, we express the interaction as a product of function interaction and coefficient interaction. The function interaction is modeled by using cross convolution of latent functions. The coefficient interaction between the LMCs is described as a cross coregionalization term. We validate that considering the interactions can promote knowledge transferring in MTGP and compare our approach with some state-of-the-art MTGPs on both synthetic- and real-world datasets.
△ Less
Submitted 2 October, 2021; v1 submitted 3 August, 2018;
originally announced August 2018.
-
Generative models for local network community detection
Authors:
Twan van Laarhoven
Abstract:
Local network community detection aims to find a single community in a large network, while inspecting only a small part of that network around a given seed node. This is much cheaper than finding all communities in a network. Most methods for local community detection are formulated as ad-hoc optimization problems. In this work, we instead start from a generative model for networks with community…
▽ More
Local network community detection aims to find a single community in a large network, while inspecting only a small part of that network around a given seed node. This is much cheaper than finding all communities in a network. Most methods for local community detection are formulated as ad-hoc optimization problems. In this work, we instead start from a generative model for networks with community structure. By assuming that the network is uniform, we can approximate the structure of unobserved parts of the network to obtain a method for local community detection. We apply this local approximation technique to two variants of the stochastic block model. To our knowledge, this results in the first local community detection methods based on probabilistic models. Interestingly, in the limit, one of the proposed approximations corresponds to conductance, a popular metric in this field. Experiments on real and synthetic datasets show comparable or improved results compared to state-of-the-art local community detection algorithms.
△ Less
Submitted 12 April, 2018;
originally announced April 2018.
-
Adversarial Alignment of Class Prediction Uncertainties for Domain Adaptation
Authors:
Jeroen Manders,
Twan van Laarhoven,
Elena Marchiori
Abstract:
We consider unsupervised domain adaptation: given labelled examples from a source domain and unlabelled examples from a related target domain, the goal is to infer the labels of target examples. Under the assumption that features from pre-trained deep neural networks are transferable across related domains, domain adaptation reduces to aligning source and target domain at class prediction uncertai…
▽ More
We consider unsupervised domain adaptation: given labelled examples from a source domain and unlabelled examples from a related target domain, the goal is to infer the labels of target examples. Under the assumption that features from pre-trained deep neural networks are transferable across related domains, domain adaptation reduces to aligning source and target domain at class prediction uncertainty level. We tackle this problem by introducing a method based on adversarial learning which forces the label uncertainty predictions on the target domain to be indistinguishable from those on the source domain. Pre-trained deep neural networks are used to generate deep features having high transferability across related domains. We perform an extensive experimental analysis of the proposed method over a wide set of publicly available pre-trained deep neural networks. Results of our experiments on domain adaptation tasks for image classification show that class prediction uncertainty alignment with features extracted from pre-trained deep neural networks provides an efficient, robust and effective method for domain adaptation.
△ Less
Submitted 4 January, 2019; v1 submitted 12 April, 2018;
originally announced April 2018.
-
Domain Adaptation with Randomized Expectation Maximization
Authors:
Twan van Laarhoven,
Elena Marchiori
Abstract:
Domain adaptation (DA) is the task of classifying an unlabeled dataset (target) using a labeled dataset (source) from a related domain. The majority of successful DA methods try to directly match the distributions of the source and target data by transforming the feature space. Despite their success, state of the art methods based on this approach are either involved or unable to directly scale to…
▽ More
Domain adaptation (DA) is the task of classifying an unlabeled dataset (target) using a labeled dataset (source) from a related domain. The majority of successful DA methods try to directly match the distributions of the source and target data by transforming the feature space. Despite their success, state of the art methods based on this approach are either involved or unable to directly scale to data with many features. This article shows that domain adaptation can be successfully performed by using a very simple randomized expectation maximization (EM) method. We consider two instances of the method, which involve logistic regression and support vector machine, respectively. The underlying assumption of the proposed method is the existence of a good single linear classifier for both source and target domain. The potential limitations of this assumption are alleviated by the flexibility of the method, which can directly incorporate deep features extracted from a pre-trained deep neural network. The resulting algorithm is strikingly easy to implement and apply. We test its performance on 36 real-life adaptation tasks over text and image data with diverse characteristics. The method achieves state-of-the-art results, competitive with those of involved end-to-end deep transfer-learning methods.
△ Less
Submitted 20 March, 2018;
originally announced March 2018.
-
Graph-based time-space trade-offs for approximate near neighbors
Authors:
Thijs Laarhoven
Abstract:
We take a first step towards a rigorous asymptotic analysis of graph-based approaches for finding (approximate) nearest neighbors in high-dimensional spaces, by analyzing the complexity of (randomized) greedy walks on the approximate near neighbor graph. For random data sets of size $n = 2^{o(d)}$ on the $d$-dimensional Euclidean unit sphere, using near neighbor graphs we can provably solve the ap…
▽ More
We take a first step towards a rigorous asymptotic analysis of graph-based approaches for finding (approximate) nearest neighbors in high-dimensional spaces, by analyzing the complexity of (randomized) greedy walks on the approximate near neighbor graph. For random data sets of size $n = 2^{o(d)}$ on the $d$-dimensional Euclidean unit sphere, using near neighbor graphs we can provably solve the approximate nearest neighbor problem with approximation factor $c > 1$ in query time $n^{ρ_q + o(1)}$ and space $n^{1 + ρ_s + o(1)}$, for arbitrary $ρ_q, ρ_s \geq 0$ satisfying \begin{align} (2c^2 - 1) ρ_q + 2 c^2 (c^2 - 1) \sqrt{ρ_s (1 - ρ_s)} \geq c^4. \end{align} Graph-based near neighbor searching is especially competitive with hash-based methods for small $c$ and near-linear memory, and in this regime the asymptotic scaling of a greedy graph-based search matches the recent optimal hash-based trade-offs of Andoni-Laarhoven-Razenshteyn-Waingarten [SODA'17]. We further study how the trade-offs scale when the data set is of size $n = 2^{Θ(d)}$, and analyze asymptotic complexities when applying these results to lattice sieving.
△ Less
Submitted 8 December, 2017;
originally announced December 2017.
-
Spectral-spatial classification of hyperspectral images: three tricks and a new supervised learning setting
Authors:
Jacopo Acquarelli,
Elena Marchiori,
Lutgarde M. C. Buydens,
Thanh Tran,
Twan van Laarhoven
Abstract:
Spectral-spatial classification of hyperspectral images has been the subject of many studies in recent years. In the presence of only very few labeled pixels, this task becomes challenging. In this paper we address the following two research questions: 1) Can a simple neural network with just a single hidden layer achieve state of the art performance in the presence of few labeled pixels? 2) How i…
▽ More
Spectral-spatial classification of hyperspectral images has been the subject of many studies in recent years. In the presence of only very few labeled pixels, this task becomes challenging. In this paper we address the following two research questions: 1) Can a simple neural network with just a single hidden layer achieve state of the art performance in the presence of few labeled pixels? 2) How is the performance of hyperspectral image classification methods affected when using disjoint train and test sets? We give a positive answer to the first question by using three tricks within a very basic shallow Convolutional Neural Network (CNN) architecture: a tailored loss function, and smooth- and label-based data augmentation. The tailored loss function enforces that neighborhood wavelengths have similar contributions to the features generated during training. A new label-based technique here proposed favors selection of pixels in smaller classes, which is beneficial in the presence of very few labeled pixels and skewed class distributions. To address the second question, we introduce a new sampling procedure to generate disjoint train and test set. Then the train set is used to obtain the CNN model, which is then applied to pixels in the test set to estimate their labels. We assess the efficacy of the simple neural network method on five publicly available hyperspectral images. On these images our method significantly outperforms considered baselines. Notably, with just 1% of labeled pixels per class, on these datasets our method achieves an accuracy that goes from 86.42% (challenging dataset) to 99.52% (easy dataset). Furthermore we show that the simple neural network method improves over other baselines in the new challenging supervised setting. Our analysis substantiates the highly beneficial effect of using the entire image (so train and test data) for constructing a model.
△ Less
Submitted 23 July, 2018; v1 submitted 15 November, 2017;
originally announced November 2017.
-
Deep Learning for Automatic Stereotypical Motor Movement Detection using Wearable Sensors in Autism Spectrum Disorders
Authors:
Nastaran Mohammadian Rad,
Seyed Mostafa Kia,
Calogero Zarbo,
Twan van Laarhoven,
Giuseppe Jurman,
Paola Venuti,
Elena Marchiori,
Cesare Furlanello
Abstract:
Autism Spectrum Disorders are associated with atypical movements, of which stereotypical motor movements (SMMs) interfere with learning and social interaction. The automatic SMM detection using inertial measurement units (IMU) remains complex due to the strong intra and inter-subject variability, especially when handcrafted features are extracted from the signal. We propose a new application of th…
▽ More
Autism Spectrum Disorders are associated with atypical movements, of which stereotypical motor movements (SMMs) interfere with learning and social interaction. The automatic SMM detection using inertial measurement units (IMU) remains complex due to the strong intra and inter-subject variability, especially when handcrafted features are extracted from the signal. We propose a new application of the deep learning to facilitate automatic SMM detection using multi-axis IMUs. We use a convolutional neural network (CNN) to learn a discriminative feature space from raw data. We show how the CNN can be used for parameter transfer learning to enhance the detection rate on longitudinal data. We also combine the long short-term memory (LSTM) with CNN to model the temporal patterns in a sequence of multi-axis signals. Further, we employ ensemble learning to combine multiple LSTM learners into a more robust SMM detector. Our results show that: 1) feature learning outperforms handcrafted features; 2) parameter transfer learning is beneficial in longitudinal settings; 3) using LSTM to learn the temporal dynamic of signals enhances the detection rate especially for skewed training data; 4) an ensemble of LSTMs provides more accurate and stable detectors. These findings provide a significant step toward accurate SMM detection in real-time scenarios.
△ Less
Submitted 14 September, 2017;
originally announced September 2017.
-
L2 Regularization versus Batch and Weight Normalization
Authors:
Twan van Laarhoven
Abstract:
Batch Normalization is a commonly used trick to improve the training of deep neural networks. These neural networks use L2 regularization, also called weight decay, ostensibly to prevent overfitting. However, we show that L2 regularization has no regularizing effect when combined with normalization. Instead, regularization has an influence on the scale of weights, and thereby on the effective lear…
▽ More
Batch Normalization is a commonly used trick to improve the training of deep neural networks. These neural networks use L2 regularization, also called weight decay, ostensibly to prevent overfitting. However, we show that L2 regularization has no regularizing effect when combined with normalization. Instead, regularization has an influence on the scale of weights, and thereby on the effective learning rate. We investigate this dependence, both in theory, and experimentally. We show that popular optimization methods such as ADAM only partially eliminate the influence of normalization on the learning rate. This leads to a discussion on other ways to mitigate this issue.
△ Less
Submitted 16 June, 2017;
originally announced June 2017.
-
Unsupervised Domain Adaptation with Random Walks on Target Labelings
Authors:
Twan van Laarhoven,
Elena Marchiori
Abstract:
Unsupervised Domain Adaptation (DA) is used to automatize the task of labeling data: an unlabeled dataset (target) is annotated using a labeled dataset (source) from a related domain. We cast domain adaptation as the problem of finding stable labels for target examples. A new definition of label stability is proposed, motivated by a generalization error bound for large margin linear classifiers: a…
▽ More
Unsupervised Domain Adaptation (DA) is used to automatize the task of labeling data: an unlabeled dataset (target) is annotated using a labeled dataset (source) from a related domain. We cast domain adaptation as the problem of finding stable labels for target examples. A new definition of label stability is proposed, motivated by a generalization error bound for large margin linear classifiers: a target labeling is stable when, with high probability, a classifier trained on a random subsample of the target with that labeling yields the same labeling. We find stable labelings using a random walk on a directed graph with transition probabilities based on labeling stability. The majority vote of those labelings visited by the walk yields a stable label for each target example. The resulting domain adaptation algorithm is strikingly easy to implement and apply: It does not rely on data transformations, which are in general computational prohibitive in the presence of many input features, and does not need to access the source data, which is advantageous when data sharing is restricted. By acting on the original feature space, our method is able to take full advantage of deep features from external pre-trained neural networks, as demonstrated by the results of our experiments.
△ Less
Submitted 20 March, 2018; v1 submitted 16 June, 2017;
originally announced June 2017.
-
Faster tuple lattice sieving using spherical locality-sensitive filters
Authors:
Thijs Laarhoven
Abstract:
To overcome the large memory requirement of classical lattice sieving algorithms for solving hard lattice problems, Bai-Laarhoven-Stehlé [ANTS 2016] studied tuple lattice sieving, where tuples instead of pairs of lattice vectors are combined to form shorter vectors. Herold-Kirshanova [PKC 2017] recently improved upon their results for arbitrary tuple sizes, for example showing that a triple sieve…
▽ More
To overcome the large memory requirement of classical lattice sieving algorithms for solving hard lattice problems, Bai-Laarhoven-Stehlé [ANTS 2016] studied tuple lattice sieving, where tuples instead of pairs of lattice vectors are combined to form shorter vectors. Herold-Kirshanova [PKC 2017] recently improved upon their results for arbitrary tuple sizes, for example showing that a triple sieve can solve the shortest vector problem (SVP) in dimension $d$ in time $2^{0.3717d + o(d)}$, using a technique similar to locality-sensitive hashing for finding nearest neighbors.
In this work, we generalize the spherical locality-sensitive filters of Becker-Ducas-Gama-Laarhoven [SODA 2016] to obtain space-time tradeoffs for near neighbor searching on dense data sets, and we apply these techniques to tuple lattice sieving to obtain even better time complexities. For instance, our triple sieve heuristically solves SVP in time $2^{0.3588d + o(d)}$. For practical sieves based on Micciancio-Voulgaris' GaussSieve [SODA 2010], this shows that a triple sieve uses less space and less time than the current best near-linear space double sieve.
△ Less
Submitted 23 February, 2018; v1 submitted 8 May, 2017;
originally announced May 2017.
-
Hypercube LSH for approximate near neighbors
Authors:
Thijs Laarhoven
Abstract:
A celebrated technique for finding near neighbors for the angular distance involves using a set of \textit{random} hyperplanes to partition the space into hash regions [Charikar, STOC 2002]. Experiments later showed that using a set of \textit{orthogonal} hyperplanes, thereby partitioning the space into the Voronoi regions induced by a hypercube, leads to even better results [Terasawa and Tanaka,…
▽ More
A celebrated technique for finding near neighbors for the angular distance involves using a set of \textit{random} hyperplanes to partition the space into hash regions [Charikar, STOC 2002]. Experiments later showed that using a set of \textit{orthogonal} hyperplanes, thereby partitioning the space into the Voronoi regions induced by a hypercube, leads to even better results [Terasawa and Tanaka, WADS 2007]. However, no theoretical explanation for this improvement was ever given, and it remained unclear how the resulting hypercube hash method scales in high dimensions.
In this work, we provide explicit asymptotics for the collision probabilities when using hypercubes to partition the space. For instance, two near-orthogonal vectors are expected to collide with probability $(\frac{1}π)^{d + o(d)}$ in dimension $d$, compared to $(\frac{1}{2})^d$ when using random hyperplanes. Vectors at angle $\fracπ{3}$ collide with probability $(\frac{\sqrt{3}}π)^{d + o(d)}$, compared to $(\frac{2}{3})^d$ for random hyperplanes, and near-parallel vectors collide with similar asymptotic probabilities in both cases.
For $c$-approximate nearest neighbor searching, this translates to a decrease in the exponent $ρ$ of locality-sensitive hashing (LSH) methods of a factor up to $\log_2(π) \approx 1.652$ compared to hyperplane LSH. For $c = 2$, we obtain $ρ\approx 0.302 + o(1)$ for hypercube LSH, improving upon the $ρ\approx 0.377$ for hyperplane LSH. We further describe how to use hypercube LSH in practice, and we consider an example application in the area of lattice algorithms.
△ Less
Submitted 19 February, 2017;
originally announced February 2017.
-
Optimal Hashing-based Time-Space Trade-offs for Approximate Near Neighbors
Authors:
Alexandr Andoni,
Thijs Laarhoven,
Ilya Razenshteyn,
Erik Waingarten
Abstract:
[See the paper for the full abstract.]
We show tight upper and lower bounds for time-space trade-offs for the $c$-Approximate Near Neighbor Search problem. For the $d$-dimensional Euclidean space and $n$-point datasets, we develop a data structure with space $n^{1 + ρ_u + o(1)} + O(dn)$ and query time $n^{ρ_q + o(1)} + d n^{o(1)}$ for every $ρ_u, ρ_q \geq 0$ such that: \begin{equation} c^2 \sqrt…
▽ More
[See the paper for the full abstract.]
We show tight upper and lower bounds for time-space trade-offs for the $c$-Approximate Near Neighbor Search problem. For the $d$-dimensional Euclidean space and $n$-point datasets, we develop a data structure with space $n^{1 + ρ_u + o(1)} + O(dn)$ and query time $n^{ρ_q + o(1)} + d n^{o(1)}$ for every $ρ_u, ρ_q \geq 0$ such that: \begin{equation} c^2 \sqrt{ρ_q} + (c^2 - 1) \sqrt{ρ_u} = \sqrt{2c^2 - 1}. \end{equation}
This is the first data structure that achieves sublinear query time and near-linear space for every approximation factor $c > 1$, improving upon [Kapralov, PODS 2015]. The data structure is a culmination of a long line of work on the problem for all space regimes; it builds on Spherical Locality-Sensitive Filtering [Becker, Ducas, Gama, Laarhoven, SODA 2016] and data-dependent hashing [Andoni, Indyk, Nguyen, Razenshteyn, SODA 2014] [Andoni, Razenshteyn, STOC 2015].
Our matching lower bounds are of two types: conditional and unconditional. First, we prove tightness of the whole above trade-off in a restricted model of computation, which captures all known hashing-based approaches. We then show unconditional cell-probe lower bounds for one and two probes that match the above trade-off for $ρ_q = 0$, improving upon the best known lower bounds from [Panigrahy, Talwar, Wieder, FOCS 2010]. In particular, this is the first space lower bound (for any static data structure) for two probes which is not polynomially smaller than the one-probe bound. To show the result for two probes, we establish and exploit a connection to locally-decodable codes.
△ Less
Submitted 21 May, 2017; v1 submitted 11 August, 2016;
originally announced August 2016.
-
Sieving for closest lattice vectors (with preprocessing)
Authors:
Thijs Laarhoven
Abstract:
Lattice-based cryptography has recently emerged as a prime candidate for efficient and secure post-quantum cryptography. The two main hard problems underlying its security are the shortest vector problem (SVP) and the closest vector problem (CVP). Various algorithms have been studied for solving these problems, and for SVP, lattice sieving currently dominates in terms of the asymptotic time comple…
▽ More
Lattice-based cryptography has recently emerged as a prime candidate for efficient and secure post-quantum cryptography. The two main hard problems underlying its security are the shortest vector problem (SVP) and the closest vector problem (CVP). Various algorithms have been studied for solving these problems, and for SVP, lattice sieving currently dominates in terms of the asymptotic time complexity: one can heuristically solve SVP in time $2^{0.292d}$ in high dimensions $d$ [BDGL'16]. Although several SVP algorithms can also be used to solve CVP, it is not clear whether this also holds for heuristic lattice sieving methods. The best time complexity for CVP is currently $2^{0.377d}$ [BGJ'14].
In this paper we revisit sieving algorithms for solving SVP, and study how these algorithms can be modified to solve CVP and its variants as well. Our first method is aimed at solving one problem instance and minimizes the overall time complexity for a single CVP instance with a time complexity of $2^{0.292d}$. Our second method minimizes the amortized time complexity for several instances on the same lattice, at the cost of a larger preprocessing cost. We can solve the closest vector problem with preprocessing (CVPP) with $2^{0.636d}$ space and preprocessing, in $2^{0.136d}$ time, while the query complexity can even be reduced to $2^{εd}$ at the cost of preprocessing time and memory complexities of $(1/ε)^{O(d)}$.
For easier variants of CVP, such as approximate CVP and bounded distance decoding (BDD), we further show how the preprocessing method achieves even better complexities. For instance, we can solve approximate CVPP with large approximation factors $k$ with polynomial-sized advice in polynomial time if $k = Ω(\sqrt{d/\log d})$, heuristically closing the gap between the decision-CVPP result of [AR'04] and the search-CVPP result of [DRS'14].
△ Less
Submitted 16 July, 2016;
originally announced July 2016.
-
Lower Bounds on Time-Space Trade-Offs for Approximate Near Neighbors
Authors:
Alexandr Andoni,
Thijs Laarhoven,
Ilya Razenshteyn,
Erik Waingarten
Abstract:
We show tight lower bounds for the entire trade-off between space and query time for the Approximate Near Neighbor search problem. Our lower bounds hold in a restricted model of computation, which captures all hashing-based approaches. In articular, our lower bound matches the upper bound recently shown in [Laarhoven 2015] for the random instance on a Euclidean sphere (which we show in fact extend…
▽ More
We show tight lower bounds for the entire trade-off between space and query time for the Approximate Near Neighbor search problem. Our lower bounds hold in a restricted model of computation, which captures all hashing-based approaches. In articular, our lower bound matches the upper bound recently shown in [Laarhoven 2015] for the random instance on a Euclidean sphere (which we show in fact extends to the entire space $\mathbb{R}^d$ using the techniques from [Andoni, Razenshteyn 2015]).
We also show tight, unconditional cell-probe lower bounds for one and two probes, improving upon the best known bounds from [Panigrahy, Talwar, Wieder 2010]. In particular, this is the first space lower bound (for any static data structure) for two probes which is not polynomially smaller than for one probe. To show the result for two probes, we establish and exploit a connection to locally-decodable codes.
△ Less
Submitted 18 August, 2016; v1 submitted 9 May, 2016;
originally announced May 2016.
-
Local Network Community Detection with Continuous Optimization of Conductance and Weighted Kernel K-Means
Authors:
Twan van Laarhoven,
Elena Marchiori
Abstract:
Local network community detection is the task of finding a single community of nodes concentrated around few given seed nodes in a localized way. Conductance is a popular objective function used in many algorithms for local community detection. This paper studies a continuous relaxation of conductance. We show that continuous optimization of this objective still leads to discrete communities. We i…
▽ More
Local network community detection is the task of finding a single community of nodes concentrated around few given seed nodes in a localized way. Conductance is a popular objective function used in many algorithms for local community detection. This paper studies a continuous relaxation of conductance. We show that continuous optimization of this objective still leads to discrete communities. We investigate the relation of conductance with weighted kernel k-means for a single community, which leads to the introduction of a new objective function, $σ$-conductance. Conductance is obtained by setting $σ$ to $0$. Two algorithms, EMc and PGDc, are proposed to locally optimize $σ$-conductance and automatically tune the parameter $σ$. They are based on expectation maximization and projected gradient descent, respectively. We prove locality and give performance guarantees for EMc and PGDc for a class of dense and well separated communities centered around the seeds. Experiments are conducted on networks with ground-truth communities, comparing to state-of-the-art graph diffusion algorithms for conductance optimization. On large graphs, results indicate that EMc and PGDc stay localized and produce communities most similar to the ground, while graph diffusion algorithms generate large communities of lower quality.
△ Less
Submitted 17 August, 2016; v1 submitted 21 January, 2016;
originally announced January 2016.
-
Tradeoffs for nearest neighbors on the sphere
Authors:
Thijs Laarhoven
Abstract:
We consider tradeoffs between the query and update complexities for the (approximate) nearest neighbor problem on the sphere, extending the recent spherical filters to sparse regimes and generalizing the scheme and analysis to account for different tradeoffs. In a nutshell, for the sparse regime the tradeoff between the query complexity $n^{ρ_q}$ and update complexity $n^{ρ_u}$ for data sets of si…
▽ More
We consider tradeoffs between the query and update complexities for the (approximate) nearest neighbor problem on the sphere, extending the recent spherical filters to sparse regimes and generalizing the scheme and analysis to account for different tradeoffs. In a nutshell, for the sparse regime the tradeoff between the query complexity $n^{ρ_q}$ and update complexity $n^{ρ_u}$ for data sets of size $n$ is given by the following equation in terms of the approximation factor $c$ and the exponents $ρ_q$ and $ρ_u$: $$c^2\sqrt{ρ_q}+(c^2-1)\sqrt{ρ_u}=\sqrt{2c^2-1}.$$
For small $c=1+ε$, minimizing the time for updates leads to a linear space complexity at the cost of a query time complexity $n^{1-4ε^2}$. Balancing the query and update costs leads to optimal complexities $n^{1/(2c^2-1)}$, matching bounds from [Andoni-Razenshteyn, 2015] and [Dubiner, IEEE-TIT'10] and matching the asymptotic complexities of [Andoni-Razenshteyn, STOC'15] and [Andoni-Indyk-Laarhoven-Razenshteyn-Schmidt, NIPS'15]. A subpolynomial query time complexity $n^{o(1)}$ can be achieved at the cost of a space complexity of the order $n^{1/(4ε^2)}$, matching the bound $n^{Ω(1/ε^2)}$ of [Andoni-Indyk-Patrascu, FOCS'06] and [Panigrahy-Talwar-Wieder, FOCS'10] and improving upon results of [Indyk-Motwani, STOC'98] and [Kushilevitz-Ostrovsky-Rabani, STOC'98].
For large $c$, minimizing the update complexity results in a query complexity of $n^{2/c^2+O(1/c^4)}$, improving upon the related exponent for large $c$ of [Kapralov, PODS'15] by a factor $2$, and matching the bound $n^{Ω(1/c^2)}$ of [Panigrahy-Talwar-Wieder, FOCS'08]. Balancing the costs leads to optimal complexities $n^{1/(2c^2-1)}$, while a minimum query time complexity can be achieved with update complexity $n^{2/c^2+O(1/c^4)}$, improving upon the previous best exponents of Kapralov by a factor $2$.
△ Less
Submitted 9 September, 2016; v1 submitted 23 November, 2015;
originally announced November 2015.
-
Practical and Optimal LSH for Angular Distance
Authors:
Alexandr Andoni,
Piotr Indyk,
Thijs Laarhoven,
Ilya Razenshteyn,
Ludwig Schmidt
Abstract:
We show the existence of a Locality-Sensitive Hashing (LSH) family for the angular distance that yields an approximate Near Neighbor Search algorithm with the asymptotically optimal running time exponent. Unlike earlier algorithms with this property (e.g., Spherical LSH [Andoni, Indyk, Nguyen, Razenshteyn 2014], [Andoni, Razenshteyn 2015]), our algorithm is also practical, improving upon the well-…
▽ More
We show the existence of a Locality-Sensitive Hashing (LSH) family for the angular distance that yields an approximate Near Neighbor Search algorithm with the asymptotically optimal running time exponent. Unlike earlier algorithms with this property (e.g., Spherical LSH [Andoni, Indyk, Nguyen, Razenshteyn 2014], [Andoni, Razenshteyn 2015]), our algorithm is also practical, improving upon the well-studied hyperplane LSH [Charikar, 2002] in practice. We also introduce a multiprobe version of this algorithm, and conduct experimental evaluation on real and synthetic data sets.
We complement the above positive results with a fine-grained lower bound for the quality of any LSH family for angular distance. Our lower bound implies that the above LSH family exhibits a trade-off between evaluation time and quality that is close to optimal for a natural class of LSH functions.
△ Less
Submitted 9 September, 2015;
originally announced September 2015.
-
Optimal sequential fingerprinting: Wald vs. Tardos
Authors:
Thijs Laarhoven
Abstract:
We study sequential collusion-resistant fingerprinting, where the fingerprinting code is generated in advance but accusations may be made between rounds, and show that in this setting both the dynamic Tardos scheme and schemes building upon Wald's sequential probability ratio test (SPRT) are asymptotically optimal. We further compare these two approaches to sequential fingerprinting, highlighting…
▽ More
We study sequential collusion-resistant fingerprinting, where the fingerprinting code is generated in advance but accusations may be made between rounds, and show that in this setting both the dynamic Tardos scheme and schemes building upon Wald's sequential probability ratio test (SPRT) are asymptotically optimal. We further compare these two approaches to sequential fingerprinting, highlighting differences between the two schemes. Based on these differences, we argue that Wald's scheme should in general be preferred over the dynamic Tardos scheme, even though both schemes have their merits. As a side result, we derive an optimal sequential group testing method for the classical model, which can easily be generalized to different group testing models.
△ Less
Submitted 12 February, 2015;
originally announced February 2015.
-
Resolution-limit-free and local Non-negative Matrix Factorization quality functions for graph clustering
Authors:
Twan van Laarhoven,
Elena Marchiori
Abstract:
Many graph clustering quality functions suffer from a resolution limit, the inability to find small clusters in large graphs. So called resolution-limit-free quality functions do not have this limit. This property was previously introduced for hard clustering, that is, graph partitioning.
We investigate the resolution-limit-free property in the context of Non-negative Matrix Factorization (NMF)…
▽ More
Many graph clustering quality functions suffer from a resolution limit, the inability to find small clusters in large graphs. So called resolution-limit-free quality functions do not have this limit. This property was previously introduced for hard clustering, that is, graph partitioning.
We investigate the resolution-limit-free property in the context of Non-negative Matrix Factorization (NMF) for hard and soft graph clustering. To use NMF in the hard clustering setting, a common approach is to assign each node to its highest membership cluster. We show that in this case symmetric NMF is not resolution-limit-free, but that it becomes so when hardness constraints are used as part of the optimization. The resulting function is strongly linked to the Constant Potts Model. In soft clustering, nodes can belong to more than one cluster, with varying degrees of membership. In this setting resolution-limit-free turns out to be too strong a property. Therefore we introduce locality, which roughly states that changing one part of the graph does not affect the clustering of other parts of the graph. We argue that this is a desirable property, provide conditions under which NMF quality functions are local, and propose a novel class of local probabilistic NMF quality functions for soft graph clustering.
△ Less
Submitted 22 July, 2014;
originally announced July 2014.
-
Asymptotics of Fingerprinting and Group Testing: Capacity-Achieving Log-Likelihood Decoders
Authors:
Thijs Laarhoven
Abstract:
We study the large-coalition asymptotics of fingerprinting and group testing, and derive explicit decoders that provably achieve capacity for many of the considered models. We do this both for simple decoders (fast but suboptimal) and for joint decoders (slow but optimal), and both for informed and uninformed settings.
For fingerprinting, we show that if the pirate strategy is known, the Neyman-…
▽ More
We study the large-coalition asymptotics of fingerprinting and group testing, and derive explicit decoders that provably achieve capacity for many of the considered models. We do this both for simple decoders (fast but suboptimal) and for joint decoders (slow but optimal), and both for informed and uninformed settings.
For fingerprinting, we show that if the pirate strategy is known, the Neyman-Pearson-based log-likelihood decoders provably achieve capacity, regardless of the strategy. The decoder built against the interleaving attack is further shown to be a universal decoder, able to deal with arbitrary attacks and achieving the uninformed capacity. This universal decoder is shown to be closely related to the Lagrange-optimized decoder of Oosterwijk et al. and the empirical mutual information decoder of Moulin. Joint decoders are also proposed, and we conjecture that these also achieve the corresponding joint capacities.
For group testing, the simple decoder for the classical model is shown to be more efficient than the one of Chan et al. and it provably achieves the simple group testing capacity. For generalizations of this model such as noisy group testing, the resulting simple decoders also achieve the corresponding simple capacities.
△ Less
Submitted 9 April, 2014;
originally announced April 2014.
-
Asymptotics of Fingerprinting and Group Testing: Tight Bounds from Channel Capacities
Authors:
Thijs Laarhoven
Abstract:
In this work we consider the large-coalition asymptotics of various fingerprinting and group testing games, and derive explicit expressions for the capacities for each of these models. We do this both for simple decoders (fast but suboptimal) and for joint decoders (slow but optimal).
For fingerprinting, we show that if the pirate strategy is known, the capacity often decreases linearly with the…
▽ More
In this work we consider the large-coalition asymptotics of various fingerprinting and group testing games, and derive explicit expressions for the capacities for each of these models. We do this both for simple decoders (fast but suboptimal) and for joint decoders (slow but optimal).
For fingerprinting, we show that if the pirate strategy is known, the capacity often decreases linearly with the number of colluders, instead of quadratically as in the uninformed fingerprinting game. For many attacks the joint capacity is further shown to be strictly higher than the simple capacity.
For group testing, we improve upon known results about the joint capacities, and derive new explicit asymptotics for the simple capacities. These show that existing simple group testing algorithms are suboptimal, and that simple decoders cannot asymptotically be as efficient as joint decoders. For the traditional group testing model, we show that the gap between the simple and joint capacities is a factor 1.44 for large numbers of defectives.
△ Less
Submitted 9 April, 2014;
originally announced April 2014.
-
Capacities and Capacity-Achieving Decoders for Various Fingerprinting Games
Authors:
Thijs Laarhoven
Abstract:
Combining an information-theoretic approach to fingerprinting with a more constructive, statistical approach, we derive new results on the fingerprinting capacities for various informed settings, as well as new log-likelihood decoders with provable code lengths that asymptotically match these capacities. The simple decoder built against the interleaving attack is further shown to achieve the simpl…
▽ More
Combining an information-theoretic approach to fingerprinting with a more constructive, statistical approach, we derive new results on the fingerprinting capacities for various informed settings, as well as new log-likelihood decoders with provable code lengths that asymptotically match these capacities. The simple decoder built against the interleaving attack is further shown to achieve the simple capacity for unknown attacks, and is argued to be an improved version of the recently proposed decoder of Oosterwijk et al. With this new universal decoder, cut-offs on the bias distribution function can finally be dismissed.
Besides the application of these results to fingerprinting, a direct consequence of our results to group testing is that (i) a simple decoder asymptotically requires a factor 1.44 more tests to find defectives than a joint decoder, and (ii) the simple decoder presented in this paper provably achieves this bound.
△ Less
Submitted 2 April, 2014; v1 submitted 22 January, 2014;
originally announced January 2014.
-
Axioms for graph clustering quality functions
Authors:
Twan van Laarhoven,
Elena Marchiori
Abstract:
We investigate properties that intuitively ought to be satisfied by graph clustering quality functions, that is, functions that assign a score to a clustering of a graph. Graph clustering, also known as network community detection, is often performed by optimizing such a function. Two axioms tailored for graph clustering quality functions are introduced, and the four axioms introduced in previous…
▽ More
We investigate properties that intuitively ought to be satisfied by graph clustering quality functions, that is, functions that assign a score to a clustering of a graph. Graph clustering, also known as network community detection, is often performed by optimizing such a function. Two axioms tailored for graph clustering quality functions are introduced, and the four axioms introduced in previous work on distance based clustering are reformulated and generalized for the graph setting. We show that modularity, a standard quality function for graph clustering, does not satisfy all of these six properties. This motivates the derivation of a new family of quality functions, adaptive scale modularity, which does satisfy the proposed axioms. Adaptive scale modularity has two parameters, which give greater flexibility in the kinds of clusterings that can be found. Standard graph clustering quality functions, such as normalized cut and unnormalized cut, are obtained as special cases of adaptive scale modularity.
In general, the results of our investigation indicate that the considered axiomatic framework covers existing `good' quality functions for graph clustering, and can be used to derive an interesting new family of quality functions.
△ Less
Submitted 22 July, 2014; v1 submitted 15 August, 2013;
originally announced August 2013.
-
Efficient Probabilistic Group Testing Based on Traitor Tracing
Authors:
Thijs Laarhoven
Abstract:
Inspired by recent results from collusion-resistant traitor tracing, we provide a framework for constructing efficient probabilistic group testing schemes. In the traditional group testing model, our scheme asymptotically requires T ~ 2 K ln N tests to find (with high probability) the correct set of K defectives out of N items. The framework is also applied to several noisy group testing and thres…
▽ More
Inspired by recent results from collusion-resistant traitor tracing, we provide a framework for constructing efficient probabilistic group testing schemes. In the traditional group testing model, our scheme asymptotically requires T ~ 2 K ln N tests to find (with high probability) the correct set of K defectives out of N items. The framework is also applied to several noisy group testing and threshold group testing models, often leading to improvements over previously known results, but we emphasize that this framework can be applied to other variants of the classical model as well, both in adaptive and in non-adaptive settings.
△ Less
Submitted 9 April, 2014; v1 submitted 9 July, 2013;
originally announced July 2013.
-
Dynamic Traitor Tracing Schemes, Revisited
Authors:
Thijs Laarhoven
Abstract:
We revisit recent results from the area of collusion-resistant traitor tracing, and show how they can be combined and improved to obtain more efficient dynamic traitor tracing schemes. In particular, we show how the dynamic Tardos scheme of Laarhoven et al. can be combined with the optimized score functions of Oosterwijk et al. to trace coalitions much faster. If the attack strategy is known, in m…
▽ More
We revisit recent results from the area of collusion-resistant traitor tracing, and show how they can be combined and improved to obtain more efficient dynamic traitor tracing schemes. In particular, we show how the dynamic Tardos scheme of Laarhoven et al. can be combined with the optimized score functions of Oosterwijk et al. to trace coalitions much faster. If the attack strategy is known, in many cases the order of the code length goes down from quadratic to linear in the number of colluders, while if the attack is not known, we show how the interleaving defense may be used to catch all colluders about twice as fast as in the dynamic Tardos scheme. Some of these results also apply to the static traitor tracing setting where the attack strategy is known in advance, and to group testing.
△ Less
Submitted 30 June, 2013;
originally announced July 2013.
-
Discrete Distributions in the Tardos Scheme, Revisited
Authors:
Thijs Laarhoven,
Benne de Weger
Abstract:
The Tardos scheme is a well-known traitor tracing scheme to protect copyrighted content against collusion attacks. The original scheme contained some suboptimal design choices, such as the score function and the distribution function used for generating the biases. Skoric et al. previously showed that a symbol-symmetric score function leads to shorter codes, while Nuida et al. obtained the optimal…
▽ More
The Tardos scheme is a well-known traitor tracing scheme to protect copyrighted content against collusion attacks. The original scheme contained some suboptimal design choices, such as the score function and the distribution function used for generating the biases. Skoric et al. previously showed that a symbol-symmetric score function leads to shorter codes, while Nuida et al. obtained the optimal distribution functions for arbitrary coalition sizes. Later, Nuida et al. showed that combining these results leads to even shorter codes when the coalition size is small. We extend their analysis to the case of large coalitions and prove that these optimal distributions converge to the arcsine distribution, thus showing that the arcsine distribution is asymptotically optimal in the symmetric Tardos scheme. We also present a new, practical alternative to the discrete distributions of Nuida et al. and give a comparison of the estimated lengths of the fingerprinting codes for each of these distributions.
△ Less
Submitted 29 April, 2013; v1 submitted 7 February, 2013;
originally announced February 2013.
-
Solving the Shortest Vector Problem in Lattices Faster Using Quantum Search
Authors:
Thijs Laarhoven,
Michele Mosca,
Joop van de Pol
Abstract:
By applying Grover's quantum search algorithm to the lattice algorithms of Micciancio and Voulgaris, Nguyen and Vidick, Wang et al., and Pujol and Stehlé, we obtain improved asymptotic quantum results for solving the shortest vector problem. With quantum computers we can provably find a shortest vector in time $2^{1.799n + o(n)}$, improving upon the classical time complexity of…
▽ More
By applying Grover's quantum search algorithm to the lattice algorithms of Micciancio and Voulgaris, Nguyen and Vidick, Wang et al., and Pujol and Stehlé, we obtain improved asymptotic quantum results for solving the shortest vector problem. With quantum computers we can provably find a shortest vector in time $2^{1.799n + o(n)}$, improving upon the classical time complexity of $2^{2.465n + o(n)}$ of Pujol and Stehlé and the $2^{2n + o(n)}$ of Micciancio and Voulgaris, while heuristically we expect to find a shortest vector in time $2^{0.312n + o(n)}$, improving upon the classical time complexity of $2^{0.384n + o(n)}$ of Wang et al. These quantum complexities will be an important guide for the selection of parameters for post-quantum cryptosystems based on the hardness of the shortest vector problem.
△ Less
Submitted 25 January, 2013;
originally announced January 2013.
-
The Collatz conjecture and De Bruijn graphs
Authors:
Thijs Laarhoven,
Benne de Weger
Abstract:
We study variants of the well-known Collatz graph, by considering the action of the 3n+1 function on congruence classes. For moduli equal to powers of 2, these graphs are shown to be isomorphic to binary De Bruijn graphs. Unlike the Collatz graph, these graphs are very structured, and have several interesting properties. We then look at a natural generalization of these finite graphs to the 2-adic…
▽ More
We study variants of the well-known Collatz graph, by considering the action of the 3n+1 function on congruence classes. For moduli equal to powers of 2, these graphs are shown to be isomorphic to binary De Bruijn graphs. Unlike the Collatz graph, these graphs are very structured, and have several interesting properties. We then look at a natural generalization of these finite graphs to the 2-adic integers, and show that the isomorphism between these infinite graphs is exactly the conjugacy map previously studied by Bernstein and Lagarias. Finally, we show that for generalizations of the 3n+1 function, we get similar relations with 2-adic and p-adic De Bruijn graphs.
△ Less
Submitted 16 September, 2012;
originally announced September 2012.
-
Dynamic Traitor Tracing for Arbitrary Alphabets: Divide and Conquer
Authors:
Thijs Laarhoven,
Jan-Jaap Oosterwijk,
Jeroen Doumen
Abstract:
We give a generic divide-and-conquer approach for constructing collusion-resistant probabilistic dynamic traitor tracing schemes with larger alphabets from schemes with smaller alphabets. This construction offers a linear tradeoff between the alphabet size and the codelength. In particular, we show that applying our results to the binary dynamic Tardos scheme of Laarhoven et al. leads to schemes t…
▽ More
We give a generic divide-and-conquer approach for constructing collusion-resistant probabilistic dynamic traitor tracing schemes with larger alphabets from schemes with smaller alphabets. This construction offers a linear tradeoff between the alphabet size and the codelength. In particular, we show that applying our results to the binary dynamic Tardos scheme of Laarhoven et al. leads to schemes that are shorter by a factor equal to half the alphabet size. Asymptotically, these codelengths correspond, up to a constant factor, to the fingerprinting capacity for static probabilistic schemes. This gives a hierarchy of probabilistic dynamic traitor tracing schemes, and bridges the gap between the low bandwidth, high codelength scheme of Laarhoven et al. and the high bandwidth, low codelength scheme of Fiat and Tassa.
△ Less
Submitted 26 September, 2012; v1 submitted 28 June, 2012;
originally announced June 2012.
-
Dynamic Tardos Traitor Tracing Schemes
Authors:
Thijs Laarhoven,
Jeroen Doumen,
Peter Roelse,
Boris Skoric,
Benne de Weger
Abstract:
We construct binary dynamic traitor tracing schemes, where the number of watermark bits needed to trace and disconnect any coalition of pirates is quadratic in the number of pirates, and logarithmic in the total number of users and the error probability. Our results improve upon results of Tassa, and our schemes have several other advantages, such as being able to generate all codewords in advance…
▽ More
We construct binary dynamic traitor tracing schemes, where the number of watermark bits needed to trace and disconnect any coalition of pirates is quadratic in the number of pirates, and logarithmic in the total number of users and the error probability. Our results improve upon results of Tassa, and our schemes have several other advantages, such as being able to generate all codewords in advance, a simple accusation method, and flexibility when the feedback from the pirate network is delayed.
△ Less
Submitted 25 January, 2013; v1 submitted 15 November, 2011;
originally announced November 2011.
-
Optimal symmetric Tardos traitor tracing schemes
Authors:
Thijs Laarhoven,
Benne de Weger
Abstract:
For the Tardos traitor tracing scheme, we show that by combining the symbol-symmetric accusation function of Skoric et al. with the improved analysis of Blayer and Tassa we get further improvements. Our construction gives codes that are up to 4 times shorter than Blayer and Tassa's, and up to 2 times shorter than the codes from Skoric et al. Asymptotically, we achieve the theoretical optimal codel…
▽ More
For the Tardos traitor tracing scheme, we show that by combining the symbol-symmetric accusation function of Skoric et al. with the improved analysis of Blayer and Tassa we get further improvements. Our construction gives codes that are up to 4 times shorter than Blayer and Tassa's, and up to 2 times shorter than the codes from Skoric et al. Asymptotically, we achieve the theoretical optimal codelength for Tardos' distribution function and the symmetric score function. For large coalitions, our codelengths are asymptotically about 4.93% of Tardos' original codelengths, which also improves upon results from Nuida et al.
△ Less
Submitted 15 November, 2011; v1 submitted 18 July, 2011;
originally announced July 2011.